| Patent application number | Description | Published |
| 20090204889 | ADAPTIVE SAMPLING OF WEB PAGES FOR EXTRACTION - Techniques are provided for improving the recall rate of an information extraction system by automatically selecting pages to surface to a user for annotation based on variation data. Techniques are provided for generating the variation data during the construction of the template that is to be used for extraction. During template construction, data is stored to indicate which template-construction pages saw or made changes to nodes in the template. After interesting nodes have been identified in the template, the data stored during template construction is used to determine which pages made changes to interesting-variation nodes. Techniques are also provided for generating the variation data during the extraction phase, when the template is being used to extract information from pages. During the extraction phase, variation data is generated in response to detecting that extraction for a given page resulted in one or more empty attributes. | 08-13-2009 |
| 20090216708 | STRUCTURAL CLUSTERING AND TEMPLATE IDENTIFICATION FOR ELECTRONIC DOCUMENTS - Subject matter disclosed herein may relate to clustering electronic documents, such as, for example, web pages, and may also relate to template identification for electronic documents. | 08-27-2009 |
| 20090248707 | SITE-SPECIFIC INFORMATION-TYPE DETECTION METHODS AND SYSTEMS - Methods and systems are provided herein that may allow for pertinent information-type(s) of data to be located or otherwise identified within one or more documents, such as, for example, web page documents associated with one or more websites. For example, exemplary methods and systems are provided that may be used to determine if information may be more likely to be of an “informative” type of information or possibly more likely to be of a “noise” type of information. | 10-01-2009 |
| 20100228738 | ADAPTIVE DOCUMENT SAMPLING FOR INFORMATION EXTRACTION - A method and apparatus for improved sampling documents for training sets input to information extraction systems is provided, which improves the recall and robustness of wrapper extraction. A passive sampling technique provides a list of documents to present for human annotation ordered by representativeness of the document based on structural and content statistics. Thus, the document with the most interesting attributes and which is most representative of the cluster of structurally similar documents to which the document pertains is presented for annotation first. The problem is mapped to classical ‘Set-Cover’ problem and solved using greedy approach. An active sampling technique refines and reorders the sample list produced by the passive sampling technique after initial annotations, based on the human annotation, spatial boundaries of the documents, and structural and content statistics. The proposed techniques work at a site level and perform page-level structural analysis using XPath-term frequency, XPath-document frequency, and XPath-importance. | 09-09-2010 |
| Patent application number | Description | Published |
| 20100155460 | Ventilation Board, Ventilation Box, Ventilation System, Insulating Board and Method for Manufacturing Ventilation Board and Box - The present invention relates to a ventilation board, an insulating board, a ventilation system, and articles and architectural applications comprising said ventilation board and a method of manufacturing the ventilation board. The said ventilation board comprises a layer ( | 06-24-2010 |
| 20100193312 | FOLDABLE AND/OR DISPOSABLE LUGGAGE - The present invention provides a luggage, which is mainly foldable and/or disposable. In preferable terms, the present invention provides said foldable and/or disposable luggage made of corrugated board having wheels adapted detachably to the luggage, which makes the luggage easy to handle, store and stackable. Said foldable and/or disposable luggage comprises two main panels, namely top main panel and a bottom main panel, and a plurality of side panels of the luggage formed by folding at least one scored die-cut flat blank, an extended panel, and a ply having wheels aligning with the holes of the said panel. | 08-05-2010 |