Patent application number | Description | Published |
20130061132 | SYSTEM AND METHOD FOR WEB PAGE SEGMENTATION USING ADAPTIVE THRESHOLD COMPUTATION - A system and method for an adaptive threshold Web Page segmenting is disclosed. In one embodiment, a method performed by a physical computing system having one or more processors for segmenting a Web page including a plurality of nodes includes parsing content in the Web page into the plurality of nodes using the physical computing system, obtaining feature values between each pair of nodes using the physical computing system, estimating an adaptive threshold value using the obtained feature values using the physical computing system, and segmenting the Web page by comparing the feature values associated with each pair of nodes with the estimated adaptive threshold value. | 03-07-2013 |
20130091150 | DETERMIINING SIMILARITY BETWEEN ELEMENTS OF AN ELECTRONIC DOCUMENT - Disclosed is a computer-implemented method of determining smarty between first and second elements of an electronic document. The method uses a computer to calculate a plurality of measures of similarity between the first and second elements in at least two representations of the electronic document. A computer program product and system implementing this method are also disclosed. | 04-11-2013 |
20130124684 | VISUAL SEPARATOR DETECTION IN WEB PAGES USING CODE ANALYSIS - A method for detection of visual separators in web pages using code analysis includes receiving a web page and its associated web code by a web page analysis device and analyzing the web code to detect visual separators in the web page. A web page analysis device for visual separator detection in web pages is also provided. | 05-16-2013 |
20130124953 | PRODUCING WEB PAGE CONTENT - A method for producing web page content includes identifying blocks within a web page. The blocks are selectively assembled into sections. The sections are selectively assembled into article candidates. An article candidate that includes article content is distinguished from article candidates that do not include article content. Content is produced only from the article candidate distinguished as including article content. | 05-16-2013 |
20130145255 | SYSTEMS AND METHODS FOR FILTERING WEB PAGE CONTENTS - A system and method for selectively filtering web page contents are disclosed. In one example embodiment a document object model (DOM) structure and visual information of the web page contents are generated. The document object model (DOM) structure and the visual information are analyzed to determine multiple web page content attributes. One or more filtering parameters are selected from the multiple web page content attributes. The web page is filtered based on the one or more filtering parameters. | 06-06-2013 |
20130159209 | PRODUCT INFORMATION - Disclosed is a method of generating a model representation of product information. The method obtains a list of products from a source of product information. A hierarchical tree is then constructed from the obtained list of products, wherein each hierarchical layer of the tree corresponds to a different category of product information. | 06-20-2013 |
20130159889 | Obtaining Rendering Co-ordinates Of Visible Text Elements - A computer-implemented method for obtaining the rendering co-ordinates of visible text elements on a web page is disclosed. The web page is represented by an input data structure comprising a plurality of text nodes, each of which represents a text element on the web page. The method comprises the following steps:
| 06-20-2013 |
20130204835 | METHOD OF EXTRACTING NAMED ENTITY - Presented is a method of extracting named entities from a large-scale document corpus. The method includes identifying named entities in the corpus and forming a set of seed entities manually or automatically using some existing resources, constructing a named entity graph to discover same-type probability between any given pair of named entities, expanding the set of seed entities and performing a confidence propagation of the seed entities on the named entity graph. | 08-08-2013 |
20130204867 | Selection of Main Content in Web Pages - A system and method for selecting main content ( | 08-08-2013 |
20130212498 | Selecting Content Within a Web Page - A system and method of selecting content within a web page ( | 08-15-2013 |
20130238607 | SEED SET EXPANSION - Systems and methods for seed set expansion are provided. A context-based extractor ( | 09-12-2013 |
20130275854 | Segmenting a Web Page into Coherent Functional Blocks - Segmenting a web page ( | 10-17-2013 |
20130283148 | Extraction of Content from a Web Page - A system and method are provided for extracting main content from a web page. Web page segmentation is performed on a web page to provide affinity-grouped segments. Descriptive features of at least one of the affinity-grouped segments are computed. At least one of the affinity-grouped segments is classified as a main body segment based on the computed descriptive features. Additional affinity-grouped segments are classified as to a document function based on the computed descriptive features. Classified affinity-grouped segments are assembled according to their classified document functions to provide the main content. | 10-24-2013 |