Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Ji-Rong Wen, Beijing CN

Ji-Rong Wen, Beijing CN

Patent application numberDescriptionPublished
20080215561SCORING RELEVANCE OF A DOCUMENT BASED ON IMAGE TEXT - A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.09-04-2008
20080215563Pseudo-Anchor Text Extraction for Vertical Search - A search method uses pseudo-anchor text associated with search objects to improve search performance. The pseudo-anchor text may be extracted in combination with an identifier of the search objects (such as a pseudo-URL) from a digital corpus such as a collection of documents. Pseudo-anchor texts for each object are preferably extracted from candidate anchor blocks using a machine learning based approach. The pseudo-anchor texts are made available for searching and used to help ranking the objects in a search result to improve search performance. Method may be used in vertical search of objects such as published articles, products and images that lack explicit URL and anchor text information.09-04-2008
20080250009ASSESSING MOBILE READINESS OF A PAGE USING A TRAINED SCORER - A method and system for ranking pages of a search result based on the mobile readiness of the pages is provided. A mobile-readiness system receives an indication of pages that are to be ranked. The mobile-readiness system evaluates the mobile readiness for each of the pages. Mobile readiness indicates suitability of the page for a mobile device. The mobile readiness system then ranks the pages based on the generated mobile readiness and some other criterion such as a relevance score or an importance score. The mobile-readiness system may train a classifier to classify pages based on their mobile readiness.10-09-2008
20080256068METHOD AND SYSTEM FOR CALCULATING IMPORTANCE OF A BLOCK WITHIN A DISPLAY PAGE - A method and system for identifying the importance of information areas of a display page. An importance system identifies information areas or blocks of a web page. A block of a web page represents an area of the web page that appears to relate to a similar topic. The importance system provides the characteristics or features of a block to an importance function that generates an indication of the importance of that block to its web page. The importance system “learns” the importance function by generating a model based on the features of blocks and the user-specified importance of those blocks. To learn the importance function, the importance system asks users to provide an indication of the importance of blocks of web pages in a collection of web pages.10-16-2008
20080313165SCALABLE MODEL-BASED PRODUCT MATCHING - Aspects of the subject matter described herein relate to matching product information to products. In aspects, a product matching component receives product information. The product matching component normalizes the product information and obtains keywords from the product information. By querying a database of recognized products, the keywords are used to obtain a list of products that potentially match the product information. A confidence level is assigned to each of the potential matches in the list. A match may be returned for the highest matched product or for a selectable number of products whose confidence level(s) exceed a selectable threshold.12-18-2008
20090012956Retrieval of Structured Documents - This disclosure relates to performing a query for a search term of a database containing a plurality of structured documents. Those structured documents that do not include the search term are ferreted or filtered out during an initial search. Matched structured documents which are those structured documents that do contain the search term are evaluated by ranking the individual elements based on how well each individual element matches the search term, and indicating to the user the ranking of the individual elements wherein the individual elements can be accessed by the user.01-08-2009
20090024607QUERY SELECTION FOR EFFECTIVELY LEARNING RANKING FUNCTIONS - A learning system for a search ranking function model may include a computer program that iteratively refines the model using new queries and associated documents from an unlabeled training set. The unlabeled training set may include a set of queries for which the associated documents have not been labeled as “relevant” or otherwise labeled. The new queries may be selected based on a similarity to and an accuracy of each neighbor from a labeled training set, such as a labeled validation set. Upon selection, the documents associated with the new queries may be labeled. The new queries and their associated documents may be accumulated into a labeled training set, such as a labeled training set, and a refined model may be learned based on the augmented labeled training set. The model may be iteratively refined until it is determined that the model is adequate.01-22-2009
20100145956PSEUDO-ANCHOR TEXT EXTRACTION - A search method uses pseudo-anchor text associated with search objects to improve search performance. The pseudo-anchor text may be extracted in combination with an identifier of the search objects (such as a pseudo-URL) from a digital corpus such as a collection of documents. Pseudo-anchor texts for each object are preferably extracted from candidate anchor blocks using a machine learning based approach. The pseudo-anchor texts are made available for searching and used to help rank the objects in a search result to improve search performance. The method may be used in vertical search of objects such as published articles, products and images that lack explicit URLs and anchor text information.06-10-2010
20100281009HIERARCHICAL CONDITIONAL RANDOM FIELDS FOR WEB EXTRACTION - A method and system for labeling object information of an information page is provided. A labeling system identifies an object record of an information page based on the labeling of object elements within an object record and labels object elements based on the identification of an object record that contains the object elements. To identify the records and label the elements, the labeling system generates a hierarchical representation of blocks of an information page. The labeling system identifies records and elements within the records by propagating probability-related information of record labels and element labels through the hierarchy of the blocks. The labeling system generates a feature vector for each block to represent the block and calculates a probability of a label for a block being correct based on a score derived from the feature vectors associated with related blocks. The labeling system searches for the labeling of records and elements that has the highest probability of being correct.11-04-2010
20110078131EXPERIMENTAL WEB SEARCH SYSTEM - Described is the running of search-related experiments on a full (or partial) offline snapshot copy of the search engine documents of an actual production system. A snapshot experimentation subsystem runs experimental code related to web searches on the offline data, including to run experimental index building code to build an experimental index (e.g., to test a new document feature), and/or to run experimental search-related code, such as to rank search results according to experimental ranking code, to implement an experimental search strategy, and/or to generate experimental captions.03-31-2011
20110078132FLEXIBLE INDEXING AND RANKING FOR SEARCH - Described is a flexible framework for index building and document retrieval in a search environment that allows different search scenario applications to reuse index building and document retrieval code for non-scenario-specific functionality. Interfaces to various functionality of an index builder and retrieval engine are defined. An application calls the interfaces to specify custom code to perform a search scenario when needed, or use default code when non-scenario-specific functionality may be used.03-31-2011
20110078162WEB-SCALE ENTITY SUMMARIZATION - Described is a summarizing a web entity (e.g., a person, place, product or so forth) based upon the entity's appearance in web documents (e.g., on the order of hundreds of millions or billions of webpages). Webpages are separated into blocks, which are then processed according to various features to filter the number of blocks to further process, and rank the most relevant blocks with respect to the entity that remain. A redundancy removal mechanism removes redundant blocks, leaving a set of remaining blocks that are used to provide a summary of information that is relevant to the entity.03-31-2011
20110078554WEBPAGE ENTITY EXTRACTION THROUGH JOINT UNDERSTANDING OF PAGE STRUCTURES AND SENTENCES - Described is a technology for understanding entities of a webpage, e.g., to label the entities on the webpage. An iterative and bidirectional framework processes a webpage, including a text understanding component (e.g., extended Semi-CRF model) that provides text segmentation features to a structure understanding component (e.g., extended HCRF model). The structure understanding component uses the text segmentation features and visual layout features of the webpage to identify a structure (e.g., labeled block). The text understanding component in turn uses the labeled block to further understand the text. The process continues iteratively until a similarity criterion is met, at which time the entities may be labeled. Also described is the use of multiple mentions of a set of text in the webpage to help in labeling an entity.03-31-2011
20110087660SCORING RELEVANCE OF A DOCUMENT BASED ON IMAGE TEXT - A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.04-14-2011
20110137886Data-Centric Search Engine Architecture - Described is a data-centric web search engine technology/architecture, in which document metadata, including offline-extracted metadata, is used as part of a search indexing and ranking pipeline. A web data management component receives crawled documents and extracts document metadata from the documents. An indexing component uses the document metadata to build an index for the documents. A serving component uses the index and the document metadata to serve content, e.g., search results. Also described is the use of query metadata extracted from queries of a query log for use in the pipeline.06-09-2011

Patent applications by Ji-Rong Wen, Beijing CN