Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Benoit Dumoulin, Palo Alto US

Benoit Dumoulin, Palo Alto, CA US

Patent application numberDescriptionPublished
20090259629ABBREVIATION HANDLING IN WEB SEARCH - A method for handling abbreviations in web queries includes building a dictionary of a plurality of possible word expansions for a plurality of potential abbreviations related to query terms received or anticipated to be received by a search engine; accepting a query including an abbreviation; expanding the abbreviation into one of the plurality of word expansions if a probability that the expansion is correct is above a threshold value, wherein the probability is determined by taking into consideration a context of the abbreviation within the query, wherein the context including at least anchor text; and sending the query with the expanded abbreviation to the search engine to generate a search results page related to the query.10-15-2009
20090259643NORMALIZING QUERY WORDS IN WEB SEARCH - A method for normalizing query words in web search includes populating a dictionary with join and split candidates and corresponding joined and split words from an aggregate of query logs; determining a confidence score for join and split candidates, a highest confidence score for each being characterized in the dictionary as must-join and must-split, respectively; accepting queries with words amenable to being split or joined, or amenable to an addition or deletion of a hyphen or an apostrophe; generating, based on the accepted queries, split candidates obtained from the dictionary, and candidates of join, hyphen, or apostrophe algorithmically; and submitting to a search engine the generated possible candidates characterized as must-join or must-split in the dictionary, to improve search results returned in response to the queries; applying a language dictionary to generated candidates not characterized as must-split or must-join, to rank them, and submitting those highest-ranked to the search engine.10-15-2009
20100036784SYSTEMS AND METHODS FOR FINDING HIGH QUALITY CONTENT IN SOCIAL MEDIA - The present invention is directed towards systems and methods for identifying high quality content in a social media environment. The method according to one embodiment of the present invention comprises retrieving a content item and retrieving a plurality of quality features associated with said content item wherein said quality features comprise intrinsic, usage and relationship features. The method then performs an analysis of said content item against said quality features and generates a quality score based on said analysis.02-11-2010
20100094835AUTOMATIC QUERY CONCEPTS IDENTIFICATION AND DRIFTING FOR WEB SEARCH - Techniques are described for automatically determining which terms in a search query may be augmented by contextually similar terms such that more relevant results can be displayed to a user. Contextually similar words are determined based on training data, including a web corpus and a query log. Once contextually similar words are determined, they may be inserted into a search query and used to find more relevant results. Consequently, documents that contain helpful information but may not have exact word matches may be found more readily by a search engine.04-15-2010
20100114878SELECTIVE TERM WEIGHTING FOR WEB SEARCH BASED ON AUTOMATIC SEMANTIC PARSING - A method is provided for selecting relevant documents returned from a search query. When a search engine finds search terms in documents, the document score is based on the frequency of the occurrence of those terms, the category of the term, and the section of the document in which the term is found. Each (category type, document section) pair is assigned a weight that is used to modify the contribution of term frequency. The weights are determined in an offline process using historical data and human validation. Through this empirical process, the weight assignments are made to correlate high relevance scores with documents that humans would find relevant to a search query.05-06-2010
20100185623TOPICAL RANKING IN INFORMATION RETRIEVAL - An aggregate ranking model is generated, which comprises a general ranking model and one or more topical training models. Each topical ranking model is associated with a topic, or topic class, and for use in ranking search result items determined to belong to the topic, or topic class. As one example, the topical ranking model is trained using a set of topical training data, e.g., training data determined to belong to the topic, or topic class, a general ranking model and a residue, or error, determined from a general ranking generated by the general ranking model for the topical training data, with the topical ranking model being trained to minimize the general ranking model's error in the aggregate ranking model.07-22-2010
20100191740SYSTEM AND METHOD FOR RANKING WEB SEARCHES WITH QUANTIFIED SEMANTIC FEATURES - A system and method for ranking web searches with quantified semantic features. A query for a web search is received from a user. The query is segmented and tagged into one or more linguistic segments using linguistic analysis. At least some of the linguistic segments are tagged with a linguistic type. A query execution plan is generated comprising the linguistic segments and, for each of the linguistic segments tagged with a linguistic type, at least one tag attribute comprising at least one domain specific feature of the linguistic type. A search is performed for documents matching the query. Each of the documents is scored for each of the linguistic segments of the query execution plan using the tag attributes of the respective linguistic segment. The documents are ranked using a function that uses the scores of the documents. A ranked list of the documents is transmitted back to the user.07-29-2010
20100191758SYSTEM AND METHOD FOR IMPROVED SEARCH RELEVANCE USING PROXIMITY BOOSTING - A system and method for improved search relevance using proximity boosting. A query for a web search is received from a user, via a network, wherein the query comprises a plurality of query tokens. One or more concepts are identified in the query wherein each of concepts comprises at least two query tokens. A relative concept strength is determined for each of the identified concepts. The query is then rewritten for submission to a search engine wherein for each of the one or more concepts, a syntax rule associated with the respective relative concept strength of the concept is applied to the query tokens comprising the concept such that the rewritten query represents the one or more concepts whereby the proximity of the one or more concepts in a search result returned by the search engine to the user in response to the rewritten query is boosted.07-29-2010
20100312778PREDICTIVE PERSON NAME VARIANTS FOR WEB SEARCH - Techniques for determining when and which name variant candidates to use to re-write a search query that includes a person's name in order to provide the most relevant search results are provided. A determination is made whether a person name is present in a search query request entered by a user. Name variant candidates are generated for each person name. Then, the name variant candidates are ranked for each person name based upon one or more models that calculate a probability value for each name variant candidate. Based upon these rankings, the query may be re-written to include the original person name and a specified number of top ranked name variant candidates to present the user with the most relevant search results.12-09-2010
20110010353ABBREVIATION HANDLING IN WEB SEARCH - A method for handling abbreviations in web queries includes building a dictionary of possible word expansions for potential abbreviations related to query terms received and anticipated to be received by a search engine; accepting a query including an abbreviation from a searching user, where a probability of finding a most probably-correct expansion in the dictionary is a first probability, and a probability that the expansion is the abbreviation itself is a second probability; determining a ratio between the first and second probabilities; expanding the abbreviation in accordance with the most probably-correct expansion when the ratio is above a first threshold value; and highlighting the abbreviation with a suggested expansion of the most probably-correct expansion for the user so that the user may accept the suggested expansion when the ratio is between a second, lower threshold value and the first threshold value.01-13-2011
20110072021Semantic and Text Matching Techniques for Network Search - In one embodiment, access a search query comprising one or more query words, at least one of the query words representing one or more query concepts; access a network document identified for a search query by a search engine, the network document comprising one or more document words, at least one of the document words representing one or more document concepts; semantic-text match the search query and the network document to determine one or more negative semantic-text matches; and construct one or more negative features based on the negative semantic-text matches.03-24-2011

Patent applications by Benoit Dumoulin, Palo Alto, CA US