Patent application number | Description | Published |
20080250008 | Query Specialization - A system, a method and computer-readable media for identifying and presenting potential query refinements for a user's search input. Documents are identified as being responsive to the search input. A query log is accessed to identify previously entered queries that also returned one or more of the identified documents. From these previously entered queries, a portion of the queries are selected as potential query refinements. Thereafter, the potential query refinements are displayed to the user. | 10-09-2008 |
20080319941 | METHOD AND APPARATUS FOR DOCUMENT CLUSTERING AND DOCUMENT SKETCHING - A first embodiment of the invention provides a system that automatically classifies documents in a collection into clusters based on the similarities between documents, that automatically classifies new documents into the right clusters, and that may change the number or parameters of clusters under various circumstances. A second embodiment of the invention provides a technique for comparing two documents, in which a fingerprint or sketch of each document is computed. In particular, this embodiment of the invention uses a specific algorithm to compute the document's fingerprint. One embodiment uses a sentence in the document as a logical delimiter or window from which significant words are extracted and, thereafter, a hash is computed of all pair-wise permutations. Words are extracted based on their weight in the document, which can be computed using measures such as term frequency and the inverse document frequency. | 12-25-2008 |
20090073888 | DETERMINING QUALITY OF COMMUNICATION - A method, computer-readable medium, and system for providing a quality measurement based on communications within a communication application. Communication attributes that include information associated with a user's communications are obtained. In embodiments, such communication attributes may pertain to communication duration and communication frequency. Upon obtaining communication attributes, a quality measurement may be determined based on the communication attributes. Such a quality measurement provides an indication of the quality of the user's communications. In embodiments, the quality measurement may be stored, communicated to a user, or implemented within a communication application. | 03-19-2009 |
20090089266 | METHOD OF FINDING CANDIDATE SUB-QUERIES FROM LONGER QUERIES - A method is disclosed for identifying queries stored in a log which are semantically related to an input query that may include a large number of terms. A set of one or more subsequences are generated for each query stored in the log, and these sets of subsequences are stored in a lookup table. A set of one or more subsequences are also generated for the input query. The subsequences in the lookup table and of the input query are generated by hashing of the respective query terms to a value between 0 and 1 using a known technique of min-hashing. The present system then constructs the subsequences of the query using the k-min hashes of the query, where k is an integer based on the number of terms in the query. | 04-02-2009 |
20090157643 | SEMI-SUPERVISED PART-OF-SPEECH TAGGING - Relevant search results for a given query may be determined using click data for the query and the number of times the query is issued to a search engine. The number of clicks that a result receives for the given query may provide a feedback mechanism to the search engine on how relevant the result is for the given query. The frequency of a query along with the associated clicks provides the search engine with the effectiveness of the query in producing relevant results. Edges in a graph of queries versus results may be weighted in accordance with the click data and the efficiency to rank the search results provided to a user. | 06-18-2009 |
20090157644 | EXTRACTING SIMILAR ENTITIES FROM LISTS / TABLES - Large numbers of lists of entities may be mined for similar entities to related searches. A representation for each list may be determined to provide for a comparison between lists and to support membership checks. A score for an element in a list may be computed that represents the validity of an item in the corpus of lists. Thus, a spurious element would receive a very low score, where a valid element would receive a higher score. A list weight is then computed using the constituent element weights, and the element and list weight are used to compute the nearest neighbors of a given query element. | 06-18-2009 |
20090313286 | GENERATING TRAINING DATA FROM CLICK LOGS - Data from a click log may be used to generate training data for a search engine. The pages clicked as well as the pages skipped by a user may be used to assess the relevance of a page to a query. Labels for training data may be generated based on data from the click log. The labels may pertain to the relevance of a page to a query. | 12-17-2009 |
20100153370 | SYSTEM OF RANKING SEARCH RESULTS BASED ON QUERY SPECIFIC POSITION BIAS - A model based on a generalization of the Examination Hypothesis is disclosed that states that for a given query, the user click probability on a document in a given position is proportional to the relevance of the document and a query specific position bias. Based on this model the relevance and position bias parameters are learned for different queries and documents. This is done by translating the model into a system of linear equations that can be solved to obtain the best fit relevance and position bias values. A cumulative analysis of the position bias curves may be performed for different queries to understand the nature of these curves for navigational and informational queries. In particular, the position bias parameter values may be computed for a large number of queries. Such an exercise reveals whether the query is informational or navigational. A method is also proposed to solve the problem of dealing with sparse click data by inferring the goodness of unclicked documents for a given query from the clicks associated with similar queries. | 06-17-2010 |
20100153388 | METHODS AND APPARATUS FOR RESULT DIVERSIFICATION - Methods, apparatus, and systems directed to receiving search queries, retrieving documents, computing the number of categories to present for a given query, computing the number of results to show in each category, computing an ordering of categories, and for all the result pages beyond the first page employing user interface elements that optionally allow the user to quickly zoom in on a specific category and get more results belonging to that category. | 06-17-2010 |
20100228731 | LARGE GRAPH MEASUREMENT - As provided herein, a pairwise distance between nodes in a large graph can be determined efficiently. URL-sketches are generated for respective nodes in an index by extracting labels from respective nodes, which provide a reference to a link between the nodes, aggregating the labels into sets for respective nodes, and storing the sets of labels as URL-sketches. Neighborhood-sketches are generated for the respective nodes in the index using the URL-sketches, by determining a neighborhood for a node and generating a sketch using labels that are associated with the respective neighboring nodes. A distance between two nodes is determined by computing an approximate number of paths and an approximate path length between the two nodes, using the neighborhood sketches for the two nodes. | 09-09-2010 |
20100281022 | ESTIMATING RANK ON GRAPH STREAMS - The rank of nodes in a graph may be inferred from a calculated probability that each node in the graph appears in a single random walk of the graph. Short random walks may be generated for each node in the graph. The generated random walks may be combined to form a longer single random walk. Multiple single random walks may be generated in this fashion. The probability of each node appearing in a single random may be calculated from the generated single random walks. The rank of each node may then be inferred from the calculated probabilities. | 11-04-2010 |
20110119269 | Concept Discovery in Search Logs - Described is a search (e.g., web search) technology in which concepts are returned in response to a query in addition to (or instead of) search results in the form of traditional links. Each concept generally corresponds to a set of links to content that are more directed towards a possible user intention, or information need, with respect to that query. If a user selects a concept, that concept's links are exposed to facilitate selection of a document the user finds relevant. In this manner, much more than the top ten ranked links may be provided for a query, each set of other links arranged by the concepts. Also described is processing a query log or other data store to optionally find related queries and find the concepts, e.g., by clustering a relationship graph built from the query log to find dense subgraphs representative of the concepts. | 05-19-2011 |
20110145226 | PRODUCT SIMILARITY MEASURE - Queries submitted by users looking for products and/or services are monitored and collected over a time period. Webpages corresponding to products and/or services bought by the users in response to submitting the queries are also monitored and collected over the time period. Attributes are extracted from the webpages and the queries, and the attributes are correlated to identify attributes that are similar to one another. The attributes are correlated to identify attributes that are not substitutable in a query. The identified attributes may be used to rank products and/or services that are responsive to a query based on attributes associated with the products and/or services, or to recommend alternative queries based on a submitted query by substituting one or more attributes of the query with similar attributes. | 06-16-2011 |
20110202846 | ESTIMATING SHORTEST DISTANCES IN GRAPHS - Sketches are generated for each node in a graph. For undirected graphs, each sketch for a node may include an indicator of a node from a seed set of nodes and the shortest distance between the node and the indicated node. When a request is received for the shortest distance between two nodes of the graph, the sketches for each of the two nodes are retrieved, and nodes that are indicated in both of the sketches are determined. The distances between each of the two nodes and a determined node as indicated in the sketches is summed for each of the determined nodes, and the sum having the least distance is selected as the estimated shortest distance between the two nodes. | 08-18-2011 |
20110258033 | EFFECTIVE AD PLACEMENT - Search results and ads that satisfy a query may be formatted into a search results page where an initial placement of ads may result in an estimated click through rate for the presented ads. An adjustment factor may be determined based on parameters associated with the search results, such as the clicks on the search results themselves. The adjustment factor may be applied to a particular ad to determine if an estimated click through rate of the particular ad will change with respect to a position when there are a first number of mainline ads and a second number of side bar ads. Mainline exclusivity may be appropriate for an ad to increase the click through rate of the ad. The increase may be determined in accordance with the adjustment factor to decide whether to present the ad with mainline exclusivity. | 10-20-2011 |
20110264639 | LEARNING DIVERSE RANKINGS OVER DOCUMENT COLLECTIONS - A document selector selects and ranks documents that are relevant to a query. The document selector executes an instance of a multi-armed bandits algorithm to select a document for each slot of a results page according to one or more strategies. The documents are selected in an order defined by the results page and documents selected for previous slots are used to guide the selection of a document for a current slot. If a document in a slot is subsequently selected, the strategy used to select the document is rewarded with positive feedback. When the uncertainty in an estimate of the utility of a strategy is less than the variation between documents associated with the strategy, the strategy is subdivided into multiple strategies. The document selector is able to “zoom in” on effective strategies and provide more relevant search results. | 10-27-2011 |
20110307517 | RELAXATION FOR STRUCTURED QUERIES - A structured query may specify attribute values for attributes. An estimate of the number of items that will match the structured query if it is applied to a structured database is determined. If the estimated number of items is below a threshold, the structured query may be relaxed to form new candidate structured queries. The number of candidate queries may be determined based on a desired running time. Each of the candidate structured queries may be determined by changing one or more attribute values of the attributes of the structured query. Estimates of the number of items each of the candidate structured queries will match is determined, and the candidate structured query that has the highest matching estimation is used to query the database. The matching results may be output. | 12-15-2011 |
20110314012 | DETERMINING QUERY INTENT - A tree structure has a node associated with each category of a hierarchy of item categories. Child nodes of the tree are associated with sub-categories of the categories associated with parent nodes. Training data including received queries and indicators of a selected item category for each received query is combined with the tree structure by associating each query with the node corresponding to the selected category of the query. When a query is received, a classifier is applied to the nodes to generate a probability that the query is intended to match an item of the category associated with the node. The classifier is applied until the probability is below a threshold. One or more categories associated with the nodes that are closest to the intent of the received query are selected and indicators of items of those categories that match the received query are output. | 12-22-2011 |
20120078927 | LARGE GRAPH MEASUREMENT - As provided herein, a pairwise distance between nodes in a large graph can be determined efficiently. URL-sketches are generated for respective nodes in an index by extracting labels from respective nodes, which provide a reference to a link between the nodes, aggregating the labels into sets for respective nodes, and storing the sets of labels as URL-sketches. Neighborhood-sketches are generated for the respective nodes in the index using the URL-sketches, by determining a neighborhood for a node and generating a sketch using labels that are associated with the respective neighboring nodes. A distance between two nodes is determined by computing an approximate number of paths and an approximate path length between the two nodes, using the neighborhood sketches for the two nodes. | 03-29-2012 |
20120089588 | SEARCH RESULT DIVERSIFICATION - Methods, apparatus, and systems directed to receiving search queries, retrieving documents, computing the number of categories to present for a given query, computing the number of results to show in each category, computing an ordering of categories, and for all the result pages beyond the first page employing user interface elements that optionally allow the user to quickly zoom in on a specific category and get more results belonging to that category. | 04-12-2012 |
20120124070 | RECOMMENDING QUERIES ACCORDING TO MAPPING OF QUERY COMMUNITIES - A set of queries, such as a search log, is divided into commercial queries and non-commercial queries. A first set of query communities is determined from the non-commercial queries and a second set is determined from the commercial queries. The query communities are correlated based on the users who submitted the queries and instances where a query from the first set of query communities was followed by a query from the second set to generate a mapping between the first set of query communities and the second set. Later, a non-commercial query is received from a user, and the mapping is used to predict one or more commercial queries that the user is likely to submit in the future based on the non-commercial query. One or more of the commercial queries are presented to the user according to the mapping with search results responsive to the non-commercial query. | 05-17-2012 |
20120226661 | INDEXING FOR LIMITED SEARCH SERVER AVAILABILITY - Documents are replicated among servers comprising a search engine based on the value of each document by approximating its value as one of the top search results for one or more exemplary queries. Documents are allocated among servers comprising a search engine by calculating a relevance value for each document and then distributing the documents evenly to the servers. A subset of servers are selected from among a plurality of servers comprising a search engine using term-based, server-specific histograms reflecting the number of instances of the term in each document allocated to each server, and then selecting servers to service a query based on the documents on those servers. | 09-06-2012 |
20120226679 | FULFILLING QUERIES USING SPECIFIED AND UNSPECIFIED ATTRIBUTES - A query is received and processed to determine one or more specified and unspecified attributes in the query. The specified and unspecified attributes may correspond to attributes of one or more items. A graph is generated for the items and includes a node for each item and an edge between each unique pair of nodes. Each node is assigned a cost based on a distance between the specified attributes of the query and the attributes of the item associated with the node. Each edge is assigned a weight based on a distance between the unspecified attributes associated with the nodes of the node pair corresponding to the edge. A set of nodes from the graph is selected by minimizing a total cost of the nodes while maximizing a dispersion of the nodes based on the edge weights. | 09-06-2012 |
20120276992 | MOST VALUABLE PLAYER GAMER STATUS AND APPLICATIONS - A gaming environment is provided by an MVP gaming system provider in which “most valuable player” (MVP) gamers may compete. MVP gamers may be identified using achievements, gamer scores, game play during sanctioned gaming events, or other indicia of game skills. The MVP gamers may be sponsored by advertisers, and the MVP gamers' avatars may be branded based on sponsorship during gaming events. The sponsorships may be brokered by the MVP gaming system provider. Some gaming events may be sanctioned gaming events that are coordinated by the MVP gaming system provider and “televised” to allow viewers to watch the gaming events. Tutorials from MVP gamers may also be provided to gamers for viewing to assist in their game play. Further, gamers may be able to rent the avatars of MVP gamers for use during their game play. | 11-01-2012 |
20120277004 | BRINGING ACHIEVEMENTS TO AN OFFLINE WORLD - An achievement system tracks users' offline activities and awards achievements to users for participation in particular offline activities. The achievements that are awarded for particular activities and/or to particular users may be sponsored by merchants, who may compensate an achievement system provider for the opportunity to sponsor the achievements. To award users achievements, the users' offline activities are tracked. When a user participates in an offline activity for which achievements are available, the user is awarded an achievement. The achievement may be stored in an achievement profile for the user. In some embodiments, achievements earned by users may be converted into other benefits and alternative awards. | 11-01-2012 |
20120278154 | MARKETING INVENTORY BASED ON SPOILAGE - An inventory marketing system operates to identify and market inventory items that are likely to spoil. Initially, inventory items that have a particular likelihood of spoilage may be identified. Customers to target with offers for the inventory items may be identified based on the customers' current location or expected location near a spoilage time for the inventory items. Offers for the inventory items may be provided to the targeted customers, and purchases of the inventory items by customers may be facilitated. | 11-01-2012 |
20130226713 | BID DISCOUNTING USING EXTERNALITIES - Advertisers provide bids for the placement of one or more advertisements on web pages, along with a set of externalities. The externalities from an advertiser indicate discounts that can be applied to their bid based on information learned from a browse history of a user. The externalities may include externalities based on previous viewings of advertisements from the advertiser by the user, externalities based on previous viewings of advertisements from competitor advertisers viewed by the user, and externalities based on web pages or domains that have been visited by the user. When a request for an advertisement is received, the externalities are used to discount the provided bids based on the browse history of the user. An advertisement is then selected based on the discounted bids. | 08-29-2013 |
20140258303 | REFORMULATING QUERY TERMS IN STRUCTURED SEARCH - Search history data such as browse trails are collected over time. The browse trails, including associated queries and domains, are processed to identify free tokens of the queries that are also modifiers. Attribute value pairs of a structured data source that correspond to the modifiers are determined based on the search history data and a frequency of the attribute value pairs in the structured data source. When a subsequent query is received, modifiers in the query are identified and replaced with the determined combinations of attribute value pairs that correspond to the modifiers in a structured query that is generated from the received query. The structured query is used to identify items and/or services in the structured data source that are responsive to the received query. | 09-11-2014 |
20140278945 | ONLINE ALLOCATION WITH MINIMUM TARGETS - Various technologies described herein pertain to allocating requests based on revenue targets of providers for an online service. Information that indicates revenue budgets of providers for the online service and revenue targets of the providers for the online service can be received. The revenue budgets set maximums for total revenues from the providers and the revenue targets set minimums for the total revenues from the providers. Moreover, a request allocable to one of the providers having a total revenue generated thereby constrained by a corresponding revenue budget can be received. Further, bid values of the providers corresponding to the request can be received. An output of an algorithm can be computed based at least in part upon the bid values of the providers and the revenue targets of the providers. The request can be allocated to a selected provider from the providers based upon the output of the algorithm. | 09-18-2014 |