Patent application number | Description | Published |
20080270549 | Extracting link spam using random walks and spam seeds - Architecture for extracting link spam communities when given one or more members of the community. A link spam extraction algorithm is provided that takes as input link spam seeds and extracts other nearby link spam through a biased local random walk around the seed(s). The seed set is provided by a user (or an automated algorithm scrubbed by a human) which the algorithm uses to simulate a random walk on a web graph. The random walk can be biased to explore a local neighborhood around the seed set through use of decay probabilities. Truncation can be used to retain only the most frequently visited nodes. After termination, the nodes are sorted in decreasing order of final probabilities and presented to the user. Human judges need only make decisions at the spam community level, thereby limiting involvement, and human input can be scaled by several orders of magnitude. | 10-30-2008 |
20080307189 | Data partitioning via bucketing bloom filters - Multiple Bloom filters are generated to partition data between first and second disjoint data sets of elements. Each element in the first data set is assigned to a bucket of a first set of buckets, and each element in the second data set is assigned to a bucket of a second set of buckets. A Bloom filter is generated for each bucket of the first set of buckets. The Bloom filter generated for a bucket indicates that each element assigned to that bucket is part of the first data set, and that each element assigned to a corresponding bucket of the second set of buckets is not part of the first data set. Additionally, a Bloom filter corresponding to a subsequently received element can be determined and used to identify whether that subsequently received element is part of the first data set or the second data set. | 12-11-2008 |
20090248657 | WEB SEARCHING - Mislabeled URLs are identified and corrected based upon a click relevance ranking computed from user data comprising user click information. The click relevance ranking is formed by applying a set of relevance ordering rules to user log data aggregated by query and URL and by mapping the results of the relevance ordering rules into a linear ordering. For a given query, the aggregated user log data comprises a relative total number of impression, a relative total number of clicks received and a rank associated with the query/URL pair at the time of the total number of impressions and total number of clicks received. The click relevance ranking is used to identify and correct mislabeled query/URL pairs of other rankings according to a number of disclosed methods. | 10-01-2009 |
20090265317 | CLASSIFYING SEARCH QUERY TRAFFIC - A method for classifying search query traffic can involve receiving a plurality of labeled sample search query traffic and generating a feature set partitioned into human physical limit features and query stream behavioral features. A model can be generated using the plurality of labeled sample search query traffic and the feature set. Search query traffic can be received and the model can be utilized to classify the received search query traffic as generated by a human or automatically generated. | 10-22-2009 |
20110016116 | WEB SEARCHING - A human or hand-labeled ranking of URL results for a search query is compared against actual click data for the respective query/URL pairs (e.g., which URLs were actually clicked on by users when the URLs were presented to users when the search query was run in the real world). The human ranking or ordering of the URL results (e.g., pre-existing relevance ranking) for the query can then be adjusted, if necessary, based upon the real world click data (e.g., click relevance ranking). The modified pre-existing relevance ranking can be used in providing future search results. | 01-20-2011 |
20120323907 | WEB SEARCHING - A human or hand-labeled ranking of URL results for a search query is compared against actual click data for the respective query/URL pairs (e.g., which URLs were actually clicked on by users when the URLs were presented to users when the search query was run in the real world). The human ranking or ordering of the URL results (e.g., pre-existing relevance ranking) for the query can then be adjusted, if necessary, based upon the real world click data (e.g., click relevance ranking). The modified pre-existing relevance ranking can be used in providing future search results. | 12-20-2012 |
Patent application number | Description | Published |
20100306158 | SPEEDING UP ANALYSIS OF COMPRESSED WEB GRAPHS - Classes of web graph algorithms are extended to run directly on virtual node-type compressed web graphs where a reduction in runtime of the extended algorithms is realized which is approximately proportional to the compression ratio applied to the original (i.e., uncompressed) graph. In the virtual node compression technique, a succinct representation of a web graph is constructed by replacing dense subgraphs by sparse ones so that the resulting compressed graph has significantly fewer edges and a relatively small number of additional nodes. | 12-02-2010 |
20100306216 | SHORT PATHS IN WEB GRAPHS WITH SMALL QUERY TIME - Short paths are found with a small query time in scale-free directed graphs using a two-phase process by which data structures comprising shortest path trees are first pre-computed for a group of central vertices called “hubs” that have short paths to most other vertices in the graph. In a query time phase, a short path between two vertices of interest in the graph is found by looking up the path to the root in each of the shortest path trees. | 12-02-2010 |
20120047121 | CONTENT SIGNATURE NOTIFICATION - A client application installed on end user computers generates metadata from the content of web pages visited by end users and provides the metadata to a search engine. When an end user visits a web page, the end user's computer downloads and displays the web page to the end user. The client application may simultaneously access the web page content and generate this metadata in the form of a content signature of the web page from the web page content. The client application then provides the content signature to a search engine. The search engine may employ content signatures to identify new web pages to crawl and index. Additionally, the search engine may employ content signatures to identify changes to web pages and determine the crawl frequency of web pages. | 02-23-2012 |
20120323677 | CLICK PREDICTION USING BIN COUNTING - Methods, systems, and computer-storage media having computer-usable instructions embodied thereon for calculating event probabilities are provided. The event may be a click probability. Event probabilities are calculated using a system optimized for runtime model accuracy with an operable learning algorithm. Bin counting techniques are used to calculate event probabilities based on a count of event occurrences and non-event occurrences. Linear parameters, such and counts of clicks and non-clicks, may also be used in the system to allow for runtime adjustments. | 12-20-2012 |
Patent application number | Description | Published |
20080275847 | Scalable minimal perfect hashing - A minimal perfect hash function can be created for input data by dividing the input data into multiple collections, with each collection comprising fewer elements that the input data as a whole. Subsequently, minimal perfect hash functions can be created for each of the collections and the resulting hash values can be offset by a value equivalent to the number of input data in preceding collections. The minimal perfect hash function can, thereby, be derived in parallel and can consume substantially less storage space. To further save storage space, the internal state of each individual minimal perfect hash function can be further compressed using algorithms exploiting a skewed distribution of values in a lookup table comprising the internal state. | 11-06-2008 |
20080320498 | High Performance Script Behavior Detection Through Browser Shimming - The behavior of browser applications, such as web browsers, can be controlled in part by script-based instructions present within documents read by those browsers. To analyze such scripts in an efficient manner, a script analyzer can identify the scripts in the document, divide them into script modules, and order the modules to represent an interpretational flow. The script can be interpreted and executed on a line-by-line basis and its behavior analyzed. Prior to interpretation, the scripts can be reviewed for delay conditionals, and such statements can be modified for more efficient interpretation. Additionally, if, during interpretation, the script generates new script, or modifies existing script, such new scripts can be themselves interpreted. External function calls made by the script can be intercepted and responded to in a generic fashion, limiting the need to create a document object model, based on the document's data, solely for script analysis purposes. | 12-25-2008 |
20090222408 | DATA STORAGE STRUCTURE - Efficient data storage and retrieval (e.g., in terms of time and space requirements) is facilitated by implementing an indexing structure comprising an indexing array. That is, a functional relationship between elements of a source set and elements of a query result set can be stored in the indexing structure. This allows, for example, a query regarding whether an element is a member of a set (e.g., whether a particular website or Uniform Resource Locator (URL)) has been visited before) as well as a relationship between the member set and the query (e.g., the number of hyperlinks in the website the last time it was visited) to be resolved efficiently. | 09-03-2009 |
20120207391 | INTERACTIVE PAPER SYSTEM - A printer, scanner device and methods for using same are described herein. A printer device may include a dedicated input that, when actuated, generates and sends a request to a computer for known data or a predetermined print job, e.g., schedule information from a personal information management (PIM) application. A scanner device may include another dedicated input that, when actuated, automatically scans a document fed to the device by the user and sends the scanned image to IM (or other) software on a computer, bypassing the need to manipulate the scanned image using scanner software. The device may be used with printed metapaper, which includes a barcode or other indicia identifying the metapaper and corresponds to a stored template image of the metapaper. When the metapaper is rescanned, the scan can be compared to the stored template information to identify changes and synchronize the changes with the IM software. | 08-16-2012 |
Patent application number | Description | Published |
20090070354 | MINIMAL PERFECT HASH FUNCTIONS USING DOUBLE HASHING - Technologies are described herein for constructing a minimal perfect hash function. According to embodiments, a hash table is constructed by double hashing each of the strings in a set of strings. A computed double hash value is utilized to identify an empty location in the hash table for each string. A signature for each string is stored in the empty location of the hash table identified for the string. In order to obtain a minimal perfect hash value for an input string, the input string is iteratively double hashed until a location is identified in the hash table that contains a signature corresponding to the input string. The minimal perfect hash value is an integer value identifying the location in the hash table that contains the signature corresponding to the input string. | 03-12-2009 |
20090150375 | Detecting zero-result search queries - A processing device and method may be provided for determining whether a zero search result may be produced with respect to a search for a document including all words of a word group. An index, with respect to words included in a group of documents, may be searched for documents including all words of the word group when a zero search result is determined not likely to occur with respect to the search for the document including all of the words of the word group. A method for creating multiple types of data structures corresponding to word grouping collections may further be provided to store occurrence information indicating a likelihood of a presence of a document including all words of a word group. | 06-11-2009 |
20090192960 | EFFICIENT WEIGHTED CONSISTENT SAMPLING - A method and a processing device may be provided for performing efficient weighted consistent sampling. A group of sets having multiple elements with associated weights may be provided. A single hash function may be applied to each of the elements of the group of sets to produce consistent uniformly distributed non-negative random numbers. Transformed values corresponding to each of the elements may be produced by determining a w | 07-30-2009 |
20090193044 | WEB GRAPH COMPRESSION THROUGH SCALABLE PATTERN MINING - A method and a processing device are provided for compressing a web graph including multiple nodes and links between the multiple nodes. Nodes of the web graph may be clustered into groups including no more than a predetermined number of nodes. A list of links of the clustered nodes may be created and sorted based on a frequency of occurrence of each of the links. A prefix tree may be created based on the sorted list of links. The prefix tree may be walked to find candidate virtual nodes. The candidate virtual nodes may be analyzed according to a selection criteria and a virtual node may be selected. The prefix tree may be adjusted to account for the selection of the virtual node and the virtual node may be added to the web graph. | 07-30-2009 |
20100332476 | WEB GRAPH COMPRESSION THROUGH SCALABLE PATTERN MINING - A method and a processing device are provided for compressing a web graph including multiple nodes and links between the multiple nodes. Nodes of the web graph may be clustered into groups including no more than a predetermined number of nodes. A list of links of the clustered nodes may be created and sorted based on a frequency of occurrence of each of the links. A prefix tree may be created based on the sorted list of links. The prefix tree may be walked to find candidate virtual nodes. The candidate virtual nodes may be analyzed according to a selection criteria and a virtual node may be selected. The prefix tree may be adjusted to account for the selection of the virtual node and the virtual node may be added to the web graph. | 12-30-2010 |