Patent application number | Description | Published |
20090089244 | METHOD OF DETECTING SPAM HOSTS BASED ON CLUSTERING THE HOST GRAPH - Systems and methods for identifying spam hosts are disclosed in which hosts are known to the system and initially classified as spam or non-spam. Then the hosts are partitioned into clusters based on how each host is linked to other hosts. Each cluster is then analyzed and, depending on the number of spam and non-spam hosts it contains, the cluster may be classified as a spam cluster or a non-spam cluster. The hosts within the cluster may then be reclassified based on the cluster's classification. The results may then be used in many different ways including to filter search results based on host classifications so that spam hosts are not displayed or displayed last in a results set. | 04-02-2009 |
20090089285 | METHOD OF DETECTING SPAM HOSTS BASED ON PROPAGATING PREDICTION LABELS - Systems and methods for identifying spam hosts are disclosed in which hosts are known to the system and initially classified as spam or non-spam by a baseline classifier. The accuracy of the initial host classifications are then improved by propagating them using a random walk algorithm. The random walk used may be modified in order to obtain a weighted or skewed characterization of the host. The hosts may then be reclassified based on the characterization obtained from the random walk to obtain a final spam/non-spam classification. The final classification may then be used in many different ways including to filter search results based on host classifications so that spam hosts are not displayed or displayed last in a results set. | 04-02-2009 |
20090089373 | SYSTEM AND METHOD FOR IDENTIFYING SPAM HOSTS USING STACKED GRAPHICAL LEARNING - Systems and methods for identifying spam hosts are disclosed in which hosts known to the system and initially classified as spam or non-spam by a baseline classifier. Then for each node u in the host graph a new feature is computed. This feature is an aggregate function of the initial classifications produced by the baseline classifier for the neighbors of the node u. The set of neighbors can be defined in many different ways: in-link neighbors, out-link neighbors, bi-directional neighbors, k-hops neighbors, etc. The new feature computed above then is added to the existing set of features, and the baseline classifier is trained again, producing new predictions for each node. The results may then be used in many different ways including to filter search results based on host classifications so that spam hosts are not displayed or displayed last in a results set. | 04-02-2009 |
20090094416 | SYSTEM AND METHOD FOR CACHING POSTING LISTS - A method of caching posting lists to a search engine cache calculates the ratios between the frequencies of the query terms in a past query log and the sizes of the posting lists for each term, and uses these ratios to determine which posting lists should be cached by sorting the ratios in decreasing order and storing to the cache those posting lists corresponding to the highest ratio values. Further, a method of finding an optimal allocation between two parts of a search engine cache evaluates a past query stream based on a relationship between various properties of the stream and the total size of the cache, and uses this information to determine the respective sizes of both parts of the cache. | 04-09-2009 |
20100036784 | SYSTEMS AND METHODS FOR FINDING HIGH QUALITY CONTENT IN SOCIAL MEDIA - The present invention is directed towards systems and methods for identifying high quality content in a social media environment. The method according to one embodiment of the present invention comprises retrieving a content item and retrieving a plurality of quality features associated with said content item wherein said quality features comprise intrinsic, usage and relationship features. The method then performs an analysis of said content item against said quality features and generates a quality score based on said analysis. | 02-11-2010 |
20100082694 | QUERY LOG MINING FOR DETECTING SPAM-ATTRACTING QUERIES - Disclosed are methods and apparatus for detecting spam-attracting queries. In one embodiment, one or more graphs are generated using data obtained from a query log, where the one or more graphs include at least one of an anticlick graph or a view graph. Values of one or more syntactic features of the one or more graphs are ascertained. Values of one or more semantic features of the one or more graphs are determined by propagating categories from a web directory among nodes in each of the one or more graphs. Spam-attracting queries are then detected based upon the values of the syntactic features and the semantic features. | 04-01-2010 |
20100082752 | QUERY LOG MINING FOR DETECTING SPAM HOSTS - Disclosed are methods and apparatus for detecting spam hosts. In one embodiment, one or more graphs are generated using data obtained from a query log, where the one or more graphs include at least one of an anticlick graph or a view graph. Values of one or more syntactic features of the one or more graphs are ascertained. Values of one or more semantic features of the one or more graphs are determined by propagating categories from a web directory among nodes in each of the one or more graphs. Spam hosts are then detected based upon the values of the syntactic features and the semantic features. | 04-01-2010 |
20100094853 | SYSTEM AND METHODOLOGY FOR A MULTI-SITE SEARCH ENGINE - Techniques for query processing in a multi-site search engine are described. During an indexing phase, each site of a multi-site search engine indexes a set of assigned web resources and each site calculates, for each term in the set of assigned web resources, a site-specific upper bound ranking score on the contribution of the term to the search engine ranking function for a query containing the term. During a propagation phase, all sites exchange their site-specific upper bound ranking scores with each other. In response to a site receiving a query, the site determines the set of locally matching resources and compares the ranking score of a locally matching resource with the site-specific upper bound ranking scores for the terms of the query that were received during the propagation phase and determines whether to communicate the query to other sites. By exchanging appropriately defined site-specific upper bound ranking scores, the site initially receiving the query can determine whether the locally matching resources would be identical to the resources obtained from a single-site search system without having to communicate the query to each of the other sites. | 04-15-2010 |
20100114928 | DIVERSE QUERY RECOMMENDATIONS USING WEIGHTED SET COVER METHODOLOGY - A computer-implemented method is such that suggested search queries are provided based on an input search query. The search query is received (such as from a user providing the search query to a search engine service) and a first list of documents is determined that correspond to processing the query by a search engine. A list of result queries is determined, wherein executing the list of result queries would correspond to a second list of documents, that result from presenting the result queries to the search engine, and the documents of the second list of documents cover the documents of the first list of documents. The list of result queries is returned as the suggested queries. Determining a list of result queries may include, for example, determining a list of potential queries, wherein each potential query, when executed by the search engine, results in at least one document in the first list of documents; and processing the potential queries to determine which of the potential queries to include in the list of result queries. | 05-06-2010 |
20100114929 | DIVERSE QUERY RECOMMENDATIONS USING CLUSTERING-BASED METHODOLOGY - A computer-implemented method provides suggested search queries based on an input search query. The input search query is received. A first list of documents is determined that correspond to processing the query by a search engine determining the list of result queries, including processing the first list of documents to determine clusters of documents and determining potential queries that correspond to the determined clusters by comparing results of the potential queries with documents in the determined clusters. A list of result queries is determined, wherein executing the list of result queries would correspond to a second list of documents, that result from presenting the result queries to the search engine; and the documents of the second list of documents cover the documents of the first list of documents. The list of result queries based on the potential queries determined to correspond to the determined clusters. | 05-06-2010 |
20100125572 | Method And System For Generating A Hyperlink-Click Graph - A method of ascribing scores to web documents and search queries generates a hyperlink-click graph by taking the union of the hyperlink and click graphs, takes a random walk on the hyperlink-click graph, and associates the transition probabilities resulting from the random walk with scores for each of the documents and search queries. | 05-20-2010 |
20100161145 | SEARCH ENGINE DESIGN AND COMPUTATIONAL COST ANALYSIS - A computer implemented system for search engine facility architecting and design. The system estimates the costs of power and networking based on system parameters, such as average CPU utilization, connection time, and bytes transferred over the network. Regional distribution of facilities may be evaluated to take into account the various parameters and optimize the cost and speed of the systems being designed. The parameters used in analyzing and formulating an architecture are independent of a particular indexing or query processing technique. | 06-24-2010 |
20100161643 | SEGMENTATION OF INTERLEAVED QUERY MISSIONS INTO QUERY CHAINS - The subject matter disclosed herein relates to segmentation of interleaved query missions into a plurality of query chains. | 06-24-2010 |
20110029475 | TAXONOMY-DRIVEN LUMPING FOR SEQUENCE MINING - Methods and apparatus are described for modeling sequences of events with Markov models whose states correspond to nodes in a provided taxonomy. Each state represents the events in the subtree under the corresponding node. By lumping observed events into states that correspond to internal nodes in the taxonomy, more compact models are achieved that are easier to understand and visualize, at the expense of a decrease in the data likelihood. The decision for selecting the best model is taken on the basis of two competing goals: maximizing the data likelihood, while minimizing the model complexity (i.e., the number of states). | 02-03-2011 |
20110078189 | NETWORK GRAPH EVOLUTION RULE GENERATION - A network's evolution is characterized by graph evolution rules. A graph that represents an evolutionary network is mined to identify evolutional patterns of the network, and graph evolution rules are generated using identified evolutional patterns. The generated graph evolution rules represent the evolutional patterns of the network. | 03-31-2011 |
20120246196 | NETWORK GRAPH EVOLUTION RULE GENERATION - A network's evolution is characterized by graph evolution rules. A graph that represents an evolutionary network is mined to identify evolutional patterns of the network, and graph evolution rules are generated using identified evolutional patterns. The generated graph evolution rules represent the evolutional patterns of the network. | 09-27-2012 |
20130151456 | SYSTEM AND METHOD OF MATCHING CONTENT ITEMS AND CONSUMERS - A matching between content items and consumers is discloses. More particularly, items and consumers are matched using a matching approach that uses capacity constraints associated with each consumer, capacity constraints associated with each item, and relationship weights, each relationship weight representing a similarity between a consumer and an item. Edges representing the relationships between consumers and items can be selected using an iterative selection that includes a matching approach that permits capacity constraints. Alternatively, edges can be selected using an iterative approach that allows a solution to be identified prior to completion of the selection processing. | 06-13-2013 |
20130151514 | EXTRACTING TIPS - Embodiments disclosed herein may relate to extracting tips from online sources and/or selecting tips for display to a user on a computing platform. | 06-13-2013 |
20140006503 | SOCIAL NETWORKING FEED DELIVERY SYSTEM AND METHOD | 01-02-2014 |
20140115155 | NETWORK GRAPH EVOLUTION RULE GENERATION - A network's evolution is characterized by graph evolution rules. A graph, formed by merging multiple graphs representing the multiple snapshots of the network, that represents an evolutionary network is mined to identify evolutional patterns of the network. A pattern is selected from the identified patterns. Graph evolution rules are generated using identified evolutional patterns. The generated graph evolution rules represent the evolutional patterns of the network, the rules indicating that any occurrence of a child pattern of the selected pattern implies a corresponding occurrence of the selected pattern. | 04-24-2014 |