Patent application number | Description | Published |
20080319995 | RELIABILITY OF DUPLICATE DOCUMENT DETECTION ALGORITHMS - In a single-signature duplicate document system, a secondary set of attributes is used in addition to a primary set of attributes so as to improve the precision of the system. When the projection of a document onto the primary set of attributes is below a threshold, then a secondary set of attributes is used to supplement the primary lexicon so that the projection is above the threshold. | 12-25-2008 |
20100299290 | Web Query Classification - A query phrase may be automatically classified to one or more topics of interest (e.g., categories) to assist in routing the query phrase to one or more appropriate backend databases. A selectional preference query classification technique may be used to classify the query phrase based on a comparison between the query phrase and patterns of query phrases. Additionally, or alternatively, a combination of query classification techniques may be used to classify the query phrase. Topical classification of a query phrase also may be used to assist a search system in delivering auxiliary information to a user who entered the query phrase. Advertisements, for instance, may be tailored based on classification rather than query keywords. | 11-25-2010 |
20110276646 | RELIABILITY OF DUPLICATE DOCUMENT DETECTION ALGORITHMS - In a single-signature duplicate document system, a secondary set of attributes is used in addition to a primary set of attributes so as to improve the precision of the system. When the projection of a document onto the primary set of attributes is below a threshold, then a secondary set of attributes is used to supplement the primary lexicon so that the projection is above the threshold. | 11-10-2011 |
20120209870 | WEB QUERY CLASSIFICATION - A query phrase may be automatically classified to one or more topics of interest (e.g., categories) to assist in routing the query phrase to one or more appropriate backend databases. A selectional preference query classification technique may be used to classify the query phrase based on a comparison between the query phrase and patterns of query phrases. Additionally, or alternatively, a combination of query classification techniques may be used to classify the query phrase. Topical classification of a query phrase also may be used to assist a search system in delivering auxiliary information to a user who entered the query phrase. Advertisements, for instance, may be tailored based on classification rather than query keywords. | 08-16-2012 |
20130007026 | Reliability of Duplicate Document Detection Algorithms - In a single-signature duplicate document system, a secondary set of attributes is used in addition to a primary set of attributes so as to improve the precision of the system. When the projection of a document onto the primary set of attributes is below a threshold, then a secondary set of attributes is used to supplement the primary lexicon so that the projection is above the threshold. | 01-03-2013 |
20130007152 | ONLINE ADAPTIVE FILTERING OF MESSAGES - In general, a two or more stage spam filtering system is used to filter spam in an e-mail system. One stage includes a global e-mail classifier that classifies e-mail as it enters the e-mail system. The parameters of the global e-mail classifier generally may be determined by the policies of e-mail system owner and generally are set to only classify as spam those e-mails that are likely to be considered spam by a significant number of users of the e-mail system. Another stage includes personal e-mail classifiers at the individual mailboxes of the e-mail system users. The parameters of the personal e-mail classifiers generally are set by the users through retraining, such that the personal e-mail classifiers are refined to track the subjective perceptions of their respective user as to what e-mails are spam e-mails. | 01-03-2013 |
20130173518 | Simplifying Lexicon Creation in Hybrid Duplicate Detection and Inductive Classifier System - A classification system includes a signature-based duplicate detector and an inductive classifier that share attribute information. To perform the duplicate detection and the classification, the duplicate detector and inductive classifier are first initialized by generating a lexicon of attributes for the duplicate detector and a classification model for the classifier. To develop a classification model, a training set of documents of known class are used by the classifier to determine the attributes of the documents that are most useful in classifying an unknown document. The model is developed from these attributes. Attribute information containing the attributes determined by the classifier is then passed to the duplicate detector and the duplicate detector uses the attribute information to generate the lexicon of attributes. | 07-04-2013 |
20130173562 | Simplifying Lexicon Creation in Hybrid Duplicate Detection and Inductive Classifier System - A classification system includes a signature-based duplicate detector and an inductive classifier that share attribute information. To perform the duplicate detection and the classification, the duplicate detector and inductive classifier are first initialized by generating a lexicon of attributes for the duplicate detector and a classification model for the classifier. To develop a classification model, a training set of documents of known class are used by the classifier to determine the attributes of the documents that are most useful in classifying an unknown document. The model is developed from these attributes. Attribute information containing the attributes determined by the classifier is then passed to the duplicate detector and the duplicate detector uses the attribute information to generate the lexicon of attributes. | 07-04-2013 |
20130173563 | RELIABILITY OF DUPLICATE DOCUMENT DETECTION ALGORITHMS - In a single-signature duplicate document system, a secondary set of attributes is used in addition to a primary set of attributes so as to improve the precision of the system. When the projection of a document onto the primary set of attributes is below a threshold, then a secondary set of attributes is used to supplement the primary lexicon so that the projection is above the threshold. | 07-04-2013 |
20140344387 | ONLINE ADAPTIVE FILTERING OF MESSAGES - In general, a two or more stage spam filtering system is used to filter spam in an e-mail system. One stage includes a global e-mail classifier that classifies e-mail as it enters the e-mail system. The parameters of the global e-mail classifier generally may be a determined by the policies of e-mail system owner and generally are set to only classify as spam those e-mails that are likely to be considered spam by a significant number of users of the e-mail system. Another stage includes personal e-mail classifiers at the individual mailboxes of the e-mail system users. The parameters of the personal e-mail classifiers generally are set by the users through retraining, such that the personal e-mail classifiers are refined to track the subjective perceptions of their respective user as to what e-mails are spam e-mails. Retraining data for the personal e-mail classifiers may be aggregated and a subset of the aggregate may be chosen for use in retraining the global e-mail classifier. | 11-20-2014 |