Patent application number | Description | Published |
20080281806 | SEARCHING A DATABASE OF LISTINGS - A database having listings rather than long documents is searched using a term frequency-inverse document frequency (Tf/Idf) algorithm. | 11-13-2008 |
20080281827 | USING STRUCTURED DATABASE FOR WEBPAGE INFORMATION EXTRACTION - A structured database is used for webpage information extraction, and in particular, to obtain training data from the webpage for training a statistical model. The structured database has a plurality of entries, wherein each entry comprises a plurality of fields. One of the fields comprises a URL (uniform resource locater), while another field comprises information at least similar to other information to be located in a webpage associated with the URL. For at least some of the entries in the structured database, a web page associated with the URL is retrieved. The webpage is analyzed and if information is found in the webpage similar to the information in the structured database, the webpage is identified as being suitable to be considered as a training sample. | 11-13-2008 |
20090019027 | Disambiguating residential listing search results - A directory assistance system includes a directory database and a search engine. The search engine is configured to search the directory database for a first set of residential listings based on at least one first search term. A second search term is received that is related to a cohabitant of the listing to be found. At least one search result is selected that satisfies the second search term. | 01-15-2009 |
20090037175 | Confidence measure generation for speech related searching - A voice search system has a speech recognizer, a search component, and a dialog manager. A confidence measure generator receives speech recognition features from the speech recognizer, search features from the search component, and dialog features from the dialog manager, and calculates an overall confidence measure for voice search results based upon the features received. The invention can be extended to include the generation of additional features, based on those received from the individual components of the voice search system. | 02-05-2009 |
20090150308 | MAXIMUM ENTROPY MODEL PARAMETERIZATION - Described is a technology by which a maximum entropy model used for classification is trained with a significantly lesser amount of training data than is normally used in training other maximum entropy models, yet provides similar accuracy to the others. The maximum entropy model is initially parameterized with parameter values determined from weights obtained by training a vector space model or an n-gram model. The weights may be scaled into the initial parameter values by determining a scaling factor. Gaussian mean values may also be determined, and used for regularization in training the maximum entropy model. Scaling may also be applied to the Gaussian mean values. After initial parameterization, training comprises using training data to iteratively adjust the initial parameters into adjusted parameters until convergence is determined. | 06-11-2009 |
20090276380 | COMPUTER-AIDED NATURAL LANGUAGE ANNOTATION - The present invention uses a natural language understanding system that is currently being trained to assist in annotating training data for training that natural language understanding system. Unannotated training data is provided to the system and the system proposes annotations to the training data. The user is offered an opportunity to confirm or correct the proposed annotations, and the system is trained with the corrected or verified annotations. | 11-05-2009 |
20090327260 | CONSTRUCTING A CLASSIFIER FOR CLASSIFYING QUERIES - To construct a classifier, a data structure correlating queries to items identified by the queries is received, where the data structure contains initial labeled queries that have been labeled with respect to predetermined classes, and unlabeled queries that have not been labeled with respect to the predetermined classes. The data structure is used to label at least some of the unlabeled queries with respect to the predetermined classes. Queries in the data structure that have been labeled with respect to the predetermined classes are used as training data to train the classifier. | 12-31-2009 |
20100145694 | REPLYING TO TEXT MESSAGES VIA AUTOMATED VOICE SEARCH TECHNIQUES - An automated “Voice Search Message Service” provides a voice-based user interface for generating text messages from an arbitrary speech input. Specifically, the Voice Search Message Service provides a voice-search information retrieval process that evaluates user speech inputs to select one or more probabilistic matches from a database of pre-defined or user-defined text messages. These probabilistic matches are also optionally sorted in terms of relevancy. A single text message from the probabilistic matches is then selected and automatically transmitted to one or more intended recipients. Optionally, one or more of the probabilistic matches are presented to the user for confirmation or selection prior to transmission. Correction or recovery of speech recognition errors avoided since the probabilistic matches are intended to paraphrase the user speech input rather than exactly reproduce that speech, though exact matches are possible. Consequently, potential distractions to the user are significantly reduced relative to conventional speech recognition techniques. | 06-10-2010 |
20100169317 | Product or Service Review Summarization Using Attributes - Described is a technology in which product or service reviews are automatically processed to form a summary for each single product or service. Snippets from the reviews are extracted and classified into sentiment classes (e.g., as positive or negative) based on their wording. Attributes are assigned to the reviews, e.g., based on term frequency concepts, as nouns, which may be paired with adjectives and/or verbs. The summary of the reviews belonging to a single product or service is generated based on the automatically computed attributes and the classification of review snippets into attribute and sentiment classes. For example, the summary may indicate how many reviews were positive (the sentiment class), along with text corresponding to the most similar snippet based on its similarity to the attributes (the attribute class). | 07-01-2010 |
20100268725 | ACQUISITION OF SEMANTIC CLASS LEXICONS FOR QUERY TAGGING - A user's search experience may be enhanced by providing additional content based upon an understanding of the user's intent. Query tagging, the assigning of semantic labels to terms within a query, is one technique that may be utilized to determine the context of a user's search query. Accordingly, as provided herein, a query tagging model may be updated using one or more stratified lexicons. A list data structure (e.g., lists of phrases obtained from web pages) and seed distribution data (e.g., pre-labeled probability data) may be used by a graph learning technique to obtain an expanded set of phrases and their respective probabilities of corresponding with particular lexicons (e.g., semantic class lexicons). The expanded set of phrases may be used to group phrases into stratified lexicons. The stratified lexicons may be used as features for updating and/or executing the query tagging model. | 10-21-2010 |
20110179049 | Automatic Aggregation Across Data Stores and Content Types - Project-related data may be aggregated from various data sources, given context, and may be stored in a data repository or organizational knowledge base that may be available to and accessed by others. Documents, emails, contact information, calendar data, social networking data, and any other content that is related to a project may be brought together within a single user interface, irrespective of its data type. A user may organize and understand content, discover relevant information, and act on it without regard to where the information resides or how it was created. | 07-21-2011 |
20110179061 | Extraction and Publication of Reusable Organizational Knowledge - An analysis module, when triggered by a synchronization framework when a new data item is added to a project data store, runs a series of analysis feature extractors on the new content. An analysis may be conducted, and features of interest may be extracted from the data item. The analysis utilizes natural language processing, as well as other technologies, to provide an automatic or semi-automatic extraction of information. The extracted features of interest are saved as metadata within the project data store, and are associated with the data item from which it was extracted. The analysis module may be utilized to discover additional information that may be gleaned from content that is already in the project data store. | 07-21-2011 |
20110202533 | Dynamic Search Interaction - This patent application pertains to dynamic search interaction. One example includes an organizational component configured to obtain a search query from a user. The organizational component can also be configured to obtain related search queries. The organizational component can further be configured to organize the related search queries by topic and to estimate a relative likelihood that an intent of the user matches an individual topic. This example also includes an image generation component configured to cause the organized related search queries to be presented on a graphical user interface (GUI) in a manner that reflects the relative likelihood. | 08-18-2011 |
20110314003 | TEMPLATE CONCATENATION FOR CAPTURING MULTIPLE CONCEPTS IN A VOICE QUERY - Architecture that provides the capability to identify which parts (terms and phrases) of a voice query have been covered by predefined phrase templates, and then to concatenate matching phrase templates into a new paraphrased query. A match-drop-continue algorithm is disclosed that progressively masks out the portions (phrases, terms) of the query matched to the phrase templates. Ultimately, the matched phrase templates are accumulated and organized together dynamically into a rephrased version of the original voice query. A user interface is provided that allows the user to confirm/summarize the multiple concepts in a progressive manner. | 12-22-2011 |
20120158703 | SEARCH LEXICON EXPANSION - One or more techniques and/or systems are disclosed for creating an expanded or improved lexicon for use in search-based semantic tagging. A set of first documents can be identified using a set of first lexicon elements as queries, and one or more first document patterns can be extracted from the set of first documents. The document patterns can be used to find one or more second documents in a query log that comprise the document patterns, which are associated with query terms used to return the second documents. The query terms for the second documents can be extracted and used to expand the lexicon. Elements within the lexicon may be weighted based upon relevance to different query domains, for example. | 06-21-2012 |
20120185252 | CONFIDENCE MEASURE GENERATION FOR SPEECH RELATED SEARCHING - A method of generating a confidence measure generator is provided for use in a voice search system, the voice search system including voice search components comprising a speech recognition system, a dialog manager and a search system. The method includes selecting voice search features, from a plurality of the voice search components, to be considered by the confidence measure generator in generating a voice search confidence measure. The method includes training a model, using a computer processor, to generate the voice search confidence measure based on selected voice search features. | 07-19-2012 |
20120323948 | DIALOG-ENHANCED CONTEXTUAL SEARCH QUERY ANALYSIS - Embodiments of the present invention relate to systems, methods, and computer-storage media for a method of contextually analyzing terms within a search query. In one embodiment, a received search query is classified into a domain category. Additionally, information is assigned to a schema associated with the domain by analyzing the search query. Further, at least one search result that helps a user complete a task within the domain is provided based on the information in the schema. | 12-20-2012 |
20130006973 | Summarization of Conversation Threads - Automatically summarizing electronic communication conversation threads is provided. Electronic mails, text messages, tasks, questions and answers, meeting requests, calendar items, and the like are processed via a combination of natural language processing and heuristics. For a given conversation thread, for example, an electronic mail thread associated with a given task, a text summary of the thread is generated to highlight the most important text in the thread. The text summary is presented to a user in a visual user interface to allow the user to quickly understand the significance or relevance of the thread. | 01-03-2013 |
20130007648 | Automatic Task Extraction and Calendar Entry - Automatically detected and identified tasks and calendar items from electronic communications may be populated into one or more tasks applications and calendaring applications. Text content retrieved from one or more electronic communications may be extracted and parsed for determining whether keywords or terms contained in the parsed text may lead to a classification of the text content or part of the text content as a task. Identified tasks may be automatically populated into a tasks application. Similarly, text content from such sources may be parsed for keywords and terms that may be identified as indicating calendar items, for example, meeting requests. Identified calendar items may be automatically populated into a calendar application as a calendar entry. | 01-03-2013 |
20140101119 | META CLASSIFIER FOR QUERY INTENT CLASSIFICATION - Systems and methods are provided for classifying a search query. A first group of query classifiers can be used to evaluate a query relative to various subject matter domains. The evaluation results from the first group of domain classifiers can then be used by a second group of meta-classifiers. The meta-classifiers are based on non-linear classification models. The meta-classifiers are associated with meta-classifier categories that may correspond to a domain or that may correspond to a plurality of domains. The assigned meta-classifier category for a query can be used in any convenient manner, such as by triggering additional uses of the search query to match images or other alternative types of documents, or such as by allowing a subject matter domain to be assigned to the query. | 04-10-2014 |
20140236575 | EXPLOITING THE SEMANTIC WEB FOR UNSUPERVISED NATURAL LANGUAGE SEMANTIC PARSING - Structured web pages are accessed and parsed to obtain implicit annotation for natural language understanding tasks. Search queries that hit these structured web pages are automatically mined for information that is used to semantically annotate the queries. The automatically annotated queries may be used for automatically building statistical unsupervised slot filling models without using a semantic annotation guideline. For example, tags that are located on a structured web page that are associated with the search query may be used to annotate the query. The mined search queries may be filtered to create a set of queries that is in a form of a natural language query and/or remove queries that are difficult to parse. A natural language model may be trained using the resulting mined queries. Some queries may be set aside for testing and the model may be adapted using in-domain sentences that are not annotated. The models may be tested using these implicitly annotated natural-language-like queries in an unsupervised fashion. | 08-21-2014 |
20140279995 | QUERY SIMPLIFICATION - Methods, systems, and computer-readable media for query simplification are provided. A search engine executed by a server receives a query. In response, the search engine determines whether the query is a long or hard query. For long or hard queries, the search engine drops one or more terms based on search engine logs. The search engine may utilize statistical models like machine translation, condition random fields, or max entropy, to identify the terms that should be dropped. The search engine obtains search results for the simplified query and transmits the results to a user that provided the query. | 09-18-2014 |