Patent application number | Description | Published |
20080262995 | Multimodal rating system - A method of communicating information about a product evaluation between a system having a data store and a wireless client device is discussed. The method includes receiving a signal representative of an audible indication from the client device via a wireless communication link identifying the product about which evaluation information is to be communicated. The method further includes comparing an indication of the signal to data in the data store in response to match the indication with a portion of the data and communicating evaluation information between the wireless client device and the system. | 10-23-2008 |
20080281806 | SEARCHING A DATABASE OF LISTINGS - A database having listings rather than long documents is searched using a term frequency-inverse document frequency (Tf/Idf) algorithm. | 11-13-2008 |
20080298562 | Voice aware demographic personalization - A voice interaction system is configured to analyze an utterance and identify inherent attributes that are indicative of a demographic characteristic of the system user that spoke the utterance. The system then selects and presents a personalized response to the user, the response being selected based at least in part on the identified demographic characteristic. In one embodiment, the demographic characteristic is one or more of the caller's age, gender, ethnicity, education level, emotional state, health status and geographic group. In another embodiment, the selection of the response is further based on consideration of corroborative caller data. | 12-04-2008 |
20090019027 | Disambiguating residential listing search results - A directory assistance system includes a directory database and a search engine. The search engine is configured to search the directory database for a first set of residential listings based on at least one first search term. A second search term is received that is related to a cohabitant of the listing to be found. At least one search result is selected that satisfies the second search term. | 01-15-2009 |
20090037174 | Understanding spoken location information based on intersections - In one embodiment, the present system recognizes a user's speech input using an automatically generated probabilistic context free grammar for street names that maps all pronunciation variations of a street name to a single canonical representation during recognition. A tokenizer expands the representation using position-dependent phonetic tokens and an intersection classifier classifies an intersection, despite the presence of recognition errors and incomplete street names. | 02-05-2009 |
20090037175 | Confidence measure generation for speech related searching - A voice search system has a speech recognizer, a search component, and a dialog manager. A confidence measure generator receives speech recognition features from the speech recognizer, search features from the search component, and dialog features from the dialog manager, and calculates an overall confidence measure for voice search results based upon the features received. The invention can be extended to include the generation of additional features, based on those received from the individual components of the voice search system. | 02-05-2009 |
20090043497 | Conveying Locations In Spoken Dialog Systems - The presentation of location information to a user that is distracted by traveling can result in the user quickly forgetting, or never even comprehending, key parts of the location information, such as the street number. Identification can be made of intersections and points of interest near the user's destination, which can then be provided instead of, or in addition to, the address, thereby increasing user comprehension and retention, especially when distracted. Map data can be parsed into addresses, intersections and points of interest databases. These databases can be accessed to identify proximate intersections and points of interest, which can then be filtered and subsequently ranked to identify one intersection, one point of interest, or both, that can be presented to the user to aid the user in comprehending and retaining the location information even when distracted. | 02-12-2009 |
20090070112 | Automatic reading tutoring - A method of providing automatic reading tutoring is disclosed. The method includes retrieving a textual indication of a story from a data store and creating a language model including constructing a target context free grammar indicative of a first portion of the story. A first acoustic input is received and a speech recognition engine is employed to recognize the first acoustic input. An output of the speech recognition engine is compared to the language model and a signal indicative of whether the output of the speech recognition matches at least a portion of the target context free grammar is provided. | 03-12-2009 |
20090082037 | PERSONAL POINTS OF INTEREST IN LOCATION-BASED APPLICATIONS - Framework for receiving, processing, and re-using personal points of interest (PPOI) information of a user in a location-based application. A telephone dialog system provides location-based information related PPOI of a user. For example, the PPOI information can include major intersections that the user may normally travel, gas stations, clubs, etc., based on real-time data obtained via web services. The PPOI information can be acquired using common names and nicknames, which are added into system lexicon and recognition grammars. Each PPOI is also tagged to the user (or “owner”) who defined it. The PPOI information can also be shared to support a community of users. The framework also resolves conflicting PPOI information between multiple users and multiple locations. PPOI information input by one user can be used to extract demographic information and personal preferences and be re-used by other users by automatically popping up common names and attributes other users entered for the same nickname. | 03-26-2009 |
20090100340 | ASSOCIATIVE INTERFACE FOR PERSONALIZING VOICE DATA ACCESS - The claimed subject matter according to one aspect provides systems and/or methods that effectuate user development, customization, or utilization of dynamically configurable dialogue flow systems. The system can include devices and components that employ data associated with a user to retrieve navigation panes unique with respect to the user, scans the navigation panes and identifies adjustable attributes, utilizes the adjustable attributes to generate voice prompts communicated to the user via handheld devices, the user in reply to the voice prompts utters personalized responses associated with the voice prompts, and based at least on the personalized responses initiates actions associated with the adjustable attributes. | 04-16-2009 |
20090150341 | GENERATION OF ALTERNATIVE PHRASINGS FOR SHORT DESCRIPTIONS - The claimed subject matter provides systems and/or methods that effectuate generation of alternative expressions or phrasings for short descriptions, proper nouns or places. The system can include devices that select and associate an item with a prompt, displays the selected item and then obscures the item with the prompt associated with the item, elicits a response from users to the prompt based on a motivational statement constructed to solicit an appropriate response from the user. The response elicited from the user and the item selected associated with one another and then persisted to storage media. | 06-11-2009 |
20090248422 | INTRA-LANGUAGE STATISTICAL MACHINE TRANSLATION - Training data may be provided, the training data including pairs of source phrases and target phrases. The pairs may be used to train an intra-language statistical machine translation model, where the intra-language statistical machine translation model, when given an input phrase of text in the human language, can compute probabilities of semantic equivalence of the input phrase to possible translations of the input phrase in the human language. The statistical machine translation model may be used to translate between queries and listings. The queries may be text strings in the human language submitted to a search engine. The listing strings may be text strings of formal names of real world entities that are to be searched by the search engine to find matches for the query strings. | 10-01-2009 |
20090287626 | MULTI-MODAL QUERY GENERATION - A multi-modal search system (and corresponding methodology) is provided. The system employs text, speech, touch and gesture input to establish a search query. Additionally, a subset of the modalities can be used to obtain search results based upon exact or approximate matches to a search result. For example, wildcards, which can either be triggered by the user or inferred by the system, can be employed in the search. | 11-19-2009 |
20090287680 | MULTI-MODAL QUERY REFINEMENT - A multi-modal search query refinement system (and corresponding methodology) is provided. In accordance with the innovation, query suggestion results represent a word palette which can be used to select strings for inclusion or exclusion from a refined set of results. The system employs text, speech, touch and gesture input to refine a set of search query results. Wildcards can be employed in the search either prompted by the user or inferred by the system. Additionally, partial knowledge supplemented by speech can be employed to refine search results. | 11-19-2009 |
20090287681 | MULTI-MODAL SEARCH WILDCARDS - A multi-modal search system (and corresponding methodology) that employs wildcards is provided. Wildcards can be employed in the search query either initiated by the user or inferred by the system. These wildcards can represent uncertainty conveyed by a user in a multi-modal search query input. In examples, the words “something” or “whatchamacallit” can be used to convey uncertainty and partial knowledge about portions of the query and to dynamically trigger wildcard generation. | 11-19-2009 |
20100076752 | Automated Data Cleanup - The described implementations relate to automated data cleanup. One system includes a language model generated from language model seed text and a dictionary of possible data substitutions. This system also includes a transducer configured to cleanse a corpus utilizing the language model and the dictionary. | 03-25-2010 |
20100100384 | Speech Recognition System with Display Information - A language processing system may determine a display form of a spoken word by analyzing the spoken form using a language model that includes dictionary entries for display forms of homonyms. The homonyms may include trade names as well as given names and other phrases. The language processing system may receive spoken language and produce a display form of the language while displaying the proper form of the homonym. Such a system may be used in search systems where audio input is converted to a graphical display of a portion of the spoken input. | 04-22-2010 |
20100145694 | REPLYING TO TEXT MESSAGES VIA AUTOMATED VOICE SEARCH TECHNIQUES - An automated “Voice Search Message Service” provides a voice-based user interface for generating text messages from an arbitrary speech input. Specifically, the Voice Search Message Service provides a voice-search information retrieval process that evaluates user speech inputs to select one or more probabilistic matches from a database of pre-defined or user-defined text messages. These probabilistic matches are also optionally sorted in terms of relevancy. A single text message from the probabilistic matches is then selected and automatically transmitted to one or more intended recipients. Optionally, one or more of the probabilistic matches are presented to the user for confirmation or selection prior to transmission. Correction or recovery of speech recognition errors avoided since the probabilistic matches are intended to paraphrase the user speech input rather than exactly reproduce that speech, though exact matches are possible. Consequently, potential distractions to the user are significantly reduced relative to conventional speech recognition techniques. | 06-10-2010 |
20110137639 | ADAPTING A LANGUAGE MODEL TO ACCOMMODATE INPUTS NOT FOUND IN A DIRECTORY ASSISTANCE LISTING - A statistical language model is trained for use in a directory assistance system using the data in a directory assistance listing corpus. Calculations are made to determine how important words in the corpus are in distinguishing a listing from other listings, and how likely words are to be omitted or added by a user. The language model is trained using these calculations. | 06-09-2011 |
20110238414 | TELEPHONY SERVICE INTERACTION MANAGEMENT - A method for managing an interaction of a calling party to a communication partner is provided. The method includes automatically determining if the communication partner expects DTMF input. The method also includes translating speech input to one or more DTMF tones and communicating the one or more DTMF tones to the communication partner, if the communication partner expects DTMF input. | 09-29-2011 |
20110307252 | Using Utterance Classification in Telephony and Speech Recognition Applications - Described is the use of utterance classification based methods and other machine learning techniques to provide a telephony application or other voice menu application (e.g., an automotive application) that need not use Context-Free-Grammars to determine a user's spoken intent. A classifier receives text from an information retrieval-based speech recognizer and outputs a semantic label corresponding to the likely intent of a user's speech. The semantic label is then output, such as for use by a voice menu program in branching between menus. Also described is training, including training the language model from acoustic data without transcriptions, and training the classifier from speech-recognized acoustic data having associated semantic labels. | 12-15-2011 |
20110314003 | TEMPLATE CONCATENATION FOR CAPTURING MULTIPLE CONCEPTS IN A VOICE QUERY - Architecture that provides the capability to identify which parts (terms and phrases) of a voice query have been covered by predefined phrase templates, and then to concatenate matching phrase templates into a new paraphrased query. A match-drop-continue algorithm is disclosed that progressively masks out the portions (phrases, terms) of the query matched to the phrase templates. Ultimately, the matched phrase templates are accumulated and organized together dynamically into a rephrased version of the original voice query. A user interface is provided that allows the user to confirm/summarize the multiple concepts in a progressive manner. | 12-22-2011 |
20120109994 | ROBUST AUTO-CORRECTION FOR DATA RETRIEVAL - A data-retrieval method for use on a portable electronic device. The method comprises receiving a query string at a user interface of the device and displaying one or more index strings on the user interface such that the relative prominence of each index string displayed increases with increasing resemblance of that index string to the query string. The method further comprises displaying an index string with greater prominence when a fixed-length substring of the query string occurs anywhere in the index string, regardless of position. In this manner, the relevance of prominently displayed index strings increases as more characters are appended to the query string, even if the query string contains errors. | 05-03-2012 |
20120166196 | Word-Dependent Language Model - This document describes word-dependent language models, as well as their creation and use. A word-dependent language model can permit a speech-recognition engine to accurately verify that a speech utterance matches a multi-word phrase. This is useful in many contexts, including those where one or more letters of the expected phrase are known to the speaker. | 06-28-2012 |
20120185252 | CONFIDENCE MEASURE GENERATION FOR SPEECH RELATED SEARCHING - A method of generating a confidence measure generator is provided for use in a voice search system, the voice search system including voice search components comprising a speech recognition system, a dialog manager and a search system. The method includes selecting voice search features, from a plurality of the voice search components, to be considered by the confidence measure generator in generating a voice search confidence measure. The method includes training a model, using a computer processor, to generate the voice search confidence measure based on selected voice search features. | 07-19-2012 |
20120323967 | Spelling Using a Fuzzy Pattern Search - A multimedia system configured to receive user input in the form of a spelled character sequence is provided. In one implementation, a spell mode is initiated, and a user spells a character sequence. The multimedia system performs spelling recognition and recognizes a sequence of character representations having a possible ambiguity resulting from any user and/or system errors. The sequence of character representations with the possible ambiguity yields multiple search keys. The multimedia system performs a fuzzy pattern search by scoring each target item from a finite dataset of target items based on the multiple search keys. One or more relevant items are ranked and presented to the user for selection, each relevant item being a target item that exceeds a relevancy threshold. The user selects the indented character sequence from the one or more relevant items. | 12-20-2012 |
20130090921 | PRONUNCIATION LEARNING FROM USER CORRECTION - Systems and methods are described for adding entries to a custom lexicon used by a speech recognition engine of a speech interface in response to user interaction with the speech interface. In one embodiment, a speech signal is obtained when the user speaks a name of a particular item to be selected from among a finite set of items. If a phonetic description of the speech signal is not recognized by the speech recognition engine, then the user is presented with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item. After the user has selected the particular item via the means for selecting, the phonetic description of the speech signal is stored in association with a text description of the particular item in the custom lexicon. | 04-11-2013 |
20130159000 | Spoken Utterance Classification Training for a Speech Recognition System - The subject disclosure is directed towards training a classifier for spoken utterances without relying on human-assistance. The spoken utterances may be related to a voice menu program for which a speech comprehension component interprets the spoken utterances into voice menu options. The speech comprehension component provides confirmations to some of the spoken utterances in order to accurately assign a semantic label. For each spoken utterance with a denied confirmation, the speech comprehension component automatically generates a pseudo-semantic label that is consistent with the denied confirmation and selected from a set of potential semantic labels and updates a classification model associated with the classifier using the pseudo-semantic label. | 06-20-2013 |
20130262114 | Crowdsourced, Grounded Language for Intent Modeling in Conversational Interfaces - Different advantageous embodiments provide a crowdsourcing method for modeling user intent in conversational interfaces. One or more stimuli are presented to a plurality of describers. One or more sets of describer data are captured from the plurality of describers using a data collection mechanism. The one or more sets of describer data are processed to generate one or more models. Each of the one or more models is associated with a specific stimulus from the one or more stimuli. | 10-03-2013 |
20140244254 | FACILITATING DEVELOPMENT OF A SPOKEN NATURAL LANGUAGE INTERFACE - A development system is described for facilitating the development of a spoken natural language (SNL) interface. The development system receives seed templates from a developer, each of which provides a command phrasing that can be used to invoke a function, when spoken by an end user. The development system then uses one or more development resources, such as a crowdsourcing system and a paraphrasing system, to provide additional templates. This yields an extended set of templates. A generation system then generates one or more models based on the extended set of templates. A user device may install the model(s) for use in interpreting commands spoken by an end user. When the user device recognizes a command, it may automatically invoke a function associated with that command. Overall, the development system provides an easy-to-use tool for producing an SNL interface. | 08-28-2014 |
20140350928 | Method For Finding Elements In A Webpage Suitable For Use In A Voice User Interface - A voice interface for web pages or other documents identifies interactive elements such as links, obtains one or more phrases of each interactive element, such as link text, title text and alternative text for images, and adds the phrases to a grammar which is used for speech recognition. A click event is generated for an interactive element having a phrase which is a best match for the voice command of a user. In one aspect, the phrases of currently-displayed elements of the document are used for speech recognition. In another aspect, phrases which are not displayed, such as title text and alternative text for images, are used in the grammar. In another aspect, updates to the document are detected and the grammar is updated accordingly so that the grammar is synchronized with the current state of the document. | 11-27-2014 |
20140350941 | Method For Finding Elements In A Webpage Suitable For Use In A Voice User Interface (Disambiguation) - A disambiguation process for a voice interface for web pages or other documents. The process identifies interactive elements such as links, obtains one or more phrases of each interactive element, such as link text, title text and alternative text for images, and adds the phrases to a grammar which is used for speech recognition. A group of interactive elements are identified as potential best matches to a voice command when there is no single, clear best match. The disambiguation process modifies a display of the document to provide unique labels for each interactive element in the group, and the user is prompted to provide a subsequent spoke command to identify one of the unique labels. The selected unique label is identified and a click event is generated for the corresponding interactive element. | 11-27-2014 |
20150019216 | PERFORMING AN OPERATION RELATIVE TO TABULAR DATA BASED UPON VOICE INPUT - Described herein are various technologies pertaining to performing an operation relative to tabular data based upon voice input. An ASR system includes a language model that is customized based upon content of the tabular data. The ASR system receives a voice signal that is representative of speech of a user. The ASR system creates a transcription of the voice signal based upon the ASR being customized with the content of the tabular data. The operation relative to the tabular data is performed based upon the transcription of the voice signal. | 01-15-2015 |