| Nuance Communications, Inc. Patent applications |
| Patent application number | Title | Published |
| 20120130718 | METHOD AND SYSTEM FOR COLLECTING AUDIO PROMPTS IN A DYMANICALLY GENERATED VOICE APPLICATION - A prompt collecting tool ( | 05-24-2012 |
| 20120123777 | ADJUSTING A SPEECH ENGINE FOR A MOBILE COMPUTING DEVICE BASED ON BACKGROUND NOISE - Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates. | 05-17-2012 |
| 20120123776 | ADJUSTING A SPEECH ENGINE FOR A MOBILE COMPUTING DEVICE BASED ON BACKGROUND NOISE - Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates. | 05-17-2012 |
| 20120109647 | System Enhancement of Speech Signals - A system enhances speech by detecting a speaker's utterance through a first microphone positioned a first distance from a source of interference. A second microphone may detect the speaker's utterance at a different position. A monitoring device may estimate the power level of a first microphone signal. A synthesizer may synthesize part of the first microphone signal by processing the second microphone signal. The synthesis may occur when power level is below a predetermined level. | 05-03-2012 |
| 20120106749 | Microphone Non-Uniformity Compensation System - A microphone compensation system responds to changes in the characteristics of individual microphones in an array of microphones. The microphone compensation system provides a communication system with consistent performance despite microphone aging, widely varying environmental conditions, and other factors that alter the characteristics of the microphones. Furthermore, lengthy, complex, and costly measurement and analysis phases for determining initial settings for filters in the communication system are eliminated. | 05-03-2012 |
| 20120096108 | MANAGING APPLICATION INTERACTIONS USING DISTRIBUTED MODALITY COMPONENTS - A method for managing multimodal interactions can include the step of registering a multitude of modality components with a modality component server, wherein each modality component handles an interface modality for an application. The modality component can be connected to a device. A user interaction can be conveyed from the device to the modality component for processing. Results from the user interaction can be placed on a shared memory are of the modality component server. | 04-19-2012 |
| 20120095765 | AUTOMATICALLY PROVIDING A USER WITH SUBSTITUTES FOR POTENTIALLY AMBIGUOUS USER-DEFINED SPEECH COMMANDS - A method for alleviating ambiguity issues of new user-defined speech commands. An original command for a user-defined speech command can be received. It can then be determined if the original command is likely to be confused with a set of existing speech commands. When confusion is unlikely, the original command can be automatically stored. When confusion is likely, a substitute command that is unlikely to be confused with existing commands can be automatically determined. The substitute can be presented as an alternative to the original command and can be selectively stored as the user-defined speech command. | 04-19-2012 |
| 20120095676 | ON DEMAND TTS VOCABULARY FOR A TELEMATICS SYSTEM - A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory. | 04-19-2012 |
| 20120089399 | Voice Over Short Messaging Service - A method of operating a mobile communication device is described. A text message is received over a wireless messaging channel, wherein the text message contains a non-text representation of an utterance. The non-text representation is extracted from the text message, and an audio representation of the spoken utterance is synthesized from the non-text representation. | 04-12-2012 |
| 20120078834 | Sparse Representations for Text Classification - A sparse representation method of text classification is described. An input text document is represented as a document feature vector y. A category dictionary H provides possible examples [h | 03-29-2012 |
| 20120072211 | USING CODEC PARAMETERS FOR ENDPOINT DETECTION IN SPEECH RECOGNITION - Systems, methods and apparatus for determining an estimated endpoint of human speech in a sound wave received by a mobile device having a speech encoder for encoding the sound wave to produce an encoded representation of the sound wave. The estimated endpoint may be determined by analyzing information available from the speech encoder, without analyzing the sound wave directly and without producing a decoded representation of the sound wave. The encoded representation of the sound wave may be transmitted to a remote server for speech recognition processing, along with an indication of the estimated endpoint. | 03-22-2012 |
| 20120065982 | DYNAMICALLY GENERATING A VOCAL HELP PROMPT IN A MULTIMODAL APPLICATION - Dynamically generating a vocal help prompt in a multimodal application that include detecting a help-triggering event for an input element of a VoiceXML dialog, where the detecting is implemented with a multimodal application operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application is operatively coupled to a VoiceXML interpreter, and the multimodal application has no static help text. Dynamically generating a vocal help prompt in a multimodal application according to embodiments of the present invention typically also includes retrieving, by the VoiceXML interpreter from a source of help text, help text for an element of a speech recognition grammar, forming by the VoiceXML interpreter the help text into a vocal help prompt, and presenting by the multimodal application the vocal help prompt through a computer user interface to a user. | 03-15-2012 |
| 20120060113 | METHODS AND APPARATUS FOR DISPLAYING CONTENT - Some embodiments relate to using a carousel to display content. In some embodiments, a carousel having a plurality of slots may be displayed in a first portion of a display of a display device, and in response to user selection of one of the plurality of slots, content that is dynamically generated based on user input may be displayed in a second portion of the display, separate from the first portion. | 03-08-2012 |
| 20120059814 | METHODS AND APPARATUS FOR SELECTING A SEARCH ENGINE TO WHICH TO PROVIDE A SEARCH QUERY - Some embodiments relate to a method of performing a search for content on the Internet, in which a user may issue a search query, and the search engine or engines to which that query is provided may be determined dynamically based on any of a variety of factors. For example, in some embodiments, the search engine or engines to which the query is provided may be determined based on the content of the search query, this historical access patterns of the user that issued the query, or the historical access patterns of other users. | 03-08-2012 |
| 20120059813 | METHODS AND APPARATUS FOR SEARCHING THE INTERNET - Some embodiments relate to performing a search for content via the Internet, wherein user input specifying a search query is supplied to a mobile communications device, such as, for example, a smartphone. The mobile communications device separately issues the search query to a plurality of search engines and can receive the results from each search engine and display the results to the user. Thus, the user does not have to separately issue the query to each of the plurality of search engines. | 03-08-2012 |
| 20120059810 | METHOD AND APPARATUS FOR PROCESSING SPOKEN SEARCH QUERIES - Some embodiments relate to a method of performing a search for content on the Internet, in which a user may speak a search query and speech recognition may be performed on the spoken query to generate a text search query to be provided to a plurality of search engines. This enables a user to speak the search query rather than having to type it, and also allows the user to provide the search query only once, rather than having to provide it separately to multiple different search engines. | 03-08-2012 |
| 20120059658 | METHODS AND APPARATUS FOR PERFORMING AN INTERNET SEARCH - Embodiments of the present invention relate to searching for content on the Internet. A user may supply a search query to a device, and the device may issue the search query to a plurality of search engines, including at least one general purpose search engine and at least one site-specific search engine. In this way, the user need not separately issue search queries to each of the plurality of search engines. | 03-08-2012 |
| 20120059655 | METHODS AND APPARATUS FOR PROVIDING INPUT TO A SPEECH-ENABLED APPLICATION PROGRAM - Some embodiments are directed to allowing a user to provide speech input intended for a speech-enabled application program into a mobile communications device, such as a smartphone, that is not connected to the computer that executes the speech-enabled application program. The mobile communications device may provide the user's speech input as audio data to a broker application executing on a server, which determines to which computer the received audio data is to be provided. When the broker application determines the computer to which the audio data is to be provided, it sends the audio data to that computer. In some embodiments, automated speech recognition may be performed on the audio data before it is provided to the computer. In such embodiments, instead of providing the audio data, the broker application may send the recognition result generated from performing automated speech recognition to the identified computer. | 03-08-2012 |
| 20120046953 | ESTABLISHING A MULTIMODAL PERSONALITY FOR A MULTIMODAL APPLICATION - Methods, apparatus, and computer program products are described for establishing a multimodal personality for a multimodal application that include selecting, by the multimodal application, matching vocal and visual demeanors and incorporating, by the multimodal application, the matching vocal and visual demeanors as a multimodal personality into the multimodal application. | 02-23-2012 |
| 20120046951 | NUMERIC WEIGHTING OF ERROR RECOVERY PROMPTS FOR TRANSFER TO A HUMAN AGENT FROM AN AUTOMATED SPEECH RESPONSE SYSTEM - A method for a speech response system to automatically transfer users to human agents. The method can establish an interactive dialog session between a user and an automated speech response system. An error score can be established when the interactive dialog session is initiated. During the interactive dialog session, responses to dialog prompts can be received. Error weights can be assigned to receive responses determined to be non-valid responses. Different non-valid responses can be assigned different error weights. For each non-valid response, the assigned error weight can be added to the error score. When a value of the error score exceeds a previously established error threshold, a user can be automatically transferred from the automated speech response system to a human agent. | 02-23-2012 |
| 20120046950 | RETRIEVAL AND PRESENTATION OF NETWORK SERVICE RESULTS FOR MOBILE DEVICE USING A MULTIMODAL BROWSER - A method of obtaining information using a mobile device can include receiving a request including speech data from the mobile device, and querying a network service using query information extracted from the speech data, whereby search results are received from the network service. The search results can be formatted for presentation on a display of the mobile device. The search results further can be sent, along with a voice grammar generated from the search results, to the mobile device. The mobile device then can render the search results. | 02-23-2012 |
| 20120046945 | MULTIMODAL AGGREGATING UNIT - In a voice processing system, a multimodal request is received from a plurality of modality input devices, and the requested application is run to provide a user with the feedback of the multimodal request. In the voice processing system, a multimodal aggregating unit is provided which receives a multimodal input from a plurality of modality input devices, and provides an aggregated result to an application control based on the interpretation of the interaction ergonomics of the multimodal input within the temporal constraints of the multimodal input. Thus, the multimodal input from the user is recognized within a temporal window. Interpretation of the interaction ergonomics of the multimodal input include interpretation of interaction biometrics and interaction mechani-metrics, wherein the interaction input of at least one modality may be used to bring meaning to at least one other input of another modality. | 02-23-2012 |
| 20120044183 | MULTIMODAL AGGREGATING UNIT - In a voice processing system, a multimodal request is received from a plurality of modality input devices, and the requested application is run to provide a user with the feedback of the multimodal request. In the voice processing system, a multimodal aggregating unit is provided which receives a multimodal input from a plurality of modality input devices, and provides an aggregated result to an application control based on the interpretation of the interaction ergonomics of the multimodal input within the temporal constraints of the multimodal input. Thus, the multimodal input from the user is recognized within a temporal window. Interpretation of the interaction ergonomics of the multimodal input include interpretation of interaction biometrics and interaction mechani-metrics, wherein the interaction input of at least one modality may be used to bring meaning to at least one other input of another modality. | 02-23-2012 |
| 20120041758 | SYNCHRONIZATION OF AN INPUT TEXT OF A SPEECH WITH A RECORDING OF THE SPEECH - A method and system for synchronizing words in an input text of a speech with a continuous recording of the speech. A received input text includes previously recorded content of the speech to be reproduced. A synthetic speech corresponding to the received input text is generated. Ratio data including a ratio between the respective pronunciation times of words included in the received text in the generated synthetic speech is computed. The ratio data is used to determine an association between erroneously recognized words of the received text and a time to reproduce each erroneously recognized word. The association is outputted in a recording medium and/or displayed on a display device. | 02-16-2012 |
| 20120035928 | SPEAKER ADAPTATION OF VOCABULARY FOR SPEECH RECOGNITION - A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed. | 02-09-2012 |
| 20120029921 | SPEECH RECOGNITION SYSTEM AND METHOD - According to the present invention, a method for integrating processes with a multi-faceted human centered interface is provided. The interface is facilitated to implement a hands free, voice driven environment to control processes and applications. A natural language model is used to parse voice initiated commands and data, and to route those voice initiated inputs to the required applications or processes. The use of an intelligent context based parser allows the system to intelligently determine what processes are required to complete a task which is initiated using natural language. A single window environment provides an interface which is comfortable to the user by preventing the occurrence of distracting windows from appearing. The single window has a plurality of facets which allow distinct viewing areas. Each facet has an independent process routing its outputs thereto. As other processes are activated, each facet can reshape itself to bring a new process into one of the viewing areas. All activated processes are executed simultaneously to provide true multitasking. | 02-02-2012 |
| 20120022875 | SYNCHRONIZING VISUAL AND SPEECH EVENTS IN A MULTIMODAL APPLICATION - Exemplary methods, systems, and products are disclosed for synchronizing visual and speech events in a multimodal application, including receiving from a user speech; determining a semantic interpretation of the speech; calling a global application update handler; identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation; and executing the additional function. Typical embodiments may include updating a visual element after executing the additional function. Typical embodiments may include updating a voice form after executing the additional function. Typical embodiments also may include updating a state table after updating the voice form. Typical embodiments also may include restarting the voice form after executing the additional function. | 01-26-2012 |
| 20120011443 | ENABLING SPEECH WITHIN A MULTIMODAL PROGRAM USING MARKUP - A method for speech enabling an application can include the step of specifying a speech input within a speech-enabled markup. The speech-enabled markup can also specify an application operation that is to be executed responsive to the detection of the speech input. After the speech input has been defined within the speech-enabled markup, the application can be instantiated. The specified speech input can then he detected and the application operation can be responsively executed in accordance with the specified speech-enabled markup. | 01-12-2012 |
| 20120009878 | Vehicle Communication System - A vehicle communication system detects the presence of a passenger wearable communication device. The system receives audio signals from multiple sources inside or outside of a vehicle. The system processes the signals before routing the signals to multiple destinations. The destinations may include wearable personal communication devices, front and/or rear speakers, and/or a remote mobile device. | 01-12-2012 |
| 20120004912 | METHOD AND SYSTEM FOR USING INPUT SIGNAL QUALITY IN SPEECH RECOGNITION - A method and system for using input signal quality in an automatic speech recognition system. The method includes measuring the quality of an input signal into a speech recognition system and varying a rejection threshold of the speech recognition system at runtime in dependence on the measurement of the input signal quality. If the measurement of the input signal quality is low, the rejection threshold is reduced and, if the measurement of the input signal quality is high, the rejection threshold is increased. The measurement of the input signal quality may be based on one or more of the measurements of signal-to-noise ratio, loudness, including clipping, and speech signal duration. | 01-05-2012 |
| 20110320203 | METHOD AND SYSTEM FOR IDENTIFYING AND CORRECTING ACCENT-INDUCED SPEECH RECOGNITION DIFFICULTIES - A system for use in speech recognition includes an acoustic module accessing a plurality of distinct-language acoustic models, each based upon a different language; a lexicon module accessing at least one lexicon model; and a speech recognition output module. The speech recognition output module generates a first speech recognition output using a first model combination that combines one of the plurality of distinct-language acoustic models with the at least one lexicon model. In response to a threshold determination, the speech recognition output module generates a second speech recognition output using a second model combination that combines a different one of the plurality of distinct-language acoustic models with the at least one distinct-language lexicon model. | 12-29-2011 |
| 20110282672 | DISTRIBUTED VOICE BROWSER - The present invention can include a method of call processing using a distributed voice browser including allocating a plurality of service processors configured to interpret parsed voice markup language data and allocating a plurality of voice markup language parsers configured to retrieve and parse voice markup language data representing a telephony service. The plurality of service processors and the plurality of markup language parsers can be registered with one or more session managers. Accordingly, components of received telephony service requests can be distributed to the voice markup language parsers and the parsed voice markup language data can be distributed to the service processors. | 11-17-2011 |
| 20110276595 | HANDS FREE CONTACT DATABASE INFORMATION ENTRY AT A COMMUNICATION DEVICE - A method, system, and program provides for hands free contact database information entry at a communication device. A recording system at a communication device detects a user initiation to record. Responsive to detecting the user initiation to record, the recording system records the ongoing conversation supported between the communication device and a second remote communication device. The recording system converts the recording of the conversation into text. Next, the recording system extracts contact information from the text. Then, the recording system stores the extracted contact information in an entry of the contact database, such that contact information is added to the contact database of the communication device without manual entry of the contact information by the user. | 11-10-2011 |
| 20110270613 | INFERRING SWITCHING CONDITIONS FOR SWITCHING BETWEEN MODALITIES IN A SPEECH APPLICATION ENVIRONMENT EXTENDED FOR INTERACTIVE TEXT EXCHANGES - The disclosed solution includes a method for dynamically switching modalities based upon inferred conditions in a dialogue session involving a speech application. The method establishes a dialogue session between a user and the speech application. During the dialogue session, the user interacts using an original modality and a second modality. The speech application interacts using a speech modality only. A set of conditions indicative of interaction problems using the original modality can be inferred. Responsive to the inferring step, the original modality can be changed to the second modality. A modality transition to the second modality can be transparent the speech application and can occur without interrupting the dialogue session. The original modality and the second modality can be different modalities; one including a text exchange modality and another including a speech modality. | 11-03-2011 |
| 20110255671 | PROVIDING CONTEXTUAL INFORMATION FOR SPOKEN INFORMATION - Techniques are described for providing relevant information to users (e.g., information that is at least potentially of interest to the users). Relevant information for a user may be automatically determined based on a determined context of the user and/or on a request for that information from the user. For example, voice-based information may be obtained from a user in one or more ways, and then analyzed to identify requests or other indications of information of interest and/or to otherwise determine a context of the user that corresponds to potential information of interest. Relevant information for a user may be provided to the user in various ways, such as via a voice-based response during a telephone call and/or via one or more electronic messages sent to the user (e.g., via emails, instant messages, paging messages, SMS or other text messages, etc.). | 10-20-2011 |
| 20110246197 | METHOD, APPARATUS, AND PROGRAM FOR CERTIFYING A VOICE PROFILE WHEN TRANSMITTING TEXT MESSAGES FOR SYNTHESIZED SPEECH - A mechanism is provided for authenticating and using a personal voice profile. The voice profile may be issued by a trusted third party, such as a certification authority. The personal voice profile may include information for generating a digest or digital signature for text messages. A speech synthesis system may speak the text message using the voice characteristics, such as prosodic characteristics, only if the voice profile is authenticated and the text message is valid and free of tampering. | 10-06-2011 |
| 20110231189 | METHODS AND APPARATUS FOR EXTRACTING ALTERNATE MEDIA TITLES TO FACILITATE SPEECH RECOGNITION - Techniques for generating a set of one or more alternate titles associated with stored digital media content and updating a speech recognition system to enable the speech recognition system to recognize the set of alternate titles. The system operates on an original media title to extract a set of alternate media titles by applying at least one rule to the original title. The extracted set of alternate media titles are used to update the speech recognition system prior to runtime. In one aspect rules that are applied to original titles are determined by analyzing a corpus of original titles and corresponding possible alternate media titles that a user may use to refer to the original titles. | 09-22-2011 |
| 20110218806 | DETERMINING TEXT TO SPEECH PRONUNCIATION BASED ON AN UTTERANCE FROM A USER - Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words. | 09-08-2011 |
| 20110202349 | ESTABLISHING A MULTIMODAL ADVERTISING PERSONALITY FOR A SPONSOR OF A MULTIMODAL APPLICATION - Establishing a multimodal advertising personality for a sponsor of a multimodal application, including associating one or more vocal demeanors with a sponsor of a multimodal application and presenting a speech portion of the multimodal application for the sponsor using at least one of the vocal demeanors associated with the sponsor. | 08-18-2011 |
| 20110202346 | METHOD AND APPARATUS FOR GENERATING SYNTHETIC SPEECH WITH CONTRASTIVE STRESS - Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings. | 08-18-2011 |
| 20110202345 | METHOD AND APPARATUS FOR GENERATING SYNTHETIC SPEECH WITH CONTRASTIVE STRESS - Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings. | 08-18-2011 |
| 20110179032 | CONCEPTUAL WORLD REPRESENTATION NATURAL LANGUAGE UNDERSTANDING SYSTEM AND METHOD - A Natural Language Understanding system is provided for indexing of free text documents. The system according to the invention utilizes typographical and functional segmentation of text to identify those portions of free text that carry meaning. The system then uses words and multi-word terms and phrases identified in the free to text to identify concepts in the free text. The system uses a lexicon of terms linked to a formal ontology that is independent of a specific language to extract concepts from the free text based on the words and multi-word terms in the free text. The formal ontology contains both language independent domain knowledge concepts and language dependent linguistic concepts that govern the relationships between concepts and contain the rules about how language works. The system according to the current invention may preferably be used to index medical documents and assign codes from independent coding systems, such as, SNOMED, ICD-9 and ICD-10. The system according to the current invention may also preferably make use of syntactic parsing to improve the efficiency of the method. | 07-21-2011 |
| 20110161079 | Grammar and Template-Based Speech Recognition of Spoken Utterances - The present invention relates to a communication system, comprising a database including classes of speech templates, in particular, classified according to a predetermined grammar; an input configured to receive and to digitize speech signals corresponding to a spoken utterance; a speech recognizer configured to receive and recognize the digitized speech signals; and wherein the speech recognizer is configured to recognize the digitized speech signals based on speech templates stored in the database and a predetermined grammatical structure. | 06-30-2011 |
| 20110093868 | APPLICATION MODULE FOR MANAGING INTERACTIONS OF DISTRIBUTED MODALITY COMPONENTS - A method for managing application modalities using dialogue states can include the step of asserting a set of activation conditions associated with a dialogue state of an application. Each of the activation conditions can be linked to at least one programmatic action, wherein different programmatic actions can be executed by different modality components. The application conditions can be monitored. An application event can be detected resulting in an associated application condition being run. At least one programmatic action linked to the application condition can be responsively initiated. | 04-21-2011 |
| 20110078254 | Method and System for the Conversion and Processing of Documents in a Hybrid Network Environment - A method of converting a document for a user. The method includes receiving the document in a first format from a first user device through a telecommunications network. The method also includes automatically producing a new version of the document upon receipt of the document. The new version of the document is in a second format, which is selected from a group including a plurality of formats distinct from the first format. | 03-31-2011 |
| 20110047452 | ENABLING GRAMMARS IN WEB PAGE FRAME - Enabling grammars in web page frames, including receiving, in a multimodal application on a multimodal device, a frameset document, where the frameset document includes markup defining web page frames; obtaining by the multimodal application content documents for display in each of the web page frames, where the content documents include navigable markup elements; generating by the multimodal application, for each navigable markup element in each content document, a segment of markup defining a speech recognition grammar, including inserting in each such grammar markup identifying content to be displayed when words in the grammar are matched and markup identifying a frame where the content is to be displayed; and enabling by the multimodal application all the generated grammars for speech recognition. | 02-24-2011 |
| 20110029316 | SPEECH RECOGNITION SYSTEM AND METHOD - According to the present invention, a method for integrating processes with a multi-faceted human centered interface is provided. The interface is facilitated to implement a hands free, voice driven environment to control processes and applications. A natural language model is used to parse voice initiated commands and data, and to route those voice initiated inputs to the required applications or processes. The use of an intelligent context based parser allows the system to intelligently determine what processes are required to complete a task which is initiated using natural language. A single window environment provides an interface which is comfortable to the user by preventing the occurrence of distracting windows from appearing. The single window has a plurality of facets which allow distinct viewing areas. Each facet has an independent process routing its outputs thereto. As other processes are activated, each facet can reshape itself to bring a new process into one of the viewing areas. All activated processes are executed simultaneously to provide true multitasking. | 02-03-2011 |
| 20110026732 | System for Detecting and Reducing Noise via a Microphone Array - A system for detecting noise in a signal received by a microphone array and a method for detecting noise in a signal received by a microphone array is disclosed. The system also provides for the reduction of noise in a signal received by a microphone array and a method for reducing noise in a signal received by a microphone array. The signal to noise ratio in handsfree systems may be improved, particularly in handsfree systems present in a vehicular environment. | 02-03-2011 |
| 20110019835 | Speaker Localization - The present invention relates to a method for localizing a sound source, in particular, a human speaker, comprising detecting sound generated by the sound source by means of a microphone array comprising more than two microphones and obtaining microphone signals, one for each of the microphones, selecting from the microphone signals a pair of microphone signals for a predetermined frequency range based on the distance of the microphones to each other and estimating the angle of the incidence of the sound on the microphone array based on the selected pair of microphone signals. | 01-27-2011 |
| 20110002449 | VOICE BROWSER WITH INTEGRATED TCAP AND ISUP INTERFACES - A voice browser configured to process voice markup language documents can include a voice processing application and an integrated communications interface for interacting with a voice processing system. The voice browser can be configured to load the voice processing application independently of a received telephone call. The integrated communications interface can include at least one of an integrated transaction capabilities application part component for receiving a transaction capabilities application part query and an integrated ISUP component for receiving a telephony control signal. | 01-06-2011 |
| 20100324889 | ENABLING GLOBAL GRAMMARS FOR A PARTICULAR MULTIMODAL APPLICATION - Methods, apparatus, and computer program products are described for enabling global grammars for a particular multimodal application according to the present invention by loading a multimodal web page; determining whether the loaded multimodal web page is one of a plurality of multimodal web pages of the particular multimodal application. If the loaded multimodal web page is one of the plurality of multimodal web pages of the particular multimodal application, enabling global grammars typically includes loading any currently unloaded global grammars of the particular multimodal application identified in the multimodal web page and maintaining any previously loaded global grammars. If the loaded multimodal web page is not one of the plurality of multimodal web pages of the particular multimodal application, enabling global grammars typically includes unloading any currently loaded global grammars. | 12-23-2010 |
| 20100305947 | Speech Recognition Method for Selecting a Combination of List Elements via a Speech Input - The invention provides a speech recognition method for selecting a combination of list elements via a speech input, wherein a first list element of the combination is part of a first set of list elements and a second list element of the combination is part of a second set of list elements, the method comprising the steps of receiving the speech input, comparing each list element of the first set with the speech input to obtain a first candidate list of best matching list elements, processing the second set using the first candidate list to obtain a subset of the second set, comparing each list element of the subset of the second set with the speech input to obtain a second candidate list of best matching list elements, and selecting a combination of list elements using the first and the second candidate list. | 12-02-2010 |
| 20100298010 | METHOD AND APPARATUS FOR BACK-UP OF CUSTOMIZED APPLICATION INFORMATION - A method of operating a mobile communication device having a set of one or more applications, each with its own associated user-configurable customization, the method comprising detecting whether the user-configurable customization of any of the applications has changed since an earlier time, and for all applications for which the user-configurable customization has changed since said earlier time, wirelessly transmitting those changes to a remote server. The method further comprises maintaining a set of flags indicating whether changes have occurred to the user-configurable customization, wherein detecting whether the user-configurable customization of any of the applications has changed since said earlier time includes reading the set of flags. The remote server is one of a carrier server and a third party provider server. | 11-25-2010 |
| 20100293446 | METHOD AND APPARATUS FOR COUPLING A VISUAL BROWSER TO A VOICE BROWSER - A method and apparatus for concurrently accessing network-based electronic content in a Voice Browser and a Visual Browser can include the steps of retrieving a network-based document formatted for display in the Visual Browser; identifying in the retrieved document a reference to the Voice Browser, the reference specifying electronic content formatted for audible presentation in the Voice Browser; and, transmitting the reference to the Voice Browser. The Voice Browser can retrieve the specified electronic content and audibly present the electronic content. Concurrently, the Visual Browser can visually present the network-based document formatted for visual presentation in the Visual Browser. Likewise, the method of the invention can include the steps of retrieving a network-based document formatted for audible presentation in the Voice Browser; identifying in the retrieved document a reference to the Visual Browser, the reference specifying electronic content formatted for visual presentation in the Visual Browser; and, transmitting the reference to the Visual Browser. The Visual Browser can retrieve the specified electronic content and visually present the specified electronic content. Concurrently, the Voice Browser can audibly present the network-based document formatted for audible presentation in the Voice Browser. | 11-18-2010 |
| 20100286981 | Method for Estimating a Fundamental Frequency of a Speech Signal - The invention provides a method for estimating a fundamental frequency of a speech signal comprising the steps of receiving a signal spectrum of the speech signal, filtering the signal spectrum to obtain a refined signal spectrum, determining a cross-power spectral density using the refined signal spectrum and the signal spectrum, transforming the cross-power spectral density into the time domain to obtain a cross-correlation function, and estimating the fundamental frequency of the speech signal based on the cross-correlation function. | 11-11-2010 |
| 20100246851 | Method for Determining a Noise Reference Signal for Noise Compensation and/or Noise Reduction - The invention provides a method for determining a noise reference signal for noise compensation and/or noise reduction. A first audio signal on a first signal path and a second audio signal on a second signal path are received. The first audio signal is filtered using a first adaptive filter to obtain a first filtered audio signal. The second audio signal is filtered using a second adaptive filter to obtain a second filtered audio signal. The first and the second filtered audio signal are combined to obtain the noise reference signal. The first and the second adaptive filter are adapted such as to minimize a wanted signal component in the noise reference signal. | 09-30-2010 |
| 20100246844 | Method for Determining a Signal Component for Reducing Noise in an Input Signal - The invention provides a method for determining a signal component for reducing noise in an input signal, which comprises a noise component, comprising the steps of: estimating the noise component in the input signal, estimating a reverberation component in the noise component, and removing the estimated reverberation component from the estimated noise component to obtain a modified estimate of the noise component. | 09-30-2010 |
| 20100217589 | Method for Automated Training of a Plurality of Artificial Neural Networks - The invention provides a method for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame. A sequence of frames from the training data are provided, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks. Each of the artificial neural networks is assigned a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames. A common phoneme label for the sequence of frames is determined based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence. Each artificial neural network using the common phoneme label. | 08-26-2010 |
| 20100215184 | Method for Determining a Set of Filter Coefficients for an Acoustic Echo Compensator - The invention provides a method for determining a set of filter coefficients for an acoustic echo compensator in a beamformer arrangement. The acoustic echo compensator compensates for echoes within the beamformed signal. A plurality of sets of filter coefficients for the acoustic echo compensator is provided. Each set of filter coefficients corresponds to one of a predetermined number of steering directions of the beamformer arrangement. The predetermined number of steering directions is equal to or greater than the number of microphones in the microphone array. For a current steering direction, a current set of filter coefficients for the acoustic echo compensator is determined based on the provided sets of filter coefficients. | 08-26-2010 |
| 20100211390 | Speech Recognition of a List Entry - The present invention relates to a method of generating a candidate list from a list of entries in accordance with a string of subword units corresponding to a speech input in a speech recognition system, the list of entries including plural list entries each comprising at least one fragment having one or more subword units. For each list entry, the fragments of the list entry are compared with the string of subword units. A matching score for each of the compared fragments based on the comparison is determined. The matching score for a fragment is further based on a comparison of at least one other fragment of the same list entry with the string of subword units. A total score for each list entry is determined based on the matching scores for the compared fragments of the respective list entry. A candidate list with the best matching entries from the list of entries based on the total scores of the list entries is generated. | 08-19-2010 |
| 20100198598 | Speaker Recognition in a Speech Recognition System - A method for recognizing a speaker of an utterance in a speech recognition system is disclosed. A likelihood score for each of a plurality of speaker models for different speakers is determined. The likelihood score indicating how well the speaker model corresponds to the utterance. For each of the plurality of speaker models, a probability that the utterance originates from that speaker is determined. The probability is determined based on the likelihood score for the speaker model and requires the estimation of a distribution of likelihood scores expected based at least in part on the training state of the speaker. | 08-05-2010 |
| 20100179805 | METHOD, APPARATUS, AND COMPUTER PROGRAM PRODUCT FOR ONE-STEP CORRECTION OF VOICE INTERACTION - A one-step correction mechanism for voice interaction is provided. Correction of a previous state is enabled simultaneously with recognition in a current or subsequent state. An application is decomposed into a set of tasks. Each task is associated with the collection of one piece of information. Each task may be in a different state. At any point during the interaction, while a task/state pair is active, the dialog manager may enable multiple other task/state pairs to be active in latent fashion. The application developer may then use those facilities or resources to the active task/state and the latent task/state pairs depending on contextual condition of the interaction state of the application. | 07-15-2010 |
| 20100150375 | Determination of the Coherence of Audio Signals - Embodiments of the invention disclose computer-implemented methods, systems, and computer program products for estimating signal coherence. First, a sound generated by a sound source is detected by a first microphone to obtain a first microphone signal and by a second microphone to obtain a second microphone signal. The first microphone signal is filtered by a first adaptive finite impulse response filter to obtain a first filtered signal. The second microphone signal is filtered by a second adaptive finite impulse response filter, to obtain a second filtered signal. The coherence of the first filtered signal and the second filtered signal is determined based upon the filtered signals. The first and the second microphone signals are filtered such that the difference between the acoustic transfer function for the transfer of the sound from the sound source to the first microphone and the transfer of the sound from the sound source to the second microphone is compensated in the first and second filtered signals. | 06-17-2010 |
| 20100150364 | Method for Determining a Time Delay for Time Delay Compensation - The invention provides a computer-implemented method for determining a time delay for time delay compensation of a microphone signal from a microphone array in a beamformer arrangement. For a given time, an instantaneous estimate of a position of a wanted sound source and/or of a direction of arrival of a signal originating from the wanted sound source is determined. The computer system then determines whether the instantaneous estimate deviates from a preset estimate of a position of the wanted sound source and/or of a direction of arrival of a signal originating from the wanted sound source according to a predetermined criterion. The predetermined criterion comprises a check whether the instantaneous estimate deviates from the preset estimate by at least a predetermined deviation threshold. If the predetermined criterion is fulfilled, the instantaneous estimate for the given time is set by the computer system as the preset estimate, and the computer system determines the time delay for time delay compensation of the microphone signal based on the instantaneous estimate. | 06-17-2010 |
| 20100145710 | Data-Driven Voice User Interface - A method for developing a voice user interface for a statistical semantic system is described. A set of semantic meanings is defined that reflect semantic classification of a user input dialog. Then, a set of speech dialog prompts is automatically developed from an annotated transcription corpus for directing user inputs to corresponding final semantic meanings. The statistical semantic system may be a call routing application where the semantic meanings are call routing destinations. | 06-10-2010 |
| 20100138222 | Method for Adapting a Codebook for Speech Recognition - A method for adapting a codebook for speech recognition, wherein the codebook is from a set of codebooks comprising a speaker-independent codebook and at least one speaker-dependent codebook is disclosed. A speech input is received and a feature vector based on the received speech input is determined. For each of the Gaussian densities, a first mean vector is estimated using an expectation process and taking into account the determined feature vector. For each of the Gaussian densities, a second mean vector using an Eigenvoice adaptation is determined taking into account the determined feature vector. For each of the Gaussian densities, the mean vector is set to a convex combination of the first and the second mean vector. Thus, this process allows for adaptation during operation and does not require a lengthy training phase. | 06-03-2010 |
| 20100131262 | Speech Recognition Based on a Multilingual Acoustic Model - Embodiments of the invention relate to methods for generating a multilingual acoustic model. A main acoustic model comprising a main acoustic model having probability distribution functions and a probabilistic state sequence model including first states is provided to a processor. At least one second acoustic model including probability distribution functions and a probabilistic state sequence model including states is also provided to the processor. The processor replaces each of the probability distribution functions of the at least one second acoustic model by one of the probability distribution functions and/or each of the states of the probabilistic state sequence model of the at least one second acoustic model with the state of the probabilistic state sequence model of the main acoustic model based on a criteria set to obtain at least one modified second acoustic model. The criteria set may be a distance measurement. The processor then combines the main acoustic model and the at least one modified second acoustic model to obtain the multilingual acoustic model. | 05-27-2010 |
| 20100125459 | STOCHASTIC PHONEME AND ACCENT GENERATION USING ACCENT CLASS - Exemplary embodiments provide for determining a sequence of words in a TTS system. An input text is analyzed using two models, a word n-gram model and an accent class n-gram model. A list of all possible words for each word in the input is generated for each model. Each word in each list for each model is given a score based on the probability that the word is the correct word in the sequence, based on the particular model. The two lists are combined and the two scores are combined for each word. A set of sequences of words are generated. Each sequence of words comprises a unique combination of an attribute and associated word for each word in the input. The combined score of each of word in the sequence of words is combined. A sequence of words having the highest score is selected and presented to a user. | 05-20-2010 |
| 20100112997 | LOCAL TRIGGERING METHODS, SUCH AS APPLICATIONS FOR DEVICE-INITIATED DIAGNOSTIC OR CONFIGURATION MANAGEMENT - A system and method of performing device-initiated diagnostic or configuration management on a mobile device. In some examples, the system identifies a problem related to of the mobile device, communicates with a remote server on a network a message pertaining to the problem, initiates a remedial action at the server, and changes the configuration of the mobile device based on the remedial action. In some cases, the system may transmit a message to the server that triggers the action to be performed. In some cases, the system may not require a user to understand how to solve the problem. | 05-06-2010 |
| 20100106503 | SPEAKER VERIFICATION METHODS AND APPARATUS - In one aspect, a method for determining a validity of an identity asserted by a speaker using a voice print that models speech of a user whose identity the speaker is asserting is provided. The method comprises acts of performing a first verification stage comprising acts of obtaining a first voice signal from the speaker uttering at least one first challenge utterance; and comparing at least one characteristic feature of the first voice signal with at least a portion of the voice print to assess whether the at least one characteristic feature of the first voice signal is similar enough to the at least a portion of the voice print to conclude that the first voice signal was obtained from an utterance by the user. The method further comprises performing a second verification stage if it is concluded in the first verification stage that the first voice signal was obtained from an utterance by the user, the second verification stage comprising acts of adapting at least one parameter of the voice print based, at least in part, on the first voice signal to obtain an adapted voice print, obtaining a second voice signal from the speaker uttering at least one second challenge utterance, and comparing at least one characteristic feature of the second voice signal with at least a portion of the adapted voice print to assess whether the at least one characteristic feature of the second voice signal is similar enough to the at least a portion of the adapted voice print to conclude that the second voice signal was obtained from an utterance by the user. | 04-29-2010 |
| 20100106502 | SPEAKER VERIFICATION METHODS AND APPARATUS - In one aspect, a method for determining validity of an identity asserted by a speaker using a voice print associated with a user whose identity the speaker is asserting, the voice print obtained from characteristic features of at least one first voice signal obtained from the user uttering at least one enrollment utterance including at least one enrollment word is provided. The method comprises acts of obtaining a second voice signal of the speaker uttering at least one challenge utterance, wherein the at least one challenge utterance includes at least one word that was not in the at least one enrollment utterance, obtaining at least one characteristic feature from the second voice signal, comparing the at least one characteristic feature with at least a portion of the voice print to determine a similarity between the at least one characteristic feature and the at least a portion of the voice print, and determining whether the speaker is the user based, at least in part, on the similarity between the at least one characteristic feature and the at least a portion of the voice print. | 04-29-2010 |
| 20100057462 | Speech Recognition - The present invention relates to a method for speech recognition of a speech signal comprising the steps of providing at least one codebook comprising codebook entries, in particular, multivariate Gaussians of feature vectors, that are frequency weighted such that higher weights are assigned to entries corresponding to frequencies below a predetermined level than to entries corresponding to frequencies above the predetermined level and processing the speech signal for speech recognition comprising extracting at least one feature vector from the speech signal and matching the feature vector with the entries of the codebook. | 03-04-2010 |
| 20100054085 | Method and Device for Locating a Sound Source - A method of locating a sound source based on sound received at an array of microphones comprises the steps of determining a correlation function of signals provided by microphones of the array and establishing a direction in which the sound source is located based on at least one eigenvector of a matrix having matrix elements which are determined based on the correlation function. The correlation function has first and second frequency components associated with a first and second frequency band, respectively. The first frequency component is determined based on signals from microphones having a first distance, and the second frequency component is determined based on signals from microphones having a second distance different from the first distance. | 03-04-2010 |
| 20100049529 | INTEGRATED SYSTEM AND METHOD FOR MOBILE AUDIO PLAYBACK AND DICTATION - A method and system provides for a single-pass review and feedback of a document. During audio playback of the document to be reviewed, voice-activated recording of feedback and submission of feedback relative to the location in the original document are accomplished. This provides for a fully integrated, single pass review and feedback of documentation to occur. | 02-25-2010 |
| 20100049521 | SELECTIVE ENABLEMENT OF SPEECH RECOGNITION GRAMMARS - A method for processing speech audio in a network connected client device can include selecting a speech grammar for use in a speech recognition system in the network connected client device; characterizing the selected speech grammar; and, based on the characterization, determining whether to process the speech grammar locally in the network connected client device, or remotely in a speech server in the network. In one aspect of the invention, the selecting step can include establishing a communications session with a speech server; and, querying the speech server for a speech grammar over the established communications session. Additionally, the selecting step can further include registering the speech grammar in the speech recognition system. | 02-25-2010 |
| 20100036659 | Noise-Reduction Processing of Speech Signals - The present invention relates to a method for signal processing comprising the steps of providing a set of prototype spectral envelopes, providing a set of reference noise prototypes, wherein the reference noise prototypes are obtained from at least a sub-set of the provided set of prototype spectral envelopes, detecting a verbal utterance by at least one microphone to obtain a microphone signal, processing the microphone signal for noise reduction based on the provided reference noise prototypes to obtain an enhanced signal and encoding the enhanced signal based on the provided prototype spectral envelopes to obtain an encoded enhanced signal. | 02-11-2010 |
| 20100035663 | Hands-Free Telephony and In-Vehicle Communication - The present invention relates to a signal processing system, comprising a number of microphones and loudspeakers, a hands-free set configured to receive a telephone signal from a remote party and to transmit a microphone signal supplied by at least one of the microphones to the remote party; an in-vehicle communication system configured to receive a microphone signal supplied by at least one of the microphones; receive the telephone signal; amplify the microphone signal to obtain at least one first output signal; output the at least one first output signal and/or a second output signal corresponding to the telephone signal to at least one of the loudspeakers; and wherein the signal processing systems is configured to detect speech activity in the telephone signal and to control the in-vehicle communication system to reduce amplification of the microphone signal by a damping factor, if it is detected that speech activity is present in the telephone signal. | 02-11-2010 |
| 20100031151 | Enabling speech within a multimodal program using markup - A method for speech enabling an application can include the step of specifying a speech input within a speech-enabled markup. The speech-enabled markup can also specify an application operation that is to be executed responsive to the detection of the speech input. After the speech input has been defined within the speech-enabled markup, the application can be instantiated. The specified speech input can then be detected and the application operation can be responsively executed in accordance with the specified speech-enabled markup. | 02-04-2010 |
| 20100030561 | ANNOTATING PHONEMES AND ACCENTS FOR TEXT-TO-SPEECH SYSTEM - A system that outputs phonemes and accents of texts. The system has a storage section storing a first corpus in which spellings, phonemes, and accents of a text input beforehand are recorded separately for individual segmentations of the words that are contained in the text. A text for which phonemes and accents are to be output is acquired and the first corpus is searched to retrieve at least one set of spellings that match the spellings in the text from among sets of contiguous spellings. Then, the combination of a phoneme and an accent that has a higher probability of occurrence in the first corpus than a predetermined reference probability is selected as the phonemes and accent of the text. | 02-04-2010 |
| 20100030558 | Method for Determining the Presence of a Wanted Signal Component - This invention provides a method for determining, in a speech dialogue system issuing speech prompts, a score value as an indicator for the presence of a wanted signal component in an input signal stemming from a microphone, comprising the steps of: using a first likelihood function to determine a first likelihood value for the presence of the wanted signal component in the input signal, using a second likelihood function to determine a second likelihood value for the presence of a noise signal component in the input signal, and determining a score value based on the first and the second likelihood values, wherein the first likelihood function is based on a predetermined reference wanted signal, and the second likelihood function is based on a predetermined reference noise signal. | 02-04-2010 |
| 20100023331 | Speech recognition semantic classification training - An automated method is described for developing an automated speech input semantic classification system such as a call routing system. A set of semantic classifications is defined for classification of input speech utterances, where each semantic classification represents a specific semantic classification of the speech input. The semantic classification system is trained from training data having little or no in-domain manually transcribed training data, and then operated to assign input speech utterances to the defined semantic classifications. Adaptation training data based on input speech utterances is collected with manually assigned semantic labels. When the adaptation training data satisfies a pre-determined adaptation criteria, the semantic classification system is automatically retrained based on the adaptation training data. | 01-28-2010 |
| 20100017393 | Entry Selection from Long Entry Lists - A method, device, and computer program product for locating a desired entry in a list containing multiple list entries for use with a limited text display is described. A list of entries is partitioned into a number of sub-parts such that the desired entry is contained in one of the list sub-parts. At least one of the sub-parts is characterized within a limited text display to prompt a user for feedback regarding the location of the desired entry in the sub-parts. The user feedback is received from a view input element. In response to the user feedback, the sub-part containing the desired entry is selected. The partitioning, characterizing, receiving, and selecting steps are re-performed one or more times until a final sub-part is generated that contains a limited number of entries including the desired entry. | 01-21-2010 |
| 20100014782 | Automatic Correction of Digital Image Distortion - A method of de-skewing a digital image is described. An input camera image is initially received, and text within the input camera image is identified. A text direction of the identified text is determined to determine text lines within the camera image. A three-dimensional de-skewing transformation is determined of the text lines to make the text lines horizontal. Then the de-skewing transformation is applied to the input camera image to form a de-skewed output image. An unwarping transformation may also be applied to the input camera image for straightening text lines that are curved. | 01-21-2010 |
| 20100014690 | Beamforming Pre-Processing for Speaker Localization - Embodiments of the present invention relate to methods, systems, and computer program products for signal processing. A first plurality of microphone signals is obtained by a first microphone array. A second plurality of microphone signals is obtained by a second microphone array different from the first microphone array. The first plurality of microphone signals is beamformed by a first beamformer comprising beamforming weights to obtain a first beamformed signal. The second plurality of microphone signals is beamformed by a second beamformer comprising the same beamforming weights as the first beamformer to obtain a second beamformed signal. The beamforming weights are adjusted such that the power density of echo components and/or noise components present in the first and second plurality of microphone signals is substantially reduced. | 01-21-2010 |
| 20100010805 | RELATIVE DELTA COMPUTATIONS FOR DETERMINING THE MEANING OF LANGUAGE INPUTS - A method for processing language input can include the step of determining at least two possible meanings for a language input. For each possible meaning, a probability that the possible meaning is a correct interpretation of the language input can be determined. At least one relative data computation can be computed based at least in part upon the probabilities. At least one irregularity within the language input can be detected based upon the relative delta computation. The irregularity can include mumble, ambiguous input, and/or compound input. At least one programmatic action can be performed responsive to the detection of the irregularity. | 01-14-2010 |
| 20090306977 | SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, COMPUTER-EXECUTABLE PROGRAM FOR CAUSING COMPUTER TO EXECUTE RECOGNITION METHOD, AND STORAGE MEDIUM - A speech recognition device and method configured to include a computer, for recognizing speech, including: a storage location for storing a feature quantity acquired from a speech signal for each frame; storage portions for storing acoustic model data and language model data; a echo speech component for generating echo speech model data from a speech signal acquired prior to a speech signal to be processed at the current time point and using the echo speech model data to generate adapted acoustic model data; and a processing component for utilizing the feature quantity, the adapted acoustic model data, and the language model data to provide a speech recognition result of the speech signal. | 12-10-2009 |
| 20090275320 | MEASURING END USER ACTIVITY OF SOFTWARE ON A MOBILE OR DISCONNECTED DEVICE - A hardware and/or software facility measures end user activity associated with a software application or service on a mobile phone or other mobile device. The facility tracks and stores usage data associated with a mobile user's use of the application or service. When the mobile user initiates transmission of the usage data, the facility retrieves from the mobile phone or other mobile device a usage code representing the usage data. The facility relies on user transcription, text input-buffer insertion, or other indirect means of data transport to deliver the usage code from the mobile phone or other mobile device to an application developer, service provider, or another entity. The recipient extracts the usage data contained in the usage code, and may perform various data mining and analysis techniques on the usage data in order to evaluate how the application or service is used. | 11-05-2009 |
| 20090275316 | Minimal Distraction Capture of Spoken Contact Information - Real-time automatic capturing and storing is described for contact information such as a telephone number or other well-structured contact information spoken during a conversation over the mobile telephone. A user input is received to capture contact information contained in recent audio data processed by the mobile device. Speech in the recent audio data is identified that corresponds to the contact information. Then speech recognition is used to extract the contact information from the identified speech. The contact information is stored in mobile device memory storage. | 11-05-2009 |
| 20090259613 | Knowledge Re-Use for Call Routing - A method is described for semantic classification in human-machine dialog applications, for example, call routing. Utterances in a new training corpus of a new semantic classification application are tagged using a pre-existing semantic classifier and associated pre-existing classification tags trained for an earlier semantic classification application. | 10-15-2009 |
| 20090259466 | Adaptive Confidence Thresholds for Speech Recognition - Adjusting confidence score thresholds is described for a speech recognition engine. The speech recognition engine is implemented in multiple computer processes functioning in a computer processor, and is characterized by an associated receiver operating characteristic (ROC) curve. A results confirmation process interprets user confirmation of speech recognition results within a given confidence score threshold to create a confirmed portion of the ROC curve for the speech recognition engine. A curve extension process extends the confirmed portion of the ROC curve by extrapolation of unconfirmed speech recognition results beyond the confidence score threshold to generate an extended ROC curve. A threshold adjustment process adjusts the confidence score threshold based on the extended ROC curve to meet target operating constraints for operating the speech recognition engine to perform automatic speech recognition of user speech inputs. | 10-15-2009 |
| 20090254912 | SYSTEM AND METHOD FOR BUILDING APPLICATIONS, SUCH AS CUSTOMIZED APPLICATIONS FOR MOBILE DEVICES - A system and method for building applications, such as applications that cause a mobile device to perform a task, is described. In some examples, the system provides one or more plugins, a framework for the plugins, and configures the plugins to build a customized application for a mobile device. The plugins may include code configured to perform a task, display one or more pages associated with performance of the task, perform a transaction during performance of the task, and so on. | 10-08-2009 |
| 20090216532 | Automatic Extraction and Dissemination of Audio Impression - A method of creating a voice message is described. A dictated audio input is converted by automatic speech recognition to produce a structured text report that includes report fields with report field data extracted from the dictated audio input. A report message is created for transmission over an electronic communication system to a message recipient. The report message has message fields with message field data based on corresponding report field data. A message audio extract is automatically extracted from a portion of the dictated audio input and attached to the report message. And the report message with the message audio extract attachment is forwarded over the electronic communication system to the message recipient | 08-27-2009 |
| 20090202049 | Voice User Interfaces Based on Sample Call Descriptions - A design interface is described for maintaining call information for creating a voice user interface. An initial set of sample call paths is defined for a dialog application. Each sample call path has associated call information including a sequence of system prompts and caller responses that model a user interaction through the dialog application for the sample call path. A call design database stores the call information. A set of subsequent call paths is defined for the dialog application using the call information in the call design database. The call information in the call design database is updated to reflect current versions of the call information for all the call paths. | 08-13-2009 |
| 20090164881 | Scan-to-Redact Searchable Documents - An automatic scan-to-redacted electronic document is described. A user input is received which identifies a scanned document. Then the scanned document is automatically processed to produce a corresponding redacted document which having searchable document text and a document image. The searchable document text includes coded redaction text satisfying defined redaction parameters. The document image includes redacted image areas corresponding to redacted elements. | 06-25-2009 |
| 20090138265 | Joint Discriminative Training of Multiple Speech Recognizers - Adjusting model parameters is described for a speech recognition system that combines recognition outputs from multiple speech recognition processes. Discriminative adjustments are made to model parameters of at least one acoustic model based on a joint discriminative criterion over multiple complementary acoustic models to lower recognition word error rate in the system. | 05-28-2009 |
| 20090117885 | SYSTEM AND METHOD FOR CONDUCTING A SEARCH USING A WIRELESS MOBILE DEVICE - A method and system are provided by which a wireless mobile device takes a vocally entered query and transmits it in a text message format over a wireless network to a search engine; receives search results based on the query from the search engine over the wireless network; and displays the search results. | 05-07-2009 |
| 20090074157 | Speech Recognition System for Electronic Switches In A Non-Wireline Communications Network - Voice activated dialing is described for use in a mobile telecommunications system. A voice input is received from a wireless network user. A telephone number to be dialed is determined by using speaker independent speech recognition to interpret a string of spoken digits in the voice input to determine the telephone number, or using speaker dependent speech recognition to interpret a spoken word in the voice input to determine the telephone number. A telephone call is then initiated by dialing the telephone number. | 03-19-2009 |
| 20090055184 | Creation and Use of Application-Generic Class-Based Statistical Language Models for Automatic Speech Recognition - A method of creating an application-generic class-based SLM includes, for each of a plurality of speech applications, parsing a corpus of utterance transcriptions to produce a first output set, in which expressions identified in the corpus are replaced with corresponding grammar tags from a grammar that is specific to the application. The method further includes, for each of the plurality of speech applications, replacing each of the grammar tags in the first output set with a class identifier of an application-generic class, to produce a second output set. The method further includes processing the resulting second output sets with a statistical language model (SLM) trainer to generate an application-generic class-based SLM. | 02-26-2009 |
| 20090055182 | Discriminative Training of Hidden Markov Models for Continuous Speech Recognition - Methods are given for improving discriminative training of hidden Markov models for continuous speech recognition. For a mixture component of a hidden Markov model state, a gradient adjustment is calculated of the standard deviation of the mixture component. If the calculated gradient adjustment is greater than a first threshold amount, an adjustment is performed of the standard deviation of the mixture component using the first threshold. If the calculated gradient adjustment is less than a second threshold amount, an adjustment is performed of the standard deviation of the mixture component using the second threshold. Otherwise, an adjustment is performed of the standard deviation of the mixture component using the calculated gradient adjustment. | 02-26-2009 |
| 20090048841 | Synthesis by Generation and Concatenation of Multi-Form Segments - A speech synthesis system and method is described. A speech segment database references speech segments having various different speech representational structures. A speech segment selector selects from the speech segment database a sequence of speech segment candidates corresponding to a target text. A speech segment sequencer generates from the speech segment candidates sequenced speech segments corresponding to the target text. A speech segment synthesizer combines the selected sequenced speech segments to produce a synthesized speech signal output corresponding to the target text. | 02-19-2009 |
| 20090024390 | Multi-Class Constrained Maximum Likelihood Linear Regression - A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input. | 01-22-2009 |
| 20080255884 | Categorization of Information Using Natural Language Processing and Predefined Templates - A computer implemented method for generating a report that includes latent information, comprising receiving an input data stream that includes latent information, performing one of normalization, validation, and extraction of the input data stream, processing the input data stream to identify latent information within the data stream that is required for generation of a particular report, wherein said processing of the input data stream to identify latent information comprises of identifying a relevant portion of the input data stream, bounding the relevant portion of the input data stream, classifying and normalizing the bounded data, activating a relevant report template based on said identified latent information, populating said template with template-specified data, and processing the template-specified data to generate a report. | 10-16-2008 |
| 20080252936 | Personal network scanning profiles - User-specific scanning of documents on a computer network is described. A user profile containing a set of user-specific scanning settings associated with pre-identified scanning preferences of a specific user is retrieved from a network storage device. A paper document is scanned on a network scanning device based on the user profile to produce a representative scanned document, which is automatically forwarded to a network location associated with the user profile. | 10-16-2008 |
| 20080240396 | Semi-supervised training of destination map for call handling applications - A method of semi-supervised synonym inference for a call handling application, such as automated directory assistance or call routing, is described. In one embodiment the method comprises examining a database of caller interaction results from a directory assistance system that includes an automated directory assistance engine, detecting a specified characteristic in the caller interaction results, and using the detected characteristic to automatically train a destination map, where the destination map is for use by the automated directory assistance engine in automatically mapping human speech to a destination. The detecting of the specified characteristic in the caller interaction results may include a statistical analysis of the caller interaction results for each of one or more speech recognition strings. | 10-02-2008 |