Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Word recognition

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression


704231000 - Recognition

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
704255000 Specialized models 138
704254000 Subportions 92
704252000 Preliminary matching 7
704253000 Endpoint detection 6
20100161334UTTERANCE VERIFICATION METHOD AND APPARATUS FOR ISOLATED WORD N-BEST RECOGNITION RESULT - An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.06-24-2010
20110196678SPEECH RECOGNITION APPARATUS AND SPEECH RECOGNITION METHOD - A distance calculation unit (08-11-2011
20100049517AUTOMATIC ANSWERING DEVICE, AUTOMATIC ANSWERING SYSTEM, CONVERSATION SCENARIO EDITING DEVICE, CONVERSATION SERVER, AND AUTOMATIC ANSWERING METHOD - An automatic answering device and an automatic answering method for automatically answering to a user utterance are configured: to prepare a conversation scenario that is a set of input sentences and replay sentences, the input sentences each corresponding to a user utterance assumed to be uttered by a user, the reply sentences each being an automatic reply to the inputted sentence; to accept a user utterance; to determine the reply sentence to the accepted user utterance on the basis of the conversation scenario; and to present the determined reply sentence to the user. Data of the conversation scenario have a data structure that enables the inputted sentences and the reply sentences to be expressed in a state transition diagram in which each of the inputted sentences is defined as a morphism and the reply sentence corresponding to the inputted sentence is defined as an object.02-25-2010
20100076764METHOD OF DIALING PHONE NUMBERS USING AN IN-VEHICLE SPEECH RECOGNITION SYSTEM - A method of dialing phone numbers using an in-vehicle speech recognition system includes receiving speech input at a vehicle, separating the speech input into a word segment and a digit segment, identifying the letters in a word segment, converting the letters in the word segment to digits, and operating an alphanumeric keypad based on the digit speech segment and the converted word segment.03-25-2010
20130080171BACKGROUND SPEECH RECOGNITION ASSISTANT - In one embodiment, a method receives an acoustic input signal at a speech recognizer configured to recognize the acoustic input signal in an always on mode. A set of responses based on the recognized acoustic input signal is determined and ranked based on criteria. A computing device determines if the response should be output based on a ranking of the response. The method determines an output method in a plurality of output methods based on the ranking of the response and outputs the response using the output method if it is determined the response should be output.03-28-2013
20130035938APPARATUS AND METHOD FOR RECOGNIZING VOICE - The present invention includes a hierarchical search process. The hierarchical search process includes three steps. In a first step, a word boundary is determined using a recognition method of determining a following word dependent on a preceding word, and a word boundary detector. In a second step, word unit based recognition is performed in each area by dividing an input voice into a plurality of areas based on the determined word boundary. Finally, in a third step, a language model is applied to induce an optimal sentence recognition result with respect to a candidate word that is determined for each area. The present invention may improve the voice recognition performance, and particularly, the sentence unit based consecutive voice recognition performance.02-07-2013
20130041667MULTIMODAL DISAMBIGUATION OF SPEECH RECOGNITION - The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.02-14-2013
20100100381System and Method for Automatic Verification of the Understandability of Speech - The present invention relates to a system and method for automatically verifying that a message received from a user is intelligible. In an exemplary embodiment, a message is received from the user. A speech level of the user's message may be measured and compared to a pre-determined speech level threshold to determine whether the measured speech level is below the pre-determined speech level threshold. A signal-to-noise ratio of the user's message may be measured and compared to a pre-determined signal-to-noise ratio threshold to determine whether the measured signal-to-noise ratio of the message is below the pre-determined signal-to-noise ratio threshold. An estimate of intelligibility for the user's message may be calculated and compared to an intelligibility threshold to determine whether the calculated estimate of intelligibility is below the intelligibility threshold. If any of the measured speech level, measured signal-to-noise ratio and calculated estimate of intelligibility of the user's message are determined to be below their respective thresholds, the user may be prompted to repeat at least a portion of the message.04-22-2010
20100106505USING WORD CONFIDENCE SCORE, INSERTION AND SUBSTITUTION THRESHOLDS FOR SELECTED WORDS IN SPEECH RECOGNITION - A method and system for improving the accuracy of a speech recognition system using word confidence score (WCS) processing is introduced. Parameters in a decoder are selected to minimize a weighted total error rate, such that deletion errors are weighted more heavily than substitution and insertion errors. The occurrence distribution in WCS is different depending on whether the word was correctly identified and based on the type of error. This is used to determine thresholds in WCS for insertion and substitution errors. By processing the hypothetical word (HYP) (output of the decoder), a mHYP (modified HYP) is determined. In some circumstances, depending on the WCS's value in relation to insertion and substitution threshold values, mHYP is set equal to: null, a substituted HYP, or HYP.04-29-2010
20090043580System and Method for Controlling the Operation of a Device by Voice Commands - The present invention includes a speech recognition system comprising a light element, a power control switch, the power control switch varying the power delivered to the light element, a controller, a microphone, a speech recognizer coupled to the microphone for recognizing speech input signals and transmitting recognition results to the controller, and a speech synthesizer coupled to the controller for generating synthesized speech, wherein the controller varies the power to the light element in accordance with the recognition results received from the speech recognizer. Embodiments of the invention may alternatively include a low power wake up circuit. In another embodiment, the present invention is a method of controlling a device by voice commands.02-12-2009
20090094032Systems and methods of performing speech recognition using sensory inputs of human position - Embodiments of the present invention improve methods of performing speech recognition using sensory inputs of human position. In one embodiment, the present invention includes a speech recognition method comprising sensing a change in position of at least one part of a human body, selecting a recognition set based on the change of position, receiving a speech input signal, and recognizing the speech input signal in the context of the first recognition set.04-09-2009
20090094031Method, Apparatus and Computer Program Product for Providing Text Independent Voice Conversion - An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model.04-09-2009
20130060571INTEGRATED LOCAL AND CLOUD BASED SPEECH RECOGNITION - A system for integrating local speech recognition with cloud-based speech recognition in order to provide an efficient natural user interface is described. In some embodiments, a computing device determines a direction associated with a particular person within an environment and generates an audio recording associated with the direction. The computing device then performs local speech recognition on the audio recording in order to detect a first utterance spoken by the particular person and to detect one or more keywords within the first utterance. The first utterance may be detected by applying voice activity detection techniques to the audio recording. The first utterance and the one or more keywords are subsequently transferred to a server which may identify speech sounds within the first utterance associated with the one or more keywords and adapt one or more speech recognition techniques based on the identified speech sounds.03-07-2013
20130060570SYSTEM AND METHOD FOR ADVANCED TURN-TAKING FOR INTERACTIVE SPOKEN DIALOG SYSTEMS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for advanced turn-taking in an interactive spoken dialog system. A system configured according to this disclosure can incrementally process speech prior to completion of the speech utterance, and can communicate partial speech recognition results upon finding particular conditions. A first condition which, if found, allows the system to communicate partial speech recognition results, is that the most recent word found in the partial results is statistically likely to be the termination of the utterance, also known as a terminal node. A second condition is the determination that all search paths within a speech lattice converge to a common node, also known as a pinch node, before branching out again. Upon finding either condition, the system can communicate the partial speech recognition results. Stability and correctness probabilities can also determine which partial results are communicated.03-07-2013
20110066436Speaker intent analysis system - A speaker intent analysis system and method for validating the truthfulness and intent of a plurality of participants' responses to questions. A computer stores, retrieves, and transmits a series of questions to be answered audibly by participants. The participants' answers are received by a data processor. The data processor analyzes and records the participants' speech parameters for determining the likelihood of dishonesty. In addition to analyzing participants' speech parameters for distinguishing stress or other abnormality, the processor may be equipped with voice recognition software to screen responses that while not dishonest, are indicative of possible malfeasance on the part of the participants. Once the responses are analyzed, the processor produces an output that is indicative of the participant's credibility. The output may be sent to proper parties and/or devices such as a web page, computer, e-mail, PDA, pager, database, report, etc. for appropriate action.03-17-2011
20090271199Records Disambiguation In A Multimodal Application Operating On A Multimodal Device - Methods, apparatus, and products are disclosed for record disambiguation in a multimodal application operating on a multimodal device, the multimodal device supporting multiple modes of interaction including at least a voice mode and a visual mode, that include: prompting, by the multimodal application, a user to identify a particular record among a plurality of records; receiving, by the multimodal application in response to the prompt, a voice utterance from the user; determining, by the multimodal application, that the voice utterance ambiguously identifies more than one of the plurality of records; generating, by the multimodal application, a user interaction to disambiguate the records ambiguously identified by the voice utterance in dependence upon record attributes of the records ambiguously identified by the voice utterance; and selecting, by the multimodal application for further processing, one of the records ambiguously identified by the voice utterance in dependence upon the user interaction.10-29-2009
20120116765SPEECH PROCESSING DEVICE, METHOD, AND STORAGE MEDIUM - A speech recognition unit (05-10-2012
20120116764Speech recognition method on sentences in all languages - A speech recognition method on all sentences in all languages is provided. A sentence can be a word, name or sentence. All sentences are represented by E×P=12×12 matrices of linear predict coding cepstra (LPCC) 1000 different voices are transformed into 1000 matrices of LPCC to represent 1000 databases. E×P matrices of known sentences after deletion of time intervals between two words are put into their closest databases. To classify an unknown sentence, use the distance to find its F closest databases and then from known sentences in its F databases, find a known sentence to be the unknown one. The invention needs no samples and can find a sentence in one second using Visual Basic. Any person without training can immediately and freely communicate with computer in any language. It can recognize up to 7200 English words, 500 sentences of any language and 500 Chinese words.05-10-2012
20090037175Confidence measure generation for speech related searching - A voice search system has a speech recognizer, a search component, and a dialog manager. A confidence measure generator receives speech recognition features from the speech recognizer, search features from the search component, and dialog features from the dialog manager, and calculates an overall confidence measure for voice search results based upon the features received. The invention can be extended to include the generation of additional features, based on those received from the individual components of the voice search system.02-05-2009
20130166302Methods and Apparatus for Audio Input for Customization of Digital Displays - Aspects of customizing digital signage are addressed. For example, an audio feed may be analyzed for keywords occurring in potential customers' speech. These keywords are then employed to customize display screens of a digital display.06-27-2013
20080294439Speech screening - This invention relates to screening of spoken audio data so as to detect threat words or phrases. The method is particularly useful for protecting children or vulnerable adults from unsuitable content and/or suspicious or threatening contact with others via a communication medium. The method is applicable to screening speech transmitted over a computer network such as the internet and provides screening of access to stored content, e.g. audio or multimedia data files, as well as real time speech such as live broadcasts or communication via voice over IP or similar communication protocols. The method allows an administrator, e.g. a parent, to identify groups of threat words or phrases to be monitored, to set user access levels and to determine appropriate responses when threat words or phrases are detected.11-27-2008
20080294438Apparatus for the processing of sales - The invention relates to an apparatus for the processing of sales of articles of a product assortment to a customer, in particular to store scales, having a microphone device for listening to a conversation between a customer and a salesperson for the conversion of connected spoken words of the conversation into an electrical speech signal, having a speech recognition device for the generation of a speech recognition result representing the words from the electrical speech signal having a comparator for the comparison of the speech recognition result with keywords stored in a memory device of the apparatus for keyword recognition in the speech recognition result, with at least some of the stored keywords being product names which define a group of keywords, and having a control device which, on the detection of a keyword belonging to the group and/or of a permitted combination of keywords, which includes a keyword belonging to the group, is adapted to output a piece of information associated with the detected keyword or with the detected permitted combination and available via a data source or to output an offer for the output of the information, in particular in the form of a section menu, by means of an output device of the apparatus.11-27-2008
20080294437LANGUAGE UNDERSTANDING DEVICE - A language understanding device includes: a language understanding model storing unit configured to store word transition data including pre-transition states, input words, predefined outputs corresponding to the input words, word weight information, and post-transition states, and concept weighting data including concepts obtained from language understanding results for at least one word, and concept weight information corresponding to the concepts; a finite state transducer processing unit configured to output understanding result candidates including the predefined outputs, to accumulate word weights so as to obtain a cumulative word weight, and to sequentially perform state transition operations; a concept weighting processing unit configured to accumulate concept weights so as to obtain a cumulative concept weight; and an understanding result determination unit configured to determine an understanding result from the understanding result candidates by referring to the cumulative word weight and the cumulative concept weight.11-27-2008
20080294436SPEECH RECOGNITION FOR IDENTIFYING ADVERTISEMENTS AND/OR WEB PAGES - A device may identify terms in a speech signal using speech recognition. The device may further retain one or more of the identified terms by comparing them to a set of words and send the retained terms and information associated with the retained terms to a remote device. The device may also receive messages that are related to the retained terms and to the information associated with the retained terms from the remote device.11-27-2008
20120271634Context Based Voice Activity Detection Sensitivity - A speech dialog system is described that adjusts a voice activity detection threshold during a speech dialog prompt C to reflect a context-based probability of user barge in speech occurring. For example, the context-based probability may be based on the location of one or more transition relevance places in the speech dialog prompt.10-25-2012
20090089058Part-of-speech tagging using latent analogy - Methods and apparatuses to assign part-of-speech tags to words are described. An input sequence of words is received. A global fabric of a corpus having training sequences of words may be analyzed in a vector space. A global semantic information associated with the input sequence of words may be extracted based on the analyzing. A part-of-speech tag may be assigned to a word of the input sequence based on POS tags from pertinent words in relevant training sequences identified using the global semantic information. The input sequence may be mapped into a vector space. A neighborhood associated with the input sequence may be formed in the vector space wherein the neighborhood represents one or more training sequences that are globally relevant to the input sequence.04-02-2009
20090306983USER ACCESS AND UPDATE OF PERSONAL HEALTH RECORDS IN A COMPUTERIZED HEALTH DATA STORE VIA VOICE INPUTS - Systems and methods for enabling user access and update of personal health records stored in a health data store via voice inputs are provided. The system may include a computer program having a recognizer module configured to process structured word data of a user voice input received from a voice platform, to produce a set of tagged structured word data based on a healthcare-specific glossary. The computer program may further include a health data store interface configured to apply a rule set to the tagged structured word data to produce a query to the health data store and receive a response from the health data store based on the query, and a grammar generator configured to generate a reply sentence based on the response received from the health data store and pass the reply sentence to the voice platform to be played as a voice reply to the user.12-10-2009
20090012789Method and process for performing category-based analysis, evaluation, and prescriptive practice creation upon stenographically written and voice-written text files - System and method for electronically identifying and analyzing the type and frequency of errors and mismatches in a stenographically or voice written text against a stored master file and dynamically creating personalized user feedback, drills, and practice based on identified errors and mismatches from within the context of the stored master file. The system provides the user with a plurality of methods to enter a text file for error identification and analysis including both realtime and non-realtime input. The text input is then compared to a stored master file through a word-by-word iterative process which produces a comparison of writing input and stored master wherein errors and mismatches are identified and grouped in a plurality of pre-defined and user-selected categories, each of which is color-coded to facilitate pattern recognition of type and frequency of errors and mismatches in the submitted writing.01-08-2009
20110301955Predicting and Learning Carrier Phrases for Speech Input - Predicting and learning users' intended actions on an electronic device based on free-form speech input. Users' actions can be monitored to develop of a list of carrier phrases having one or more actions that correspond to the carrier phrases. A user can speak a command into a device to initiate an action. The spoken command can be parsed and compared to a list of carrier phrases. If the spoken command matches one of the known carrier phrases, the corresponding action(s) can be presented to the user for selection. If the spoken command does not match one of the known carrier phrases, search results (e.g., Internet search results) corresponding to the spoken command can be presented to the user. The actions of the user in response to the presented action(s) and/or the search results can be monitored to update the list of carrier phrases.12-08-2011
20110288868DISAMBIGUATION OF CONTACT INFORMATION USING HISTORICAL DATA - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for disambiguating contact information. A method includes receiving an audio signal, generating an affinity score based on a frequency with which a user has previously communicated with a contact associated with an item of contact information, and further based on a recency of one or more past interactions between the user and the contact associated with the item of contact information, inferring a probability that the user intends to initiate a communication using the item of contact information based on the affinity score generated for the item of contact information, and generating a communication initiation grammar.11-24-2011
20110288867NAMETAG CONFUSABILITY DETERMINATION - A method of and system for managing nametags including receiving a command from a user to store a nametag, prompting the user to input a number to be stored in association with the nametag, receiving an input for the number from the user, prompting the user to input the nametag to be stored in association with the number, receiving an input for the nametag from the user, processing the nametag input, and calculating confusability of the nametag input in multiple individual domains including a nametag domain, a number domain, and a command domain.11-24-2011
20110295605SPEECH RECOGNITION SYSTEM AND METHOD WITH ADJUSTABLE MEMORY USAGE - This speech recognition system provides a function that is capable of adjusting memory usage according to the different target resources. It extracts a sequence of feature vectors from input speech signal. A module for constructing search space reads a text file and generates a word-level search space in an off-line phase. After removing redundancy, the word-level search space is expanded to a phone-level one and is represented by a tree-structure. This may be performed by combining the information from dictionary which gives the mapping from a word to its phonetic sequence(s). In the online phase, a decoder traverses the search space, takes the dictionary and at least one acoustic model as input, computes score of feature vectors and outputs decoding result.12-01-2011
20100036665GENERATING SPEECH-ENABLED USER INTERFACES - Methods, systems, and apparatus, including computer program products, for automatically creating a speech-based user interface involve identifying a software service definition that includes service inputs, service outputs, and context data and accessing a standard user interface incorporating the service input and output. The standard user interface defines a set of valid inputs for the service input and a set of available outputs, at least one of which based on the context data. Audio data is associated with at least some of the inputs in the set of valid inputs to define a set of valid speech inputs. A speech-based user interface is automatically generated from the standard user interface and the set of valid speech inputs.02-11-2010
20100114574RETRIEVAL USING A GENERALIZED SENTENCE COLLOCATION - A method and system for identifying documents relevant to a query that specifies a part of speech is provided. A retrieval system receives from a user an input query that includes a word and a part of speech. Upon receiving an input query that includes a word and a part of speech, the retrieval system identifies documents with a sentence that includes that word collocated with a word that is used as that part of speech. The retrieval system displays to the user an indication of the identified documents.05-06-2010
20100114575System and Method for Extracting a Specific Situation From a Conversation - A system, method, and computer readable article of manufacture for extracting a specific situation in a conversation. The system includes: an acquisition unit for acquiring speech voice data of speakers in the conversation; a specific expression detection unit for detecting the speech voice data of a specific expression from speech voice data of a specific speaker in the conversation; and a specific situation extraction unit for extracting, from the speech voice data of the speakers in the conversation, a portion of the speech voice data that forms a speech pattern that includes the speech voice data of the specific expression detected by the specific expression detection unit.05-06-2010
20090089057SPOKEN LANGUAGE GRAMMAR IMPROVEMENT TOOL AND METHOD OF USE - A system and method of improving language skills and, more particularly, a spoken language grammar improvement tool and method of use is provided. The method includes monitoring and analyzing a user's speech pattern; matching a stored undesirable phrase and/or word with the user's speech pattern; and providing feedback to the user when a match is found between the stored undesirable phrase and/or word and the user's speech pattern. A system for monitoring spoken language includes a computer infrastructure being operable to: detect a user's speech pattern; compare the user's speech pattern with stored undesirable words and/or phrases; and provide a notification type to a user that the user's speech pattern matches with at least one of the stored words and/or phrases.04-02-2009
20080262842PORTABLE COMPUTER WITH SPEECH RECOGNITION FUNCTION AND METHOD FOR PROCESSING SPEECH COMMAND THEREOF - A portable computer with a speech recognition function and the method for processing a speech command thereof is disclosed. In the method of a speech command, the speech command has Y command character strings, wherein Y is a positive integer and which is greater than or equal to one. The method includes a step: providing a plurality of speech recognition databases and loading a corresponding speech recognition database responding to execute the X-th command string of the speech command, wherein X is a positive integer and is greater than or equal to one and is less than or equal to N. When the string corresponding to the X-th command character string is found in the loaded speech recognition database, an operation designated by the X-th command string is executed, and when X is not equal to Y, one is added to X.10-23-2008
20080262841APPARATUS AND METHOD FOR RENDERING CONTENTS, CONTAINING SOUND DATA, MOVING IMAGE DATA AND STATIC IMAGE DATA, HARMLESS - A method of rendering multimedia contents harmless is described. The method includes: reading out a predetermined word and the contents from a recording apparatus; replacing the predetermined word in transcript data with a different word, and setting the transcript data including the different word, and the predetermined word, respectively, as transcript data of harmless contents, and as transcript data of unique information; replacing the predetermined word with the different word, and setting the sound data including the different word and the predetermined word according to a time when the predetermined word appears in the firstly mentioned transcript data, respectively, as sound data of the harmless contents, and as sound data of the unique information; replacing the predetermined word in the presentation data with the different word, and the predetermined word, respectively, as presentation data of the harmless contents, and as presentation data of the unique information; recording the harmless contents; and recording the unique information.10-23-2008
20120109652Leveraging Interaction Context to Improve Recognition Confidence Scores - On a computing device a speech utterance is received from a user. The speech utterance is a section of a speech dialog that includes a plurality of speech utterances. One or more features from the speech utterance are identified. Each identified feature from the speech utterance is a specific characteristic of the speech utterance. One or more features from the speech dialog are identified. Each identified feature from the speech dialog is associated with one or more events in the speech dialog. The one or more events occur prior to the speech utterance. One or more identified features from the speech utterance and one or more identified features from the speech dialog are used to calculate a confidence score for the speech utterance.05-03-2012
20090177471MODEL DEVELOPMENT AUTHORING, GENERATION AND EXECUTION BASED ON DATA AND PROCESSOR DEPENDENCIES - A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.07-09-2009
20090012791REFERENCE PATTERN ADAPTATION APPARATUS, REFERENCE PATTERN ADAPTATION METHOD AND REFERENCE PATTERN ADAPTATION PROGRAM - A method and apparatus for carrying out adaptation using input speech data information even at a low reference pattern recognition performance. A reference pattern adaptation device 01-08-2009
20090265171SEGMENTING WORDS USING SCALED PROBABILITIES - Systems, methods, and apparatuses including computer program products for segmenting words using scaled probabilities. In one implementation, a method is provided. The method includes receiving a probability of a n-gram identifying a word, determining a number of atomic units in the corresponding n-gram, identifying a scaling weight depending on the number of atomic units in the n-gram, and applying the scaling weight to the probability of the n-gram identifying a word to determine a scaled probability of the n-gram identifying a word.10-22-2009
20100125456System and Method for Recognizing Proper Names in Dialog Systems - Embodiments of a dialog system that utilizes contextual information to perform recognition of proper names are described. Unlike present name recognition methods on large name lists that generally focus strictly on the static aspect of the names, embodiments of the present system take into account of the temporal, recency and context effect when names are used, and formulates new questions to further constrain the search space or grammar for recognition of the past and current utterances.05-20-2010
20100088097USER FRIENDLY SPEAKER ADAPTATION FOR SPEECH RECOGNITION - Improved performance and user experience for speech recognition application and system by utilizing for example offline adaptation without tedious effort by a user. Interactions with a user may be in the form of a quiz, game, or other scenario wherein the user may implicitly provide vocal input for adaptation data. Queries with a plurality of candidate answers may be designed in an optimal and efficient way, and presented to the user, wherein detected speech from the user is then matched to one of the candidate answers, and may be used to adapt an acoustic model to the particular speaker for speech recognition.04-08-2010
20080208582Methods for statistical analysis of speech - Computer-implemented methods and apparatus are provided to facilitate the recognition of the content of a body of speech data. In one embodiment, a method for analyzing verbal communication is provided, comprising acts of producing an electronic recording of a plurality of spoken words; processing the electronic recording to identify a plurality of word alternatives for each of the spoken words, each of the plurality of word alternatives being identified by comparing a portion of the electronic recording with a lexicon, and each of the plurality of word alternatives being assigned a probability of correctly identifying a spoken word; loading the word alternatives and the probabilities to a database for subsequent analysis; and examining the word alternatives and the probabilities to determine at least one characteristic of the plurality of spoken words.08-28-2008
20110208526METHOD FOR VARIABLE RESOLUTION AND ERROR CONTROL IN SPOKEN LANGUAGE UNDERSTANDING - A method for variable resolution and error control in spoken language understanding (SLU) allows arranging the categories of the SLU into a hierarchy of different levels of specificity. The pre-determined hierarchy is used to identify different types of errors such as high-cost errors and low-cost errors and trade, if necessary, high cost errors for low cost errors.08-25-2011
20120296652OBTAINING INFORMATION ON AUDIO VIDEO PROGRAM USING VOICE RECOGNITION OF SOUNDTRACK - A method for obtaining information on an audio video program being presented on a consumer electronics (CE) device includes receiving at the CE device a viewer command to recognize the audio video program being presented on the CE device. The method also includes receiving signals from a microphone representative of audio from the audio video program as sensed by the microphone as the audio is played real time on the CE device. The method then includes executing voice recognition on the signals from the microphone to determine words in the audio from the audio video program as sensed by the microphone. Words are then uploaded to an Internet server, where they are correlated to at least one audio video script. The method then includes receiving back from the Internet server information correlated by the server using the words to the audio video program.11-22-2012
20090276219VOICE INPUT SYSTEM AND VOICE INPUT METHOD - In the present invention, a voice input system and a voice input method are provided. The voice input method includes the steps of: (A) initiating a speech recognition process by a first input associated with a first parameter of a first speech recognition subject; (B) providing a voice and a searching space constructed by a speech recognition model associated with the first speech recognition subject; (C) obtaining a sub-searching space from the searching space based on the first parameter; (D) searching at least one candidate item associated with the voice from the sub-searching space; and (E) showing the at least one candidate item.11-05-2009
20080281596CONTINUOUS ADAPTATION IN DETECTION SYSTEMS VIA SELF-TUNING FROM TARGET POPULATION SUBSETS - The present invention provides a system and method for treating distortion propagated though a detection system. The system includes a compensation module that compensates for untreated distortions propagating through the detection compensation system, a user model pool that comprises of a plurality of model sets, and a model selector that selects at least one model set from plurality of model sets in the user model pool. The compensation is accomplished by continually producing scores distributed according to a prescribed distribution for the at least one model set and mitigating the adverse effects of the scores being distorted and lying off a pre-set operating point.11-13-2008
20080228484Techniques for Aiding Speech-to-Speech Translation - Techniques for assisting in translation are provided. A speech recognition hypothesis is obtained, corresponding to a source language utterance. Information retrieval is performed on a supplemental database, based on a situational context, to obtain at least one word string that is related to the source language utterance. The speech recognition hypothesis and the word string are then formatted for display to a user, to facilitate an appropriate selection by the user for translation.09-18-2008
20080228483Method, Device And System for Implementing Speech Recognition Function - The present disclosure discloses a method, a device and a system for implementing a speech recognition function, in which a media resource control device controls a media resource processing device to recognize a speech input by a user via H.248 protocol. The method includes receiving, by the media resource processing device, an H.248 message carrying a speech recognition instruction and a related parameter sent by the media resource control device; performing speech recognition according to the speech recognition instruction and the parameter; and reporting a recognition result to the media resource control device. A corresponding device and system for implementing the speech recognition function is further provided.09-18-2008
20080312927Computer assisted interactive writing instruction method - A computer assisted method of instruction for writing that is directed toward students of English as a second language. The method includes instruction for a plurality of composition types with interactive assistance in the writing strategies and tone of writing. The student selects the type of composition and can choose from a plurality of instructions in writing strategy displayed and with audio playback. The student interactively proceeds with writing and revision as the strategies are reviewed until completion of the writing process.12-18-2008
20120185252CONFIDENCE MEASURE GENERATION FOR SPEECH RELATED SEARCHING - A method of generating a confidence measure generator is provided for use in a voice search system, the voice search system including voice search components comprising a speech recognition system, a dialog manager and a search system. The method includes selecting voice search features, from a plurality of the voice search components, to be considered by the confidence measure generator in generating a voice search confidence measure. The method includes training a model, using a computer processor, to generate the voice search confidence measure based on selected voice search features.07-19-2012
20080270133Speech model refinement with transcription error detection - Reliable transcription error-checking algorithm that uses a word confidence score and a word duration probability to detect transcription errors for improved results through the automatic detection of transcription errors in a corpus. The transcription error-checking algorithm is combined model training so as to use a current model to detect transcription errors, remove utterances which contain incorrect transcription (or manually fix the found errors), and retrain the model. This process can be repeated for several iterations to obtain an improved speech recognition model. The speech model is employed to achieve speech-transcription alignment to obtain a word boundary. Speech recognizer is then utilized to generate a word-lattice. Using the word boundary and word lattice, error detection is computed using a word confidence score and a word duration probability.10-30-2008
20120143609METHOD AND SYSTEM FOR PROVIDING SPEECH RECOGNITION - An approach for providing speech recognition is disclosed. A name is retrieved from a user based on data provided by the user. The user is prompted for a name of the user. A first audio input is received from the user in response to the prompt. Speech recognition is applied to the first audio input using a name grammar database to output a recognized name. A determination is made whether the recognized name matches the retrieved name. If no match is determined, the user is re-prompted for the name of the user for a second audio input. Speech recognition is applied to the second audio input using a confidence database having entries less than the name grammar database.06-07-2012
20110270612Computer-Implemented Systems and Methods for Estimating Word Accuracy for Automatic Speech Recognition - Systems and methods are provided for scoring non-native, spontaneous speech. A spontaneous speech sample is received, where the sample is of spontaneous speech spoken by a non-native speaker. Automatic speech recognition is performed on the sample using an automatic speech recognition system to generate a transcript of the sample, where a speech recognizer metric is determined by the automatic speech recognition system. A word accuracy rate estimate is determined for the transcript of the sample generated by the automatic speech recognition system based on the speech recognizer metric. The spontaneous speech sample is scored using a preferred scoring model when the word accuracy rate estimate satisfies a threshold, and the spontaneous speech sample is scored using an alternate scoring model when the word accuracy rate estimate fails to satisfy the threshold.11-03-2011
20090083034VEHICLE CONTROL - The present invention relates to voice-activated vehicle control, and to the control of UAVs (unmanned air vehicles) using speech in particular. A method of controlling a vehicle is provided that includes receiving one or more instructions issued as speech and analyzing the speech using speech recognition software to provide a sequence of words and a word confidence measure for each word so recognized. The sequence of words is analyzed to identify a semantic concept corresponding to an instruction based on the analysis, and a semantic confidence level for the semantic concept identified derived at least in part with reference to the word confidence measures of the words associated with the semantic concept. A spoken confirmation of the semantic concept so identified based on the semantic confidence level is provided, and the semantic concept is used to provide a control input for the vehicle.03-26-2009
20090006095LEARNING TO REORDER ALTERNATES BASED ON A USER'S PERSONALIZED VOCABULARY - Learning to reorder alternates based on a user's personalized vocabulary may be provided. An alternate list provided to a user for replacing words input by the user via a character recognition application may be reordered based on data previously viewed or input by the user (personal data). The alternate list may contain generic data, for example, words for possible substitution with one or more words input by the user. By using the user's personal data and statistical learning methodologies in conjunction with generic data in the alternate list, the alternate list can be reordered to present a top alternate that more closely reflect the user's vocabulary. Accordingly, the user is presented with a top alternate that is more likely to be used by the user to replace data incorrectly input.01-01-2009
20090094033Systems and methods of performing speech recognition using historical information - Embodiments of the present invention improve speech recognition using historical information. In one embodiment, the present invention includes a method of performing speech recognition comprising receiving an identifier specifying a user of a kiosk, retrieving history information about the user using the identifier, receiving speech input, recognizing said speech input in the context of a first recognition set, resulting in first recognition results, and modifying the first recognition results using the history information.04-09-2009
20090012790SPEECH RECOGNITION APPARATUS AND CONTROL METHOD THEREOF - A speech recognition apparatus which improves the sound quality of speech output as a speech recognition result is provided. The speech recognition apparatus includes a recognition unit, which recognizes speech based on a recognition dictionary, and a registration unit, which registers a dictionary entry of a new recognition word in the recognition dictionary. The recognition unit includes a generation unit, which generates a dictionary entry including speech of the new recognition word item and feature parameters of the speech, and a modification unit, which makes a modification for improving the sound quality of the speech included in the dictionary entry generated by the generation unit. The recognition unit includes a speech output unit, which outputs speech which is included in a dictionary entry corresponding to the recognition result of input speech, and is modified by the modification unit.01-08-2009
20110144996ANALYZING AND PROCESSING A VERBAL EXPRESSION CONTAINING MULTIPLE GOALS - Disclosed is a method for parsing a verbal expression received from a user to determine whether or not the expression contains a multiple-goal command. Specifically, known techniques are applied to extract terms from the verbal expression. The extracted terms are assigned to categories. If two or more terms are found in the parsed verbal expression that are in associated categories and that do not overlap one another temporally, then the confidence levels of these terms are compared. If the confidence levels are similar, then the terms may be parallel entries in the verbal expression and may represent multiple goals. If a multiple-goal command is found, then the command is either presented to the user for review and possible editing or is executed. If the parsed multiple-goal command is presented to the user for review, then the presentation can be made via any appropriate interface including voice and text interfaces.06-16-2011
20110144995SYSTEM AND METHOD FOR TIGHTLY COUPLING AUTOMATIC SPEECH RECOGNITION AND SEARCH - Disclosed herein are systems, methods, and computer-readable storage media for performing a search. A system configured to practice the method first receives from an automatic speech recognition (ASR) system a word lattice based on speech query and receives indexed documents from an information repository. The system composes, based on the word lattice and the indexed documents, at least one triple including a query word, selected indexed document, and weight. The system generates an N-best path through the word lattice based on the at least one triple and re-ranks ASR output based on the N-best path. The system aggregates each weight across the query words to generate N-best listings and returns search results to the speech query based on the re-ranked ASR output and the N-best listings. The lattice can be a confusion network, the arc density of which can be adjusted for a desired performance level.06-16-2011
20090063148CALIBRATION OF WORD SPOTS SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT - An example embodiment of the invention may include a system, a method and/or a computer program product for enabling calibrating of word spots resulting from a spoken query, including, e.g., but not limited to, presenting a plurality of word spots to a user, each of the plurality of word spots having a confidence level; determining by the user whether at least one of the plurality of word spots is a hit or a false positive by determining whether the at least one of the plurality of word spots matches at least one word In the spoken query; receiving a maximum acceptable percentage of false positives from the user; and determining an acceptable confidence threshold value for the spoken query by locating the smallest confidence level in the plurality of word spots below which the percentage of word spots in the plurality of word spots that are false positives exceeds the maximum acceptable percentage of false positives.03-05-2009
20090055180System and method for optimizing speech recognition in a vehicle - A system is provided for controlling personalized settings in a vehicle. The system includes a microphone for receiving spoken commands from a person in the vehicle, a location recognizer for identifying location of the speaker, and an identity recognizer for identifying the identity of the speaker. The system also includes a speech recognizer for recognizing the received spoken commands. The system further includes a controller for processing the identified location, identity and commands of the speaker. The controller controls one or more feature settings based on the identified location, identified identity and recognized spoken commands of the speaker. The system also optimizes on the beamforming microphone array used in the vehicle.02-26-2009
20090055181MOBILE TERMINAL AND METHOD OF INPUTTING MESSAGE THERETO - A mobile terminal and a method of inputting a message thereto are provided. The method of inputting a message includes analyzing an input voice signal and determining whether the voice signal corresponds to a message modification instruction, and modifying, if the voice signal corresponds to a message modification instruction, a message according to the voice signal. A user can thereby input a message through the input of a voice in the mobile terminal and modify the input message.02-26-2009
20090276220MEASURING DOUBLE TALK PERFORMANCE - A system evaluates a hands free communication system. The system automatically selects a consonant-vowel-consonant (CVC), vowel-consonant-vowel (VCV), or other combination of sounds from an intelligent database. The selection is transmitted with another communication stream that temporally overlaps the selection. The quality of the communication system is evaluated through an automatic speech recognition engine. The evaluation occurs at a location remote from the transmitted selection.11-05-2009
20090063147PHONETIC, SYNTACTIC AND CONCEPTUAL ANALYSIS DRIVEN SPEECH RECOGNITION SYSTEM AND METHOD - A new approach to speech recognition that reacts to concepts conveyed through speech, which shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. A probabilistically unbiased multi-phoneme recognition process is employed, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response. Applications include improving accuracy or automatically generating punctuation for transcription and dictation, word or concept spotting in audio streams, concept spotting in electronic text, customer support, call routing and other command/response scenarios.03-05-2009
20090210228System for Dynamic Management of Customer Direction During Live Interaction - A system for customer interaction includes a telephony-enabled device for receiving voice calls from customers, a voice recognition engine connected to the telephony-enabled device for monitoring the voice channel, and an application server connected to the voice recognition engine for receiving notification when specific keywords phrases or tones are detected. The system is characterized in that the application server selects scripts for presentation to the customer based at least in part on the notifications received from the voice recognition engine.08-20-2009
20080288254Voice recognition apparatus and navigation apparatus - A voice recognition apparatus recognizes speaker's voice collected by a microphone, determines whether a telephone number is grouped into categories based on an inclusion of vocabulary in the telephone number that divides the telephone number into groups such as an area code, a city code and a subscriber number, and displays the telephone number in a display part in a grouped form of the area code, city code and subscriber number.11-20-2008
20110144994Automatic Sound Level Control - In one or more embodiments, one or more methods and/or systems described can perform determining two or more words of a written language from first data, determining at least one of a noise level external to a mobile device and a location of the mobile device, determining a sound output level based on the at least one of the noise level external to the mobile device and the location of the mobile device, and generating sound data based on the two or more words of the written language and the sound output level. The first data can include, for example, portable document format data that can include first text and/or an image that can include second text. In one or more embodiments, the location can be determined by using at least one of a global positioning system receiver and a location of an access point communicating with the mobile device.06-16-2011
20090210229Processing Received Voice Messages - A voice message processing system shortens received voice messages to reduce the time a user must spend in reviewing the user's voice messages. In some embodiments, a data file associated with a caller is created and updated with words and associated audio files that may be used to replace longer words or phrases in future voice messages from the caller. A user may manually configure preferences to aggressively shorten messages in some embodiments. A speech synthesizer may be employed to replace text in messages when sufficient audio files are not stored to provide sufficient processing of messages. An audible indicator may be played with a revised message to allow a user to play back at least a portion of the original, received message without the substituted portions. Such systems provide a user the opportunity to review messages in a reduced time.08-20-2009
20090248415USE OF METADATA TO POST PROCESS SPEECH RECOGNITION OUTPUT - A method of utilizing metadata stored in a computer-readable medium to assist in the conversion of an audio stream to a text stream. The method compares personally identifiable data, such as a user's electronic address book and/or Caller/Recipient ID information (in the case of processing voice mail to text), to the n-best results generated by a speech recognition engine for each word that is output by the engine. A goal of this comparison is to correct a possible misrecognition of a spoken proper noun such as a name or company with its proper textual form or a spoken phone number to correctly formatted phone number with Arabic numerals to improve the overall accuracy of the output of the voice recognition system.10-01-2009
20090254344ACTIVE LABELING FOR SPOKEN LANGUAGE UNDERSTANDING - A spoken language understanding method and system are provided. The method includes classifying a set of labeled candidate utterances based on a previously trained classifier, generating classification types for each candidate utterance, receiving confidence scores for the classification types from the trained classifier, sorting the classified utterances based on an analysis of the confidence score of each candidate utterance compared to a respective label of the candidate utterance, and rechecking candidate utterances according to the analysis. The system includes modules configured to control a processor in the system to perform the steps of the method.10-08-2009
20100153111INPUT DEVICE AND INPUT METHOD FOR MOBILE BODY - Provided is an input device for a mobile body, the input device allowing a safe input operation at the time of operating an equipment such as a car regardless of whether the mobile body is traveling or stopping. The input device includes: an input section (06-17-2010
20100185446SPEECH RECOGNITION SYSTEM AND DATA UPDATING METHOD - It is provided a speech recognition system installed in a terminal coupled to a server via a network. The terminal holds map data including a landmark. The speech recognition system manages recognition data including a word corresponding to a name of the landmark, and sends update area information and updated time to the server. The server generates, when recognition data of the area of the update area information sent from the terminal has been changed after updated time, difference data between latest recognition data and recognition data of the update area information at a time of the updated time, and sends the generated difference data and map data of the update area information to the terminal. The terminal updates the map data based on the map data sent from the server. The speech recognition system updates the recognition data managed by the terminal based on the difference data.07-22-2010
20080215326SPEAKER ADAPTATION OF VOCABULARY FOR SPEECH RECOGNITION - A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.09-04-2008
20090171662Robust Information Extraction from Utterances - The performance of traditional speech recognition systems (as applied to information extraction or translation) decreases significantly with, larger domain size, scarce training data as well as under noisy environmental conditions. This invention mitigates these problems through the introduction of a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language. The predictive features are combined with text classifiers to map the noisy text to one of the semantically or functionally similar groups. The features used by the classifier can be syntactic, semantic, and statistical.07-02-2009
20100161333ADAPTIVE PERSONAL NAME GRAMMARS - In one embodiment, an adaptive personal name grammar improves speech recognition by limiting or weighting the scope of potential addressable names based upon meta-information relative to the communications patterns, environmental considerations, or sociological/professional hierarchy of a user to increase the likelihood of a positive match.06-24-2010
20100228548TECHNIQUES FOR ENHANCED AUTOMATIC SPEECH RECOGNITION - Techniques for enhanced automatic speech recognition are described. An enhanced ASR system may be operative to generate an error correction function. The error correction function may represent a mapping between a supervised set of parameters and an unsupervised training set of parameters generated using a same set of acoustic training data, and apply the error correction function to an unsupervised testing set of parameters to form a corrected set of parameters used to perform speaker adaptation. Other embodiments are described and claimed.09-09-2010
20100217597Systems and Methods for Monitoring Speech Data Labelers - Systems and methods for using an annotation guide to label utterances and speech data with a call type are disclosed. A method embodiment monitors labelers of speech data by presenting via a processor a test utterance to a labeler, receiving input from the labeler that selects a particular call type from a list of call types and determining via the processor if the labeler labeled the test utterance correctly. Based on the determining step, the method performs at least one of the following: revising the annotation guide, retraining the labeler or altering the test utterance.08-26-2010
20090292540SYSTEM AND METHOD FOR EXCERPT CREATION - A method including displaying content on a display of a device, receiving a speech input designating a segment of the content to be excerpted and transferring the excerpted content to a predetermined location for storage and retrieval.11-26-2009
20090138264SPEECH TO DTMF GENERATION - A method of speech to DTMF generation involving ASR-enabled and DTMF-controlled communications systems. The ASR-enabled system is used to recognize speech received from the DTMF-controlled telecommunications system using sampling rate independent speech recognition. It then identifies a speech segment contained in the speech received from the DTMF-controlled system that corresponds with at least one keyword associated with user-defined data. Then, the ASR-enabled system transmits at least one DTMF signal to the DTMF-controlled system in response to the identified speech segment. This allows a user of an ASR-enabled system such as a vehicle telematics unit to at least partially automate access to the DTMF-controlled system using the telematics unit, so that voice mailbox numbers, passwords, and the like normally entered via a telephone keypad can be automatically sent to the DTMF-controlled system from the telematics unit without having to be manually input each time by the user.05-28-2009
20130218562Sound Recognition Operation Apparatus and Sound Recognition Operation Method - According to one embodiment, a sound recognition operation apparatus includes a sound detection module, a keyword detection module, an audio mute module, and a transmission module. The sound detection module is configured to detect sound. The keyword detection module is configured to detect a particular keyword using voice recognition when the sound detection module detects sound. The audio mute module is configured to transmit an operation signal for muting audio sound when the keyword detection module detects the keyword. The transmission module is configured to recognize the voice command after the keyword is detected by the keyword detection module, and transmit an operation signal corresponding to the voice command.08-22-2013
20090319272METHOD AND SYSTEM FOR VOICE ORDERING UTILIZING PRODUCT INFORMATION - A method for voice ordering utilizing catalog taxonomies and hierarchical categorization relationships in product information management (PIM) systems includes: prompting a user with a query to input speech into a speech recognition engine; translating the inputted speech into a series of words; querying a product information management component (PIM) based on the series of words; wherein the querying is performed as a matching algorithm against PIM category and attribute keywords; returning coded results to a voice synthesizer to produce at least one of: a voice response, and a text response to the user; and wherein in the coded results indicate one or more of: a not found message for zero matches, a confirmation of a suitable single match, a request for additional information in the event one or more of the following occurs: more than one matching item, category, and item attribute was found in the PIM.12-24-2009
20090112593SYSTEM FOR RECOGNIZING SPEECH FOR SEARCHING A DATABASE - A system is provided for recognizing speech for searching a database. The system receives speech input as a spoken search request and then processes the speech input in a speech recognition step using a vocabulary for recognizing the spoken request. By processing the speech input words recognized in the speech input and included in the vocabulary are obtained to form at least one hypothesis. The hypothesis is then utilized to search a database using the at least one hypothesis as a search query. A search result is then received from the database and provided to the user.04-30-2009
20090106026Speech recognition method, device, and computer program - A speech recognition method including for a spoken expression: a) providing a vocabulary of words including predetermined subsets of words, b) assigning to each word of at least one subset an individual score as a function of the value of a criterion of the acoustic resemblance of that word to a portion of the spoken expression, c) for a plurality of subsets, assigning to each subset of the plurality of subsets a composite score corresponding to a sum of the individual scores of the words of said subset, d) determining at least one preferred subset having the highest composite score.04-23-2009
20090037174Understanding spoken location information based on intersections - In one embodiment, the present system recognizes a user's speech input using an automatically generated probabilistic context free grammar for street names that maps all pronunciation variations of a street name to a single canonical representation during recognition. A tokenizer expands the representation using position-dependent phonetic tokens and an intersection classifier classifies an intersection, despite the presence of recognition errors and incomplete street names.02-05-2009
20090070111METHODS, SYSTEMS, AND COMPUTER PROGRAM PRODUCTS FOR SPOKEN LANGUAGE GRAMMAR EVALUATION - A method, system, and computer program product for spoken language grammar evaluation are provided. The method includes playing a recorded question to a candidate, recording a spoken answer from the candidate, and converting the spoken answer into text. The method further includes comparing the text to a grammar database, calculating a spoken language grammar evaluation score based on the comparison, and outputting the spoken language grammar evaluation score.03-12-2009
20090164216IN-VEHICLE CIRCUMSTANTIAL SPEECH RECOGNITION - A method of circumstantial speech recognition in a vehicle. A plurality of parameters associated with a plurality of vehicle functions are monitored as an indication of current vehicle circumstances. At least one vehicle function is identified as a candidate for user-intended ASR control based on user interaction with the vehicle. The identified vehicle function is then used to disambiguate between potential commands contained in speech received from the user.06-25-2009
20090138265Joint Discriminative Training of Multiple Speech Recognizers - Adjusting model parameters is described for a speech recognition system that combines recognition outputs from multiple speech recognition processes. Discriminative adjustments are made to model parameters of at least one acoustic model based on a joint discriminative criterion over multiple complementary acoustic models to lower recognition word error rate in the system.05-28-2009
20100332229APPARATUS CONTROL BASED ON VISUAL LIP SHARE RECOGNITION - An information processing apparatus that includes an image acquisition unit to acquire a temporal sequence of frames of image data, a detecting unit to detect a lip area and a lip image from each of the frames of the image data, a recognition unit to recognize a word based on the detected lip images of the lip areas, and a controller to control an operation at the information processing apparatus based on the word recognized by the recognition unit.12-30-2010
20080319748Conversation System and Conversation Software - A first domain satisfying a first condition concerning a current utterance understanding result and a second domain satisfying a second condition concerning a selection history are specified. For each of the first and second domains, indices representing reliability in consideration of the utterance understanding history, selection history, and utterance generation history are evaluated. Based on the evaluation results, one of the first, second, and third domains is selected as a current domain according to a selection rule.12-25-2008
20110029313METHODS AND SYSTEMS FOR ADAPTING A MODEL FOR A SPEECH RECOGNITION SYSTEM - Methods are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. A method for model adaptation for a speech recognition system includes determining an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The method may further include adjusting an adaptation, of the model for the word or various models for the various words, based on the error rate. Apparatus are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. An apparatus for model adaptation for a speech recognition system includes a processor adapted to estimate an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The apparatus may further include a controller adapted to adjust an adaptation of the model for the word or various models for the various words, based on the error rate.02-03-2011
20110040562WORD CLOUD AUDIO NAVIGATION - The present invention is directed generally to linking a collection of words and/or phrases with locations in a video and/or audio stream where the words and/or phrases occur and/or associations of a collection of words and/or phrases with a call history.02-17-2011
20090063149Speech retrieval apparatus - A speech retrieval apparatus derives a times series of pitch or power values of speech input as a retrieval condition, obtains a pattern of local maxima, local minima, and inflection points in the time series, compares this pattern with similar patterns obtained from speech stored in a speech database, and outputs only stored speech for which the compared patterns approximately match. Correct retrieval results are thereby obtained even from speech input including multiple accent nuclei.03-05-2009
20110125499SPEECH RECOGNITION - Systems, methods, and apparatus, including computer program products for accepting a predetermined vocabulary-dependent characterization of a set of audio signals, the predetermined characterization including an identification of putative occurrences of each of a plurality of vocabulary items in the set of audio signals, the plurality of vocabulary items included in the vocabulary; accepting a new vocabulary item not included in the vocabulary; accepting putative occurrences of the new vocabulary item in the set of audio signals; and generating, by an analysis engine of a speech processing system, an augmented characterization of the set of audio signals based on the identified putative occurrences of the new vocabulary item.05-26-2011
20120035930Keyword Alerting in Conference Calls - A conferencing system is disclosed in which a participant to a conference call can program the embodiment to listen for one or more “keywords” in the conference call. The keywords might be a participant's name or words associated with him or her or words associated with his or her area of knowledge. The embodiments uses speech recognition technology to listen for those words. When the embodiments detects that those words have been spoken, the embodiment alerts the participant—using audible, visual, and/or tactile signals—that the participant's attention to the call is warranted. When the keywords are chosen wisely, the benefit can be great.02-09-2012
20100185445MACHINE, SYSTEM AND METHOD FOR USER-GUIDED TEACHING AND MODIFYING OF VOICE COMMANDS AND ACTIONS EXECUTED BY A CONVERSATIONAL LEARNING SYSTEM - A machine, system and method for user-guided teaching and modifications of voice commands and actions to be executed by a conversational learning system. The machine includes a system bus for communicating data and control signals received from the conversational learning system to a computer system, a vehicle data and control bus for connecting devices and sensors in the machine, a bridge module for connecting the vehicle data and control bus to the system bus, machine subsystems coupled to the vehicle data and control bus having a respective user interface for receiving a voice command or input signal from a user, a memory coupled to the system bus for storing action command sequences learned for a new voice command and a processing unit coupled to the system bus for automatically executing the action command sequences learned when the new voice command is spoken.07-22-2010
20090216533STORED PHRASE REUTILIZATION WHEN TESTING SPEECH RECOGNITION - A set of audio phrases and corresponding phrase characteristics can be maintained, such as in a database. The phrase characteristics can include a translation of speech in the associated audio phrase. A finite state grammar that includes a set of textual phrases can be received. A software algorithm can execute to compare the set of textual phrases against the translations associated with the maintained audio phrases. A result of the software algorithm execution can be produced, where the result indicates phrase coverage for the finite state grammar based upon the audio phrases.08-27-2009
20090216534VOICE-ACTIVATED EMERGENCY MEDICAL SERVICES COMMUNICATION AND DOCUMENTATION SYSTEM - A method of documenting information as well as a documentation and communication system for documenting information with a wearable computing device of the type that includes a processing unit and a touchscreen display is provided. The method includes displaying at least one screen on the touchscreen display. A field on the screen in which to enter data is selected and speech input from a user is received. The speech input is converted to machine readable input and the machine readable input is displayed in the field on the at least one screen.08-27-2009
20110178802APPARATUS FOR CLASSIFYING OR DISAMBIGUATING DATA - A computing system has a data storage device (07-21-2011
20100063818MULTI-TIERED VOICE FEEDBACK IN AN ELECTRONIC DEVICE - This invention is directed to providing voice feedback to a user of an electronic device. Because each electronic device display may include several speakable elements (i.e., elements for which voice feedback is provided), the elements may be ordered. To do so, the electronic device may associate a tier with the display of each speakable element. The electronic device may then provide voice feedback for displayed speakable elements based on the associated tier. To reduce the complexity in designing the voice feedback system, the voice feedback features may be integrated in a Model View Controller (MVC) design used for displaying content to a user. For example, the model and view of the MVC design may include additional variables associated with speakable properties. The electronic device may receive audio files for each speakable element using any suitable approach, including for example by providing a host device with a list of speakable elements and directing a text to speech engine of the host device to generate and provide the audio files.03-11-2010
20110153328OBSCENE CONTENT ANALYSIS APPARATUS AND METHOD BASED ON AUDIO DATA ANALYSIS - Provided is an obscene content analysis apparatus and method. The obscene content analysis apparatus includes a content input unit that receives content, an input data buffering unit that buffers the received content, wherein buffering is performed on content corresponding to a length of a previously set analysis section or a length longer than the analysis section, an obscenity analysis determining unit that determines whether or not the analysis section of audio data extracted from the buffered content is obscene by using a previously generated audio-based obscenity determining model and marks the analysis section with an obscenity mark when the analysis section is determined as obscene, a reproduction data buffering unit that accumulates and stores content in which obscenity has been determined by the obscenity analysis determining unit, and a content reproducing unit that reproduces the content while blocking the analysis section marked with the obscenity mark.06-23-2011
20110071833SPEECH RETRIEVAL APPARATUS AND SPEECH RETRIEVAL METHOD - Disclosed are a speech retrieval apparatus and a speech retrieval method for searching, in a speech database, for an audio file matching an input search term by using an acoustic model serialization code, a phonemic code, a sub-word unit, and a speech recognition result of speech. The speech retrieval apparatus comprises a first conversion device, a first division device, a first speech retrieval unit creation device, a second conversion device, a second division device, a second speech retrieval unit creation device, and a matching device. The speech retrieval method comprises a first conversion step, a first division step, a first speech retrieval unit creation step, a second conversion step, a second division step, a second speech retrieval unit creation step, and a matching step.03-24-2011
20110060589Multi-Purpose Contextual Control - A method and a system for activating functions including a first function and a second function, wherein the system is embedded in an apparatus, are disclosed. The system includes a control configured to be activated by a plurality of activation styles, wherein the control generates a signal indicative of a particular activation style from multiple activation styles; and controller configured to activate either the first function or the second function based on the particular activation style, wherein the first function is configured to be executed based only on the activation style, and wherein the second function is further configured to be executed based on a speech input.03-10-2011
20080215327Method For Processing Speech Data For A Distributed Recognition System - Speech signal information is formatted, processed and transported in accordance with a format adapted for TCP/IP protocols used on the Internet and other communications networks. NULL characters are used for indicating the end of a voice segment. The method is useful for distributed speech recognition systems such as a client-server system, typically implemented on an intranet or over the Internet based on user queries at his/her computer, a PDA, or a workstation using a speech input interface.09-04-2008
20100063820CORRELATING VIDEO IMAGES OF LIP MOVEMENTS WITH AUDIO SIGNALS TO IMPROVE SPEECH RECOGNITION - A speech recognition device can include an audio signal receiver configured to receive audio signals from a speech source, a video signal receiver configured to receive video signals from the speech source, and a processing unit configured to process the audio signals and the video signals. In addition, the speech recognition device can include a conversion unit configured to convert the audio signals and the video signals to recognizable speech, and an implementation unit configured to implement a task based on the recognizable speech.03-11-2010
20100063819LANGUAGE MODEL LEARNING SYSTEM, LANGUAGE MODEL LEARNING METHOD, AND LANGUAGE MODEL LEARNING PROGRAM - A language model learning system for learning a language model on an identifiable basis relating to a word error rate used in speech recognition. The language model learning system (03-11-2010
20100036666METHOD AND SYSTEM FOR PROVIDING META DATA FOR A WORK - A method for providing meta data for a work includes designating a file for uploading data associated therewith to a telematics unit operatively connected to a vehicle; and using meta data associated with the designed file, obtaining phonetic meta data for the designed file from an on-line service. The method further includes creating a phonetic meta data file associated with the designed file and including the obtained phonetic meta data, and transferring the phonetic metal data file to the telematics unit. Also disclosed herein is a system for providing the same.02-11-2010
20080243505Method for variable resolution and error control in spoken language understanding - A method for variable resolution and error control in spoken language understanding (SLU) allows arranging the categories of the SLU into a hierarchy of different levels of specificity. The pre-determined hierarchy is used to identify different types of errors such as high-cost errors and low-cost errors and trade, if necessary, high cost errors for low cost errors.10-02-2008
20090119107SPEECH RECOGNITION BASED ON SYMBOLIC REPRESENTATION OF A TARGET SENTENCE - Systems and methods for processing a user speech input to determine whether the user has correctly read a target sentence string are provided. One disclosed method may include receiving a sentence array including component words of the target sentence string and processing the sentence array to generate a symbolic representation of the target sentence string. The symbolic representation may include a subset of words selected from the component words of the target sentence string, having fewer words than the sentence array. The method may include processing user speech input to recognize in the user speech input each of the words in the subset of words in the symbolic representation of the target sentence string. The method may further include, upon recognizing the subset of words, making a determination that the user has correctly read the target sentence string.05-07-2009
20100324899VOICE RECOGNITION SYSTEM, VOICE RECOGNITION METHOD, AND VOICE RECOGNITION PROCESSING PROGRAM - A speech recognition system for rapidly performing recognition processing while maintaining quality of speech recognition in a speech recognition device, are provided. A speech recognition system includes a speech input device which inputs speech and displays a recognition result, and a speech recognition device which receives the speech from the speech input device, performs recognition processing, and sends back the speech to the speech input device. The speech input device includes a user dictionary section which stores words used for recognizing the input speech, and a reduced user dictionary creation unit which extracts words corresponding to the input speech from the user dictionary and creates a reduced user dictionary. The speech recognition device has a speech recognition unit which inputs the input speech and the reduced user dictionary from the speech input/output device and recognizes the input speech based on the reduced user dictionary and a system dictionary provided beforehand.12-23-2010
20110125501Method and device for automatic recognition of given keywords and/or terms within voice data - The present invention relates to a method of and a device (05-26-2011
20110071834SYSTEM AND METHOD FOR IMPROVING TEXT INPUT IN A SHORTHAND-ON-KEYBOARD INTERFACE - A word pattern recognition system improves text input entered via a shorthand-on-keyboard interface. A core lexicon comprises commonly used words in a language; an extended lexicon comprises words not included in the core lexicon. The system only directly outputs words from the core lexicon. Candidate words from the extended lexicon can be outputted and simultaneously admitted to the core lexicon upon user selection. A concatenation module enables a user to input parts of a long word separately. A compound word module combines two common shorter words whose concatenation forms a long word.03-24-2011
20090292541METHODS AND APPARATUS FOR ENHANCING SPEECH ANALYTICS - Methods and apparatus for the enhancement of speech to text engines, by providing indications to the correctness of the found words, based on additional sources besides the internal indication provided by the STT engine. The enhanced indications comprise sources of data such as acoustic features, CTI features, phonetic search and others. The apparatus and methods also enable the detection of important or significant keywords found in audio files, thus enabling more efficient usages, such as further processing or transfer of interactions to relevant agents, escalation of issues, or the like. The methods and apparatus employ a training phase in which word model and key phrase model are generated for determining an enhanced correctness indication for a word and an enhanced importance indication for a key phrase, based on the additional features.11-26-2009
20110191106WORD RECOGNITION SYSTEM AND METHOD FOR CUSTOMER AND EMPLOYEE ASSESSMENT - One-to-many comparisons of callers' words and/or voice prints with known words and/or voice prints to identify any substantial matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract different words, such as words of anger. The system may also segment at least a portion of the customer's voice to create a tone profile, and it formats the segmented words and tone profiles for network transmission to a server. The server compares the customer's words and/or tone profiles with multiple known words and/or tone profiles stored on a database to determine any substantial matches. The identification of any matches may be used for a variety of purposes, such as providing representative feedback or customer follow-up.08-04-2011
20110191105Systems and Methods for Word Offensiveness Detection and Processing Using Weighted Dictionaries and Normalization - Computer-implemented systems and methods are provided for identifying language that would be considered obscene or otherwise offensive to a user or proprietor of a system. A plurality of offensive words are received, where each offensive word is associated with a severity score identifying the offensiveness of that word. A string of words is received. A distance between a candidate word and each offensive word in the plurality of offensive words is calculated, and a plurality of offensiveness scores for the candidate word are calculated, each offensiveness score based on the calculated distance between the candidate word and the offensive word and the severity score of the offensive word. A determination is made as to whether the candidate word is an offender word, where the candidate word is deemed to be an offender word when the highest offensiveness score in the plurality of offensiveness scores exceeds an offensiveness threshold value.08-04-2011
20110202342MULTI-MODAL WEB INTERACTION OVER WIRELESS NETWORK - A system, apparatus, and method is disclosed for receiving user input at a client device, interpreting the user input to identify a selection of at least one of a plurality of web interaction modes, producing a corresponding client request based in part on the user input and the web interaction mode; and sending the client request to a server via a network.08-18-2011
20100017210SYSTEM AND METHOD FOR SEARCHING STORED AUDIO DATA BASED ON A SEARCH PATTERN - A system for searching stored audio data is described. The system includes a memory configured to store audio data received from a radio receiver and a processing circuit. The processing circuit is configured to receive a search pattern, search the stored audio data for the search pattern, and provide audio data based on the search.01-21-2010
20080294440METHOD AND SYSTEM FOR ASSESSING PRONUNCIATION DIFFICULTIES OF NON-NATIVE SPEAKERSL - The present disclosure presents a useful metric for assessing the relative difficulty which non-native speakers face in pronouncing a given utterance and a method and systems for using such a metric in the evaluation and assessment of the utterances of non-native speakers. In an embodiment, the metric may be based on both known sources of difficulty for language learners and a corpus-based measure of cross-language sound differences. The method may be applied to speakers who primarily speak a first language speaking utterances in any non-native second language.11-27-2008
20110307257METHODS AND APPARATUS FOR REAL-TIME INTERACTION ANALYSIS IN CALL CENTERS - A method and system for indicating in real time that an interaction is associated with a problem or issue, comprising: receiving a segment of an interaction in which a representative of the organization participates; extracting a feature from the segment; extracting a global feature associated with the interaction; aggregating the feature and the global feature; and classifying the segment or the interaction in association with the problem or issue by applying a model to the feature and the global feature. The method and system may also use features extracted from earlier segments within the interaction. The method and system can also evaluate the model based on features extracted from training interactions and manual tagging assigned to the interactions or segments thereof.12-15-2011
20100217596WORD SPOTTING FALSE ALARM PHRASES - In one aspect, a method for processing media includes accepting a query. One or more language patterns are identified that are similar to the query. A putative instance of the query is located in the media. The putative instance is associated with a corresponding location in the media. The media in a vicinity of the putative instance is compared to the identified language patterns and data characterizing the putative instance of the query is provided according to the comparing of the media to the language patterns, for example, as a score for the putative instance that is determined according to the comparing of the media to the language patterns.08-26-2010
20120209610VOICED PROGRAMMING SYSTEM AND METHOD - Provided herein are systems and methods for using context-sensitive speech recognition logic in a computer to create a software program, including context-aware voice entry of instructions that make up a software program, automatic context-sensitive instruction formatting, and automatic context-sensitive insertion-point positioning.08-16-2012
20100023330SPEED PODCASTING - Embodiments of the present invention address deficiencies of the art in respect to podcasting and provide a method, system and computer program product for speed podcasting. In an embodiment of the invention, a speed podcasting method can include speech recognizing an audio portion of a podcast, parsing the speech recognized audio portion to identify essential words, and playing back only audio segments and corresponding video segments of the podcast including the essential words while excluding from playback audio segments and corresponding video segments of the podcast including non-essential words.01-28-2010
20120022871SPEECH RECOGNITION CIRCUIT USING PARALLEL PROCESSORS - A speech recognition circuit comprises a memory containing lexical data for word recognition, the lexical data comprising a plurality of lexical data structures stored in each of a plurality of parts of the memory; and a parallel processor structure connected to the memory to process speech parameters by performing parallel processing on a plurality of the lexical data structures.01-26-2012
20120059655METHODS AND APPARATUS FOR PROVIDING INPUT TO A SPEECH-ENABLED APPLICATION PROGRAM - Some embodiments are directed to allowing a user to provide speech input intended for a speech-enabled application program into a mobile communications device, such as a smartphone, that is not connected to the computer that executes the speech-enabled application program. The mobile communications device may provide the user's speech input as audio data to a broker application executing on a server, which determines to which computer the received audio data is to be provided. When the broker application determines the computer to which the audio data is to be provided, it sends the audio data to that computer. In some embodiments, automated speech recognition may be performed on the audio data before it is provided to the computer. In such embodiments, instead of providing the audio data, the broker application may send the recognition result generated from performing automated speech recognition to the identified computer.03-08-2012
20110093269METHOD AND SYSTEM FOR CONSIDERING INFORMATION ABOUT AN EXPECTED RESPONSE WHEN PERFORMING SPEECH RECOGNITION - A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system comprises adjusting the rejection threshold when speech input matches the predetermined expected response.04-21-2011
20110066435IMAGE TRANSMITTING APPARATUS, IMAGE TRANSMITTING METHOD, AND IMAGE TRANSMITTING PROGRAM EMBODIED ON COMPUTER READABLE MEDIUM - An MFP includes an accepting portion to accept an image and a speech, a speech recognition portion to recognize the accepted speech, a display screen generating portion, in response to an event that a keyword included in a predetermined output setting is recognized by the speech recognition portion, to generate a display screen in accordance with the output setting, the display screen including at least one of the accepted image and an image of object data that is stored in advance independently from the accepted image, and a transmission control portion to transmit, in accordance with the output setting, the generated display screen to at least one of a plurality of PCs operated respectively by a plurality of users.03-17-2011
20080270134Hybrid-captioning system - A hybrid-captioning system for editing captions for spoken utterances within video includes an editor-type caption-editing subsystem, a line-based caption-editing subsystem, and a mechanism. The editor-type subsystem is that in which captions are edited for spoken utterances within the video on a groups-of-line basis without respect to particular lines of the captions and without respect to temporal positioning of the captions in relation to the spoken utterances. The line-based subsystem is that in which captions are edited for spoken utterances within the video on a line-by-line basis with respect to particular lines of the captions and with respect to temporal positioning of the captions in relation to the spoken utterances. For each section of spoken utterances within the video, the mechanism is to select the editor-type or the line-based subsystem to provide captions for the section of spoken utterances in accordance with a predetermined criteria.10-30-2008
20120316878VOICE RECOGNITION GRAMMAR SELECTION BASED ON CONTEXT - The subject matter of this specification can be embodied in, among other things, a method that includes receiving geographical information derived from a non-verbal user action associated with a first computing device. The non-verbal user action implies an interest of a user in a geographic location. The method also includes identifying a grammar associated with the geographic location using the derived geographical information and outputting a grammar indicator for use in selecting the identified grammar for voice recognition processing of vocal input from the user.12-13-2012
20090132250ROBOT APPARATUS WITH VOCAL INTERACTIVE FUNCTION AND METHOD THEREFOR - The present invention provides a robot apparatus with a vocal interactive function. The robot apparatus 05-21-2009
20120215538Performance measurement for customer contact centers - In one embodiment, a method includes identifying a first communication from a customer, identifying a second communication from the customer following a response to the first communication from a contact center, and analyzing the first and second communications at a contact center network device to determine a change in sentiment from the first communication to the second communication. An apparatus for contact center performance measurement is also disclosed.08-23-2012
20120316877DYNAMICALLY ADDING PERSONALIZATION FEATURES TO LANGUAGE MODELS FOR VOICE SEARCH - A dynamic exponential, feature-based, language model is continually adjusted per utterance by a user, based on the user's usage history. This adjustment of the model is done incrementally per user, over a large number of users, each with a unique history. The user history can include previously recognized utterances, text queries, and other user inputs. The history data for a user is processed to derive features. These features are then added into the language model dynamically for that user.12-13-2012
20100049516METHOD OF USING MICROPHONE CHARACTERISTICS TO OPTIMIZE SPEECH RECOGNITION PERFORMANCE - A system and method for tuning a speech recognition engine to an individual microphone using a database containing acoustical models for a plurality of microphones. Microphone performance characteristics are obtained from a microphone at a speech recognition engine, the database is searched for an acoustical model that matches the characteristics, and the speech recognition engine is then modified based on the matching acoustical model.02-25-2010
20110125500AUTOMATED DISTORTION CLASSIFICATION - A method of and system for automated distortion classification. The method includes steps of (a) receiving audio including a user speech signal and at least some distortion associated with the signal; (b) pre-processing the received audio to generate acoustic feature vectors; (c) decoding the generated acoustic feature vectors to produce a plurality of hypotheses for the distortion; and (d) post-processing the plurality of hypotheses to identify at least one distortion hypothesis of the plurality of hypotheses as the received distortion. The system can include one or more distortion models including distortion-related acoustic features representative of various types of distortion and used by a decoder to compare the acoustic feature vectors with the distortion-related acoustic features to produce the plurality of hypotheses for the distortion.05-26-2011
20080300878Method For Transporting Speech Data For A Distributed Recognition System - Speech signal information is formatted, processed and transported in accordance with a format adapted for TCP/IP protocols used on the Internet and other communications networks. NULL characters are used for indicating the end of a voice segment. The method is useful for distributed speech recognition systems such as a client-server system, typically implemented on an intranet or over the Internet based on user queries at his/her computer, a PDA, or a workstation using a speech input interface.12-04-2008
20110004475METHODS AND APPARATUSES FOR AUTOMATIC SPEECH RECOGNITION - Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated. The first representation of the input signal is a discrete parameter representation. Second model parameters associated with a second representation of the input signal are generated. The second representation of the input signal includes a continuous parameter representation of residuals of the input signal. The first representation of the input signal includes discrete parameters representing first portions of the input signal. The second representation includes discrete parameters representing second portions of the input signal that are smaller than the first portions. Third model parameters are generated to couple the first representation of the input signal with the second representation of the input signal. The first representation and the second representation of the input signal are mapped into a vector space.01-06-2011
20120239402SPEECH RECOGNITION DEVICE AND METHOD - A speech recognition device includes, a speech recognition section that conducts a search, by speech recognition, on audio data stored in a first memory section to extract word-spoken portions where plural words transferred are each spoken and, of the word-spoken portions extracted, rejects the word-spoken portion for the word designated as a rejecting object; an acquisition section that obtains a derived word of a designated search target word, the derived word being generated in accordance with a derived word generation rule stored in a second memory section or read out from the second memory section; a transfer section that transfers the derived word and the search target word to the speech recognition section, the derived word being and set to the outputting object or the rejecting object by the acquisition section; and an output section that outputs the word-spoken portion extracted and not rejected in the search.09-20-2012
20110131046FEATURES FOR UTILIZATION IN SPEECH RECOGNITION - A computer-implemented speech recognition system described herein includes a receiver component that receives a plurality of detected units of an audio signal, wherein the audio signal comprises a speech utterance of an individual. A selector component selects a subset of the plurality of detected units that correspond to a particular time-span. A generator component generates at least one feature with respect to the particular time-span, wherein the at least one feature is one of an existence feature, an expectation feature, or an edit distance feature. Additionally, a statistical speech recognition model outputs at least one word that corresponds to the particular time-span based at least in part upon the at least one feature generated by the feature generator component.06-02-2011
20120323576AUTOMATED ADVERSE DRUG EVENT ALERTS - Event audio data that is based on verbal utterances associated with a pharmaceutical event associated with a patient may be received. Medical history information associated with the patient may be obtained, based on information included in a medical history repository. At least one text string that matches at least one interpretation of the event audio data may be obtained, based on information included in a pharmaceutical speech repository, information included in a speech accent repository, and a drug matching function, the at least one text string being associated with a pharmaceutical drug. One or more adverse drug event (ADE) alerts may be determined based on matching the at least one text string and medical history attributes associated with the at least one patient with ADE attributes obtained from an ADE repository. An ADE alert report may be generated, based on the determined one or more ADE alerts.12-20-2012
20110238419BINAURAL METHOD AND BINAURAL CONFIGURATION FOR VOICE CONTROL OF HEARING DEVICES - A binaural configuration and an associated method have/utilize first and second hearing devices for the voice control of the hearing devices by voice commands. The configuration contains a first voice recognition module in the first hearing device and a second voice recognition module in the second hearing device. The second voice recognition module uses information data from the first voice recognition module for recognition of the voice commands. It is here advantageous that the rate of erroneously recognized voice commands (“false alarms”) is reduced.09-29-2011
20120278078INPUT AND DISPLAYED INFORMATION DEFINITION BASED ON AUTOMATIC SPEECH RECOGNITION DURING A COMMUNICATION SESSION - Methods and systems for providing contextually relevant information to a user are provided. In particular, a user context is determined. The determination of the user context can be made from information stored on or entered in a user device. The determined user context is provided to an automatic speech recognition (ASR) engine as a watch list. A voice stream is monitored by the ASR engine. In response to the detection of a word on the watch list by the ASR engine, the context engine is notified. The context engine then modifies a display presented to the user, to provide a selectable item that the user can select to access relevant information.11-01-2012
20120089397LANGUAGE MODEL GENERATING DEVICE, METHOD THEREOF, AND RECORDING MEDIUM STORING PROGRAM THEREOF - A text in a corpus including a set of world wide web (web) pages is analyzed. At least one word appropriate for a document type set according to a voice recognition target is extracted based on an analysis result. A word set is generated from the extracted at least one word. A retrieval engine is caused to perform a retrieval process using the generated word set as a retrieval query of the retrieval engine on the Internet, and a link to a web page from the retrieval result is acquired. A language model for voice recognition is generated from the acquired web page.04-12-2012
20120095765AUTOMATICALLY PROVIDING A USER WITH SUBSTITUTES FOR POTENTIALLY AMBIGUOUS USER-DEFINED SPEECH COMMANDS - A method for alleviating ambiguity issues of new user-defined speech commands. An original command for a user-defined speech command can be received. It can then be determined if the original command is likely to be confused with a set of existing speech commands. When confusion is unlikely, the original command can be automatically stored. When confusion is likely, a substitute command that is unlikely to be confused with existing commands can be automatically determined. The substitute can be presented as an alternative to the original command and can be selectively stored as the user-defined speech command.04-19-2012
20110307258REAL-TIME APPLICATION OF INTERACTION ANLYTICS - A method and apparatus for providing real-time assistance related to an interaction associated with a contact center, comprising steps or components for: receiving at least a part of an audio signal of an interaction captured by a capturing device associated with an organization, and metadata information associated with the interaction; performing audio analysis of the at least part of the audio signal, while the interaction is still in progress to obtain audio information; categorizing at least a part of the metadata information and the audio information, to determine a category associated with the interaction, while the interaction is still in progress to obtain audio information; and taking an action associated with the category.12-15-2011
20110320202LOCATION VERIFICATION SYSTEM USING SOUND TEMPLATES - A system using sound templates is presented that may receive a first template for an audio signal and compares it to templates from different sound sources to determine a correlation between them. A location history database is created that assists in identifying the location of a user in response to audio templates generated by the user over time and at different locations. Comparisons can be made using templates of different richness to achieve confidence levels and confidence levels may be represented based on the results of the comparisons. Queries may be run against the database to track users by templates generated from their voice. In addition, background information may be filtered out of the voice signal and separately compared against the database to assist in identifying a location based on the background noise.12-29-2011
20110320201SOUND VERIFICATION SYSTEM USING TEMPLATES - An audio signal verification system is presented for verifying the sound is from a predetermined source. Various methods for analyzing the sound are presented and the various methods may be combined to vary degrees to determine an appropriate correlation with a predefined pattern. Moreover a confidence level or other indication may be used to indicate the determination was successful. The sound may be reduced to templates with varying degrees of richness. Also different templates may be created using the same sound source and different sounds from the same source may be aggregated to form a single template. Comparisons may be made comparing a sound or a template derived from that sound with stored sounds or templates derived from that stored sound. Moreover comparisons can be made using templates of different richness to achieve confidence levels and confidence levels may be represented based on the results of the comparisons.12-29-2011
20120290303SPEECH RECOGNITION SYSTEM AND METHOD BASED ON WORD-LEVEL CANDIDATE GENERATION - A speech recognition system and method based on word-level candidate generation are provided. The speech recognition system may include a speech recognition result verifying unit to verify a word sequence and a candidate word for at least one word included in the word sequence when the word sequence and the candidate word are provided as a result of speech recognition. A word sequence displaying unit may display the word sequence in which the at least one word is visually distinguishable from other words of the word sequence. The word sequence displaying unit may display the word sequence by replacing the at least one word with the candidate word when the at least one word is selected by a user.11-15-2012
20100169095DATA PROCESSING APPARATUS, DATA PROCESSING METHOD, AND PROGRAM - A data processing apparatus includes a speech recognition unit configured to perform continuous speech recognition on speech data, a related word acquiring unit configured to acquire a word related to at least one word obtained through the continuous speech recognition as a related word that is related to content corresponding to content data including the speech data, and a speech retrieval unit configured to retrieve an utterance of the related word from the speech data so as to acquire the related word whose utterance has been retrieved as metadata for the content.07-01-2010
20130013310SPEECH RECOGNITION SYSTEM - A speech recognition system comprising a recognition dictionary for use in speech recognition and a controller configured to recognize an inputted speech by using the recognition dictionary is disclosed. The controller detects a speech section based on a signal level of the inputted speech, recognizes a speech data corresponding to the speech section by using the recognition dictionary, and displays a recognition result of the recognition process and a correspondence item that corresponds to the recognition result in form of list. The correspondence item displayed in form of list is manually operable.01-10-2013
20110161083METHODS AND SYSTEMS FOR ASSESSING AND IMPROVING THE PERFORMANCE OF A SPEECH RECOGNITION SYSTEM - A method for assessing a performance of a speech recognition system may include determining a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, wherein the grade indicates a level of the performance of the system and the grade is based on a recognition rate and at least one recognition factor. An apparatus for assessing a performance of a speech recognition system may include a processor that determines a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, wherein the grade indicates a level of the performance of the system and wherein the grade is based on a recognition rate and at least one recognition factor.06-30-2011
20110161082METHODS AND SYSTEMS FOR ASSESSING AND IMPROVING THE PERFORMANCE OF A SPEECH RECOGNITION SYSTEM - A method for assessing a performance of a speech recognition system may include determining a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, wherein the grade indicates a level of the performance of the system and the grade is based on a recognition rate and at least one recognition factor. An apparatus for assessing a performance of a speech recognition system may include a processor that determines a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, wherein the grade indicates a level of the performance of the system and wherein the grade is based on a recognition rate and at least one recognition factor.06-30-2011
20080288253AUTOMATIC SPEECH RECOGNITION METHOD AND APPARATUS, USING NON-LINEAR ENVELOPE DETECTION OF SIGNAL POWER SPECTRA - An automatic speech recognition method includes converting an acoustic signal into a digital signal; determine a power spectrum of at least one portion of the digital signal; and non-linearly determining envelope values of the power spectrum at a plurality of respective frequencies, based on a combination of the power spectrum with a filter function. Non-linearly determining envelope values involves calculating each envelope value based on a respective number of values of the power spectrum and of the filter function and the respective number of values is correlated to the respective frequency of the envelope value.11-20-2008
20080235019AGE DETERMINATION USING SPEECH - A device may include logic configured to receive voice data from a user, identify a result from the voice data, calculate a confidence score associated with the result, and determine a likely age range associated with the user based on the confidence score.09-25-2008
20080235018Method and System for Determing the Topic of a Conversation and Locating and Presenting Related Content - A method and system are disclosed for determining the topic of a conversation and obtaining and presenting related content. The disclosed system provides a “creative inspirator” in an ongoing conversation. The system extracts keywords from the conversation and utilizes the keywords to determine the topic(s) being discussed. The disclosed system then conducts searches to obtain supplemental content based on the topic(s) of the conversation. The content can be presented to the participants in the conversation to supplement their discussion. A method is also disclosed for determining the topic of a text document including transcripts of audio tracks, newspaper articles, and journal papers.09-25-2008
20080221890UNSUPERVISED LEXICON ACQUISITION FROM SPEECH AND TEXT - Techniques for acquiring, from an input text and an input speech, a set of a character string and a pronunciation thereof which should be recognized as a word. A system according to the present invention: selects, from an input text, plural candidate character strings which are candidates to be recognized as a word; generates plural pronunciation candidates of the selected candidate character strings; generates frequency data by combining data in which the generated pronunciation candidates are respectively associated with the character strings; generates recognition data in which character strings respectively indicating plural words contained in the input speech are associated with pronunciations; and selects and outputs a combination contained in the recognition data, out of combinations each consisting of one of the candidate character strings and one of the pronunciation candidates.09-11-2008
20080221889MOBILE CONTENT SEARCH ENVIRONMENT SPEECH PROCESSING FACILITY - In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a content search software application resident on a mobile communication facility, where speech may be recorded using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the content search software application. In embodiments, the user may be allowed to alter the results that are received from the speech recognition facility. In addition, the speech recognition facility may be adapted based on usage.09-11-2008
20130173269METHODS, APPARATUSES AND COMPUTER PROGRAM PRODUCTS FOR JOINT USE OF SPEECH AND TEXT-BASED FEATURES FOR SENTIMENT DETECTION - An apparatus for generating a review based in part on detected sentiment may include a processor and memory storing executable computer code causing the apparatus to at least perform operations including determining a location(s) of the apparatus and a time(s) that the location(s) was determined responsive to capturing voice data of speech content associated with spoken reviews of entities. The computer program code may further cause the apparatus to analyze textual and acoustic data corresponding to the voice data to detect whether the textual or acoustic data includes words indicating a sentiment(s) of a user speaking the speech content. The computer program code may further cause the apparatus to generate a review of an entity corresponding to a spoken review(s) based on assigning a predefined sentiment to a word(s) responsive to detecting that the word indicates the sentiment of the user. Corresponding methods and computer program products are also provided.07-04-2013
20080215325TECHNIQUE FOR ACCURATELY DETECTING SYSTEM FAILURE - An apparatus, method and program for dividing a conversational dialog into utterance. The apparatus includes a computer processor; a word database for storing spellings and pronunciations of words; a grammar database for storing syntactic rules on words; a pause detecting section which detects a pause location in a channel making a main speech among conversational dialogs inputted in at least two channels; an acknowledgement detecting section which detects an acknowledgement location in a channel not making the main speech; a boundary-candidate extracting section which extracts boundary candidates in the main speech, by extracting pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and a recognizing unit which outputs a word string of the main speech segmented by one of the extracted boundary candidates after dividing the segmented speech into optimal utterance in reference to the word database and grammar database.09-04-2008
20130096918RECOGNIZING DEVICE, COMPUTER-READABLE RECORDING MEDIUM, RECOGNIZING METHOD, GENERATING DEVICE, AND GENERATING METHOD - A recognizing device includes a memory and a processor coupled to the memory. The memory stores words included in a sentence and positional information indicating a position of the words in the sentence. The processor executes a process including comparing an input voice signal with reading information of a character string that connects a plurality of words stored in the memory to calculate a similarity; calculating a connection score indicating a proximity between the plurality of connected words based on positional information of the words stored in the memory; and determining a character string corresponding to the voice signal based on the similarity and the connection score.04-18-2013
20110313768COMPOUND GESTURE-SPEECH COMMANDS - A multimedia entertainment system combines both gestures and voice commands to provide an enhanced control scheme. A user's body position or motion may be recognized as a gesture, and may be used to provide context to recognize user generated sounds, such as speech input. Likewise, speech input may be recognized as a voice command, and may be used to provide context to recognize a body position or motion as a gesture. Weights may be assigned to the inputs to facilitate processing. When a gesture is recognized, a limited set of voice commands associated with the recognized gesture are loaded for use. Further, additional sets of voice commands may be structured in a hierarchical manner such that speaking a voice command from one set of voice commands leads to the system loading a next set of voice commands.12-22-2011
20110313767SYSTEM AND METHOD FOR DATA INTENSIVE LOCAL INFERENCE - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating an accent source. A system practicing the method collects data associated with customer specific services, generates country-specific or dialect-specific weights for each service in the customer specific services list, generates a summary weight based on an aggregation of the country-specific or dialect-specific weights, and sets an interactive voice response system language model based on the summary weight and the country-specific or dialect-specific weights. The interactive voice response system can also change the user interface based on the interactive voice response system language model. The interactive voice response system can tune a voice recognition algorithm based on the summary weight and the country-specific weights. The interactive voice response system can adjust phoneme matching in the language model based on a possibility that the speaker is using other languages.12-22-2011
20110218807Method for Automated Sentence Planning in a Task Classification System - The invention relates to a method for sentence planning (09-08-2011
20110218806DETERMINING TEXT TO SPEECH PRONUNCIATION BASED ON AN UTTERANCE FROM A USER - Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.09-08-2011
20110218805SPOKEN TERM DETECTION APPARATUS, METHOD, PROGRAM, AND STORAGE MEDIUM - A spoken term detection apparatus includes: processing performed by a processor includes a feature extraction process extracting an acoustic feature from speech data accumulated in an accumulation part and storing an extracted acoustic feature in an acoustic feature storage, a first calculation process calculating a standard score from a similarity between an acoustic feature stored in the acoustic feature storage and an acoustic model stored in the acoustic model storage part, a second calculation process for comparing an acoustic model corresponding to an input keyword with the acoustic feature stored in the acoustic feature storage part to calculate a score of the keyword, and a retrieval process retrieving speech data including the keyword from speech data accumulated in the accumulation part based on the score of the keyword calculated by the second calculation process and the standard score stored in the standard score storage part.09-08-2011
20100286984METHOD FOR SPEECH ROCOGNITION - A method for the voice recognition of a spoken expression to be recognized, comprising a plurality of expression parts that are to be recognized. Partial voice recognition takes place on a first selected expression part, and depending on a selection of hits for the first expression part detected by the partial voice recognition, voice recognition on the first and further expression parts is executed.11-11-2010
20130204622ENHANCED CONTEXT AWARENESS FOR SPEECH RECOGNITION - A method comprising establishing a call connection (08-08-2013
20100318357VOICE CONTROL OF MULTIMEDIA CONTENT - Techniques are described for managing various types of content in various ways, such as based on voice commands or other voice-based control instructions provided by a user. In some situations, at least some of the content being managed includes content of a variety of types, such as music and other audio information, photos, images, non-television video information, videogames, Internet Web pages and other data, etc., which may be managed via the voice controls in a variety of ways, such as to allow a user to locate and identify content of potential interest, to schedule recordings of selected content, to manage previously recorded content (e.g., to play or delete the content), to control live television, etc. This abstract is provided to comply with rules requiring it, and is submitted with the intention that it will not be used to interpret or limit the scope or meaning of the claims.12-16-2010
20100318356APPLICATION OF USER-SPECIFIED TRANSFORMATIONS TO AUTOMATIC SPEECH RECOGNITION RESULTS - Textual transcription of speech is generated and formatted according to user-specified transformation and behavior requirements for a speech recognition system having input grammars and transformations. An apparatus may include a speech recognition platform configured to receive a user-specified transformation requirement, recognize speech in speech data into recognized speech according to a set of recognition grammars; and apply transformations to the recognized speech according to the user-specified transformation requirement. The apparatus may further be configured to receive a user-specified behavior requirement and transform the recognized speech according to the behavior requirement. Other embodiments are described and claimed.12-16-2010
20120029919USING LINGUISTICALLY-AWARE VARIABLES IN COMPUTER-GENERATED TEXT - One embodiment of the present invention provides a system for placing linguistically-aware variables in computer-generated text. During operation, the system receives a sentence at a computer system, wherein the sentence comprises two or more words. Next, the system analyzes the sentence to identify a first variable, wherein the first variable is a place-holder for a first word. The system then receives the first word. After that, the system automatically determines a gender of the first word. Next, the system analyzes the sentence to identify a first dependent word that is dependent on the first word, wherein a spelling of the first dependent word is dependent on the gender of the first word. The system then determines the spelling of the first dependent word that corresponds to the gender of the first word. Next, the system replaces the first variable in the sentence with the first word. If necessary, the system modifies the spelling of the first dependent word in the sentence to match the gender of the first word. Finally, the system outputs the sentence.02-02-2012
20120035931Automatically Monitoring for Voice Input Based on Context - In one implementation, a computer-implemented method includes detecting a current context associated with a mobile computing device and determining, based on the current context, whether to switch the mobile computing device from a current mode of operation to a second mode of operation during which the mobile computing device monitors ambient sounds for voice input that indicates a request to perform an operation. The method can further include, in response to determining whether to switch to the second mode of operation, activating one or more microphones and a speech analysis subsystem associated with the mobile computing device so that the mobile computing device receives a stream of audio data. The method can also include providing output on the mobile computing device that is responsive to voice input that is detected in the stream of audio data and that indicates a request to perform an operation.02-09-2012
20120072221DISTRIBUTED VOICE USER INTERFACE - A distributed voice user interface system includes a local device which receives speech input issued from a user. Such speech input may specify a command or a request by the user. The local device performs preliminary processing of the speech input and determines whether it is able to respond to the command or request by itself. If not, the local device initiates communication with a remote system for further processing of the speech input.03-22-2012
20120072220Matching text sets - Matching text sets is disclosed, including: extracting a text set from data associated with a current period; storing the text set with a plurality of text sets; extracting a keyword from the text set; determining a weight value associated with the keyword associated with the text set; determining a degree of similarity between the text set and another text set based at least in part on a weight value associated with the keyword associated with the text set and a weight value associated with a keyword associated with the other text set; and determining whether the text set is related to the other text set based at least in part on the determined degree of similarity.03-22-2012
20120072219SYSTEM AND METHOD FOR ENHANCING VOICE-ENABLED SEARCH BASED ON AUTOMATED DEMOGRAPHIC IDENTIFICATION - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating responses to a user speech query in voice-enabled search based on metadata that include demographic features of the speaker. A system practicing the method recognizes received speech from a speaker to generate recognized speech, identifies metadata about the speaker from the received speech, and feeds the recognized speech and the metadata to a question-answering engine. Identifying the metadata about the speaker is based on voice characteristics of the received speech. The demographic features can include age, gender, socio-economic group, nationality, and/or region. The metadata identified about the speaker from the received speech can be combined with or override self-reported speaker demographic information.03-22-2012

Patent applications in class Word recognition

Patent applications in all subclasses Word recognition