Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Specialized models

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

704231000 - Recognition

704251000 - Word recognition

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
704257000 Natural language 83
704256000 Markov 31
Entries
DocumentTitleDate
20080215329Methods and Apparatus for Generating Dialog State Conditioned Language Models - Techniques are provided for generating improved language modeling. Such improved modeling is achieved by conditioning a language model on a state of a dialog for which the language model is employed. For example, the techniques of the invention may improve modeling of language for use in a speech recognizer of an automatic natural language based dialog system. Improved usability of the dialog system arises from better recognition of a user's utterances by a speech recognizer, associated with the dialog system, using the dialog state-conditioned language models. By way of example, the state of the dialog may be quantified as: (i) the internal state of the natural language understanding part of the dialog system; or (ii) words in the prompt that the dialog system played to the user.09-04-2008
20100076765STRUCTURED MODELS OF REPITITION FOR SPEECH RECOGNITION - Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.03-25-2010
20090271201STANDARD-MODEL GENERATION FOR SPEECH RECOGNITION USING A REFERENCE MODEL - A standard model creating apparatus which provides a high-precision standard model used for pattern recognition such as speech recognition, character recognition, or image recognition using a probability model based on a hidden Markov model, Bayesian theory, or linear discrimination analysis; intention interpretation using a probability model such as a Bayesian net; data-mining performed using a probability model; and so forth. The standard model creating apparatus includes a reference model preparing unit that prepares at least one reference model; a reference model storing unit that stores the reference model prepared by the reference model preparing unit (; and a standard model creating unit that creates a standard model by calculating statistics of the standard model so as to maximize or locally maximize the probability or likelihood with respect to the reference model stored in the reference model storing unit.10-29-2009
20110046953METHOD OF RECOGNIZING SPEECH - A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.02-24-2011
20090030691USING AN UNSTRUCTURED LANGUAGE MODEL ASSOCIATED WITH AN APPLICATION OF A MOBILE COMMUNICATION FACILITY - A user may control a mobile communication facility through recognized speech provided to the mobile communication facility. Speech that is recorded by a user using a mobile communication facility resident capture facility. A speech recognition facility generates results of the recorded speech using an unstructured language model based at least in part on information relating to the recording. An application resident on the mobile communications facility is identified, wherein the resident application is capable of taking the results generated by the speech recognition facility as an input. The generated results are input to the application.01-29-2009
20080294441Speech Recognition System with Huge Vocabulary - The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.11-27-2008
20090150155KEYWORD EXTRACTING DEVICE - The present invention aims at extracting a keyword of conversation without preparations by advanced anticipation of keywords of conversation. A keyword extracting device of the present invention includes an audio input section 06-11-2009
20100292989SYMBOL INSERTION APPARATUS AND SYMBOL INSERTION METHOD - Enables symbol insertion evaluation in consideration of a difference in speaking style features between speakers. For a word sequence transcribing voice information, the symbol insertion likelihood calculation means 11-18-2010
20100125458METHOD AND APPARATUS FOR ERROR CORRECTION IN SPEECH RECOGNITION APPLICATIONS - In one embodiment, the present invention is a method and apparatus for error correction in speech recognition applications. In one embodiment, a method for recognizing user speech includes receiving a first utterance from the user, receiving a subsequent utterance from the user, and combining acoustic evidence from the first utterance with acoustic evidence from the subsequent utterance in order to recognize the first utterance. It is assumed that, if the first utterance has been incorrectly recognized on a first attempt, the user will repeat the first utterance (or at least the incorrectly recognized portion of the first utterance) in the subsequent utterance.05-20-2010
20080262844Method and system for analyzing separated voice data of a telephonic communication to determine the gender of the communicant - A method and system for determining the gender of a communicant in a communication is provided. According to the method, at least one aural segment corresponding to at least one word spoken by a communicant is identified. The aural segment is then analyzed by applying a gender detection model to the aural segment, and gender detection data is generated based on the application of the gender detection model.10-23-2008
20100185447Markup language-based selection and utilization of recognizers for utterance processing - Embodiments are provided for selecting and utilizing multiple recognizers to process an utterance based on a markup language document. The markup language document and an utterance are received in a computing device. One or more recognizers are selected from among the multiple recognizers for returning a results set for the utterance based on markup language in the markup language document. The results set is received from the one or more selected recognizers in a format determined by a processing method specified in the markup language document. An event is then executed on the computing device in response to receiving the results set.07-22-2010
20100191531QUANTIZING FEATURE VECTORS IN DECISION-MAKING APPLICATIONS - A system, method and computer program product for classification of an analog electrical signal using statistical models of training data. A technique is described to quantize the analog electrical signal in a manner which maximizes the compression of the signal while simultaneously minimizing the diminution in the ability to classify the compressed signal. These goals are achieved by utilizing a quantizer designed to minimize the loss in a power of the log-likelihood ratio. A further technique is described to enhance the quantization process by optimally allocating a number of bits for each dimension of the quantized feature vector subject to a maximum number of bits available across all dimensions.07-29-2010
20100070278Method for Creating a Speech Model - A transformation can be derived which would represent that processing required to convert a male speech model to a female speech model. That transformation is subjected to a predetermined modification, and the modified transformation is applied to a female speech model to produce a synthetic children's speech model. The male and female models can be expressed in terms of a vector representing key values defining each speech model and the derived transformation can be in the form of a matrix that would transform the vector of the male model to the vector of the female model. The modification to the derived matrix comprises applying an exponential p which has a value greater than zero and less than 1.03-18-2010
20110099013SYSTEM AND METHOD FOR IMPROVING SPEECH RECOGNITION ACCURACY USING TEXTUAL CONTEXT - Disclosed herein are systems, methods, and computer-readable storage media for improving speech recognition accuracy using textual context. The method includes retrieving a recorded utterance, capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance, and identifying words in the captured text that are relevant to the recorded utterance. The method further includes adding the identified words to a dynamic language model, and recognizing the recorded utterance using the dynamic language model. The recorded utterance can be a spoken dialog. A time stamp can be assigned to each identified word. The method can include adding identified words to and/or removing identified words from the dynamic language model based on their respective time stamps. A screen scraper can capture text from the device display associated with the recorded utterance. The device display can contain customer service data.04-28-2011
20100305948Phoneme Model for Speech Recognition - A sub-phoneme model given acoustic data which corresponds to a phoneme. The acoustic data is generated by sampling an analog speech signal producing a sampled speech signal. The sampled speech signal is windowed and transformed into the frequency domain producing Mel frequency cepstral coefficients of the phoneme. The sub-phoneme model is used in a speech recognition system. The acoustic data of the phoneme is divided into either two or three sub-phonemes. A parameterized model of the sub-phonemes is built, where the model includes Gaussian parameters based on Gaussian mixtures and a length dependency according to a Poisson distribution. A probability score is calculated while adjusting the length dependency of the Poisson distribution. The probability score is a likelihood that the parameterized model represents the phoneme. The phoneme is subsequently recognized using the parameterized model.12-02-2010
20100004932SPEECH RECOGNITION SYSTEM, SPEECH RECOGNITION PROGRAM, AND SPEECH RECOGNITION METHOD - A speech recognition system includes the following: a feature calculating unit; a sound level calculating unit that calculates an input sound level in each frame; a decoding unit that matches the feature of each frame with an acoustic model and a linguistic model, and outputs a recognized word sequence; a start-point detector that determines a start frame of a speech section based on a reference value; an end-point detector that determines an end frame of the speech section based on a reference value; and a reference value updating unit that updates the reference value in accordance with variations in the input sound level. The start-point detector updates the start frame every time the reference value is updated. The decoding unit starts matching before being notified of the end frame and corrects the matching results every time it is notified of the start frame. The speech recognition system can suppress a delay in response time while performing speech recognition based on a proper speech section.01-07-2010
20080319750CONCEPT MONITORING IN SPOKEN-WORD AUDIO - Monitoring a spoken-word audio stream for a relevant concept is disclosed. A speech recognition engine may recognize a plurality of words from the audio stream. Function words that do not indicate content may be removed from the plurality of words. A concept may be determined from at least one word recognized from the audio stream. The concept may be determined via a morphological normalization of the plurality of words. The concept may be associated with a time related to when the at least one word was spoken. A relevance metric may be computed for the concept. Computing the relevance metric may include assessing the temporal frequency of the concept within the audio stream. The relevance metric for the concept may be based on respective confidence scores of the at least one word. The concept, time, and relevance metric may be displayed in a graphical display.12-25-2008
20100324901SPEECH RECOGNITION SYSTEM - Various methods and apparatus are described for a speech recognition system. In an embodiment, the statistical language model (SLM) provides probability estimates of how linguistically likely a sequence of linguistic items are to occur in that sequence based on an amount of times the sequence of linguistic items occurs in text and phrases in general use. The speech recognition decoder module requests a correction module for one or more corrected probability estimates P′(z|xy) of how likely a linguistic item z follows a given sequence of linguistic items x followed by y, where (x, y, and z) are three variable linguistic items supplied from the decoder module. The correction module is trained to linguistics of a specific domain, and is located in between the decoder module and the SLM in order to adapt the probability estimates supplied by the SLM to the specific domain when those probability estimates from the SLM significantly disagree with the linguistic probabilities in that domain.12-23-2010
20080312928Natural language speech recognition calculator - Disclosed herein is a computer implemented method and system for evaluating a mathematical expression spoken in a natural language by a user. The disclosed method and system provides a natural language speech recognition calculator comprising a speech recognition engine. The spoken mathematical expression is transmitted to the speech recognition engine via an audio input device. Mathematical entities of the spoken mathematical expression are extracted and represented in a hierarchical recursive format of a speech recognition grammar implemented by the speech recognition engine. A symbolic mathematical expression is generated from the extracted mathematical entities and then normalized with common measurement units. The normalized mathematical expression is then evaluated to generate a mathematical result. The mathematical result may be synthesized by a text-to-speech engine to produce a voice output. The mathematical result may be provided on an audio output device, a video display unit, a printer, and an electronic device in a network.12-18-2008
20120221337METHOD AND APPARATUS FOR PREDICTING WORD ACCURACY IN AUTOMATIC SPEECH RECOGNITION SYSTEMS - The invention comprises a method and apparatus for predicting word accuracy. Specifically, the method comprises obtaining an utterance in speech data where the utterance comprises an actual word string, processing the utterance for generating an interpretation of the actual word string, processing the utterance to identify at least one utterance frame, and predicting a word accuracy associated with the interpretation according to at least one stationary signal-to-noise ratio and at least one non-stationary signal to noise ratio, wherein the at least one stationary signal-to-noise ratio and the at least one non-stationary signal to noise ratio are determined according to a frame energy associated with each of the at least one utterance frame.08-30-2012
20120095766SPEECH RECOGNITION APPARATUS AND METHOD - A speech recognition apparatus is provided. The speech recognition apparatus includes a primary speech recognition unit configured to perform speech recognition on input speech and thus to generate word lattice information, a word string generation unit configured to generate one or more word strings based on the word lattice information, a language model score calculation unit configured to calculate bidirectional language model scores of the generated word strings selectively using forward and backward language models for each of words in each of the generated word strings, and a sentence output unit configured to output one or more of the generated word strings with high scores as results of the speech recognition of the input speech based on the calculated bidirectional language model scores.04-19-2012
20110137653SYSTEM AND METHOD FOR RESTRICTING LARGE LANGUAGE MODELS - Disclosed herein are systems, methods, and computer-readable storage media for performing speech recognition based on a masked language model. A system configured to practice the method receives a masked language model including a plurality of words, wherein a bit mask identifies whether each of the plurality of words is allowed or disallowed with regard to an adaptation subset, receives input speech, generates a speech recognition lattice based on the received input speech using the masked language model, removes from the generated lattice words identified as disallowed by the bit mask for the adaptation subset, and recognizes the received speech based on the lattice. Alternatively during the generation step, the system can only add words indicated as allowed by the bit mask. The bit mask can be separate from or incorporated as part of the masked language model. The system can dynamically update the adaptation subset and bit mask.06-09-2011
20080255844Minimizing empirical error training and adaptation of statistical language models and context free grammar in automatic speech recognition - Architecture for minimizing an empirical error rate by discriminative adaptation of a statistical language model in a dictation and/or dialog application. The architecture allows assignment of an improved weighting value to each term or phrase to reduce empirical error. Empirical errors are minimized whether a user provides correction results or not based on criteria for discriminatively adapting the user language model (LM)/context-free grammar (CFG) to the target. Moreover, algorithms are provided for the training and adaptation processes of LM/CFG parameters for criteria optimization.10-16-2008
20100318358RECOGNIZER WEIGHT LEARNING DEVICE, SPEECH RECOGNIZING DEVICE, AND SYSTEM - A speech recognition apparatus (12-16-2010

Patent applications in class Specialized models

Patent applications in all subclasses Specialized models