Patent application number | Description | Published |
20090138265 | Joint Discriminative Training of Multiple Speech Recognizers - Adjusting model parameters is described for a speech recognition system that combines recognition outputs from multiple speech recognition processes. Discriminative adjustments are made to model parameters of at least one acoustic model based on a joint discriminative criterion over multiple complementary acoustic models to lower recognition word error rate in the system. | 05-28-2009 |
20110035217 | SPEECH-DRIVEN SELECTION OF AN AUDIO FILE - A system and method for detecting a refrain in an audio file having vocal components. The method and system includes generating a phonetic transcription of a portion of the audio file, analyzing the phonetic transcription and identifying a vocal segment in the generated phonetic transcription that is repeated frequently. The method and system further relate to the speech-driven selection based on similarity of detected refrain and user input. | 02-10-2011 |
20120245940 | Guest Speaker Robust Adapted Speech Recognition - A method for speech recognition is implemented in the specific form of computer processes that function in a computer processor. That is, one or more computer processes: process a speech input to produce a sequence of representative speech vectors and perform multiple recognition passes to determine a recognition output corresponding to the speech input. At least one generic recognition pass is based on a generic speech recognition arrangement using generic modeling of a broad general class of input speech. And at least one adapted recognition pass is based on a speech adapted arrangement using pre-adapted modeling of a specific sub-class of the general class of input speech. | 09-27-2012 |
20120259627 | Efficient Exploitation of Model Complementariness by Low Confidence Re-Scoring in Automatic Speech Recognition - A method for speech recognition is described that uses an initial recognizer to perform an initial speech recognition pass on an input speech utterance to determine an initial recognition result corresponding to the input speech utterance, and a reliability measure reflecting a per word reliability of the initial recognition result. For portions of the initial recognition result where the reliability of the result is low, a re-evaluation recognizer is used to perform a re-evaluation recognition pass on the corresponding portions of the input speech utterance to determine a re-evaluation recognition result corresponding to the re-evaluated portions of the input speech utterance. The initial recognizer and the re-evaluation recognizer are complementary so as to make different recognition errors. A final recognition result is determined based on the re-evaluation recognition result if any, and otherwise based on the initial recognition result. | 10-11-2012 |
20120259632 | Online Maximum-Likelihood Mean and Variance Normalization for Speech Recognition - A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search. | 10-11-2012 |
20150149174 | DIFFERENTIAL ACOUSTIC MODEL REPRESENTATION AND LINEAR TRANSFORM-BASED ADAPTATION FOR EFFICIENT USER PROFILE UPDATE TECHNIQUES IN AUTOMATIC SPEECH RECOGNITION - A computer-implemented method is described for speaker adaptation in automatic speech recognition. Speech recognition data from a particular speaker is used for adaptation of an initial speech recognition acoustic model to produce a speaker adapted acoustic model. A speaker dependent differential acoustic model is determined that represents differences between the initial speech recognition acoustic model and the speaker adapted acoustic model. In addition, an approach is also disclosed to estimate speaker-specific feature or model transforms over multiple sessions. This is achieved by updating the previously estimated transform using only adaptation statistics of the current session. | 05-28-2015 |
20150206527 | FEATURE NORMALIZATION INPUTS TO FRONT END PROCESSING FOR AUTOMATIC SPEECH RECOGNITION - A computer-implemented method is described for front end speech processing for automatic speech recognition. A sequence of speech features which characterize an unknown speech input is received with a computer process. A first subset of the speech features is normalized with a computer process using a first feature normalizing function. A second subset of the speech features is normalized with a computer process using a second feature normalizing function different from the first feature normalizing function. The normalized speech features in the first and second subsets are combined with a computer process to produce a sequence of mixed normalized speech features for automatic speech recognition. | 07-23-2015 |
20150262575 | META-DATA INPUTS TO FRONT END PROCESSING FOR AUTOMATIC SPEECH RECOGNITION - A computer-implemented method is described for front end speech processing for automatic speech recognition. A sequence of speech features which characterize an unknown speech input provided on an audio input channel and associated meta-data data which characterize the audio input channel are received. The speech features are transformed with a computer process that uses a trained mapping function controlled by the meta-data, and automatic speech recognition is performed of the transformed speech features. | 09-17-2015 |