Class / Patent application number | Description | Number of patent applications / Date published |
704256000 | Markov | 39 |
20080300879 | FACTORIAL HIDDEN MARKOV MODEL WITH DISCRETE OBSERVATIONS - A method for analyzing hidden dynamics, includes acquiring discrete observations, each discrete observation having an observed value selected from two or more allowed discrete values. A factorial hidden Markov model (FHMM) relating the discrete observations with a plurality of hidden dynamics is constructed. A contribution of the state of each hidden dynamic to the discrete observation may be represented in the FHMM as a parameter of a nominal distribution which is scaled by a function of the state of the hidden dynamic. States of the hidden dynamics are inferred from the discrete observations based on the FHMM. Information corresponding to at least one inferred state of at least one of the hidden dynamics is output. The parameters of the contribution of each dynamic to the hidden states may be learnt from a large number of observations. An example of a networked printing system is used to demonstrate the applicability of the method. | 12-04-2008 |
20080300880 | MULTI-LINGUAL OUTPUT DEVICE - This application discloses A multi-lingual output device for output of transactional information for a given customer, the device that includes a data base for determining what transaction information needs to be outputted, the local language in which the information is to be outputted, and the preferred language of the customer in which the information is to be outputted; and, a local transaction subsystem in communication with said database, wherein said local transaction sub system includes input device receiving means for accepting an input device and output generating means for generating a signal to an output device. | 12-04-2008 |
20100049519 | Recognizing the Numeric Language in Natural Spoken Dialogue - A system and a method are provided. A speech recognition processor receives unconstrained input speech and outputs a string of words. The speech recognition processor is based on a numeric language that represents a subset of a vocabulary. The subset includes a set of words identified as being for interpreting and understanding number strings. A numeric understanding processor contains classes of rules for converting the string of words into a sequence of digits. The speech recognition processor utilizes an acoustic model database. A validation database stores a set of valid sequences of digits. A string validation processor outputs validity information based on a comparison of a sequence of digits output by the numeric understanding processor with valid sequences of digits in the validation database. | 02-25-2010 |
20100217599 | Computer Implemented Method for Determining All Markov Boundaries and its Application for Discovering Multiple Maximally Accurate and Non-Redundant Predictive Models - Methods for discovery of a Markov boundary from data constitute one of the most important recent developments in pattern recognition and applied statistics, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. Even though there is always a single Markov boundary of the response variable in faithful distributions, distributions with violations of the intersection property of probability theory may have multiple Markov boundaries. Such distributions are abundant in practical data-analytic applications, and there are several reasons why it is important to discover all Markov boundaries from such data. The present invention is a novel computer implemented generative method (termed TIE*) that can discover all Markov boundaries from a data sample drawn from a distribution. TIE* can be instantiated to discover all and only Markov boundaries independent of data distribution. TIE* has been tested with simulated and re-simulated data and then applied to (a) identify the set of maximally accurate and non-redundant molecular signatures and to (b) discover Markov boundaries in datasets from several application domains including but not limited to: biology, medicine, economics, ecology, digit recognition, text categorization, and computational biology. | 08-26-2010 |
20100312561 | Information Processing Apparatus, Information Processing Method, and Computer Program - An apparatus and a method for performing a grounding process using the POMDP are provided. The configuration is designed so that, in order to understand a request from a user through the utterances from the user, a grounding process is performed using the POMDP (Partially Observable Markov Decision Process) in which analysis information acquired from a language analyzing unit that receives the utterances of the user and performs language analysis and pragmatic information including task feasibility information acquired from the task manager that performs a task are set as observation information. Accordingly, understanding can be efficiently achieved, and high-speed and accurate recognition of the user request and task execution based on the user request can be provided. | 12-09-2010 |
20120041763 | RECOGNIZING THE NUMERIC LANGUAGE IN NATURAL SPOKEN DIALOGUE - A system and a method are provided. A speech recognition processor receives unconstrained input speech and outputs a string of words. The speech recognition processor is based on a numeric language that represents a subset of a vocabulary. The subset includes a set of words identified as being for interpreting and understanding number strings. A numeric understanding processor contains classes of rules for converting the string of words into a sequence of digits. The speech recognition processor utilizes an acoustic model database. A validation database stores a set of valid sequences of digits. A string validation processor outputs validity information based on a comparison of a sequence of digits output by the numeric understanding processor with valid sequences of digits in the validation database. | 02-16-2012 |
20120143610 | Sound Event Detecting Module and Method Thereof - A sound event detecting module for detecting whether a sound event with characteristic of repeating is generated. A sound end recognizing unit recognizes ends of sounds according to a sound signal to generate sound sections and multiple sets of feature vectors of the sound sections correspondingly. A storage unit stores at least M sets of feature vectors. A similarity comparing unit compares the at least M sets of feature vectors with each other, and correspondingly generates a similarity score matrix, which stores similarity scores of any two of the sound sections of the at least M of the sound sections. A correlation arbitrating unit determines the number of sound sections with high correlations to each other according to the similarity score matrix. When the number is greater than one threshold value, the correlation arbitrating unit indicates that the sound event with the characteristic of repeating is generated. | 06-07-2012 |
20140163988 | Recognizing the Numeric Language in Natural Spoken Dialogue - A system and a method are provided. A speech recognition processor receives unconstrained input speech and outputs a string of words. The speech recognition processor is based on a numeric language that represents a subset of a vocabulary. The subset includes a set of words identified as being for interpreting and understanding number strings. A numeric understanding processor contains classes of rules for converting the string of words into a sequence of digits. The speech recognition processor utilizes an acoustic model database. A validation database stores a set of valid sequences of digits. A string validation processor outputs validity information based on a comparison of a sequence of digits output by the numeric understanding processor with valid sequences of digits in the validation database. | 06-12-2014 |
20190147854 | Speech Recognition Source to Target Domain Adaptation | 05-16-2019 |
704256100 | Hidden Markov Model (HMM) (EPO) | 30 |
20080235020 | METHOD AND APPARATUS FOR TRAINING A TEXT INDEPENDENT SPEAKER RECOGNITION SYSTEM USING SPEECH DATA WITH TEXT LABELS - There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states. | 09-25-2008 |
20080288255 | System and method for quantifying, representing, and identifying similarities in data streams - A method of quantifying similarities between sequential data streams typically includes providing a pair of sequential data streams, designing a Hidden Markov Model (HMM) of at least a portion of each stream; and computing a quantitative measure of similarity between the streams using the HMMs. For a plurality of sequential data streams, a matrix of quantitative measures of similarity may be created. A spectral analysis may be performed on the matrix of quantitative measure of similarity matrix to define a multi-dimensional diffusion space, and the plurality of sequential data streams may be graphically represented and/or sorted according to the similarities therebetween. In addition, semi-supervised and active learning algorithms may be utilized to learn a user's preferences for data streams and recommend additional data streams that are similar to those preferred by the user. Multi-task learning algorithms may also be applied. | 11-20-2008 |
20090144059 | HIGH PERFORMANCE HMM ADAPTATION WITH JOINT COMPENSATION OF ADDITIVE AND CONVOLUTIVE DISTORTIONS - A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance. | 06-04-2009 |
20090326946 | Frame Erasure Concealment Technique for a Bitstream-Based Feature Extractor - A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique. | 12-31-2009 |
20100070279 | PIECEWISE-BASED VARIABLE -PARAMETER HIDDEN MARKOV MODELS AND THE TRAINING THEREOF - A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech under many different conditions. Each Gaussian mixture component of the VPHMMs is characterized by a mean parameter μ and a variance parameter Σ. Each of these Gaussian parameters varies as a function of at least one environmental conditioning parameter, such as, but not limited to, instantaneous signal-to-noise-ratio (SNR). The way in which a Gaussian parameter varies with the environmental conditioning parameter(s) can be approximated as a piecewise function, such as a cubic spline function. Further, the recognition system formulates the mean parameter μ and the variance parameter Σ of each Gaussian mixture component in an efficient form that accommodates the use of discriminative training and parameter sharing. Parameter sharing is carried out so that the otherwise very large number of parameters in the VPHMMs can be effectively reduced with practically feasible amounts of training data. | 03-18-2010 |
20100145698 | Systems and Methods for Assessment of Non-Native Spontaneous Speech - Computer-implemented systems and methods are provided for assessing non-native spontaneous speech pronunciation. Speech recognition on digitized speech is performed using a non-native acoustic model trained with non-native speech to generate word hypotheses for the digitized speech. Time alignment is performed between the digitized speech and the word hypotheses using a reference acoustic model trained with native-quality speech. Statistics are calculated regarding individual words and phonemes in the word hypotheses based on the alignment. A plurality of features for use in assessing pronunciation of the speech are calculated based on the statistics, an assessment score is calculated based on one or more of the calculated features, and the assessment score is stored in a computer-readable memory. | 06-10-2010 |
20100185448 | DEALING WITH SWITCH LATENCY IN SPEECH RECOGNITION - In embodiments of the present invention improved capabilities are described for interacting with a mobile communication facility comprising receiving a switch activation from a user to initiate a speech recognition recording session, wherein the speech recognition recording session comprises a voice command from the user followed by the speech to be recognized from the user; recording the speech recognition recording session using a mobile communication facility resident capture facility; recognizing at least a portion of the voice command as an indication that user speech for recognition will begin following the end of the at least a portion of the voice command; recognizing the recorded speech using a speech recognition facility to produce an external output; and using the selected output to perform a function on the mobile communication facility. | 07-22-2010 |
20110202343 | CONCISE DYNAMIC GRAMMARS USING N-BEST SELECTION - A method and apparatus derive a dynamic grammar composed of a subset of a plurality of data elements that are each associated with one of a plurality of reference identifiers. The present invention generates a set of selection identifiers on the basis of a user-provided first input identifier and determines which of these selection identifiers are present in a set of pre-stored reference identifiers. The present invention creates a dynamic grammar that includes those data elements that are associated with those reference identifiers that are matched to any of the selection identifiers. Based on a user-provided second identifier and on the data elements of the dynamic grammar, the present invention selects one of the reference identifiers in the dynamic grammar. | 08-18-2011 |
20110257976 | Robust Speech Recognition - Speech recognition includes structured modeling, irrelevant variability normalization and unsupervised online adaptation of speech recognition parameters. | 10-20-2011 |
20110288869 | ROBUSTNESS TO ENVIRONMENTAL CHANGES OF A CONTEXT DEPENDENT SPEECH RECOGNIZER - An apparatus to improve robustness to environmental changes of a context dependent speech recognizer for an application, that includes a training database to store sounds for speech recognition training, a dictionary to store words supported by the speech recognizer, and a speech recognizer training module to train a set of one or more multiple state Hidden Markov Models (HMMs) with use of the training database and the dictionary. The speech recognizer training module performs a non-uniform state clustering process on each of the states of each HMM, which includes using a different non-uniform cluster threshold for at least some of the states of each HMM to more heavily cluster and correspondingly reduce a number of observation distributions for those of the states of each HMM that are less empirically affected by one or more contextual dependencies. | 11-24-2011 |
20120041764 | SPEECH PROCESSING SYSTEM AND METHOD - A speech processing method, comprising:
| 02-16-2012 |
20120059657 | Radar Microphone Speech Recognition - A method for detecting and recognizing speech is provided that remotely detects body motions from a speaker during vocalization with one or more radar sensors. Specifically, the radar sensors include a transmit aperture that transmits one or more waveforms towards the speaker, and each of the waveforms has a distinct wavelength. A receiver aperture is configured to receive the scattered radio frequency energy from the speaker. Doppler signals correlated with the speaker vocalization are extracted with a receiver. Digital signal processors are configured to develop feature vectors utilizing the vocalization Doppler signals, and words associated with the feature vectors are recognized with a word classifier. | 03-08-2012 |
20120065976 | DEEP BELIEF NETWORK FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION - A method is disclosed herein that includes an act of causing a processor to receive a sample, wherein the sample is one of spoken utterance, an online handwriting sample, or a moving image sample. The method also comprises the act of causing the processor to decode the sample based at least in part upon an output of a combination of a deep structure and a context-dependent Hidden Markov Model (HMM), wherein the deep structure is configured to output a posterior probability of a context-dependent unit. The deep structure is a Deep Belief Network consisting of many layers of nonlinear units with connecting weights between layers trained by a pretraining step followed by a fine-tuning step. | 03-15-2012 |
20120130716 | SPEECH RECOGNITION METHOD FOR ROBOT - A speech recognition method for a robot. The speech recognition method for the robot includes one fundamental acoustic model. Whenever the noisy environment and the speaker are changed, the speech recognition method generates a plurality of parallel acoustic models in which the characteristic for each noisy environment and the characteristic for each speaker are reflected. As a result, the speech recognition method for the robot can freely recognize one of several acoustic models according to individual environments and speakers, such that it can basically remove mismatch between the model training environment and the test environment, thereby improving speech recognition capabilities. | 05-24-2012 |
20120330664 | METHOD AND APPARATUS FOR COMPUTING GAUSSIAN LIKELIHOODS - The present invention relates to a method and apparatus for computing Gaussian likelihoods. One embodiment of a method for processing a speech sample includes generating a feature vector for each frame of the speech signal, evaluating the feature vector in accordance with a hierarchical Gaussian shortlist, and producing a hypothesis regarding a content of the speech signal, based on the evaluating. | 12-27-2012 |
20130132085 | Systems and Methods for Non-Negative Hidden Markov Modeling of Signals - Methods and systems for non-negative hidden Markov modeling of signals are described. For example, techniques disclosed herein may be applied to signals emitted by one or more sources. In some embodiments, methods and systems may enable the separation of a signal's various components. As such, the systems and methods disclosed herein may find a wide variety of applications. In audio-related fields, for example, these techniques may be useful in music recording and processing, source extraction, noise reduction, teaching, automatic transcription, electronic games, audio search and retrieval, and many other applications. | 05-23-2013 |
20130151254 | SPEECH RECOGNITION USING SPEECH CHARACTERISTIC PROBABILITIES - A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristic probabilities. | 06-13-2013 |
20140180693 | Histogram Based Pre-Pruning Scheme for Active HMMS - Embodiments of the present invention include an acoustic processing device, a method for acoustic signal processing, and a speech recognition system. The speech processing device can include a processing unit, a histogram pruning unit, and a pre-pruning unit. The processing unit is configured to calculate one or more Hidden Markov Model (HMM) pruning thresholds. The histogram pruning unit is configured to prune one or more HMM states to generate one or more active HMM states. The pruning is based on the one or more pruning thresholds. The pre-pruning unit is configured to prune the one or more active HMM states based on an adjustable pre-pruning threshold. Further, the adjustable pre-pruning threshold is based on the one or more pruning thresholds. | 06-26-2014 |
20140180694 | Phoneme Score Accelerator - Embodiments of the present invention include an acoustic processing device and a method for traversing a Hidden Markov Model (HMM). The acoustic processing device can include a senone scoring unit (SSU), a memory device, a HMM module, and an interface module. The SSU is configured to receive feature vectors from an external computing device and to calculate senones. The memory device is configured to store the senone scores and HMM information, where the HMM information includes HMM IDs and HMM state scores. The HMM module is configured to traverse the HMM based on the senone scores and the HMM information. Further, the interface module is configured to transfer one or more HMM scoring requests from the external computing device to the HMM module and to transfer the HMM state scores to the external computing device. | 06-26-2014 |
20140365221 | METHOD AND APPARATUS FOR SPEECH RECOGNITION - A computer-implemented method performed by a computerized device, a computerized apparatus and a computer program product for recognizing speech, the method comprising: receiving a signal; extracting audio features from the signal; performing acoustic level processing on the audio features; receiving additional data; extracting additional features from the additional data; fusing the audio features and the additional features into a unified structure; receiving a Hidden Markov Model (HMM); and performing a quantum search over the features using the HMM and the unified structure. | 12-11-2014 |
704256200 | Training of HMM (EPO) | 4 |
20090055182 | Discriminative Training of Hidden Markov Models for Continuous Speech Recognition - Methods are given for improving discriminative training of hidden Markov models for continuous speech recognition. For a mixture component of a hidden Markov model state, a gradient adjustment is calculated of the standard deviation of the mixture component. If the calculated gradient adjustment is greater than a first threshold amount, an adjustment is performed of the standard deviation of the mixture component using the first threshold. If the calculated gradient adjustment is less than a second threshold amount, an adjustment is performed of the standard deviation of the mixture component using the second threshold. Otherwise, an adjustment is performed of the standard deviation of the mixture component using the calculated gradient adjustment. | 02-26-2009 |
20090112595 | DISCRIMINATIVE TRAINING OF MULTI-STATE BARGE-IN MODELS FOR SPEECH PROCESSING - Disclosed are systems and methods for training a barge-in-model for speech processing in a spoken dialogue system comprising the steps of (1) receiving an input having at least one speech segment and at least one non-speech segment, (2) establishing a restriction of recognizing only speech states during speech segments of the input and non-speech states during non-speech segments of the input, (2) generating a hypothesis lattice by allowing any sequence of speech Hidden Markov Models (HMMs) and non-speech HMMs, (4) generating a reference lattice by only allowing speech HMMs for at least one speech segment and non-speech HMMs for at least one non-speech segment, wherein different iterations of training generates at least one different reference lattice and at least one reference transcription, and (5) employing the generated reference lattice as the barge-in-model for speech processing. | 04-30-2009 |
20100070280 | PARAMETER CLUSTERING AND SHARING FOR VARIABLE-PARAMETER HIDDEN MARKOV MODELS - A speech recognition system uses Gaussian mixture variable-parameter hidden Markov models (VPHMMs) to recognize speech. The VPHMMs include Gaussian parameters that vary as a function of at least one environmental conditioning parameter. The relationship of each Gaussian parameter to the environmental conditioning parameter(s) is modeled using a piecewise fitting approach, such as by using spline functions. In a training phase, the recognition system can use clustering to identify classes of spline functions, each class grouping together spline functions which are similar to each other based on some distance measure. The recognition system can then store sets of spline parameters that represent respective classes of spline functions. An instance of a spline function that belongs to a class can make reference to an associated shared set of spline parameters. The Gaussian parameters can be represented in an efficient form that accommodates the use of sharing in the above-summarized manner. | 03-18-2010 |
20110010176 | HMM LEARNING DEVICE AND METHOD, PROGRAM, AND RECORDING MEDIUM - An HMM (Hidden Markov Model) learning device includes: a learning unit for learning a state transition probability as the function of actions that an agent can execute, with learning with HMM performed based on actions that the agent has executed, and time series information made up of an observation signal; and a storage unit for storing learning results by the learning unit as internal model data including a state-transition probability table and an observation probability table; with the learning unit calculating frequency variables used for estimation calculation of HMM state-transition and HMM observation probabilities; with the storage unit holding the frequency variables corresponding to each of state-transition probabilities and each of observation probabilities respectively, of the state-transition probability table; and with the learning unit using the frequency variables held by the storage unit to perform learning, and estimating the state-transition probability and the observation probability based on the frequency variables. | 01-13-2011 |
704256400 | Duration modeling in HMM, e.g., semi HMM, segmental models, transition probabilities (EPO) | 3 |
20080243506 | SPEECH RECOGNITION APPARATUS AND METHOD AND PROGRAM THEREFOR - A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the second probability. | 10-02-2008 |
20150371631 | CACHING SPEECH RECOGNITION SCORES - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for caching speech recognition scores. In some implementations, one or more values comprising data about an utterance are received. An index value is determined for the one or more values. An acoustic model score for the one or more received values is selected, from a cache of acoustic model scores that were computed before receiving the one or more values, based on the index value. A transcription for the utterance is determined using the selected acoustic model score. | 12-24-2015 |
20160155440 | GENERATION DEVICE, RECOGNITION DEVICE, GENERATION METHOD, AND COMPUTER PROGRAM PRODUCT | 06-02-2016 |
704256500 | Hidden Markov (HM) network (EPO) | 2 |
20090055183 | System and Method for Text Tagging and Segmentation Using a Generative/Discriminative Hybrid Hidden Markov Model - A method for sequence tagging medical patient records includes providing a labeled corpus of sentences taken from a set of medical records, initializing generative parameters θ and discriminative parameters {tilde over (θ)}, providing a functional LL−C×Penalty, where LL is a log-likelihood function | 02-26-2009 |
20140337031 | METHOD AND APPARATUS FOR DETECTING A TARGET KEYWORD - A method of detecting a target keyword for activating a function in an electronic device is disclosed. The method includes receiving an input sound starting from one of the plurality of portions of the target keyword. The input sound may be periodically received based on a duty cycle. The method extracts a plurality of sound features from the input sound, and obtains state information on a plurality of states associated with the portions of the target keyword. Based on the extracted sound features and the state information, the input sound may be detected as the target keyword. The plurality of states includes a predetermined number of entry states indicative of a predetermined number of the plurality of portions. | 11-13-2014 |
704256600 | State emission probability (EPO) | 1 |
704256700 | Continuous density, e.g, Gaussian distribution, Lapalce (EPO) | 1 |
20100191532 | Model-based comparative measure for vector sequences and word spotting using same - An object comparison method comprises: generating a first ordered vector sequence representation of a first object; generating a second ordered vector sequence representation of a second object; representing the first object by a first ordered sequence of model parameters generated by modeling the first ordered vector sequence representation using a semi-continuous hidden Markov model employing a universal basis; representing the second object by a second ordered sequence of model parameters generated by modeling the second ordered vector sequence representation using a semi-continuous hidden Markov model employing the universal basis; and comparing the first and second ordered sequences of model parameters to generate a quantitative comparison measure. | 07-29-2010 |