Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Creating patterns for matching

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

704231000 - Recognition

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
704244000 Update patterns 37
704245000 Clustering 10
Entries
DocumentTitleDate
20100161330SPEECH MODELS GENERATED USING COMPETITIVE TRAINING, ASYMMETRIC TRAINING, AND DATA BOOSTING - Speech models are trained using one or more of three different training systems. They include competitive training which reduces a distance between a recognized result and a true result, data boosting which divides and weights training data, and asymmetric training which trains different model components differently.06-24-2010
20130211835SPEECH RECOGNITION CIRCUIT AND METHOD - A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, said memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying said scores by adding said score updates to said scores; and a selector circuit for selecting at least one node or group of adjacent nodes of the lexical tree according to said scores.08-15-2013
20120245939METHOD AND SYSTEM FOR CONSIDERING INFORMATION ABOUT AN EXPECTED RESPONSE WHEN PERFORMING SPEECH RECOGNITION - A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system according to the invention has several embodiments including: comparing the observed speech features to the models of the expected response separately from the usual hypothesis search in order to speed up the recognition system; modifying the usual hypothesis search to emphasize the expected response; updating and adapting the models when the recognized speech matches the expected response to improve the accuracy of the recognition system.09-27-2012
20130085756System and Method of Semi-Supervised Learning for Spoken Language Understanding Using Semantic Role Labeling - A system and method are disclosed for providing semi-supervised learning for a spoken language understanding module using semantic role labeling. The method embodiment relates to a method of generating a spoken language understanding module. Steps in the method comprise selecting at least one predicate/argument pair as an intent from a set of the most frequent predicate/argument pairs for a domain, labeling training data using mapping rules associated with the selected at least one predicate/argument pair, training a call-type classification model using the labeled training data, re-labeling the training data using the call-type classification model and iteratively several of the above steps until training set labels converge.04-04-2013
20090157401Semantic Decoding of User Queries - An intelligent query system for processing voiced-based queries is disclosed, which uses semantic based processing to identify the question posed by the user by understanding the meaning of the user's utterance. Based on identifying the meaning of the utterance, the system selects a single answer that best matches the user's query. The answer that is paired to this single question is then retrieved and presented to the user. The system, as implemented, accepts environmental variables selected by the user and is scalable to provide answers to a variety and quantity of user-initiated queries.06-18-2009
20100106501Updating a Voice Template - Updating a voice template for recognizing a speaker on the basis of a voice uttered by the speaker is disclosed. Stored voice templates indicate distinctive characteristics of utterances from speakers. Distinctive characteristics are extracted for a specific speaker based on a voice message utterance received from that speaker. The distinctive characteristics are compared to the characteristics indicated by the stored voice templates to selected a template that matches within a predetermined threshold. The selected template is updated on the basis of the extracted characteristics.04-29-2010
20130090926MOBILE DEVICE CONTEXT INFORMATION USING SPEECH DETECTION - Systems and methods for speech detection in association with a mobile device are described herein. A method described herein for identifying presence of speech associated with a mobile device includes obtaining a plurality of audio samples from the mobile device while the mobile device operates in a mode distinct from a voice call operating mode, generating spectrogram data from the plurality of audio samples, and determining whether the plurality of audio samples include information indicative of speech by classifying the spectrogram data.04-11-2013
20090030688TAGGING SPEECH RECOGNITION RESULTS BASED ON AN UNSTRUCTURED LANGUAGE MODEL FOR USE IN A MOBILE COMMUNICATION FACILITY APPLICATION - Entering information into a software application resident on a mobile communication facility comprises recording speech presented by a user using a mobile communication facility resident capture facility, transmitting the recording through a wireless communication facility to a speech recognition facility, transmitting information relating to the software application to the speech recognition facility, generating results utilizing the speech recognition facility using an unstructured language model based at least in part on the information relating to the software application and the recording, tagging the results with information about the words in the results, transmitting the results and tags to the mobile communications facility, and loading the results and tags into the software application.01-29-2009
20090030687ADAPTING AN UNSTRUCTURED LANGUAGE MODEL SPEECH RECOGNITION SYSTEM BASED ON USAGE - A user may control a mobile communication facility through recognized speech provided to the mobile communication facility. Speech that is recorded by a user using a mobile communication facility resident capture facility is transmitted through a wireless communication facility to a speech recognition facility. The speech recognition facility generates results using an unstructured language model based at least in part on information relating to the recording. The results are transmitted to the mobile communications facility where an action is performed on the mobile communication facility based on the results and adapting the speech recognition facility based on usage.01-29-2009
20120116762SYSTEM AND METHOD FOR COMMUNICATION TERMINAL SURVEILLANCE BASED ON SPEAKER RECOGNITION - A Candidate Isolation System (CIS) detects subscribers of phone call services as candidates to be surveillance targets. A Voice Matching System (VMS) then decides whether or not a given candidate Communication Terminals (CTs) should be tracked by determining, using speaker recognition techniques, whether the subscriber operating the candidate CT is a known target subscriber. The CIS receives from the network call event data that relate to CTs in the network. The CIS detects candidate CTs using a unique candidate isolation process, which applies predefined selection criteria to the received call events data05-10-2012
20130166295METHOD AND APPARATUS FOR SPEAKER-CALIBRATED SPEAKER DETECTION - The present invention relates to a method and apparatus for speaker-calibrated speaker detection. One embodiment of a method for generating a speaker model for use in detecting a speaker of interest includes identifying one or more speech features that best distinguish the speaker of interest from a plurality of impostor speakers and then incorporating the speech features in the speaker model.06-27-2013
20130166296METHOD AND APPARATUS FOR GENERATING SPEAKER-SPECIFIC SPOKEN PASSWORDS - The present invention relates to a method and apparatus for generating speaker-specific spoken passwords. One embodiment of a method for generating a spoken password for use by a speaker of interest includes identifying one or more speech features that best distinguish the speaker of interest from a plurality of impostor speakers and incorporating the speech features in the spoken password.06-27-2013
20130166297Discriminative Training of Document Transcription System - A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.06-27-2013
20110301953SYSTEM AND METHOD OF MULTI MODEL ADAPTATION AND VOICE RECOGNITION - Provided is a system of voice recognition that adapts and stores a voice of a speaker for each feature to each of a basic voice model and new independent multi models and provides stable real-time voice recognition through voice recognition using a multi adaptive model.12-08-2011
20110295602APPARATUS AND METHOD FOR MODEL ADAPTATION FOR SPOKEN LANGUAGE UNDERSTANDING - An apparatus and a method are provided for building a spoken language understanding model. Labeled data may be obtained for a target application. A new classification model may be formed for use with the target application by using the labeled data for adaptation of an existing classification model. In some implementations, the existing classification model may be used to determine the most informative examples to label.12-01-2011
20100268536SYSTEM AND METHOD FOR IMPROVING PERFORMANCE OF SEMANTIC CLASSIFIERS IN SPOKEN DIALOG SYSTEMS - A method and apparatus for continuously improving the performance of semantic classifiers in the scope of spoken dialog systems are disclosed. Rule-based or statistical classifiers are replaced with better performing rule-based or statistical classifiers and/or certain parameters of existing classifiers are modified. The replacement classifiers or new parameters are trained and tested on a collection of transcriptions and annotations of utterances which are generated manually or in a partially automated fashion. Automated quality assurance leads to more accurate training and testing data, higher classification performance, and feedback into the design of the spoken dialog system by suggesting changes to improve system behavior.10-21-2010
20100292988SYSTEM AND METHOD FOR SPEECH RECOGNITION - A speech recognition system samples speech signals having the same meaning, and obtains frequency spectrum images of the speech signals. Training objects are obtained by modifying the frequency spectrum images to be the same width. The speech recognition system obtains specific data of the speech signals by analyzing the training objects. The specific data is linked with the meaning of the speech signals. The specific data may include probability values representing probabilities that the training objects appear at different points in an image area of the training objects. A speech command may be sampled, and a frequency spectrum image of the speech command is modified to be the same width as the training objects. The speech recognition system can determine a meaning of a speech command by determining a matching degree of the modified frequency spectrum image of the speech command and the specific data of the speech signals.11-18-2010
20090319269Method of Trainable Speaker Diarization - A novel and useful method of using labeled training data and machine learning tools to train a speaker diarization system. Intra-speaker variability profiles are created from training data consisting of an audio stream labeled where speaker changes occur (i.e. which participant is speaking at any given time). These intra-speaker variability profiles are then applied to an unlabeled audio stream to segment the audio stream into speaker homogeneous segments and to cluster segments according to speaker identity.12-24-2009
20090037172Method for generating a vector codebook, method and device for compressing data, and distributed speech recognition system - A method for compressing data, the data being represented by an input vector having Q features, wherein Q is an integer higher than 1, including the steps of 1) providing a vector codebook of sub-sets of indexed Q-feature reference vectors and threshold values associated with the sub-sets for a prefixed feature; 2) identifying a sub-set of reference vectors among the sub-sets by progressively comparing the value of a feature of the input vector which corresponds to the prefixed feature, with the threshold values associated with the sub-sets; and 3) identifying the reference vector which, within the sub-set identified in step 2), provides the lowest distortion with respect to the input vector.02-05-2009
20080249773METHOD AND SYSTEM FOR THE AUTOMATIC GENERATION OF SPEECH FEATURES FOR SCORING HIGH ENTROPY SPEECH - A method and system for automatically generating a scoring model for scoring a speech sample are disclosed. One or more training speech samples are received in response to a prompt. One or more speech features are determined for each of the training speech samples. A scoring model is then generated based on the speech features. At least one of the training speech samples may be a high entropy speech sample. An evaluation speech sample is received and a score is assigned to the evaluation speech sample using the scoring model. The evaluation speech sample may be a high entropy speech sample.10-09-2008
20120046946SYSTEM AND METHOD FOR MERGING AUDIO DATA STREAMS FOR USE IN SPEECH RECOGNITION APPLICATIONS - A system and method for merging audio data streams receive audio data streams from separate inputs, independently transform each data stream from the time to the frequency domain, and generate separate feature data sets for the transformed data streams. Feature data from each of the separate feature data sets is selected to form a merged feature data set that is output to a decoder for recognition purposes. The separate inputs can include an ear microphone and a mouth microphone.02-23-2012
20080281593Apparatus for Reducing Spurious Insertions in Speech Recognition - Techniques for improving an automatic baseform generation system. More particularly, the invention provides techniques for reducing insertion of spurious speech events in a word or phone sequence generated by an automatic baseform generation system. Such automatic baseform generation techniques may be accomplished by enhancing the scores of long-lasting speech events with respect to the scores of short-lasting events. For example, this may be achieved by merging competing candidates that relate to the same speech event (e.g., phone or word) and that overlap in time into a single candidate, the score of which may be equal to the sum of the scores of the merged candidates.11-13-2008
20090150148VOICE RECOGNITION APPARATUS AND MEMORY PRODUCT - A voice recognition apparatus can reduce false recognition caused by matching with respect to the phrases composed of a small number of syllables, when it performs a recognition process, by a pronunciation unit, for voice data based on voice produced by a speaker such as a syllable and further performs recognition by a method such as the Word Spotting for matching with respect to the phrases stored in the phrase database. The voice recognition apparatus performs a recognition process for comparing a result of the recognition process by a pronunciation unit with the extended phrases obtained by adding the additional phrase before and/or behind the respective phrases.06-11-2009
20100138222Method for Adapting a Codebook for Speech Recognition - A method for adapting a codebook for speech recognition, wherein the codebook is from a set of codebooks comprising a speaker-independent codebook and at least one speaker-dependent codebook is disclosed. A speech input is received and a feature vector based on the received speech input is determined. For each of the Gaussian densities, a first mean vector is estimated using an expectation process and taking into account the determined feature vector. For each of the Gaussian densities, a second mean vector using an Eigenvoice adaptation is determined taking into account the determined feature vector. For each of the Gaussian densities, the mean vector is set to a convex combination of the first and the second mean vector. Thus, this process allows for adaptation during operation and does not require a lengthy training phase.06-03-2010
20090132249MODIFYING METHOD FOR SPEECH MODEL AND MODIFYING MODULE THEREOF - A modifying method for a speech model and a modifying module thereof are provided. The modifying method is as follows. First, a correct sequence of a speech is generated according to a correct sequence generating method and the speech model. Next, a candidate sequence generating method is selected from a plurality of candidate sequence generating methods, and a candidate sequence of the speech is generated according to the selected candidate sequence generating method and the speech model. Finally, the speech model is modified according to the correct sequence and the candidate sequence. Therefore, the present invention increases a discrimination of the speech model.05-21-2009
20090138263Data Process unit and data process unit control program - To provide a data process unit and data process unit control program which are suitable for generating acoustic models for unspecified speakers taking distribution of diversifying feature parameters into consideration under such specific conditions as the type of speaker, speech lexicons, speech styles, and speech environment and which are suitable for providing acoustic models intended for unspecified speakers and adapted to speech of a specific person.05-28-2009
20090006091APPARATUSES AND METHODS FOR HANDLING RECORDED VOICE STRINGS - An apparatus, method, and computer program product for facilitating the identification and manipulation of recorded voice strings is provided. The apparatus includes a processor for receiving a voice string that has been recorded. The processor automatically assigns the recorded voice string a name that is indicative of the content of the voice string or of a characteristic of the voice string but which may include other information regarding the voice string. Thus, the voice string may be assigned a name that provides the user with an idea of the content or circumstance of the voice string when it was recorded without requiring the user to input a name for the recorded voice string. In this way, the user may be able to access the recorded voice string more easily. The apparatus may also include a microphone, memory element, and/or a display for presenting a list of recorded voice strings.01-01-2009
20110144991Compressing Feature Space Transforms - Methods for compressing a transform associated with a feature space are presented. For example, a method for compressing a transform associated with a feature space includes obtaining the transform including a plurality of transform parameters, assigning each of a plurality of quantization levels for the plurality of transform parameters to one of a plurality of quantization values, and assigning each of the plurality of transform parameters to one of the plurality of quantization values to which one of the plurality of quantization levels is assigned. One or more of obtaining the transform, assigning of each of the plurality of quantization levels, and assigning of each of the transform parameters are implemented as instruction code executed on a processor device. Further, a Viterbi algorithm may be employed for use in non-uniform level/value assignments.06-16-2011
20110144993Disfluent-utterance tracking system and method - A disfluent-utterance tracking system includes a speech transducer; one or more targeted-disfluent-utterance records stored in a memory; a real-time speech recording mechanism operatively connected with the speech transducer for recording a real-time utterance; and an analyzer operatively coupled with the targeted-disfluent-utterance record and with the real-time speech recording mechanism, the analyzer configured to compare one or more real-time snippets of the recorded speech with the targeted-disfluent-utterance record to determine and indicate to a user a level of correlation therebetween.06-16-2011
20110231191Weight Coefficient Generation Device, Voice Recognition Device, Navigation Device, Vehicle, Weight Coefficient Generation Method, and Weight Coefficient Generation Program - A weight coefficient generation device, a speech recognition device, a navigation system, a vehicle, a vehicle coefficient generation method, and a weight coefficient generation program are provided for the purpose of improving a speech recognition performance of place names. In order to address the above purpose, an address database 09-22-2011
20100153109METHOD AND APPARATUS FOR SPEECH SEGMENTATION - Machine-readable media, methods, apparatus and system for speech segmentation are described. In some embodiments, a fuzzy rule may be determined to discriminate a speech segment from a non-speech segment. An antecedent of the fuzzy rule may include an input variable and an input variable membership. A consequent of the fuzzy rule may include an output variable and an output variable membership. An instance of the input variable may be extracted from a segment. An input variable membership function associated with the input variable membership and an output variable membership function associated with the output variable membership may be trained. The instance of the input variable, the input variable membership function, the output variable, and the output variable membership function may be operated, to determine whether the segment is the speech segment or the non-speech segment.06-17-2010
20100153108METHOD FOR DYNAMIC LEARNING OF INDIVIDUAL VOICE PATTERNS - The present invention is a system and method for generating a personal voice font including, monitoring voice segments automatically from phone conversations of a user by a voice learning processor to generate a personalized voice font and delivering the personalized voice font (PVF) to the a server.06-17-2010
20080215322Method and System for Generating Training Data for an Automatic Speech Recogniser - The invention describes a method and a system for generating training data (D09-04-2008
20080208578Robust Speaker-Dependent Speech Recognition System - The present invention provides a method of incorporating speaker-dependent expressions into a speaker-independent speech recognition system providing training data for a plurality of environmental conditions and for a plurality of speakers. The speakerdependent expression is transformed in a sequence of feature vectors and a mixture density of the set of speaker-independent training data is determined that has a minimum distance to the generated sequence of feature vectors. The determined mixture density is then assigned to a Hidden-Markov-Model (HMM) state of the speaker-dependent expression. Therefore, speaker-dependent training data and references no longer have to be explicitly stored in the speech recognition system. Moreover, by representing a speaker-dependent expression by speaker-independent training data, an environmental adaptation is inherently provided. Additionally, the invention provides generation of artificial feature vectors on the basis of the speaker-dependent expression providing a substantial improvement for the robustness of the speech recognition system with respect to varying environmental conditions.08-28-2008
20090228275SYSTEM FOR ENHANCING LIVE SPEECH WITH INFORMATION ACCESSED FROM THE WORLD WIDE WEB - A system that includes a speaker workstation and a system that includes an auditor device. The speaker workstation is configured to perform a method for generating a Speech Hyperlink-Time table in conjunction with a system of universal time. The speaker workstation creates a Speech Hyperlink table. While a speech is being spoken by a speaker, the speaker workstation recognizes each hyperlinked term of the Speech Hyperlink table being spoken by the speaker, and for each recognized hyperlinked term, generates a row in the Speech Hyperlink-Time table. The auditor device is configured to perform a method for processing a speech in conjunction with a system of universal time. The auditor device determines and records, in a record of a Selections Hyperlink-Time table, a universal time corresponding to a hyperlinked term spoken during a speech.09-10-2009
20130132083GENERIC FRAMEWORK FOR LARGE-MARGIN MCE TRAINING IN SPEECH RECOGNITION - A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.05-23-2013
20110029311VOICE PROCESSING DEVICE AND METHOD, AND PROGRAM - There is provided a voice processing device. The device includes: score calculation unit configured to calculate a score indicating compatibility of a voice signal input on the basis of an utterance of a user with each of plural pieces of intention information indicating each of a plurality of intentions; intention selection unit configured to select the intention information indicating the intention of the utterance of the user among the plural pieces of intention information on the basis of the score calculated by the score calculation unit; and intention reliability calculation unit configured to calculate the reliability with respect to the intention information selected by the intention selection unit on the basis of the score calculated by the score calculation unit.02-03-2011
20110029312METHODS AND SYSTEMS FOR ADAPTING A MODEL FOR A SPEECH RECOGNITION SYSTEM - Methods are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. A method for model adaptation for a speech recognition system includes determining an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The method may further include adjusting an adaptation, of the model for the word or various models for the various words, based on the error rate. Apparatus are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. An apparatus for model adaptation for a speech recognition system includes a processor adapted to estimate an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The apparatus may further include a controller adapted to adjust an adaptation of the model for the word or various models for the various words, based on the error rate.02-03-2011
20110144992UNSUPERVISED LEARNING USING GLOBAL FEATURES, INCLUDING FOR LOG-LINEAR MODEL WORD SEGMENTATION - Described is a technology for performing unsupervised learning using global features extracted from unlabeled examples. The unsupervised learning process may be used to train a log-linear model, such as for use in morphological segmentation of words. For example, segmentations of the examples are sampled based upon the global features to produce a segmented corpus and log-linear model, which are then iteratively reprocessed to produce a final segmented corpus and a log-linear model.06-16-2011
20100131272APPARATUS AND METHOD FOR GENERATING AND VERIFYING A VOICE SIGNATURE OF A MESSAGE AND COMPUTER READABLE MEDIUM THEREOF - Apparatuses and methods for generating and verifying a voice signature of a message and computer readable medium thereof are provided. The generation and verification ends both use the same set of pronounceable symbols. The set of pronounceable symbols comprises a plurality of pronounceable units, and each of the pronounceable units comprises an index and a pronounceable symbol. The generation end converts the message into a message digest by a hash function and generates a plurality of designated pronounceable symbols according to the message digest. A user utters the designated pronounceable symbols to generate the voice signature. After receiving the message and the voice signature, the verification end performs voice authentication to determine a user identity of the voice signature, performs speech recognition to determine the relation between the message and the voice signature, and determines whether the user generates the voice signature for the message.05-27-2010
20100057461METHOD AND SYSTEM FOR CREATING OR UPDATING ENTRIES IN A SPEECH RECOGNITION LEXICON - In a method and a system (03-04-2010
20100057462Speech Recognition - The present invention relates to a method for speech recognition of a speech signal comprising the steps of providing at least one codebook comprising codebook entries, in particular, multivariate Gaussians of feature vectors, that are frequency weighted such that higher weights are assigned to entries corresponding to frequencies below a predetermined level than to entries corresponding to frequencies above the predetermined level and processing the speech signal for speech recognition comprising extracting at least one feature vector from the speech signal and matching the feature vector with the entries of the codebook.03-04-2010
20100070276METHOD AND APPARATUS FOR INTERACTION OR DISCOURSE ANALYTICS - A method and apparatus for analyzing and segmenting a vocal interaction captured in a test audio source, the test audio source captured within an environment. The method and apparatus first use text and acoustic features extracted from the interaction with tagging information, for constructing a model. Then, at production time, text and acoustic features are extracted from the interactions, and by applying the model, tagging information is retrieved for the interaction, enabling analysis, flow visualization or further processing of the interaction.03-18-2010
20110153327SYSTEMS AND METHODS FOR IDENTITY MATCHING BASED ON PHONETIC AND EDIT DISTANCE MATCHING - According to embodiments of the present disclosure, a matching module is configured to accurately match a probe identity of an entity to a collection of entities. The matching module is configured to match the probe identity of the entity to the collection of entities based on a combination of phonetic matching processes and edit distance processes. The matching module is configured to create phonetic groups for name parts of identities in the collection. The matching module is configured to compare probe name parts of the probe identity to the name parts associated with the phonetic groups.06-23-2011
20110082696SYSTEM AND METHOD FOR SPEECH-ENABLED ACCESS TO MEDIA CONTENT - Disclosed herein are systems, methods, and computer-readable storage media for generating a speech recognition model for a media content retrieval system. The method causes a computing device to retrieve information describing media available in a media content retrieval system, construct a graph that models how the media are interconnected based on the retrieved information, rank the information describing the media based on the graph, and generate a speech recognition model based on the ranked information. The information can be a list of actors, directors, composers, titles, and/or locations. The graph that models how the media are interconnected can further model pieces of common information between two or more media. The method can further cause the computing device to weight the graph based on the retrieved information. The graph can further model relative popularity information in the list. The method can rank information based on a PageRank algorithm.04-07-2011
20110060588Method and System for Automatic Speech Recognition with Multiple Contexts - A method and a system for activating functions including a first function and a second function, wherein the system is embedded in an apparatus, are disclosed. The system includes a control configured to be activated by a plurality of activation styles, wherein the control generates a signal indicative of a particular activation style from multiple activation styles; and controller configured to activate either the first function or the second function based on the particular activation style, wherein the first function is configured to be executed based only on the activation style, and wherein the second function is further configured to be executed based on a speech input.03-10-2011
20120203553RECOGNITION DICTIONARY CREATING DEVICE, VOICE RECOGNITION DEVICE, AND VOICE SYNTHESIZER - A recognition dictionary creating device includes a user dictionary in which a phoneme label string of an inputted voice is registered and an interlanguage acoustic data mapping table in which a correspondence between phoneme labels in different languages is defined, and refers to the interlanguage acoustic data mapping table to convert the phoneme label string registered in the user dictionary and expressed in a language set at the time of creating the user dictionary into a phoneme label string expressed in another language which the recognition dictionary creating device has switched.08-09-2012
20090119103SPEAKER RECOGNITION SYSTEM - A method automatically recognizes speech received through an input. The method accesses one or more speaker-independent speaker models. The method detects whether the received speech input matches a speaker model according to an adaptable predetermined criterion. The method creates a speaker model assigned to a speaker model set when no match occurs based on the input.05-07-2009
20110071829IMAGE PROCESSING APPARATUS, SPEECH RECOGNITION PROCESSING APPARATUS, CONTROL METHOD FOR SPEECH RECOGNITION PROCESSING APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM FOR COMPUTER PROGRAM - An image processing apparatus includes a speech input portion that receives an input of speech from a user, a dictionary storage portion that stores a dictionary configured by phrase information pieces for recognizing the speech, a compound phrase generation portion that generates a plurality of compound phrases formed by all combinations of a plurality of predetermined phrases in different orders, a compound phrase registration portion that registers the plurality of compound phrases that have been generated in the dictionary as the phrase information pieces, a speech recognition portion that, in a case where speech including a speech phrase formed by the plurality of predetermined phrases said in an arbitrary order has been input, performs speech recognition on the speech by searching the dictionary for a compound phrase that matches the speech phrase.03-24-2011
20100324897AUDIO RECOGNITION DEVICE AND AUDIO RECOGNITION METHOD - Acoustic models and language models are learned according to a speaking length which indicates a length of a speaking section in speech data, and speech recognition process is implemented by using the learned acoustic models and language models. A speech recognition apparatus includes means (12-23-2010
20120010885System and Method for Unsupervised and Active Learning for Automatic Speech Recognition - A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.01-12-2012
20110166858METHOD OF RECOGNIZING SPEECH - A method for recognizing speech involves presenting an utterance to a speech recognition system and determining, via the speech recognition system, that the utterance contains a particular expression, where the particular expression is capable of being associated with at least two different meanings. The method further involves splitting the utterance into a plurality of speech frames, where each frame is assigned a predetermined time segment and a frame number, and indexing the utterance to i) a predetermined frame number, or ii) a predetermined time segment. The indexing of the utterance identifies that one of the frames includes the particular expression. Then the frame including the particular expression is re-presented to the speech recognition system to verify that the particular expression was actually recited in the utterance.07-07-2011
20120059654SPEAKER-ADAPTIVE SYNTHESIZED VOICE - An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.03-08-2012
20120059653METHODS AND SYSTEMS FOR OBTAINING LANGUAGE MODELS FOR TRANSCRIBING COMMUNICATIONS - A method for producing speech recognition results on a device includes receiving first speech recognition results, obtaining a language model, wherein the language model represents information stored on the device, and using the first speech recognition results and the language model to generate second speech recognition results.03-08-2012
20110093265Systems and Methods for Creating and Using Geo-Centric Language Models - Systems and methods for creating and using geo-centric language models are provided herein. An exemplary method includes assigning each of a plurality of listings to a local service area, determining a geographic center for the local service area, computing a listing density for the local service area, and selecting a desired number of listings for a geo-centric listing set. The geo-centric listing set includes a subset of the plurality of listings. The exemplary method further includes dividing the local service area into regions based upon the listing density and the number of listings in the geo-centric listing set, and building a language model for the geo-centric listing set.04-21-2011
20120072217SYSTEM AND METHOD FOR USING PROSODY FOR VOICE-ENABLED SEARCH - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating relevant responses to a user query with voice-enabled search. A system practicing the method receives a word lattice generated by an automatic speech recognizer based on a user speech and a prosodic analysis of the user speech, generates a reweighted word lattice based on the word lattice and the prosodic analysis, approximates based on the reweighted word lattice one or more relevant responses to the query, and presents to a user the responses to the query. The prosodic analysis examines metalinguistic information of the user speech and can identify the most salient subject matter of the speech, assess how confident a speaker is in the content of his or her speech, and identify the attitude, mood, emotion, sentiment, etc. of the speaker. Other information not described in the content of the speech can also be used.03-22-2012
20120130715METHOD AND APPARATUS FOR GENERATING A VOICE-TAG - According to one embodiment, an apparatus for generating a voice-tag includes an input unit, a recognition unit, and a combination unit. The input unit is configured to input a registration speech. The recognition unit is configured to recognize the registration speech to obtain N-best recognition results, wherein N is an integer greater than or equal to 2. The combination unit is configured to combine the N-best recognition results as a voice-tag of the registration speech.05-24-2012
20100250251ADAPTATION FOR STATISTICAL LANGUAGE MODEL - Architecture that suppresses the unexpected appearance of words by applying appropriate restrictions to long-term and short-term memory. The quickness of adaptation is also realized by leveraging the restriction. The architecture includes a history component for processing user input history for conversion of a phonetic string by a conversion process that output conversion results, and an adaptation component for adapting the conversion process to the user input history based on restriction(s) applied to short-term memory that impacts word appearances during the conversion process. The architecture performs probability boosting based on context-dependent probability differences (short-term memory), and dynamic linear-interpolation between long-term memory and baseline language model based on frequency of preceding context of word (long-term memory).09-30-2010
20120271631SPEECH RECOGNITION USING MULTIPLE LANGUAGE MODELS - In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.10-25-2012
20120221333Phonetic Features for Speech Recognition - Techniques are disclosed for using phonetic features for speech recognition. For example, a method comprises the steps of obtaining a first dictionary and a training data set associated with a speech recognition system, computing one or more support parameters from the training data set, transforming the first dictionary into a second dictionary, wherein the second dictionary is a function of one or more phonetic labels of the first dictionary, and using the one or more support parameters to select one or more samples from the second dictionary to create a set of one or more exemplar-based class identification features for a pattern recognition task.08-30-2012
20100204990SPEECH ANALYZER AND SPEECH ANALYSYS METHOD - A speech analyzer includes a vocal tract and sound source separating unit which separates a vocal tract feature and a sound source feature from an input speech, based on a speech generation model, a fundamental frequency stability calculating unit which calculates a temporal stability of a fundamental frequency of the input speech in the sound source feature, from the separated sound source feature, a stable analyzed period extracting unit which extracts time information of a stable period, based on the temporal stability, and a vocal tract feature interpolation unit which interpolates a vocal tract feature which is not included in the stable period, using a vocal tract feature included in the extracted stable period.08-12-2010
20100063817ACOUSTIC MODEL REGISTRATION APPARATUS, TALKER RECOGNITION APPARATUS, ACOUSTIC MODEL REGISTRATION METHOD AND ACOUSTIC MODEL REGISTRATION PROCESSING PROGRAM - An acoustic model registration apparatus, an talker recognition apparatus, an acoustic model registration method and an acoustic model registration processing program, each of which prevents certainly an acoustic model having a low recognition capability for talker from being registered certainly, are provided.03-11-2010
20120215535METHOD AND APPARATUS FOR AUTOMATIC CORRELATION OF MULTI-CHANNEL INTERACTIONS - A method and apparatus for multi-channel categorization, comprising capturing a vocal interaction and a non-vocal interaction, using logging or capturing devices; retrieving a first word from the vocal interaction and a second word from the non-vocal interaction; assigning the vocal interaction into a first category using the first word; assigning the non-vocal interaction into a second category using the second word; and associating the first category and the second category into a multi-channel category, thus aggregating the vocal interaction and the non-vocal interaction.08-23-2012
20120232902SYSTEM AND METHOD FOR SPEECH RECOGNITION MODELING FOR MOBILE VOICE SEARCH - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating an acoustic model for use in speech recognition. A system configured to practice the method first receives training data and identifies non-contextual lexical-level features in the training data. Then the system infers sentence-level features from the training data and generates a set of decision trees by node-splitting based on the non-contextual lexical-level features and the sentence-level features. The system decorrelates training vectors, based on the training data, for each decision tree in the set of decision trees to approximate full-covariance Gaussian models, and then can train an acoustic model for use in speech recognition based on the training data, the set of decision trees, and the training vectors.09-13-2012
20080300876Speech Recognition Device Using Statistical Language Model - [Object] To provide recognition of natural speech for a speech application in a grammar method with little effort and cost.12-04-2008
20110004473APPARATUS AND METHOD FOR ENHANCED SPEECH RECOGNITION - A method and apparatus for improving speech recognition results for an audio signal captured within an organization, comprising: receiving the audio signal captured by a capturing or logging device; extracting a phonetic feature and an acoustic feature from the audio signal; decoding the phonetic feature into a phonetic searchable structure; storing the phonetic searchable structure and the acoustic feature in an index; performing phonetic search for a word or a phrase in the phonetic searchable structure to obtain a result; activating an audio analysis engine which receives the acoustic feature to validate the result and obtain an enhanced result.01-06-2011
20120239399VOICE RECOGNITION DEVICE - Disclosed is a voice recognition device which creates a recognition dictionary (statically-created dictionary) in advance for a vocabulary having words to be recognized whose number is equal to or larger than a threshold, and creates a recognition dictionary (dynamically-created dictionary) for a vocabulary having words to be recognized whose number is smaller than the threshold in an interactive situation.09-20-2012
20110046952ACOUSTIC MODEL LEARNING DEVICE AND SPEECH RECOGNITION DEVICE - Parameters of a first variation model, a second variation model and an environment-independent acoustic model are estimated in such a way that an integrated degree of fitness obtained by integrating a degree of fitness of the first variation model to the sample speech data, a degree of fitness of the second variation model to the sample speech data, and a degree of fitness of the environment-independent acoustic model to the sample speech data becomes the maximum. Therefore, when constructing an acoustic model by using sample speech data affected by a plurality of acoustic environments; the effect on a speech which is caused by each of the acoustic environments can be extracted with high accuracy.02-24-2011
20110276329SPEECH DIALOGUE APPARATUS, DIALOGUE CONTROL METHOD, AND DIALOGUE CONTROL PROGRAM - A speech dialogue apparatus, a dialogue control method, and a dialogue control program are provided, whereby an appropriate dialogue control is enabled by determining a user's proficiency level in a dialogue behavior correctly and performing an appropriate dialogue control according to the user's proficiency level correctly determined, without being influenced by an accidental one-time behavior of the user. An input unit 11-10-2011
20120101821SPEECH RECOGNITION APPARATUS - A speech recognition apparatus is disclosed. The apparatus converts a speech signal into a digitalized speech data, and performs speech recognition based on the speech data. The apparatus makes a comparison between the speech data inputted the last time and the speech data inputted the time before the last time in response to a user's indication that the speech recognition results in erroneous recognition multiple times in a row. When the speech data inputted the last time is determined to substantially match the speech data inputted the time before the last time, the apparatus outputs a guidance prompting the user to utter an input target by calling it by another name.04-26-2012
20100169093INFORMATION PROCESSING APPARATUS, METHOD AND RECORDING MEDIUM FOR GENERATING ACOUSTIC MODEL - An information processing apparatus for speech recognition includes a first speech dataset storing speech data uttered by low recognition rate speakers; a second speech dataset storing speech data uttered by a plurality of speakers; a third speech dataset storing speech data to be mixed with the speech data of the second speech dataset; a similarity calculating part obtaining, for each piece of the speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset; a speech data selecting part recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset; and an acoustic model generating part generating a first acoustic model using the speech data recorded in the second speech dataset and the third speech dataset.07-01-2010
20080221884MOBILE ENVIRONMENT SPEECH PROCESSING FACILITY - In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a software application resident on a mobile communication facility, where recorded speech may be presented by the user using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility, and may be accompanied by information related to the software application. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the software application and the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the software application. In embodiments, the user may be allowed to alter the results that are received from the speech recognition facility.09-11-2008
20110218804SPEECH PROCESSOR, A SPEECH PROCESSING METHOD AND A METHOD OF TRAINING A SPEECH PROCESSOR - A speech recognition method, the method involving: 09-08-2011
20110231190METHOD OF AND SYSTEM FOR PROVIDING ADAPTIVE RESPONDENT TRAINING IN A SPEECH RECOGNITION APPLICATION - A system for conducting a telephonic speech recognition application includes an automated telephone device for making telephonic contact with a respondent and a speech recognition device which, upon the telephonic contact being made, presents the respondent with at least one introductory prompt for the respondent to reply to; receives a spoken response from the respondent; and performs a speech recognition analysis on the spoken response to determine a capability of the respondent to complete the application. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is capable of competing the application, the speech recognition device presents at least one application prompt to the respondent. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is not capable of completing the application, the speech recognition system presents instructions on completing the application to the respondent.09-22-2011
20110231189METHODS AND APPARATUS FOR EXTRACTING ALTERNATE MEDIA TITLES TO FACILITATE SPEECH RECOGNITION - Techniques for generating a set of one or more alternate titles associated with stored digital media content and updating a speech recognition system to enable the speech recognition system to recognize the set of alternate titles. The system operates on an original media title to extract a set of alternate media titles by applying at least one rule to the original title. The extracted set of alternate media titles are used to update the speech recognition system prior to runtime. In one aspect rules that are applied to original titles are determined by analyzing a corpus of original titles and corresponding possible alternate media titles that a user may use to refer to the original titles.09-22-2011
20130151252SYSTEM AND METHOD FOR STANDARDIZED SPEECH RECOGNITION - Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.06-13-2013
20130185070NORMALIZATION BASED DISCRIMINATIVE TRAINING FOR CONTINUOUS SPEECH RECOGNITION - A speech recognition system trains a plurality of feature transforms and a plurality of acoustic models using an irrelevant variability normalization based discriminative training. The speech recognition system employs the trained feature transforms to absorb or ignore variability within an unknown speech that is irrelevant to phonetic classification. The speech recognition system may then recognize the unknown speech using the trained recognition models. The speech recognition system may further perform an unsupervised adaptation to adapt the feature transforms for the unknown speech and thus increase the accuracy of recognizing the unknown speech.07-18-2013

Patent applications in class Creating patterns for matching

Patent applications in all subclasses Creating patterns for matching