Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Recognition

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
704251000 Word recognition 418
704235000 Speech to image 412
704246000 Voice recognition 270
704233000 Detect speech in noise 196
704236000 Specialized equations or comparisons 132
704243000 Creating patterns for matching 124
704232000 Neural network 13
704234000 Normalizing 10
Entries
DocumentTitleDate
20130030802MAINTAINING AND SUPPLYING SPEECH MODELS - Maintaining and supplying a plurality of speech models is provided. A plurality of speech models and metadata for each speech model are stored. A query for a speech model is received from a source. The query includes one or more conditions. The speech model with metadata most closely matching the supplied one or more conditions is determined. The determined speech model is provided to the source. A refined speech model is received from the source, and the refined speech model is stored.01-31-2013
20100023328Audio Recognition System - A system and method of identifying an audio track uses music identification software that produces a fingerprint or audio profile for an audio segment recorded with a portable communication device. The audio profile is transmitted from the portable communication device to a remote service provider over a communication network. The remote server receives the transmitted audio track profile and compares the profile to a stored database of audio tracks. If a matching audio track is identified by the remote server, metadata relating to the identified audio track is transmitted from the remote server to the portable communication device. The received audio track metadata is then displayed on the portable communication device.01-28-2010
20110202337Method and Discriminator for Classifying Different Segments of a Signal - For classifying different segments of a signal which has segments of at least a first type and second type, e.g. audio and speech segments, the signal is short-term classified on the basis of the at least one short-term feature extracted from the signal and a short-term classification result is delivered. The signal is also long-term classified on the basis of the at least one short-term feature and at least one long-term feature extracted from the signal and a long-term classification result is delivered. The short-term classification result and the long-term classification result are combined to provide an output signal indicating whether a segment of the signal is of the first type or of the second type.08-18-2011
20110202339SPEECH SOUND DETECTION APPARATUS - A speech sound detection apparatus receives an input audio signal (as a sound reception unit), and computes input power that indicates a magnitude of the sound represented by the audio signal (as an input power computation unit). The apparatus estimates a correction function that is a continuous function defining a relation between a certain frequency and a correction coefficient used to approximate the input power computed at that frequency to the reference power predetermined for that frequency (as a correction function estimation unit). The apparatus corrects the input power at every frequency, based upon the correction coefficient that is obtained in accordance with the relation defined by the estimated correction function (as an input power correcting unit). The apparatus further determines whether or not the sound represented by the received audio signal is speech sound, based upon the corrected input power (as a speech sound detection unit).08-18-2011
20110202338SYSTEM AND METHOD FOR RECOGNITION OF ALPHANUMERIC PATTERNS INCLUDING LICENSE PLATE NUMBERS - Voice recognition technology is combined with external information sources and/or contextual information to enhance the quality of voice recognition results specifically for the use case of reading out or speaking an alphanumeric identifier. The alphanumeric identifier may be associated with a good, service, person, account, or other entity. For example, the identifier may be a vehicle license plate number.08-18-2011
20120245932VOICE RECOGNITION APPARATUS - According to one embodiment, a voice recognition apparatus includes a determination unit, an estimating unit, and a voice recognition unit. The determination unit determines whether a component with a frequency of not less than 1000 Hz and with a level not lower than a predetermined level is included in a sound input from a plurality of microphones. The estimating unit estimates a sound source direction of the sound when the determination unit determines that the component is included in the sound. The voice recognition unit recognizes whether the sound obtained in the sound source direction coincides with a voice model registered beforehand.09-27-2012
20130080160DOCUMENT READING-OUT SUPPORT APPARATUS AND METHOD - According to one embodiment, a document reading-out support apparatus is provided with first to third acquisition units, an extraction unit, a decision unit and a user verification unit. The first acquisition unit acquires a document having texts. The second acquisition unit acquires metadata having definitions each of which includes an applicable condition and a reading-out style. The extraction unit extracts features of the document. The third acquisition unit acquires execution environment information. The decision unit decides candidates of parameters of reading-out based on the features and the information. The user verification unit presents the candidates and accepts a verification instruction.03-28-2013
20130080159DETECTION OF CREATIVE WORKS ON BROADCAST MEDIA - This disclosure relates to systems and methods for proactively determining identification information for a plurality of audio segments within a plurality of broadcast media streams, and providing identification information associated with specific audio portions of a broadcast media stream automatically or upon request.03-28-2013
20130080161SPEECH RECOGNITION APPARATUS AND METHOD - According to one embodiment, a speech recognition apparatus includes following units. The service estimation unit estimates a service being performed by a user, by using non-speech information, and to generate service information. The speech recognition unit performs speech recognition on speech information in accordance with a speech recognition technique corresponding to the service information. The feature quantity extraction unit extracts a feature quantity related to the service of the user, from the speech recognition result. The service estimation unit re-estimates the service by using the feature quantity. The speech recognition unit performs speech recognition based on the re-estimation result.03-28-2013
20090157399APPARATUS AND METHOD FOR EVALUATING PERFORMANCE OF SPEECH RECOGNITION - An apparatus for evaluating the performance of speech recognition includes a speech database for storing N-number of test speech signals for evaluation. A speech recognizer is located in an actual environment and executes the speech recognition of the test speech signals reproduced using a loud speaker from the speech database in the actual environment to produce speech recognition results. A performance evaluation module evaluates the performance of the speech recognition by comparing correct recognition results answers with the speech recognition results.06-18-2009
20100106497INTERNAL AND EXTERNAL SPEECH RECOGNITION USE WITH A MOBILE COMMUNICATION FACILITY - In embodiments of the present invention improved capabilities are described for a user interacting with a mobile communication facility, where speech presented by the user is recorded using a mobile communication facility resident capture facility. The recorded speech may be recognized using an external speech recognition facility to produce an external output and a resident speech recognition facility to produce an internal output, where at least one of the external output and the internal output may be selected based on a criteria.04-29-2010
20120191448SPEECH RECOGNITION USING DOCK CONTEXT - Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for performing speech recognition using dock context. In one aspect, a method includes accessing audio data that includes encoded speech. Information that indicates a docking context of a client device is accessed, the docking context being associated with the audio data. A plurality of language models is identified. At least one of the plurality of language models is selected based on the docking context. Speech recognition is performed on the audio data using the selected language model to identify a transcription for a portion of the audio data.07-26-2012
20110015924ACOUSTIC SOURCE SEPARATION - A method of separating a mixture of acoustic signals from a plurality of sources comprises: providing pressure signals indicative of time-varying acoustic pressure in the mixture; defining a series of time windows; and for each time window: a) providing from the pressure signals a series of sample values of measured directional pressure gradient; b) identifying different frequency components of the pressure signals c) for each frequency component defining an associated direction; and d) from the frequency components and their associated directions generating a separated signal for one of the sources.01-20-2011
20130060566SPEECH COMMUNICATION SYSTEM AND METHOD, AND ROBOT APPARATUS - This invention realizes a speech communication system and method, and a robot apparatus capable of significantly improving entertainment property. A speech communication system with a function to make conversation with a conversation partner is provided with a speech recognition means for recognizing speech of the conversation partner, a conversation control means for controlling conversation with the conversation partner based on the recognition result of the speech recognition means, an image recognition means for recognizing the face of the conversation partner, and a tracking control means for tracing the existence of the conversation partner based on one or both of the recognition result of the image recognition means and the recognition result of the speech recognition means. The conversation control means controls conversation so as to continue depending on tracking of the tracking control means.03-07-2013
20120226497SOUND RECOGNITION METHOD AND SYSTEM - A method for generating an anti-model of a sound class is disclosed. A plurality of candidate sound data is provided for generating the anti-model. A plurality of similarity values between the plurality of candidate sound data and a reference sound model of a sound class is determined. An anti-model of the sound class is generated based on at least one candidate sound data having the similarity value within a similarity threshold range.09-06-2012
20110046948AUTOMATIC SOUND RECOGNITION BASED ON BINARY TIME FREQUENCY UNITS - The invention relates to a method of automatic sound recognition. The object of the present invention is to provide an alternative scheme for automatically recognizing sounds, e.g. human speech. The problem is solved by providing a training database comprising a number of models, each model representing a sound element in the form of a binary mask comprising binary time frequency (TF) units which indicate the energetic areas in time and frequency of the sound element in question, or of characteristic features or statistics extracted from the binary mask; providing an input signal comprising an input sound element; estimating the input sound element based on the models of the training database to provide an output sound element. The method has the advantage of being relatively simple and adaptable to the application in question. The invention may e.g. be used in devices comprising automatic sound recognition, e.g. for sound, e.g. voice control of a device, or in listening devices, e.g. hearing aids, for improving speech perception.02-24-2011
20130066629Speech & Music Discriminator for Multi-Media Applications - The present invention relates to means and methods of classifying speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.03-14-2013
20110022385METHOD AND EQUIPMENT OF PATTERN RECOGNITION, ITS PROGRAM AND ITS RECORDING MEDIUM - The present invention provides a method and equipment of pattern recognition capable of efficiently pruning partial hypotheses without lowering recognition accuracy, its pattern recognition program, and its recording medium. In a second search unit, a likelihood calculation unit calculates an acoustic likelihood by matching time series data of acoustic feature parameters against a lexical tree stored in a second database and an acoustic model stored in a third database to determine an accumulated likelihood by accumulating the acoustic likelihood in a time direction. A self-transition unit causes each partial hypothesis to make a self-transition in a search process. An LR transition unit causes each partial hypothesis to make an RL transition. A reward attachment unit adds a reward R(x) in accordance with the number of reachable words to each partial hypothesis to raise the accumulated likelihood. A pruning unit excludes partial hypotheses with less likelihood from search targets.01-27-2011
20090012785SAMPLING RATE INDEPENDENT SPEECH RECOGNITION - A sampling-rate-independent method of automated speech recognition (ASR). Speech energies of a plurality of codebooks generated from training data created at an ASR sampling rate are compared to speech energies in a current frame of acoustic data generated from received audio created at an audio sampling rate below the ASR sampling rate. A codebook is selected from the plurality of codebooks, and has speech energies that correspond to speech energies in the current frame over a spectral range corresponding to the audio sampling rate. Speech energies above the spectral range are copied from the selected codebook and appended to the current frame.01-08-2009
20130166290VOICE RECOGNITION APPARATUS - A voice recognition apparatus includes a command recognizer and a data recognizer. The command recognizer recognizes a command portion of a voice input and outputs a command based on a voice recognition result of the voice input. The data recognizer recognizes a data portion of a voice inputs and outputs a data based on a voice recognition result of the voice input. The data recognizer further includes a plurality of data-category recognizers respectively using a data-category dictionary for recognizing the data portion of the voice input and outputting a data result. A voice recognition result selection unit of the voice recognition apparatus selects one of the data results from the data-category recognizers based on the command recognized by the command recognizer.06-27-2013
20080294431Displaying text of speech in synchronization with the speech - Displays a character string representing content of speech in synchronization with reproduction of the speech. An apparatus includes: a unit for obtaining scenario data representing the speech; a unit for dividing textual data resulting from recognition of the speech to generate pieces of recognition pieces of recognition data; a unit for detecting in the scenario data a character matching each character contained in each piece of recognition data for which no matching character string has been detected to detect in the scenario data a character string that matches the piece of recognition data; and a unit for setting the display timing of displaying each of character strings contained in the scenario data to the timing at which speech recognized as the piece of recognition data that matches the character string is reproduced.11-27-2008
20110282661METHOD FOR SPEAKER SOURCE CLASSIFICATION - A method for classifying a pair of audio signals into an agent audio signal and a customer audio signal. One embodiment relates to unsupervised training, in which the training corpus comprises a multiplicity of audio signal pairs, wherein each pair comprises an agent signal and a customer signal, and wherein it is unknown for each signal if it is by the agent or by the customer. Training is based on the agent signals being more similar to one another than the customer signals. An agent cluster and a customer cluster are determined. The input signals are associated with the agent or the customer according to the higher score combination of the input signals and the clusters.11-17-2011
20110282662Customer Service Data Recording Device, Customer Service Data Recording Method, and Recording Medium - To enable determining the correlation between customer satisfaction and employee satisfaction, a speech acquisition unit 11-17-2011
20090076812Media usage monitoring and measurement system and method - Media monitoring and measurement systems and methods are disclosed. Some embodiments of the present invention provide a media measurement system and method that utilizes audience data to enhance content identifications. Some embodiments analyze media player log data to enhance content identification. Other embodiments of the present invention analyze sample sequence data to enhance content identifications. Other embodiments analyze sequence data to enhance content identification and/or to establish channel identification. Yet other embodiments provide a system and method in which sample construction and selection parameters are adjusted based upon identification results. Yet other embodiments provide a method in which play-altering activity of an audience member is deduced from content offset values of identifications corresponding to captured samples. Yet other embodiments provide a monitoring and measurement system in which a media monitoring device is adapted to receive a wireless or non-wireless audio signal from a media player, the audio signal also being received wirelessly by headphones of a user of the monitoring device.03-19-2009
20110040559SYSTEMS, COMPUTER-IMPLEMENTED METHODS, AND TANGIBLE COMPUTER-READABLE STORAGE MEDIA FOR TRANSCRIPTION ALIGNMENT - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the ASR output and transcription and generating captions by aligning the transcription with the ASR output between the selected pair of anchor words. The transcription can be human-generated. Selecting pairs of anchor words can be based on a similarity threshold between the ASR output and the transcription. In one variation, commonly used words on a stop list are ineligible as anchor words. The method includes outputting the media presentation with the generated captions. The presentation can be a recording of a live event.02-17-2011
20110301950SPEECH INPUT DEVICE, SPEECH RECOGNITION SYSTEM AND SPEECH RECOGNITION METHOD - A device for speech input includes a speech input unit configured to convert a speech of a user to a speech signal; an angle detection unit configured to detect an angle of the speech input unit; a distance detection unit configured to detect a distance between the speech input unit and the user; and an input switch unit configured to control on and off of the speech input unit based on the angle and the distance.12-08-2011
20110301949SPEAKER-CLUSTER DEPENDENT SPEAKER RECOGNITION (SPEAKER-TYPE AUTOMATED SPEECH RECOGNITION) - In an example embodiment, there is disclosed herein an automatic speech recognition (ASR) system that employs speaker clustering (or speaker type) for transcribing audio. A large corpus of audio with corresponding transcripts is analyzed to determine a plurality of speaker types (e.g., dialects). The ASR system is trained for each speaker type. Upon encountering a new user, the ASR system attempts to map the user to a speaker type. After the new user is mapped to a speaker type, the ASR employs the speaker type for transcribing audio from the new user.12-08-2011
20110288859LANGUAGE CONTEXT SENSITIVE COMMAND SYSTEM AND METHOD - A system and method implements a command system in a speech recognition context in such a way as to enable a user to speak a voice command in a first spoken language to a computer that is operating an application in a second spoken language configuration. The command system identifies the first spoken language the user is speaking, recognizes the voice command, identifies the second spoken language of a target application, and selects the command action in the second spoken language that correlates to the voice command provided in the first spoken language.11-24-2011
20110295601SYSTEM AND METHOD FOR AUTOMATIC IDENTIFICATION OF SPEECH CODING SCHEME - Methods and systems for extracting speech from such packet streams. The methods and systems analyze the encoded speech in a given packet stream, and automatically identify the actual speech coding scheme that was used to produce it. These techniques may be used, for example, in interception systems where the identity of the actual speech coding scheme is sometimes unavailable or inaccessible. For instance, the identity of the actual speech coding scheme may be sent in a separate signaling stream that is not intercepted. As another example, the identity of the actual speech coding scheme may be sent in the same packet stream as the encoded speech, but in encrypted form.12-01-2011
20120089392SPEECH RECOGNITION USER INTERFACE - Speech recognition techniques are disclosed herein. In one embodiment, a novice mode is available such that when the user is unfamiliar with the speech recognition system, a voice user interface (VUI) may be provided to guide them. The VUI may display one or more speech commands that are presently available. The VUI may also provide feedback to train the user. After the user becomes more familiar with speech recognition, the user may enter speech commands without the aid of the novice mode. In this “experienced mode,” the VUI need not be displayed. Therefore, the user interface is not cluttered.04-12-2012
20120130711SPEECH DETERMINATION APPARATUS AND SPEECH DETERMINATION METHOD - A signal portion per frame is extracted from an input signal, thus generating a per-frame signal. The per-frame signal in the time domain is converted into a per-frame signal in the frequency domain, thereby generating a spectral pattern of spectra. It is determined whether an energy ratio is higher than a threshold level. The energy ratio is a ratio of each spectral energy to subband energy in a subband that involves the spectrum. The subband is involved in subbands into which a frequency band is separated with a specific bandwidth. It is determined whether the per-frame signal is a speech segment, based on a result of the determination. Average energy is derived in the frequency direction for the spectra in the spectral pattern in each subband. Subband energy is derived per subband by averaging the average energy in the time domain.05-24-2012
20100036660Emotion Detection Device and Method for Use in Distributed Systems - A prosody analyzer enhances the interpretation of natural language utterances. The analyzer is distributed over a client/server architecture, so that the scope of emotion recognition processing tasks can be allocated on a dynamic basis based on processing resources, channel conditions, client loads etc. The partially processed prosodic data can be sent separately or combined with other speech data from the client device and streamed to a server for a real-time response. Training of the prosody analyzer with real world expected responses improves emotion modeling and the real-time identification of potential features such as emphasis, intent, attitude and semantic meaning in the speaker's utterances.02-11-2010
20090076811Decision Analysis System - A decision analysis system (03-19-2009
20080319741SYSTEM AND METHOD FOR IMPROVING ROBUSTNESS OF SPEECH RECOGNITION USING VOCAL TRACT LENGTH NORMALIZATION CODEBOOKS - Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.12-25-2008
20120239393MULTIPLE AUDIO/VIDEO DATA STREAM SIMULATION - A multiple audio/video data stream simulation method and system. A computing system receives first audio and/or video data streams. The first audio and/or video data streams include data associated with a first person and a second person. The computing system monitors the first audio and/or video data streams. The computing system identifies emotional attributes comprised by the first audio and/or video data streams. The computing system generates second audio and/or video data streams associated with the first audio and/or video data streams. The second audio and/or video data streams include the first audio and/or video data streams data without the emotional attributes. The computing system stores the second audio and/or video data streams.09-20-2012
20080208577Multi-stage speech recognition apparatus and method - Provided are a multi-stage speech recognition apparatus and method. The multi-stage speech recognition apparatus includes a first speech recognition unit performing initial speech recognition on a feature vector, which is extracted from an input speech signal, and generating a plurality of candidate words; and a second speech recognition unit rescoring the candidate words, which are provided by the first speech recognition unit, using a temporal posterior feature vector extracted from the speech signal.08-28-2008
20090112583Language Processing System, Language Processing Method and Program - A language processing system, method, and program to automatically in time obtain text analysis results. The system comprises a plurality of text analysis units, each performing a different type of text analysis processing, an analysis order control means controlling order of analysis of text by each of the text analysis means, and an additional processing execution means receiving and executing additional processing for the text analysis results from each of the text analysis means, from a user. At a stage which a text analysis result by any one of said text analysis units is outputted and said additional processing execution unit operates, said analysis order control unit performs control to start text analysis processing for other text analysis means.04-30-2009
20090287484System and Method for Targeted Tuning of a Speech Recognition System - A system and method of targeted tuning of a speech recognition system are disclosed. In a particular embodiment, a method includes determining a frequency of occurrence of a particular type of utterance method and includes determining whether the frequency of occurrence exceeds a threshold. The method further includes tuning a speech recognition system to improve recognition of the particular type of utterance when the frequency of occurrence of the particular type of utterance exceeds the threshold.11-19-2009
20120296644Hybrid Speech Recognition - A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results.11-22-2012
20080243499SYSTEM AND METHOD OF SPEECH RECOGNITION TRAINING BASED ON CONFIRMED SPEAKER UTTERANCES - An interactive speech recognition training process and system is disclosed. A speech recognition process is applied to a received speaker utterance. Utterance data are matched by the system with data in a grammar database and the speaker is requested to confirm a determined match. If the system determines from the speaker's response that the match is not confirmed, a negative score is assigned to the utterance data. If the match is determined by the system to be confirmed, a positive score is assigned to the utterance data. Scores for a plurality of such speaker utterances are accumulated in a log file, the accumulated scores used to adjust acoustic models for the grammar database.10-02-2008
20080243498METHOD AND SYSTEM FOR PROVIDING INTERACTIVE SPEECH RECOGNITION USING SPEAKER DATA - An interactive speech recognition process and system is disclosed. A user is prompted for selection of one of a number designated phrases represented in a grammar database. Speech recognition processing is applied to an uttered response from the user to match data in the grammar database, thereby identifying the selected phrase. The user is requested to confirm a determined match. If the match is not confirmed, the data corresponding to the matched phrase is removed from the grammar database and the user is re-prompted to select from the remaining phrases.10-02-2008
20080249770Method and apparatus for searching for music based on speech recognition - Provided is a method and apparatus for searching music based on speech recognition. By calculating search scores with respect to a speech input using an acoustic model, calculating preferences in music using a user preference model, reflecting the preferences in the search scores, and extracting a music list according to the search scores in which the preferences are reflected, a personal expression of a search result using speech recognition can be achieved, and an error or imperfection of a speech recognition result can be compensated for.10-09-2008
20080281590Method of Deriving a Set of Features for an Audio Input Signal - The invention describes a method of deriving a set of features (S) of an audio input signal (M), which method comprises identifying a number of first-order features (f11-13-2008
20080215318EVENT RECOGNITION - Recognition of events can be performed by accessing an audio signal having static and dynamic features. A value for the audio signal can be calculated by utilizing different weights for the static and dynamic features such that a frame of the audio signal can be associated with a particular event. A filter can also be used to aid in determining the event for the frame.09-04-2008
20080275699Systems and methods of performing speech recognition using global positioning (GPS) information - Embodiments of the present invention improve content selection systems and methods using speech recognition. In one embodiment, the present invention includes a speech recognition method comprising receiving location parameters from a global positioning system, retrieving location data using the location parameters, and configuring one or more recognition sets of a speech recognizer using the location data.11-06-2008
20080235012System and method of identifying contact information - A system and method for identifying contact information is provided. A system to identify contact information may include an input to receive a data stream. The data stream may include audio content, video content or both. The system may also include an analysis module to detect contact information within the data stream. The system may also include a memory to store a record of the contact information.09-25-2008
20110208519REAL-TIME DATA PATTERN ANALYSIS SYSTEM AND METHOD OF OPERATION THEREOF - A method of operation of a real-time data-pattern analysis system includes: providing a memory module, a computational unit, and an integrated data transfer module arranged within an integrated circuit die; storing a data pattern within the memory module; transferring the data pattern from the memory module to the computational unit using the integrated data transfer module; and comparing processed data to the data pattern using the computational unit.08-25-2011
20090006087SYNCHRONIZATION OF AN INPUT TEXT OF A SPEECH WITH A RECORDING OF THE SPEECH - A method and system for synchronizing words in an input text of a speech with a continuous recording of the speech. A received input text includes previously recorded content of the speech to be reproduced. A synthetic speech corresponding to the received input text is generated. Ratio data including a ratio between the respective pronunciation times of words included in the received text in the generated synthetic speech is computed. The ratio data is used to determine an association between erroneously recognized words of the received text and a time to reproduce each erroneously recognized word. The association is outputted in a recording medium and/or displayed on a display device.01-01-2009
20080306735SYSTEMS AND METHODS FOR INDICATING PRESENCE OF DATA - Included are systems and methods for indicating presence of data. At least one embodiment of a method includes receiving communications data associated with a communication session and determining at least one point of audio silence in the communications session. Some embodiments include creating tagging data configured to indicate the at least one point of audio silence in the communications session.12-11-2008
20110224978INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD AND PROGRAM - An information processing device includes an audio-based speech recognition processing unit which is input with audio information as observation information of a real space, executes an audio-based speech recognition process, thereby generating word information that is determined to have a high probability of being spoken, an image-based speech recognition processing unit which is input with image information as observation information of the real space, analyzes mouth movements of each user included in the input image, thereby generating mouth movement information, an audio-image-combined speech recognition score calculating unit which is input with the word information and the mouth movement information, executes a score setting process in which a mouth movement close to the word information is set with a high score, thereby executing a score setting process, and an information integration processing unit which is input with the score and executes a speaker specification process.09-15-2011
20090024388METHOD AND APPARATUS FOR SEARCHING A MUSIC DATABASE - A method for a user to buy a song from a remote music source, the method comprising the steps of 01-22-2009
20120078623Method and Apparatus for Communication Between Humans and Devices - This invention relates to methods and apparatus for improving communications between humans and devices. The invention provides a method of modulating operation of a device, comprising: providing an attentive user interface for obtaining information about an attentive state of a user; and modulating operation of a device on the basis of the obtained information, wherein the operation that is modulated is initiated by the device. Preferably, the information about the user's attentive state is eye contact of the user with the device that is sensed by the attentive user interface.03-29-2012
20120078621SPARSE REPRESENTATION FEATURES FOR SPEECH RECOGNITION - Techniques are disclosed for generating and using sparse representation features to improve speech recognition performance. In particular, principles of the invention provide sparse representation exemplar-based recognition techniques. For example, a method comprises the following steps. A test vector and a training data set associated with a speech recognition system are obtained. A subset of the training data set is selected. The test vector is mapped with the selected subset of the training data set as a linear combination that is weighted by a sparseness constraint such that a new test feature set is formed wherein the training data set is moved more closely to the test vector subject to the sparseness constraint. An acoustic model is trained on the new test feature set.03-29-2012
20090210223APPARATUS AND METHOD FOR SOUND RECOGNITION IN PORTABLE DEVICE - Provided are an apparatus and a method capable of recognizing a sound through a reduced burden of computations and a noise-tolerant technique. The sound recognition apparatus in a portable device includes a memory unit that stores at least one base sound and a sound input unit that receives a sound input. The sound recognition apparatus also includes a control unit that receives the sound input from the sound input unit, extracts peak values of the sound input, calculates statistical data by using the peak values, and determines whether the sound input is equal to a base sound by using the statistical data.08-20-2009
20120035922METHOD AND APPARATUS FOR CONTROLLING WORD-SEPARATION DURING AUDIO PLAYOUT - A word-separation control capability is provided herein. An apparatus having a word-separation control capability includes a processor configured for controlling a length of separation between adjacent words of audio during playout of the audio. The processor is configured for analyzing a locator analysis region of buffered audio for identifying boundaries between adjacent words of the buffered audio, and, for each identified boundary between adjacent words, associating a boundary marker with the identified boundary. The locator analysis region of the buffered audio may be analyzed using syntactic and/or non-syntactic speech recognition capabilities. The boundary markers may all have the same thickness, or the thickness of the boundary markers may vary based on the length of separation between the adjacent words of the respective boundaries. The boundary markers are associated with the buffered audio for use in controlling the word-separation during the playout of the audio.02-09-2012
20100179810Method for recognizing and distributing music - A customer for music distributed over the interne may select a composition from a menu of written identifiers (such as the song title and singer or group) and then confirm that the composition is indeed the one desired by listening to a corrupted version of the composition. If the customer has forgotten the song title or the singer or other words that provide the identifier, he or she may hum or otherwise vocalize a few bars of the desired composition, or pick the desired composition out on a simulated keyboard. A music-recognition system then locates candidates for the selected composition and displays identifiers for these candidates to the customer.07-15-2010
20100191529Systems And Methods For Managing Multiple Grammars in a Speech Recognition System - Systems and methods are described for a speech system that manages multiple grammars from one or more speech-enabled applications. The speech system includes a speech server that supports different grammars and different types of grammars by exposing several methods to the speech-enabled applications. The speech server supports static grammars that do not change and dynamic grammars that may change after a commit. The speech server provides persistence by supporting persistent grammars that enable a user to issue a command to an application even when the application is not loaded. In such a circumstance, the application is automatically launched and the command is processed. The speech server may enable or disable a grammar in order to limit confusion between grammars. Global and yielding grammars are also supported by the speech server. Global grammars are always active (e.g., “call 9-1-1”) while yielding grammars may be deactivated when an interaction whose grammar requires priority is active.07-29-2010
20080215320Apparatus And Method To Reduce Recognition Errors Through Context Relations Among Dialogue Turns - Disclosed is directed an apparatus and method to reduce recognition errors through context relations among multiple dialogue turns. The apparatus includes a rule set storage unit having a rule set containing one or more rules, an evolutionary rule generation module connected to the rule storage unit, and a rule trigger unit connected to the rule storage unit. The rule set uses dialogue turn as a unit for the information described by each rule. The method analyzes a dialogue history through an evolutionary massive parallelism approach to get a rule set describing the context relation among dialogue turns. Based on the rule set and recognition result of an ASR system, it reevaluates the recognition result, and measures the confidence measure of the reevaluated recognition result. After each successful dialogue turn, the rule set is dynamically adapted.09-04-2008
20100191528SPEECH SIGNAL PROCESSING APPARATUS - A speech signal processing apparatus comprising: a control signal output unit configured to receive as an input signal either one of a first speech signal corresponding to a sound uttered by a user and a second speech signal corresponding to a sound output from an eardrum of the user when the user utters a sound, and output a control signal corresponding to a noise level of the input signal; and a speech signal output unit configured to output either one of the first speech signal and the second speech signal according to the control signal.07-29-2010
20100145693METHOD OF DECODING NONVERBAL CUES IN CROSS-CULTURAL INTERACTIONS AND LANGUAGE IMPAIRMENT - A method for extracting verbal cues is presented which enhances a speech signal to increase the saliency and recognition of verbal cues including emotive verbal cues. In a further embodiment of the method, the method works in conjunction with a computer that displays a face which gestures and articulates non-verbal cues in accord with speech patterns that are also modified to enhance their verbal cues. The methods work to provide a means for allowing non-fluent speakers to better understand and learn foreign languages.06-10-2010
20100217588APPARATUS AND METHOD FOR RECOGNIZING A CONTEXT OF AN OBJECT - Moving information of an object is input, and first sound information around the object is input. A motion status of the object is recognized based on the moving information. Second sound information is selectively extracted from the first sound information, based on the motion status. A first feature quantity is extracted from the second sound information. A plurality of models is stored in a memory. Each model has a second feature quantity and a corresponding specified context. The second feature quantity is previously extracted by the second extraction unit before the first feature quantity is extracted. A present context of the object is decided based on the specified context corresponding to the second feature quantity most similar to the first feature quantity. The present context of the object is output.08-26-2010
20100235168TERMINAL AND METHOD FOR EFFICIENT USE AND IDENTIFICATION OF PERIPHERALS HAVING AUDIO LINES - A communication system comprises a terminal configured for being able to communicate with a computer and to operate according to at least one operational parameter. A peripheral device for use with the terminal has a characterizing parameter associated therewith. The terminal is operable for reading the characterizing parameter from the peripheral device when the device is coupled to the terminal. The terminal is further operable for configuring itself to operate according to an operational parameter associated with the characterizing parameter of the peripheral device.09-16-2010
20090043576SYSTEM AND METHOD FOR TUNING AND TESTING IN A SPEECH RECOGNITION SYSTEM - Systems and methods for improving the performance of a speech recognition system. In some embodiments a tuner module and/or a tester module are configured to cooperate with a speech recognition system. The tester and tuner modules can be configured to cooperate with each other. In one embodiment, the tuner module may include a module for playing back a selected portion of a digital data audio file, a module for creating and/or editing a transcript of the selected portion, and/or a module for displaying information associated with a decoding of the selected portion, the decoding generated by a speech recognition engine. In other embodiments, the tester module can include an editor for creating and/or modifying a grammar, a module for receiving a selected portion of a digital audio file and its corresponding transcript, and a scoring module for producing scoring statistics of the decoding based at least in part on the transcript.02-12-2009
20100235167SPEECH RECOGNITION LEARNING SYSTEM AND METHOD - One or more embodiments include a speech recognition learning system for improved speech recognition. The learning system may include a speech optimizing system. The optimizing system may receive a first stimulus data package including spoken utterances having at least one phoneme, and contextual information. A number of result data packages may be retrieved which include stored spoken utterances and contextual information. A determination may be made as to whether the first stimulus data package requires improvement. A second stimulus data package may be generated based on the determination. A number of speech recognition implementation rules for implementing the second stimulus data package may be received. The rules may be associated with the contextual information. A determination may be made as to whether the second stimulus data package requires further improvement. Based on the determination, one or more additional speech recognition implementation rules for improved speech recognition may be generated.09-16-2010
20120191449SPEECH RECOGNITION USING DOCK CONTEXT - Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for performing speech recognition using dock context. In one aspect, a method includes accessing audio data that includes encoded speech. Information that indicates a docking context of a client device is accessed, the docking context being associated with the audio data. A plurality of language models is identified. At least one of the plurality of language models is selected based on the docking context. Speech recognition is performed on the audio data using the selected language model to identify a transcription for a portion of the audio data.07-26-2012
20120173232ACOUSTIC PROCESSING APPARATUS AND METHOD - An acoustic processing apparatus is provided. The acoustic processing apparatus including a first extracting unit configured to extract a first acoustic model that corresponds with a first position among positions set in a speech recognition target area, a second extracting unit configured to extract at least one second acoustic model that corresponds with, respectively, at least one second position in proximity to the first position, and an acoustic model generating unit configured to generate a third acoustic model based on the first acoustic model, the second acoustic model, or a combination thereof.07-05-2012
20090204398Measurement of Spoken Language Training, Learning & Testing - The fluency of a spoken utterance or passage is measure and presented to the speaker and to others. In one embodiment, a method is described that includes recording a spoken utterance, evaluating the spoken utterance for accuracy, evaluating the spoken utterance for duration, and assigning a score to the spoken utterance based on the accuracy and the duration.08-13-2009
20130138435CHARACTER-BASED AUTOMATED SHOT SUMMARIZATION - Methods, devices, systems and tools are presented that allow the summarization of text, audio, and audiovisual presentations, such as movies, into less lengthy forms. High-content media files are shortened in a manner that preserves important details, by splitting the files into segments, rating the segments, and reassembling preferred segments into a final abridged piece. Summarization of media can be customized by user selection of criteria, and opens new possibilities for delivering entertainment, news, and information in the form of dense, information-rich content that can be viewed by means of broadcast or cable distribution, “on-demand” distribution, internet and cell phone digital video streaming, or can be downloaded onto an iPod™ and other portable video playback devices.05-30-2013
20090164213Digital Media Recognition Apparatus and Methods - One of the embodiments of the invention includes a method of identifying illegal uses of copyright material. The steps of the method preferably include the steps of: (a) providing a primary digital media object, (b) associating an auxiliary construct with the object, (c) transforming the construct using at least one of the attributes of the object to generate a unique key representative of the primary object, (d) receiving a plurality of secondary digital media objects, (e) performing steps (b) and (c) on the secondary objects to generate unique keys representative of the secondary objects, (f) comparing the keys of the secondary objects with the key of the primary object to identify if any of the secondary objects are substantially similar to the primary object.06-25-2009
20100332224METHOD AND APPARATUS FOR CONVERTING TEXT TO AUDIO AND TACTILE OUTPUT - In accordance with an example embodiment of the present invention, an apparatus comprises a controller configured to process punctuated text data, and to identify punctuation in said punctuated text data; and an output unit configured to generate audio output corresponding to said punctuated text data, and to generate tactile output corresponding to said identified punctuation.12-30-2010
20110010170USE OF MULTIPLE SPEECH RECOGNITION SOFTWARE INSTANCES - A wireless communication device is disclosed that accepts recorded audio data from an end-user. The audio data can be in the form of a command requesting user action. Likewise, the audio data can be converted into a text file. The audio data is reduced to a digital file in a format that is supported by the device hardware, such as a .wav, .mp01-13-2011
20110029306AUDIO SIGNAL DISCRIMINATING DEVICE AND METHOD - An audio discriminating device includes a plurality of audio discriminators for discriminating an input audio signal as a speech signal or a non-speech signal by using at least one feature parameter, and determines whether to drive the audio discriminator connected next to the corresponding audio discriminator according to the audio discriminator's audio signal discriminating result.02-03-2011
20110035215METHOD, DEVICE AND SYSTEM FOR SPEECH RECOGNITION - Disclosed is a method and apparatus for signal processing and signal pattern recognition. According to some embodiments of the present invention, events in the signal to be processed/recognized may be used to pace or clock the operation of one or more processing elements. The detected events may be based on signal energy level measurements. The processing/recognition elements may be neuron models. The signal to be processed/recognized may be a speech signal.02-10-2011
20110087490ADJUSTING RECORDER TIMING - A portion of audio content of a multimedia program, such as a television program, is captured from a network. An audio fingerprint is generated based on the portion of audio content, and the audio fingerprint is matched to one of multiple theme song fingerprints stored in a database. An expected theme song time offset associated with the matched theme song fingerprint is retrieved from the database. It is determined whether the program is running on-schedule, based on the time the portion of audio content occurred, a scheduled start time of the program, and/or the expected theme song time offset. If it is determined that the program is running off-schedule, an adjusted start time and/or an adjusted end time of the program are calculated. The program is recorded by a recorder based on the adjusted start time and/or the adjusted end time.04-14-2011
20090222262Systems And Methods For Blind Source Signal Separation - Signal separation techniques based on frequency dependency are described. In one implementation, a blind signal separation process is provided that avoids the permutation problem of previous signal separation processes. In the process, two or more signal sources are provided, with each signal source having recognized frequency dependencies. The process uses these inter-frequency dependencies to more robustly separate the source signals. The process receives a set of mixed signal input signals, and samples each input signal using a rolling window process. The sampled data is transformed into the frequency domain, which provides channel inputs to the inter-frequency dependent separation process. Since frequency dependencies have been defined for each source, the process is able to use the frequency dependency to more accurately separate the signals. The process can use a learning algorithm that preserves frequency dependencies within each source signal, and can remove dependencies between or among the signal sources.09-03-2009
20100063813SYSTEM AND METHOD FOR MULTIDIMENSIONAL GESTURE ANALYSIS - Hand gestures are translated by first detecting the hand gestures with an electronic sensor and converting the detected gestures into respective electrical transfer signals in a frequency band corresponding to that of speech. These transfer signals are inputted in the audible-sound frequency band into a speech-recognition system where they are analyzed.03-11-2010
20100057450Hybrid Speech Recognition - A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results.03-04-2010
20100057451Distributed Speech Recognition Using One Way Communication - A speech recognition client sends a speech stream and control stream in parallel to a server-side speech recognizer over a network. The network may be an unreliable, low-latency network. The server-side speech recognizer recognizes the speech stream continuously. The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client. The client may remotely reconfigure the state of the server-side recognizer during recognition.03-04-2010
20100070273SPEECH SYNTHESIS AND VOICE RECOGNITION IN METROLOGIC EQUIPMENT - An electronic test equipment apparatus is provided. A metrologic device is adapted for creating stimulus signals and capturing responses from electronic devices under test (DUTs). An auditory device is in communication with the metrologic device. The auditory device is adapted for converting an output of the metrologic device to an audio signal to be heard by a user.03-18-2010
20110082694REAL-TIME DATA PATTERN ANALYSIS SYSTEM AND METHOD OF OPERATION THEREOF - A method for real-time data-pattern analysis. The method includes receiving and queuing at least one data-pattern analysis request by a data-pattern analysis unit controller. At least one data stream portion is also received and stored by the data-pattern analysis unit controller, each data stream portion corresponding to a received data-pattern analysis request. Next, a received data-pattern analysis request is selected by the data-pattern analysis unit controller along with a corresponding data stream portion. A data-pattern analysis is performed based on the selected data-pattern analysis request and the corresponding data stream portion, wherein the data-pattern analysis is performed by one of a plurality of data-pattern analysis units.04-07-2011
20110071823SPEECH RECOGNITION SYSTEM, SPEECH RECOGNITION METHOD, AND STORAGE MEDIUM STORING PROGRAM FOR SPEECH RECOGNITION - A purpose is to suppress recognition process delay generated due to load in signal processing. Included is a speech input means 03-24-2011
20120303365Audio Signal De-Identification - Techniques are disclosed for automatically de-identifying spoken audio signals. In particular, techniques are disclosed for automatically removing personally identifying information from spoken audio signals and replacing such information with non-personally identifying information. De-identification of a spoken audio signal may be performed by automatically generating a report based on the spoken audio signal. The report may include concept content (e.g., text) corresponding to one or more concepts represented by the spoken audio signal. The report may also include timestamps indicating temporal positions of speech in the spoken audio signal that corresponds to the concept content. Concept content that represents personally identifying information is identified. Audio corresponding to the personally identifying concept content is removed from the spoken audio signal. The removed audio may be replaced with non-personally identifying audio.11-29-2012
20110060586VOICE APPLICATION NETWORK PLATFORM - A distributed voice applications system includes a voice applications rendering agent and at least one voice applications agent that is configured to provide voice applications to an individual user. A management system may control and direct the voice applications rendering agent to create voice applications that are personalized for individual users based on user characteristics, information about the environment in which the voice applications will be performed, prior user interactions and other information. The voice applications agent and components of customized voice applications may be resident on a local user device which includes a voice browser and speech recognition capabilities. The local device, voice applications rendering agent and management system may be interconnected via a communications network.03-10-2011
20110029307SYSTEM AND METHOD FOR MOBILE AUTOMATIC SPEECH RECOGNITION - A system and method of updating automatic speech recognition parameters on a mobile device are disclosed. The method comprises storing user account-specific adaptation data associated with ASR on a computing device associated with a wireless network, generating new ASR adaptation parameters based on transmitted information from the mobile device when a communication channel between the computing device and the mobile device becomes available and transmitting the new ASR adaptation data to the mobile device when a communication channel between the computing device and the mobile device becomes available. The new ASR adaptation data on the mobile device more accurately recognizes user utterances.02-03-2011
20120150536MODEL RESTRUCTURING FOR CLIENT AND SERVER BASED AUTOMATIC SPEECH RECOGNITION - Access is obtained to a large reference acoustic model for automatic speech recognition. The large reference acoustic model has L states modeled by L mixture models, and the large reference acoustic model has N components. A desired number of components N06-14-2012
20080215319Query by humming for ringtone search and download - Described is a technology by which a user hums, sings or otherwise plays a user-provided rendition of a ringtone (or ringback tone) through a mobile telephone to a ringtone search service (e.g., a WAP, interactive voice response or SMS-based search platform). The service matches features of the user's rendition against features of actual ringtones to determine one or more matching candidate ringtones for downloading. Features may include pitch contours (up or down), pitch intervals and durations of notes. Matching candidates may be ranked based on the determined similarity, possibly in conjunction with weighting criterion such as the popularity of the ringtone and/or the importance of the matched part. The candidate set may be augmented with other ringtones independent of the matching, such as the most popular ones downloaded by other users, ringtones from similar artists, and so forth.09-04-2008
20080208576Digital Video Reproducing Apparatus - Character information recognition means (08-28-2008
20110257969MAIL RECEIPT APPARATUS AND METHOD BASED ON VOICE RECOGNITION - A mail receipt method based on voice recognition, includes receiving input voice data required for an mail receipt; and recognizing information about the mail receipt from the received input voice data. Further, the mail receipt method based on the voice recognition includes storing the recognized information about the mail receipt to complete the mail receipt.10-20-2011
20110131040MULTI-MODE SPEECH RECOGNITION - A method and an in-vehicle system having a speech recognition component are provided for improving speech recognition performance. The speech recognition component may have multiple vocabulary dictionaries, each of which may include phonetics associated with commands. When the in-vehicle system receives speech input, the speech recognition component may determine whether the received speech input includes a speech access command. If the received speech input is determined to include a speech access command, then a dictionary changing component may transition a currently-used dictionary of the speech recognition component to a vocabulary dictionary associated with the determined speech access command. Otherwise, the dictionary changing component may transition the currently-used dictionary to a first vocabulary dictionary. A command included in the received speech input may then be recognized by the speech recognition component using the transitioned currently-used dictionary.06-02-2011
20080255836METHOD AND SYSTEM FOR A RECOGNITION SYSTEM HAVING A VERIFICATION RECOGNITION SYSTEM - A method and system for performing computer implemented recognition is disclosed. In one method embodiment, the present invention first accesses user input stored in a memory of a mobile device. On the mobile device, the present invention performs a coarse recognition process on the user input to generate a coarse result. The coarse process may operate in real-time. The embodiment then displays a portion of the coarse result on a display screen of the mobile device. The embodiment further performs a detailed recognition process on the user input to generate a detailed result. The detailed process has more recognition patterns and computing resources available to it. The present embodiment performs a comparison of the detailed result and the coarse result. The present embodiment displays a portion of the comparison on the display screen.10-16-2008
20080255835USER DIRECTED ADAPTATION OF SPOKEN LANGUAGE GRAMMER - A method and system for interacting with a speech recognition system. A lattice of candidate words is displayed. The lattice of candidate words may include the output of a speech recognizer. Candidate words representing temporally serial utterances may be directly joined in the lattice. A path through the lattice represents a selection of one or more candidate words interpreting one or more corresponding utterances. An interface allows a user to select a path in the lattice. A selection of the path in the lattice may be received and the selection may be stored. The selection may be provided as positive feedback to the speech recognizer.10-16-2008
20110054890APPARATUS AND METHOD FOR AUDIO MAPPING - A mobile phone, and corresponding method, which is arranged to detect sounds of different types and to indicate to a user the direction from which those sounds are coming from. The mobile phone includes a microphone for recording sound and a display for providing feedback to the user. The phone also includes a sound mapping program which is arranged to interpret the sound recorded by the microphone and to provide an audio map of detected sounds. This is presented to the user on the display.03-03-2011
20110022384WIND TURBINE CONTROL SYSTEM AND METHOD FOR INPUTTING COMMANDS TO A WIND TURBINE CONTROLLER - A method and a control system are provided for inputting commands to a wind turbine controller during a service or maintenance procedure. A command orally input by a user is transformed into an electrical signal representing the orally input command. The electrical signal is transformed into an input command signal which is further transformed into a reproduction signal. A user is provided the reproduction signal along with a confirmation request in a form recognized by a user, such as visually or speech representation. After the user confirms the request, a signal based on the input command is sent to the wind tower controller.01-27-2011
20110125496SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND PROGRAM - A speech recognition device includes a sound source separation unit configured to separate a mixed signal of outputs of a plurality of sound sources into signals corresponding to individual sound sources and generate separation signals of a plurality of channels; a speech recognition unit configured to input the separation signals of the plurality of channels, the separation signals being generated by the sound source separation unit, perform a speech recognition process, generate a speech recognition result corresponding to each channel, and generate additional information serving as evaluation information on the speech recognition result corresponding to each channel; and a channel selection unit configured to input the speech recognition result and the additional information, calculate a score of the speech recognition result corresponding to each channel by applying the additional information, and select and output a speech recognition result having a high score.05-26-2011
20110137649 METHOD FOR DYNAMIC SUPPRESSION OF SURROUNDING ACOUSTIC NOISE WHEN LISTENING TO ELECTRICAL INPUTS - A listening instrument includes a) a microphone unit for picking up an input sound from the current acoustic environment of the user and converting it to an electric microphone signal; b) a microphone gain unit for applying a specific microphone gain to the microphone signal and providing a modified microphone signal; c) a direct electric input signal representing an audio signal; d) a direct gain unit for applying a specific direct gain to the direct electric input signal and providing a modified direct electric input signal; e) a detector unit for classifying the current acoustic environment and providing one or more classification parameters; f) a control unit for controlling the specific microphone gain applied to the electric microphone signal and/or the specific direct gain applied to the direct electric input signal based on the one or more classification parameters.06-09-2011
20100312555LOCAL AND REMOTE AGGREGATION OF FEEDBACK DATA FOR SPEECH RECOGNITION - A local feedback mechanism for customizing training models based on user data and directed user feedback is provided in speech recognition applications. The feedback data is filtered at different levels to address privacy concerns for local storage and for submittal to a system developer for enhancement of generic training models.12-09-2010
20110257970VOICED PROGRAMMING SYSTEM AND METHOD - A voiced programming system and methods are provided herein.10-20-2011
20110137648SYSTEM AND METHOD FOR IMPROVED AUTOMATIC SPEECH RECOGNITION PERFORMANCE - Disclosed herein are systems, methods, and computer-readable storage media for improving automatic speech recognition performance. A system practicing the method identifies idle speech recognition resources and establishes a supplemental speech recognizer on the idle resources based on overall speech recognition demand. The supplemental speech recognizer can differ from a main speech recognizer, and, along with the main speech recognizer, can be associated with a particular speaker. The system performs speech recognition on speech received from the particular speaker in parallel with the main speech recognizer and the supplemental speech recognizer and combines results from the main and supplemental speech recognizer. The system recognizes the received speech based on the combined results. The system can use beam adjustment in place of or in combination with a supplemental speech recognizer. A scheduling algorithm can tailor a particular combination of speech recognition resources and release the supplemental speech recognizer based on increased demand.06-09-2011
20110307251Sound Source Separation Using Spatial Filtering and Regularization Phases - Described is a multiple phase process/system that combines spatial filtering with regularization to separate sound from different sources such as the speech of two different speakers. In a first phase, frequency domain signals corresponding to the sensed sounds are processed into separated spatially filtered signals including by inputting the signals into a plurality of beamformers (which may include nullformers) followed by nonlinear spatial filters. In a regularization phase, the separated spatially filtered signals are input into an independent component analysis mechanism that is configured with multi-tap filters, followed by secondary nonlinear spatial filters. Separated audio signals are the provided via an inverse-transform.12-15-2011
20110307250Modular Speech Recognition Architecture - A speech recognition system is provided. The speech recognition system includes a speech recognition module; a plurality of domain specific dialog manager modules that communicate with the speech recognition module to perform speech recognition; and a speech interface module that that communicates with the plurality of domain specific dialog manager modules to selectively enable the speech recognition.12-15-2011
20090287483METHOD AND SYSTEM FOR IMPROVED SPEECH RECOGNITION - A method for speech recognition includes: prompting a user with a first query to input speech into a speech recognition engine; determining if the inputted speech is correctly recognized; wherein in the event the inputted speech is correctly recognized proceeding to a new task; wherein in the event the inputted speech is not correctly recognized, prompting the user repeatedly with the first query to input speech into the speech recognition engine, and determining if the inputted speech is correctly recognized until a predefined limit on repetitions has been met; wherein in the event the predefined limit has been met without correctly recognizing the inputted user speech, prompting speech input from the user with a secondary query for redundant information; and cross-referencing the user's n-best result from the first query with the n-best result from the second query to obtain a top hypothesis.11-19-2009
20110166855Systems and Methods for Hands-free Voice Control and Voice Search - In one embodiment the present invention includes a method comprising receiving an acoustic input signal and processing the acoustic input signal with a plurality of acoustic recognition processes configured to recognize the same target sound. Different acoustic recognition processes start processing different segments of the acoustic input signal at different time points in the acoustic input signal. In one embodiment, initial states in the recognition processes may be configured on each time step.07-07-2011
20120065968SPEECH RECOGNITION METHOD - In a speech recognition method, a number of audio signals are obtained from a voice input of a number of utterances of at least one speaker into a pickup system. The audio signals are examined using a speech recognition algorithm and a recognition result is obtained for each audio signal. For a reliable recognition of keywords in a conversation, it is proposed that a recognition result for at least one other audio signal is included in the examination of one of the audio signals by the speech recognition algorithm.03-15-2012
20120016670METHODS AND APPARATUSES FOR IDENTIFYING AUDIBLE SAMPLES FOR USE IN A SPEECH RECOGNITION CAPABILITY OF A MOBILE DEVICE - Techniques for provided which may be implemented using various methods and/or apparatuses in a mobile device to allow for speech recognition based, at least in part, on context information associated with at least a portion of at least one navigational region, e.g., associated with a location of the mobile device. A speech recognition capability may, for example, be provided with a set of audible samples based, at least in part, on the context information. Such speech recognition capability may be provided by the mobile device and/or by one or more other devices coupled to the mobile device.01-19-2012
20090281804PROCESSING UNIT, SPEECH RECOGNITION APPARATUS, SPEECH RECOGNITION SYSTEM, SPEECH RECOGNITION METHOD, STORAGE MEDIUM STORING SPEECH RECOGNITION PROGRAM - A processing unit is provided which executes speech recognition on speech signals captured by a microphone for capturing sounds uttered in an environment. The processing unit has: an initial reflection component extraction portion that extracts initial reflection components by removing diffuse reverberation components from a reverberation pattern of an impulse response generated in the environment; and an acoustic model learning portion that learns an acoustic model for the speech recognition by reflecting the initial reflection components to speech data for learning.11-12-2009
20120022862SPEECH RECOGNITION CIRCUIT AND METHOD - A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, the memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying the scores by adding the score updates to the scores; and a selector circuit for selecting at least one node or group of adjacent nodes of the lexical tree according to the scores.01-26-2012
20120072211USING CODEC PARAMETERS FOR ENDPOINT DETECTION IN SPEECH RECOGNITION - Systems, methods and apparatus for determining an estimated endpoint of human speech in a sound wave received by a mobile device having a speech encoder for encoding the sound wave to produce an encoded representation of the sound wave. The estimated endpoint may be determined by analyzing information available from the speech encoder, without analyzing the sound wave directly and without producing a decoded representation of the sound wave. The encoded representation of the sound wave may be transmitted to a remote server for speech recognition processing, along with an indication of the estimated endpoint.03-22-2012
20120316870COMMUNICATION DEVICE WITH SPEECH RECOGNITION AND METHOD THEREOF - A communication unit, a voice input unit, a storage unit, and a processor are included in a communication device. The communication unit enables communication between the device and other communication devices. The voice input unit receives voice signals, which may correspond to one stored speech command and an related operation. The processor detects a match, and executes the desired operation. A related communication method is also provided.12-13-2012
20120130712MOBILE TERMINAL AND MENU CONTROL METHOD THEREOF - A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal.05-24-2012
20120130710ONLINE DISTORTED SPEECH ESTIMATION WITHIN AN UNSCENTED TRANSFORMATION FRAMEWORK - Noise and channel distortion parameters in the vectorized logarithmic or the cepstral domain for an utterance may be estimated, and subsequently the distorted speech parameters in the same domain may be updated using an unscented transformation framework during online automatic speech recognition. An utterance, including speech generated from a transmission source for delivery to a receiver, may be received by a computing device. The computing device may execute instructions for applying the unscented transformation framework to speech feature vectors, representative of the speech, in order to estimate, in a sequential or online manner, static noise and channel distortion parameters and dynamic noise distortion parameters in the unscented transformation framework. The static and dynamic parameters for the distorted speech in the utterance may then be updated from clean speech parameters and the noise and channel distortion parameters using non-linear mapping.05-24-2012
20120215531Increased User Interface Responsiveness for System with Multi-Modal Input and High Response Latencies - A multi-modal user interface with increased responsiveness is described. A graphical user interface (GUI) supports multiple different user input modalities including low delay inputs which respond to user inputs without significant delay, and high latency inputs which have a significant response latency after receiving a user input before providing a corresponding completed response. The GUI accepts user inputs in a sequence of mixed input modalities independently of response latencies without waiting for responses to high latency inputs. The GUI also provides interim indication during response latencies of pending responses at a position in the GUI where the completed response will be presented.08-23-2012
20120136658SYSTEMS AND METHODS FOR CUSTOMIZING BROADBAND CONTENT BASED UPON PASSIVE PRESENCE DETECTION OF USERS - Systems and methods for customizing broadband content based upon passive presence detection of users are provided. A sample of ambient audio may be collected by a customer premise device configured to output programming content received from a service provider. One or more audio components associated with the output of the customer premise device may be removed. Following the removal, a remainder of the collected sample may be compared to one or more stored user voice samples. Based at least in part on the comparison, one of an identity of a user or one or more user characteristics may be determined. Based at least in part on the determination, the content output by the customer premise device may be customized.05-31-2012
20120136659APPARATUS AND METHOD FOR PREPROCESSING SPEECH SIGNALS - Disclosed herein are an apparatus and method for preprocessing speech signals to perform speech recognition. The apparatus includes a voiced sound interval detection unit, a preprocessing method determination unit, and a clipping signal processing unit. The voiced sound interval detection unit detects a voiced sound interval including a voiced sound signal in a voice interval. The preprocessing method determination unit detects a clipping signal present in the voiced sound interval. The clipping signal processing unit extracts signal samples adjacent to the clipping signal, and performs interpolation on the clipping signal using the adjacent signal samples.05-31-2012
20120173233COMMUNICATION METHOD AND APPARATUS FOR PHONE HAVING VOICE RECOGNITION FUNCTION - A method and apparatus for communicating through a phone having a voice recognition function are provided. The method of performing communication using a phone having a voice recognition function includes converting to an incoming call notification and voice recognition mode when a phone call is received; converting to a communication connection and speakerphone mode when voice information related to a communication connection instruction is recognized; performing communication using a speakerphone; and ending communication when voice information related to a communication end instruction is recognized during communication using the speakerphone. Therefore, when a phone call is received, a mode of a phone is converted to a speakerphone mode with a voice instruction using a voice recognition function, and thus communication can be performed without using a hand.07-05-2012
20120316871Speech Recognition Using Loosely Coupled Components - An automatic speech recognition system includes an audio capture component, a speech recognition processing component, and a result processing component which are distributed among two or more logical devices and/or two or more physical devices. In particular, the audio capture component may be located on a different logical device and/or physical device from the result processing component. For example, the audio capture component may be on a computer connected to a microphone into which a user speaks, while the result processing component may be on a terminal server which receives speech recognition results from a speech recognition processing server.12-13-2012
20100049513AUTOMATIC CONVERSATION SYSTEM AND CONVERSATION SCENARIO EDITING DEVICE - A conversation scenario editor generates/edits a conversation scenario for an automatic conversation system. The system includes a conversation device and a conversation server. The conversation device generates an input sentence through speech recognition of an utterance by a user. The conversation server determines the reply sentence based on the conversation scenario when a reply sentence to the input sentence is requested from the conversation device. The editor includes a language model generator for generating a language model to be used for the speech recognition based on the conversation scenario. According to the editor, a non-expert can generate the language model to provide an adequate conversation based on the speech recognition.02-25-2010
20120179464CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS - Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.07-12-2012
20120179463CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS - Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.07-12-2012
20120232893MULTI-LAYERED SPEECH RECOGNITION APPARATUS AND METHOD - A multi-layered speech recognition apparatus and method, the apparatus includes a client checking whether the client recognizes the speech using a characteristic of speech to be recognized and recognizing the speech or transmitting the characteristic of the speech according to a checked result; and first through N-th servers, wherein the first server checks whether the first server recognizes the speech using the characteristic of the speech transmitted from the client, and recognizes the speech or transmits the characteristic according to a checked result, and wherein an n-th (2≦n≦N) server checks whether the n-th server recognizes the speech using the characteristic of the speech transmitted from an (n−1)-th server, and recognizes the speech or transmits the characteristic according to a checked result.09-13-2012
20120232894DEVICE FOR RECONSTRUCTING SPEECH BY ULTRASONICALLY PROBING THE VOCAL APPARATUS - The invention provides a portable device for recognizing and/or reconstructing speech by ultrasound probing of the vocal apparatus, the device including at least one ultrasound transducer (09-13-2012
20120232892SYSTEM AND METHOD FOR ISOLATING AND PROCESSING COMMON DIALOG CUES - A method, system and machine-readable medium are provided. Speech input is received at a speech recognition component and recognized output is produced. A common dialog cue from the received speech input or input from a second source is recognized. An action is performed corresponding to the recognized common dialog cue. The performed action includes sending a communication from the speech recognition component to the speech generation component while bypassing a dialog component.09-13-2012
20120232891SPEECH COMMUNICATION SYSTEM AND METHOD, AND ROBOT APPARATUS - This invention realizes a speech communication system and method, and a robot apparatus capable of significantly improving entertainment property. A speech communication system with a function to make conversation with a conversation partner is provided with a speech recognition means for recognizing speech of the conversation partner, a conversation control means for controlling conversation with the conversation partner based on the recognition result of the speech recognition means, an image recognition means for recognizing the face of the conversation partner, and a tracking control means for tracing the existence of the conversation partner based on one or both of the recognition result of the image recognition means and the recognition result of the speech recognition means. The conversation control means controls conversation so as to continue depending on tracking of the tracking control means.09-13-2012
20120253799SYSTEM AND METHOD FOR RAPID CUSTOMIZATION OF SPEECH RECOGNITION MODELS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data.10-04-2012
20120253800System and Method for Modifying and Updating a Speech Recognition Program - The system provides a speech recognition program, an update website for updating a speech recognition program, and a way of storing data. A user may utilize an update website, to add, modify, and delete items that may comprise speech commands, dll's, multimedia files, executable code, and other information. Speech recognition program may communicate with update website to request information about possible updates. Update website may send a response consisting of information to speech recognition program. Speech recognition program may utilize received information to decide what items to download. A speech recognition program may send one or more requests to update website to download items. Update website may respond by transmitting, requested items to a speech recognition program that overwrite existing items with newly received items.10-04-2012
20120259627Efficient Exploitation of Model Complementariness by Low Confidence Re-Scoring in Automatic Speech Recognition - A method for speech recognition is described that uses an initial recognizer to perform an initial speech recognition pass on an input speech utterance to determine an initial recognition result corresponding to the input speech utterance, and a reliability measure reflecting a per word reliability of the initial recognition result. For portions of the initial recognition result where the reliability of the result is low, a re-evaluation recognizer is used to perform a re-evaluation recognition pass on the corresponding portions of the input speech utterance to determine a re-evaluation recognition result corresponding to the re-evaluated portions of the input speech utterance. The initial recognizer and the re-evaluation recognizer are complementary so as to make different recognition errors. A final recognition result is determined based on the re-evaluation recognition result if any, and otherwise based on the initial recognition result.10-11-2012
20090018827Media usage monitoring and measurement system and method - Media monitoring and measurement systems and methods are disclosed. Some embodiments of the present invention provide a media measurement system and method that utilizes audience data to enhance content identifications. Some embodiments analyze media player log data to enhance content identification. Other embodiments of the present invention analyze sample sequence data to enhance content identifications. Other embodiments analyze sequence data to enhance content identification and/or to establish channel identification. Yet other embodiments provide a system and method in which sample construction and selection parameters are adjusted based upon identification results. Yet other embodiments provide a method in which play-altering activity of an audience member is deduced from content offset values of identifications corresponding to captured samples. Yet other embodiments provide a monitoring and measurement system in which a media monitoring device is adapted to receive a wireless or non-wireless audio signal from a media player, the audio signal also being received wirelessly by headphones of a user of the monitoring device.01-15-2009
20080300870Method and Module for Improving Personal Speech Recognition Capability - A method and a module for improving personal speech recognition capability for use in a portable electronic device are provided. The portable electronic device has a predetermined recognition model constructed of a phoneme model for recognizing at least a command speech from a user. The method comprises the steps of: establishing a database having specific characters which are related to the command speech; construing an adaptation parameter by retrieving a plurality of speech datum spoken by the user according to the database; and modulating the recognition model by integrating the phoneme model and the adaptation parameter. The user can effectively adapt the recognition model to improve the recognition capability according to the above steps.12-04-2008
20120330654IDENTIFYING AND GENERATING AUDIO COHORTS BASED ON AUDIO DATA INPUT - A computer implemented method, system, and/or computer program product generates an audio cohort. Audio data from a set of audio sensors is received by an audio analysis engine. The audio data, which is associated with a plurality of objects, comprises a set of audio patterns. The audio data is processed to identify audio attributes associated with the plurality of objects to form digital audio data. This digital audio data comprises metadata that describes the audio attributes of the set of objects. A set of audio cohorts is generated using the audio attributes associated with the digital audio data and cohort criteria, where each audio cohort in the set of audio cohorts is a cohort of accompanied customers in a store, and where processing the audio data identifies a type of zoological creature that is accompanying each of the accompanied customers.12-27-2012
20110046949DEVICE, METHOD AND SYSTEM FOR DETECTING UNWANTED CONVERSATIONAL MEDIA SESSION - Some embodiments of the invention relate to a method and a system for detecting unwanted conversational media session data. In accordance with one aspect of the invention, a method of detecting unwanted conversation media session data according to some embodiments of the invention may include calculating two or more progressive similarity scores each with respect to a different instant during a progress of a real-time conversational media session, wherein each of said scores is associated with a similarity between the conversational media session's media data that was available at the associated instant and a reference data item corresponding to media data of a previous conversational media session, and evaluating progressive similarity between the real-time conversational media session and the reference data item based upon the two or more progressive similarity scores.02-24-2011
20110238415Hybrid Speech Recognition - A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results.09-29-2011
20120089393ACOUSTIC SIGNAL PROCESSING DEVICE AND METHOD - A highlight section including an exciting scene is appropriately extracted with smaller amount of processing. A reflection coefficient calculating unit (04-12-2012
20120101818DEVICE AND METHOD FOR CREATING DATA RECORDS IN A DATA-STORE BASED ON MESSAGES - Updating a data-store associated with an electronic communications device includes wirelessly communicating an electronic message. A location identifier representative of a physical location is identified within the electronic message. The physical location of the electronic communications device is measured or estimated as needed, after which validating the location identifier occurs when the measured or estimated physical location is calculated to be within a threshold distance of the physical location represented by the location identifier. Initiating creation of a new data record in the data-store is then performed, with the new data record storing at least the validated location identifier and a time identifier.04-26-2012
20120101817SYSTEM AND METHOD FOR GENERATING MODELS FOR USE IN AUTOMATIC SPEECH RECOGNITION - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a model for use with automatic speech recognition. These principles can be implemented as part of a streamlined tool for automatic training and tuning of speech, or other, models with a fast turnaround and with limited human involvement. A system configured to practice the method receives, as part of a request to generate a model, input data and a seed model. The system receives a cost function indicating accuracy and at least one of speed and memory usage, The system processes the input data based on seed model and based on parameters that optimize the cost function to yield an updated model, and outputs the updated model.04-26-2012
20100198591PORTABLE TERMINAL AND MANAGEMENT SYSTEM - A portable terminal having an audio pickup means that acquires sound, an absolute position detection unit that detects the absolute position of the portable terminal, a relative position detection unit that detects the relative position of the portable terminal, and a speech recognition and synthesis unit that recognizes the audio acquired by the audio pickup means as speech, is achieved with a simple configuration. A portable terminal (08-05-2010
20100169088AUTOMATED DEMOGRAPHIC ANALYSIS - A method of generating demographic information relating to an individual is provided. The method includes monitoring an environment for a voice activity of an individual and detecting the voice activity of the individual. The method further includes analyzing the detected voice activity of the individual and determining, based on the detected voice activity of the individual, a demographic descriptor of the individual.07-01-2010
20110161077METHOD AND SYSTEM FOR PROCESSING MULTIPLE SPEECH RECOGNITION RESULTS FROM A SINGLE UTTERANCE - A method of and system for accurately determining a caller response by processing speech-recognition results and returning that result to a directed-dialog application for further interaction with the caller. Multiple speech-recognition engines are provided that process the caller response in parallel. Returned speech-recognition results comprising confidence-score values and word-score values from each of the speech-recognition engines may be modified based on context information provided by the directed-dialog application and grammars associated with each speech-recognition engine. An optional context database may be used to further reduce or add weight to confidence-score values and word-score values, remove phrases and/or words, and add phrases and/or words to the speech-recognition engine results. In situations where a predefined threshold-confidence-score value is not exceeded, a new dynamic grammar may be created. A set of n-best hypotheses of what the caller uttered is returned to the directed-dialog application.06-30-2011
20110161076Intuitive Computing Methods and Systems - A smart phone senses audio, imagery, and/or other stimulus from a user's environment, and acts autonomously to fulfill inferred or anticipated user desires. In one aspect, the detailed technology concerns phone-based cognition of a scene viewed by the phone's camera. The image processing tasks applied to the scene can be selected from among various alternatives by reference to resource costs, resource constraints, other stimulus information (e.g., audio), task substitutability, etc. The phone can apply more or less resources to an image processing task depending on how successfully the task is proceeding, or based on the user's apparent interest in the task. In some arrangements, data may be referred to the cloud for analysis, or for gleaning. Cognition, and identification of appropriate device response(s), can be aided by collateral information, such as context. A great number of other features and arrangements are also detailed.06-30-2011
20130173264METHODS, APPARATUSES AND COMPUTER PROGRAM PRODUCTS FOR IMPLEMENTING AUTOMATIC SPEECH RECOGNITION AND SENTIMENT DETECTION ON A DEVICE - An apparatus for utilizing textual data and acoustic data corresponding to speech data to detect sentiment may include a processor and memory storing executable computer code causing the apparatus to at least perform operations including evaluating textual data and acoustic data corresponding to voice data associated with captured speech content. The computer program code may further cause the apparatus to analyze the textual data and the acoustic data to detect whether the textual data or the acoustic data includes one or more words indicating at least one sentiment of a user that spoke the speech content. The computer program code may further cause the apparatus to assign at least one predefined sentiment to at least one of the words in response to detecting that the word(s) indicates the sentiment of the user. Corresponding methods and computer program products are also provided.07-04-2013
20130179162TOUCH FREE OPERATION OF DEVICES BY USE OF DEPTH SENSORS - An inventive system and method for touch free operation of a device is presented. The system can comprise a depth sensor for detecting a movement, motion software to receive the detected movement from the depth sensor, deduce a gesture based on the detected movement, and filter the gesture to accept an applicable gesture, and client software to receive the applicable gesture at a client computer for performing a task in accordance with client logic based on the applicable gesture. The client can be a mapping device and the task can be one of various mapping operations. The system can also comprise hardware for making the detected movement an applicable gesture. The system can also comprise voice recognition providing voice input for enabling the client to perform the task based on the voice input in conjunction with the applicable gesture. The applicable gesture can be a movement authorized using facial recognition.07-11-2013
20130090923Framework For User-Created Device Applications - A method to provide an interface for launching applications is described. The method includes receiving information indicative of a record stored in an electronic device application. The method also includes determining whether the record is associated with a software application command. In response to determining that the record is associated with a software application command, the software application command is activated. Apparatus and computer readable media are also described.04-11-2013
20110313762SPEECH OUTPUT WITH CONFIDENCE INDICATION - A method, system, and computer program product are provided for speech output with confidence indication. The method includes receiving a confidence score for segments of speech or text to be synthesized to speech. The method includes modifying a speech segment by altering one or more parameters of the speech proportionally to the confidence score.12-22-2011
20080201140AUTOMATIC IDENTIFICATION OF SOUND RECORDINGS - Copies of original sound recordings are identified by extracting features from the copy, creating a vector of those features, and comparing that vector against a database of vectors. Identification can be performed for copies of sound recordings that have been subjected to compression and other manipulation such that they are not exact replicas of the original. Computational efficiency permits many hundreds of queries to be serviced at the same time. The vectors may be less than 100 bytes, so that many millions of vectors can be stored on a portable device.08-21-2008
20080201139Generic framework for large-margin MCE training in speech recognition - A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.08-21-2008
20130151249INFORMATION PRESENTATION DEVICE, INFORMATION PRESENTATION METHOD, INFORMATION PRESENTATION PROGRAM, AND INFORMATION TRANSMISSION SYSTEM - An information presentation device includes an audio signal input unit configured to input an audio signal, an image signal input unit configured to input an image signal, an image display unit configured to display an image indicated by the image signal, a sound source localization unit configured to estimate direction information for each sound source based on the audio signal, a sound source separation unit configured to separate the audio signal to sound-source-classified audio signals for each sound source, an operation input unit configured to receive an operation input and generates coordinate designation information indicating a part of a region of the image, and a sound source selection unit configured to select a sound-source-classified audio signal of a sound source associated with a coordinate which is included in a region indicated by the coordinate designation information, and which corresponds to the direction information.06-13-2013
20100318353COMPRESSOR AUGMENTED ARRAY PROCESSING - The present invention relates generally to the use of compressors, with an optional noise extractor, to improve audio sensing performance of one or more microphones. The audio sensing performance of a single element microphone array with dynamic range compression can be improved by the use of a noise extractor, to modify the operation of the compressor, typically to avoid noise floor amplification. Dynamic range compression can be applied to the output of two or more element microphone array processing with the optional use of a noise extractor. Dynamic range compression can precede the microphone array processing with the optional use of a noise extractor. Syllabic dynamic range compression may be used in one or more element microphone arrays, with the optional use of a noise extractor, which increases speech recognition accuracy.12-16-2010
20120284022NOISE REDUCTION SYSTEM USING A SENSOR BASED SPEECH DETECTOR - Speech detection is a technique to determine and classify periods of speech. In a normal conversation, each speaker speaks less than half the time. The remaining time is devoted to listening to the other end and pauses between speech and silence. Embodiments of the current invention provide systems and methods that may be implemented in a communication device. A system may include one or more sensors for detecting information corresponding to a user. The user is in a state of verbal communication. The system further includes one or more sensors for determining periods of speech and non-speech, in the verbal communication, based on the detected information and the audio signal captured by the microphones. The determined periods of speech and non-speech may be used in the coding, compression, noise reduction and other aspects of signal processing.11-08-2012
20130191123Automatic Door - In some implementations a storage device having a voice-recognition engine stored thereon is coupled to a microcontroller, a device-controller for an automatic door is operably coupled to the microcontroller.07-25-2013
20130191122Voice Electronic Listening Assistant - The invention comprises music and information delivery systems and methods. One system comprises a voice activated sound system wherein a user speaks and the sound system recognizes the speech and searches an internet database like Rhapsody™ to obtain a list of matching audio files and display the list on a dashboard screen of a vehicle. The user is able to identify the audio file by voice activation and the system is configured to receive the audio file.07-25-2013
20120029916METHOD FOR PROCESSING MULTICHANNEL ACOUSTIC SIGNAL, SYSTEM THEREFOR, AND PROGRAM - A method for processing multichannel acoustic signals which is characterized by calculating the feature quantity of each channel from the input signals of a plurality of channels, calculating similarity between the channels in the feature quantity of each channel, selecting channels having high similarity, and separating signals using the input signals of the selected channels.02-02-2012
20130197906TECHNIQUES TO NORMALIZE NAMES EFFICIENTLY FOR NAME-BASED SPEECH RECOGNITNION GRAMMARS - Techniques to normalize names for name-based speech recognition grammars are described. Some embodiments are particularly directed to techniques to normalize names for name-based speech recognition grammars more efficiently by caching, and on a per-culture basis. A technique may comprise receiving a name for normalization, during name processing for a name-based speech grammar generating process. A normalization cache may be examined to determine if the name is already in the cache in a normalized form. When the name is not already in the cache, the name may be normalized and added to the cache. When the name is in the cache, the normalization result may be retrieved and passed to the next processing step. Other embodiments are described and claimed.08-01-2013
20130197907SERVICES IDENTIFICATION AND INITIATION FOR A SPEECH-BASED INTERFACE TO A MOBILE DEVICE - A method of providing hands-free services using a mobile device having wireless access to computer-based services includes establishing a short range wireless connection between a mobile device and one or more audio devices that include at least a microphone and speaker; receiving at the mobile device speech inputted via the microphone from a user and sent via the short range wireless connection; wirelessly transmitting the speech input from the mobile device to a speech recognition server that provides automated speech recognition (ASR); receiving at the mobile device a speech recognition result representing the content of the speech input; determining a desired service by processing the speech recognition result using a first, service-identifying grammar; determining a user service request by processing at least some of the speech recognition result using a second, service-specific grammar associated with the desired service; initiating the user service request and receiving a service response; generating an audio message from the service response; and presenting the audio message to the user via the speaker.08-01-2013
20120296645Distributed Speech Recognition Using One Way Communication - A speech recognition client sends a speech stream and control stream in parallel to a server-side speech recognizer over a network. The network may be an unreliable, low-latency network. The server-side speech recognizer recognizes the speech stream continuously. The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client. The client may remotely reconfigure the state of the server-side recognizer during recognition.11-22-2012
20120072213SPEECH SOUND INTELLIGIBILITY ASSESSMENT SYSTEM, AND METHOD AND PROGRAM THEREFOR - The speech sound intelligibility assessment system includes: an output section for presenting a speech sound to a user; a biological signal measurement section for measuring an electroencephalogram signal of the user; a positive component determination section for determining presence/absence of a positive component of an event-related potential in the electroencephalogram signal in a zone from 600 ms to 800 ms from a starting point, which is a point in time at which the output section presents a speech sound; a negative component determination section for determining presence/absence of a negative component of an event-related potential in the electroencephalogram signal in a zone from 100 ms to 300 ms from the same starting point; and an assessment section for evaluating whether the user has clearly aurally comprehended the presented speech sound or not based on the results of determination of presence/absence of the positive and negative components, respectively.03-22-2012
20120072212SYSTEM AND METHOD FOR MOBILE AUTOMATIC SPEECH RECOGNITION - A system and method of updating automatic speech recognition parameters on a mobile device are disclosed. The method comprises storing user account-specific adaptation data associated with ASR on a computing device associated with a wireless network, generating new ASR adaptation parameters based on transmitted information from the mobile device when a communication channel between the computing device and the mobile device becomes available and transmitting the new ASR adaptation data to the mobile device when a communication channel between the computing device and the mobile device becomes available. The new ASR adaptation data on the mobile device more accurately recognizes user utterances.03-22-2012
20120078622SPOKEN DIALOGUE APPARATUS, SPOKEN DIALOGUE METHOD AND COMPUTER PROGRAM PRODUCT FOR SPOKEN DIALOGUE - According to one embodiment, a spoken dialogue apparatus includes a detection unit configured to detect speech of a user; a recognition unit configured to recognize the speech; an output unit configured to output a response voice corresponding to the result of speech recognition; an estimate unit configured to estimate probability variation of a barge-in utterance, the probability variation of the barge-in utterance being the time variation of the probability of arising the barge-in utterance interrupted by the user during outputting the response voice; and a control unit configured to determine whether to adopt the barge-in utterance based on the probability variation of the barge-in utterance.03-29-2012

Patent applications in class Recognition

Patent applications in all subclasses Recognition