Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Detect speech in noise

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

704231000 - Recognition

Patent class list (only not empty are listed)

Deeper subclasses:

Entries
DocumentTitleDate
20130030803MICROPHONE-ARRAY-BASED SPEECH RECOGNITION SYSTEM AND METHOD - A microphone-array-based speech recognition system combines a noise cancelling technique for cancelling noise of input speech signals from an array of microphones, according to at least an inputted threshold. The system receives noise-cancelled speech signals outputted by a noise masking module through at least a speech model and at least a filler model, then computes a confidence measure score with the at least a speech model and the at least a filler model for each threshold and each noise-cancelled speech signal, and adjusts the threshold to continue the noise cancelling for achieving a maximum confidence measure score, thereby outputting a speech recognition result related to the maximum confidence measure score.01-31-2013
20110208520VOICE ACTIVITY DETECTION BASED ON PLURAL VOICE ACTIVITY DETECTORS - A voice activity detection (VAD) system includes a first voice activity detector, a second voice activity detector and control logic. The first voice activity detector is included in a device and produces a first VAD signal. The second voice activity detector is located externally to the device and produces a second VAD signal. The control logic combines the first and second VAD signals into a VAD output signal. Voice activity may be detected based on the VAD output signal. The second VAD signal can be represented as a flag included in a packet containing digitized audio. The packet can be transmitted to the device from the externally located VAD over a wireless link.08-25-2011
20100088094DEVICE AND METHOD FOR VOICE ACTIVITY DETECTION - A voice activity detection (VAD) device and method are disclosed, so that the VAD threshold can be adaptive to the background noise variation. The VAD device includes: a background analyzing unit, adapted to: analyze background noise features of a current signal according to an input VAD judgment result, obtain parameters related to the background noise variation, and output these parameters; a VAD threshold adjusting unit, adapted to: obtain a bias of the VAD threshold according to the parameters output by the background analyzing unit, and output the bias of the VAD threshold; and a VAD judging unit, adapted to: modify a VAD threshold to be modified according to the bias of the VAD threshold output by the VAD threshold adjusting unit, judge the background noise by using the modified VAD threshold, and output a VAD judgment result.04-08-2010
20100153104Noise Suppressor for Robust Speech Recognition - Described is noise reduction technology generally for speech input in which a noise-suppression related gain value for the frame is determined based upon a noise level associated with that frame in addition to the signal to noise ratios (SNRs). In one implementation, a noise reduction mechanism is based upon minimum mean square error, Mel-frequency cepstra noise reduction technology. A high gain value (e.g., one) is set to accomplish little or no noise suppression when the noise level is below a threshold low level, and a low gain value set or computed to accomplish large noise suppression above a threshold high noise level. A noise-power dependent function, e.g., a log-linear interpolation, is used to compute the gain between the thresholds. Smoothing may be performed by modifying the gain value based upon a prior frame's gain value. Also described is learning parameters used in noise reduction via a step-adaptive discriminative learning algorithm.06-17-2010
20090192796FILTERING OF BEAMFORMED SPEECH SIGNALS - The invention relates to speech signal processing that detects a speech signal from more than one microphone and obtains microphone signals that are processed by a beamformer to obtain a beamformed signal that is post-filtered signal with a filter that employs adaptable filter weights to obtain an enhanced beamformed signal with the post-filter adapting the filter weights with previously learned filter weights.07-30-2009
20090192795System and method for receiving audible input in a vehicle - A steering wheel system for a vehicle. The steering wheel system includes a first microphone mounted in a steering wheel and a second microphone mounted in the vehicle. The first and second microphones are each configured to receive an audible input. The audible input includes an oral command component and a noise component. The steering wheel system also includes a controller configured to identify the noise component by determining that the noise component received at the first microphone is out of phase with the noise component received at the second microphone. The controller is configured to cancel the noise component from the audible input.07-30-2009
20130211832SPEECH SIGNAL PROCESSING RESPONSIVE TO LOW NOISE LEVELS - A method of speech recognition in a vehicle. Audio including noise and a speech signal representative of an utterance from a user is received via a microphone, and a signal-to-noise ratio (SNR) for the received audio is calculated using a processor. It is determined whether the calculated SNR is greater than a predetermined SNR. If so, then a noise distribution is identified for addition to the received audio, and noise corresponding to the identified noise distribution is injected into the received audio to produce noise-injected audio including the speech signal.08-15-2013
20110202340SPEAKER VERIFICATION - A speaker verification method is proposed that first builds a general model of user utterances using a set of general training speech data. The user also trains the system by providing a training utterance, such as a passphrase or other spoken utterance. Then in a test phase, the user provides a test utterance which includes some background noise as well as a test voice sample. The background noise is used to bring the condition of the training data closer to that of the test voice sample by modifying the training data and a reduced set of the general data, before creating adapted training and general models. Match scores are generated based on the comparison between the adapted models and the test voice sample, with a final match score calculated based on the difference between the match scores. This final match score gives a measure of the degree of matching between the test voice sample and the training utterance and is based on the degree of matching between the speech characteristics from extracted feature vectors that make up the respective speech signals, and is not a direct comparison of the raw signals themselves. Thus, the method can be used to verify a speaker without necessarily requiring the speaker to provide an identical test phrase to the phrase provided in the training sample.08-18-2011
20100076759APPARATUS AND METHOD FOR RECOGNIZING A SPEECH - A noisy vector is extracted from a noisy speech, which is a clean speech on which a noise is superimposed. A noise parameter of the noise is estimated from the noisy vector. A prior distribution parameter of a clean vector of the clean speech is already stored. A joint Gaussian distribution parameter between the clean vector and the noisy vector is calculated by unscented transformation, from the noise parameter and the prior distribution parameter. A posterior distribution parameter of the clean vector is calculated by the joint Gaussian distribution parameter, from the noisy vector. By comparing the posterior distribution parameter with a standard pattern of each word previously stored, a word sequence of the noisy speech is output.03-25-2010
20100076758PHASE SENSITIVE MODEL ADAPTATION FOR NOISY SPEECH RECOGNITION - A speech recognition system described herein includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an updater component that is in communication with a first model and a second model, wherein the updater component automatically updates parameters of the second model based at least in part upon joint estimates of additive and convolutive distortions output by the first model, wherein the joint estimates of additive and convolutive distortions are estimates of distortions based on a phase-sensitive model in the speech utterance received by the receiver component. Further, distortions other than additive and convolutive distortions, including other stationary and nonstationary sources, can also be estimated used to update the parameters of the second model.03-25-2010
20100076757ADAPTING A COMPRESSED MODEL FOR USE IN SPEECH RECOGNITION - A speech recognition system includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an adaptor component that selectively adapts parameters of a compressed model used to recognize at least a portion of the distorted speech utterance, wherein the adaptor component selectively adapts the parameters of the compressed model based at least in part upon the received distorted speech utterance.03-25-2010
20120245933ADAPTIVE AMBIENT SOUND SUPPRESSION AND SPEECH TRACKING - A device for suppressing ambient sounds from speech received by a microphone array is provided. One embodiment of the device comprises a microphone array, a processor, an analog-to-digital converter, and memory comprising instructions stored therein that are executable by the processor. The instructions stored in the memory are configured to receive a plurality of digital sound signals, each digital sound signal based on an analog sound signal originating at the microphone array, receive a multi-channel speaker signal, generate a monophonic approximation signal of the multi-channel speaker signal, apply a linear acoustic echo canceller to suppress a first ambient sound portion of each digital sound signal, generate a combined directionally-adaptive sound signal from a combination of each digital sound signal by a combination of time-invariant and adaptive beamforming techniques, and apply one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal.09-27-2012
20130085753Hybrid Client/Server Speech Recognition In A Mobile Device - A computing device is able to use an embedded speech recognizer and a network speech recognizer for speech recognition. In response to detecting speech in the captured audio, the computing device may forward the captured audio to its embedded speech recognizer and to a speech client for the network speech recognizer. The embedded speech recognizer provides an embedded-recognizer result for the captured audio. If a network-recognition criterion is met, the speech client forwards the captured audio to the network speech recognizer and receives a network-recognizer result for the captured audio from the network speech recognizer. A speech recognition result for the captured audio is forwarded to at least one application, wherein the speech recognition result is based on at least one of the embedded-recognizer result and the network-recognizer result.04-04-2013
20130035935DEVICE AND METHOD FOR DETERMINING SEPARATION CRITERION OF SOUND SOURCE, AND APPARATUS AND METHOD FOR SEPARATING SOUND SOURCE - The present invention allows a man to recognize a location of a sound source in a three-dimensional space using two ears and applies a method of separating a sound source in a certain orientation to improve the performance of an application technology using a speech in a noisy environment. The present invention acquires a speech signal using two sensors and determines an orientation angle of a sound source in a zero-crossing point step with respect to a frequency separated signal with a band pass filter bank. An object of the present invention is to obtain excellent sound source orientation detection and division performance which is difficult to be obtained in an existing crossing correlation method calculated in units of time frames in a noisy environment with a plurality of sound sources.02-07-2013
20120185248VOICE DETECTOR AND A METHOD FOR SUPPRESSING SUB-BANDS IN A VOICE DETECTOR - Embodiments of the present invention relate to a voice detector receiving an input signal that is divided into sub-signals that represent a frequency sub-band. The voice detector calculates, for each sub-band, a signal-to-noise (SNR) value based on a corresponding sub-signal for each sub-band and a background signal for each sub-band. The voice detector also calculates a power SNR value for each sub-band, where at least one of the power SNR values is calculated based on a non-linear function. The voice detector forms a single value based on the calculated power SNR values and compares the single value and a given threshold value to make a voice activity decision presented on an output port.07-19-2012
20120185247UNIFIED MICROPHONE PRE-PROCESSING SYSTEM AND METHOD - A unified microphone pre-processing system includes a plurality of microphones arranged within a vehicle passenger compartment, a processing circuit or system configured to receive signals from one or more of the plurality of microphones, and the processing circuit configured to enhance the received signals for use by at least two of a telephony processing application, an automatic speech recognition processing application, and a noise cancellation processing application. The method includes receiving signals from one or more of a plurality of microphones arranged within a vehicle passenger compartment, and enhancing the received signals for use by at least two of a telephony processing application, an automatic speech recognition processing application, and a noise cancellation processing application. A computer readable medium containing executable instructions to cause a processor to perform a method in accordance with an embodiment of the invention is also described.07-19-2012
20090125305METHOD AND APPARATUS FOR DETECTING VOICE ACTIVITY - A robust method and apparatus to detect voice activity based on the power level of an audio frame. The method may include performing primary active/non-active voice period determination of an input audio frame according to a power level of the audio frame, extracting a noise power prediction value and a signal power prediction value by referring to power levels of current and previous audio frames according to a primary active/non-active voice period determination value, and performing secondary active/non-active voice period determination for the input audio frame by comparing the extracted signal power prediction value with the extracted noise power prediction value.05-14-2009
20090125304METHOD AND APPARATUS TO DETECT VOICE ACTIVITY - A method and apparatus to detect voice activity by using a zero-crossing rate includes removing noise included in an audio signal, adding a random signal having energy of a predetermined size to the audio signal from which noise is removed, extracting predetermined voice detection parameters from the audio signal to which the random signal is added, and comparing the extracted predetermined voice detection parameters with a threshold value and determining voice and non-voice activities.05-14-2009
20100094624SYSTEM AND METHOD FOR MACHINE-BASED DETERMINATION OF SPEECH INTELLIGIBILITY IN AN AIRCRAFT DURING FLIGHT OPERATIONS - A method for effecting a machine-based determination of speech intelligibility in an aircraft during flight operations includes: (a) in no particular order: (1) providing a representation of a machine-based speech evaluating signal; and (2) providing a representation of in-flight noise; (b) combining the representation of a machine-based speech evaluation signal and the representation of in-flight noise to obtain a combined noise signal; and (c) employing the combined noise signal to present the machine-based determination of speech intelligibility in an aircraft during flight operations.04-15-2010
20130046536Method and Apparatus for Performing Song Detection on Audio Signal - Methods and apparatuses for performing song detection on an audio signal are described. Clips of the audio signal are classified into classes comprising music. Class boundaries of music clips are detected as candidate boundaries of a first type. Combinations including non-overlapped sections are derived. Each section meets the following conditions: 1) including at least one music segment longer than a predetermined minimum song duration, 2) shorter than a predetermined maximum song duration, 3) both starting and ending with a music clip, and 4) a proportion of the music clips in each of the sections is greater than a predetermined minimum proportion. In this way, various possible song partitions in the audio signal can be obtained for investigation.02-21-2013
20120191450SYSTEM AND METHOD FOR NOISE REDUCTION IN PROCESSING SPEECH SIGNALS BY TARGETING SPEECH AND DISREGARDING NOISE - A system and method for processing a speech signal delivered in a noisy channel or with ambient noise that focuses on a subset of harmonics that are least corrupted by noise, that disregards the signal harmonics with low signal-to-noise ratio(s), and that disregards amplitude modulations inconsistent with speech.07-26-2012
20130073285Robust Downlink Speech and Noise Detector - A voice activity detection process is robust to a low and high signal-to-noise ratio speech and signal loss. A process divides an aural signal into one or more bands. Signal magnitudes of frequency components and the respective noise components are estimated. A noise adaptation rate modifies estimates of noise components based on differences between the signal to the estimated noise and signal variability.03-21-2013
20110015925SPEECH RECOGNITION SYSTEM AND METHOD - A speech recognition method, comprising: 01-20-2011
20130060567Front-End Noise Reduction for Speech Recognition Engine - VoIP phones according to the present invention include a microphone, which may be internal or external, and allow the user to communicate unobtrusively, check voice mail and conduct other activities in an environment which can be noisy in general and extremely noisy sometimes. Speech recognition functionally may also be used to generate and send touch tone or DTMF tones such as in response to call trees or voice recognition functionality used by airlines, credit card companies, voice mail systems, and other applications. A system and method of audio processing which provides enhanced speech recognition is provided. Audio input is received at the microphone which is processed by adaptive noise cancellation to generate an enhanced audio signal. The operation of the speech recognition engine and the adaptive noise canceller may be advantageously controlled based on Voice Activity Detection (VAD).03-07-2013
20090271190Method and Apparatus for Voice Activity Determination - In accordance with an example embodiment of the invention, there is provided an apparatus for detecting voice activity in an audio signal. The apparatus comprises a first voice activity detector for making a first voice activity detection decision based at least in part on the voice activity of a first audio signal received from a first microphone. The apparatus also comprises a second voice activity detector for making a second voice activity detection decision based at least in part on an estimate of a direction of the first audio signal and an estimate of a direction of a second audio signal received from a second microphone. The apparatus further comprises a classifier for making a third voice activity detection decision based at least in part on the first and second voice activity detection decisions.10-29-2009
20090271188Adjusting A Speech Engine For A Mobile Computing Device Based On Background Noise - Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.10-29-2009
20090030679AMBIENT NOISE INJECTION FOR USE IN SPEECH RECOGNITION - A method of ambient noise injection for use with speech recognition in a production vehicle. The method includes the steps of monitoring audio including user speech, receiving an utterance from the user speech, retrieving vehicle-specific ambient noise, and prepending the vehicle-specific ambient noise to the utterance before pre-processing and decoding the utterance.01-29-2009
20090012786Adaptive Noise Cancellation - Speech-free noise estimation by cancellation of speech content from an audio input where the speech content is estimated by noise suppression. Adaptive noise cancellation with primary and noise-reference inputs and an adaptive noise cancellation filter from estimating primary noise from noise-reference input. Speech Suppressor (Noise Estimation) applied to noise-reference input provides speech-free noise estimates for noise cancellation in the primary input.01-08-2009
20080294432Signal enhancement and speech recognition - Provides speech enhancement techniques which are effective even for extemporaneous noise without a noise interval and unknown extemporaneous noise. An example of a signal enhancement device includes: spectral subtraction means for subtracting a given reference signal from an input signal containing a target signal and a noise signal by spectral subtraction; an adaptive filter applied to the reference signal; and coefficient control means for controlling a filter coefficient of the adaptive filter in order to reduce components of the noise signal in the input signal. In the signal enhancement device, a database of a signal model concerning the target signal expressing a given feature by means of a given statistical model is provided, and the filter coefficient is controlled based on the likelihood of the signal model with respect to an output signal from the spectral subtraction means.11-27-2008
20110071824Systems and Methods for Multiple Pitch Tracking - An apparatus includes a function module, a strength module, and a filter module. The function module compares an input signal, which has a component, to a first delayed version of the input signal and a second delayed version of the input signal to produce a multi-dimensional model. The strength module calculates a strength of each extremum from a plurality of extrema of the multi-dimensional model based on a value of at least one opposite extremum of the multi-dimensional model. The strength module then identifies a first extremum from the plurality of extrema, which is associated with a pitch of the component of the input signal, that has the strength greater than the strength of the remaining extrema. The filter module extracts the pitch of the component from the input signal based on the strength of the first extremum.03-24-2011
20110282663TRANSIENT NOISE REJECTION FOR SPEECH RECOGNITION - A method of and system for transient noise rejection for improved speech recognition. The method comprises the steps of (a) receiving audio including user speech and at least some transient noise associated with the speech, (b) converting the received audio into digital data, (c) segmenting the digital data into acoustic frames, and (d) extracting acoustic feature vectors from the acoustic frames. The method also comprises the steps of (e) evaluating the acoustic frames for transient noise on a frame-by-frame basis, (f) rejecting those acoustic frames having transient noise, (g) accepting as speech frames those acoustic frames having no transient noise and, thereafter, (h) recognizing the user speech using the speech frames.11-17-2011
20110288860SYSTEMS, METHODS, APPARATUS, AND COMPUTER-READABLE MEDIA FOR PROCESSING OF SPEECH SIGNALS USING HEAD-MOUNTED MICROPHONE PAIR - A noise cancelling headset for voice communications contains a microphone at each of the user's ears and a voice microphone. The headset shares the use of the ear microphones for improving signal-to-noise ratio on both the transmit path and the receive path.11-24-2011
20120109647System Enhancement of Speech Signals - A system enhances speech by detecting a speaker's utterance through a first microphone positioned a first distance from a source of interference. A second microphone may detect the speaker's utterance at a different position. A monitoring device may estimate the power level of a first microphone signal. A synthesizer may synthesize part of the first microphone signal by processing the second microphone signal. The synthesis may occur when power level is below a predetermined level.05-03-2012
20090089053MULTIPLE MICROPHONE VOICE ACTIVITY DETECTOR - Voice activity detection using multiple microphones can be based on a relationship between an energy at each of a speech reference microphone and a noise reference microphone. The energy output from each of the speech reference microphone and the noise reference microphone can be determined. A speech to noise energy ratio can be determined and compared to a predetermined voice activity threshold. In another embodiment, the absolute value of the autocorrelation of the speech and noise reference signals are determined and a ratio based on autocorrelation values is determined. Ratios that exceed the predetermined threshold can indicate the presence of a voice signal. The speech and noise energies or autocorrelations can be determined using a weighted average or over a discrete frame size.04-02-2009
20120130713SYSTEMS, METHODS, AND APPARATUS FOR VOICE ACTIVITY DETECTION - Systems, methods, apparatus, and machine-readable media for voice activity detection in a single-channel or multichannel audio signal are disclosed.05-24-2012
20090271189Testing A Grammar Used In Speech Recognition For Reliability In A Plurality Of Operating Environments Having Different Background Noise - Methods, systems, and products for testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise that include: receiving recorded background noise for each of the plurality of operating environments; generating a test speech utterance for recognition by a speech recognition engine using a grammar; mixing the test speech utterance with each recorded background noise, resulting in a plurality of mixed test speech utterances, each mixed test speech utterance having different background noise; performing, for each of the mixed test speech utterances, speech recognition using the grammar and the mixed test speech utterance, resulting in speech recognition results for each of the mixed test speech utterances; and evaluating, for each recorded background noise, speech recognition reliability of the grammar in dependence upon the speech recognition results for the mixed test speech utterance having that recorded background noise.10-29-2009
20100114570APPARATUS AND METHOD FOR RESTORING VOICE - An apparatus and a method for restoring voice are provided. The apparatus reduces noise included in a voice signal input to a microphone and outputs a voice signal having reduced noise, detects harmonic frequencies from the voice signal having reduced noise, and restores the voice signal having reduced noise approximate to its original state before being input to the microphone according to detected harmonic frequencies of the voice signal having reduced noise.05-06-2010
20100268533APPARATUS AND METHOD FOR DETECTING SPEECH - A speech detection apparatus and method are provided. The speech detection apparatus and method determine whether a frame is speech or not using feature information extracted from an input signal. The speech detection apparatus may estimate a situation related to an input frame and determine which feature information is required for speech detection for the input frame in the estimated situation. The speech detection apparatus may detect a speech signal using dynamic feature information that may be more suitable to the situation of a particular frame, instead of using the same feature information for each and every frame.10-21-2010
20090150145Learning word segmentation from non-white space languages corpora - Illustrative embodiments provide a computer implemented method, apparatus, and computer program product for learning word segmentation from non-white space language corpora. In one illustrative embodiment, the computer implemented method receives text input characters and calculates a ratio-measure for each pair of characters in the input characters. The computer implemented method further determines whether the ratio-measure of each pair of characters is equal to a predetermined threshold value. Responsive to determining the ratio-measure is less than the predetermined threshold value, and a local-minimum value, the computer method further identifies the pair as a weak pair and breaks the weak pair of characters.06-11-2009
20090150144Robust voice detector for receive-side automatic gain control - A voice detector improves voice output quality. The voice detector may be incorporated into a cellphone, hands-free car phone, or any other device that provides voice output. The voice detector provides excellent voice output quality even when signal dropouts and other significant signal artifacts are present in the received signal. Not only does the high quality voice output improve the listening experience, it also benefits downstream processing systems that further process the voice signal.06-11-2009
20090055173SUB BAND VAD - The present invention relates to a voice detector being responsive to an input signal being divided into sub-signals representing a frequency sub-band, comprising: means to calculate, for each sub-band, an SNR value snr[n] based on a corresponding sub-signal for each sub-band and a background signal for each sub-band. The voice detector further comprises: means to calculate a power SNR value for each sub-band, wherein at least one of said power SNR values is calculated based on a non-linear function, means to form a single value snr_sum based on the calculated power SNR values, and means to compare said single value snr_sum and a given threshold value vad_thr to make a voice activity decision vad_prim presented on an output port. The invention also relates to a voice activity detector, a node and a method for selectively suppressing sub-bands in a voice detector.02-26-2009
20120239394ERRONEOUS DETECTION DETERMINATION DEVICE, ERRONEOUS DETECTION DETERMINATION METHOD, AND STORAGE MEDIUM STORING ERRONEOUS DETECTION DETERMINATION PROGRAM - An erroneous detection determination device includes: a signal acquisition unit configured to acquire, from each of microphones, a plurality of audio signals relating to ambient sound including sound from a sound source in a certain direction; a result acquisition unit configured to acquire a recognition result including voice activity information indicating the inclusion of a voice activity relating to at least one of the audio signals; a calculation unit configured to calculate, for each of audio signals on the basis of the signals in respective unit times and the certain direction, a speech arrival rate representing the proportion of the sound from the certain direction to the ambient sound in each of the unit times; and an error detection unit configured to determine, on the basis of the recognition result and the speech arrival rate, whether or not the voice activity information is the result of erroneous detection.09-20-2012
20100088093Voice Command Acquisition System and Method - A voice command acquisition method and system for motor vehicles is improved in that noise source information is obtained directly from the vehicle system bus. Upon receiving an input signal with a voice command, the system bus is queried for one or more possible sources of a noise component in the input signal. In addition to vehicle-internal information (e.g., window status, fan blower speed, vehicle speed), the system may acquire external information (e.g., weather status) in order to better classify the noise component in the input signal. If the noise source is found to be a window, for example, the driver may be prompted to close the window. In addition, if the fan blower is at a high speed level, it may be slowed down automatically.04-08-2010
20100082341Speaker recognition device and method using voice signal analysis - A device includes a speaker recognition device operable to perform a method that identifies a speaker using voice signal analysis. The speaker recognition device and method identifies the speaker by analyzing a voice signal and comparing the signal with voice signal characteristics of speakers, which are statistically classified. The device and method is applicable to a case where a voice signal is a voiced sound or a voiceless sound or to a case where no information on a voice signal is present. Since voice/non-voice determination is performed, the speaker can be reliably identified from the voice signal. The device and method is adaptable to applications that require a real-time process due to a small amount of data to be calculated and fast processing. Furthermore, the device and method can be variously applied to portable devices due to low power consumption.04-01-2010
20090281805INTEGRATED SPEECH INTELLIGIBILITY ENHANCEMENT SYSTEM AND ACOUSTIC ECHO CANCELLER - A system and method is described that improves the intelligibility of a far-end telephone speech signal to a user of a telephony device in the presence of near-end background noise. As described herein, the system and method improves the intelligibility of the far-end telephone speech signal in a manner that does not require user input and that minimizes the distortion of the far-end telephone speech signal. The system is integrated with an acoustic echo canceller and shares information therewith.11-12-2009
20080300871METHOD AND APPARATUS FOR IDENTIFYING ACOUSTIC BACKGROUND ENVIRONMENTS TO ENHANCE AUTOMATIC SPEECH RECOGNITION - Disclosed are systems, methods, and computer readable media for identifying an acoustic environment of a caller. The method embodiment comprises analyzing acoustic features of a received audio signal from a caller, receiving meta-data information, classifying a background environment of the caller based on the analyzed acoustic features and the meta-data, selecting an acoustic model matched to the classified background environment from a plurality of acoustic models, and performing speech recognition as the received audio signal using the selected acoustic model.12-04-2008
20090287485ADAPTIVELY FILTERING A MICROPHONE SIGNAL RESPONSIVE TO VIBRATION SENSED IN A USER'S FACE WHILE SPEAKING - Electronic devices and methods are disclosed that adaptively filter a microphone signal responsive to vibration that is sensed in the face of a user speaking into a microphone of the device. An electronic device can include a microphone, a vibration sensor, a vibration characterization unit, and an adaptive sound filter. The microphone generates a microphone signal that can include a user speech component and a background noise component. The vibration sensor senses vibration of the face while a user speaks into the microphone, and generates a vibration signal containing frequency components that are indicative of the sensed vibration. The vibration characterization unit generates speech characterization data that characterize at least one of the frequency components of the vibration signal that is associated with the speech component of the microphone signal. The adaptive sound filter filters the microphone signal using filter coefficients that are tuned in response to the speech characterization data to generate a filtered speech signal with an attenuated background noise component relative to the user speech component from the microphone signal.11-19-2009
20090112584DYNAMIC NOISE REDUCTION - A speech enhancement system improves the speech quality and intelligibility of a speech signal. The system includes a time-to-frequency converter that converts segments of a speech signal into frequency bands. A signal detector measures the signal power of the frequency bands of each speech segment. A background noise estimator measures a background noise detected in the speech signal. A dynamic noise reduction controller dynamically models the background noise in the speech signal. The speech enhancement renders a speech signal perceptually pleasing to a listener by dynamically attenuating a portion of the noise that occurs in a portion of the spectrum of the speech signal.04-30-2009
20090150146MICROPHONE ARRAY BASED SPEECH RECOGNITION SYSTEM AND TARGET SPEECH EXTRACTING METHOD OF THE SYSTEM - A microphone-array-based speech recognition system using a blind source separation (BBS) and a target speech extraction method in the system are provided. The speech recognition system performs an independent component analysis (ICA) to separate mixed signals input through a plurality of microphone into sound-source signals, extracts one target speech spoken for speech recognition from the separated sound-source signals by using a Gaussian mixture model (GMM) or a hidden Markov Model (HMM), and automatically recognizes a desired speech from the extracted target speech. Accordingly, it is possible to obtain a high speech recognition rate even in a noise environment.06-11-2009
20090276213ROBUST DOWNLINK SPEECH AND NOISE DETECTOR - A voice activity detection process is robust to a low and high signal-to-noise ratio speech and signal loss. A process divides an aural signal into one or more bands. Signal magnitudes of frequency components and the respective noise components are estimated. A noise adaptation rate modifies estimates of noise components based on differences between the signal to the estimated noise and signal variability.11-05-2009
20110208521Hidden Markov Model for Speech Processing with Training Method - A method, system and apparatus are shown for identifying non-language speech sounds in a speech or audio signal. An audio signal is segmented and feature vectors are extracted from the segments of the audio signal. The segment is classified using a hidden Markov model (HMM) that has been trained on sequences of these feature vectors. Post-processing components can be utilized to enhance classification. An embodiment is described in which the hidden Markov model is used to classify a segment as a language speech sound or one of a variety of non-language speech sounds. Another embodiment is described in which the hidden Markov model is trained using discriminative learning.08-25-2011
20080281591METHOD OF PATTERN RECOGNITION USING NOISE REDUCTION UNCERTAINTY - A method and apparatus are provided for using the uncertainty of a noise-removal process during pattern recognition. In particular, noise is removed from a representation of a portion of a noisy signal to produce a representation of a cleaned signal. In the meantime, an uncertainty associated with the noise removal is computed and is used with the representation of the cleaned signal to modify a probability for a phonetic state in the recognition system. In particular embodiments, the uncertainty is used to modify a probability distribution, by increasing the variance in each Gaussian distribution by the amount equal to the estimated variance of the cleaned signal, which is used in decoding the phonetic state sequence in a pattern recognition task.11-13-2008
20080228478Targeted speech - A system detects a speech segment that may include unvoiced, fully voiced, or mixed voice content. The system includes a digital converter that converts a time-varying input signal into a digital-domain signal. A window function passes signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range when multiplied by an output of the digital converter. A frequency converter converts the signals passing within the programmed aural frequency range into a plurality of frequency bins. A background voice detector estimates the strength of a background speech segment relative to the noise of selected portions of the aural spectrum. A noise estimator estimates a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins. A voice detector compares the strength of a desired speech segment to a criterion based on an output of the background voice detector and an output of the noise estimator.09-18-2008
20080228477Method and Device For Processing a Voice Signal For Robust Speech Recognition - A speech signal is processed for subsequent speech recognition. The speech signal is tainted by noise and represents at least one speech command. The following steps are executed: a) recording of the noise-tainted speech signal; b) use of noise reduction on the speech signal to generate a noise-reduced speech signal; c) normalization of the noise-reduced speech signal to a target signal value with the aid of a normalization factor, to generate a noise-reduced, normalized speech signal).09-18-2008
20090265169Techniques for Comfort Noise Generation in a Communication System - A technique of operating a communication device includes dividing a frequency band associated with a background noise signal into respective sub-bands. Respective individual level estimates for each of the respective sub-bands are then determined. A total level estimate for the background noise signal is determined. Finally, a comfort noise signal (whose characteristics are based on the respective individual level estimates and the total level estimate) is provided.10-22-2009
20090089054APPARATUS AND METHOD OF NOISE AND ECHO REDUCTION IN MULTIPLE MICROPHONE AUDIO SYSTEMS - Multiple microphone noise suppression apparatus and methods are described herein. The apparatus and methods implement a variety of noise suppression techniques and apparatus that can be selectively applied to signals received using multiple microphones. The microphone signals received at each of the multiple microphones can be independently processed to cancel echo signal components that can be generated from a local audio source. The echo cancelled signals may be processed by some or all modules within a signal separator that operates to separate or otherwise isolate a speech signal from noise signals. The signal separator can include a pre-processing de-correlator followed by a blind source separator. The output of the blind source separator can be post filtered to provide post separation de-correlation. The separated speech and noise signals can be non-linearly processed for further noise reduction, and additional post processing can be implemented following the non-linear processing.04-02-2009
20100004929APPARATUS AND METHOD FOR CANCELING NOISE OF VOICE SIGNAL IN ELECTRONIC APPARATUS - An apparatus and a method for canceling noise in a voice signal in an electronic apparatus are provided. The apparatus includes a Generalized Sidelobe Canceller (GSC) and a decision unit. The GSC cancels noise components from signals with different phases input via a plurality of microphones. The decision unit estimates a Signal-to-Noise Ratio (SNR) of an input signal to determine a step-size of a filter included in the GSC.01-07-2010
20090132248TIME-DOMAIN RECEIVE-SIDE DYNAMIC CONTROL - A system improves the speech intelligibility and the speech quality of a speech segment. The system includes a dynamic controller that detects a background noise from an input by modeling a signal. A variable gain amplifier adjusts the variable gain of the amplifier in response to an output of dynamic controller. A shaping filter adjusts a speech signal by tilting portions of the speech signal of the dynamic controller.05-21-2009
20080306736METHOD AND SYSTEM FOR A SUBBAND ACOUSTIC ECHO CANCELLER WITH INTEGRATED VOICE ACTIVITY DETECTION - Methods and systems for a subband acoustic echo canceller with integrated voice activity detection are disclosed and may include adjusting transmit and/or receive powers of wirelessly communicated audio signals based on voice activity detection via subband analysis of the wirelessly communicated audio signals. The receive power may be adjusted by utilizing a reduced duty cycle, or by conveying voice activity detection information via an asynchronous control channel in a Bluetooth application. A plurality of subbands may be generated utilizing a fast Fourier transform, and a first subset of the subbands corresponding to voice activity may be selected and a second subset of the subbands may be selected that corresponds to background noise. The processing of the subsets may be dynamically adjusted due to variations in the voice activity or background noise. Comfort noise may be generated and transmitted at a reduced bandwidth utilizing the second subset of the subbands.12-11-2008
20130218560METHOD AND APPARATUS FOR AUDIO INTELLIGIBILITY ENHANCEMENT AND COMPUTING APPARATUS - Method and apparatus for audio intelligibility enhancement and computing apparatus are provided. The method includes the following steps. Environment noise is detected by performing voice activity detection according to a detected audio signal from at least a microphone of a computing device. Noise information is obtained according to the detected environment noise and a first audio signal. A second audio signal is outputted by boosting the first audio signal under an adjustable headroom by the computing device according to the noise information and the first audio signal.08-22-2013
20110224979Enhancing Speech Recognition Using Visual Information - Speech recognition device uses visual information to narrow down the range of likely adaptation parameters even before a speaker makes an utterance. Images of the speaker and/or the environment are collected using an image capturing device, and then processed to extract biometric features and environmental features. The extracted features and environmental features are then used to estimate adaptation parameters. A voice sample may also be collected to refine the adaptation parameters for more accurate speech recognition.09-15-2011
20090198492ADAPTIVE NOISE MODELING SPEECH RECOGNITION SYSTEM - An adaptive noise modeling speech recognition system improves speech recognition by modifying an activation of the system's grammar rules or models based on detected noise characteristics. An adaptive noise modeling speech recognition system includes a sensor that receives acoustic data having a speech component and a noise component. A processor analyzes the acoustic data and generates a noise indicator that identifies a characteristic of the noise component. An integrating decision logic processes the noise indicator and generates a noise model activation data structure that includes data that may be used by a speech recognition engine to adjust the activation of associated grammar rules or models.08-06-2009
20110144988EMBEDDED AUDITORY SYSTEM AND METHOD FOR PROCESSING VOICE SIGNAL - An embedded auditory system includes a voice detecting unit for receiving a voice signal as an input and dividing the voice signal into a voice section and a non-voice section; a noise removing unit for removing a noise in the voice section of the voice signal using noise information in the non-voice section of the voice signal; and a keyword spotting unit for extracting a feature vector from the voice signal noise-removed by the noise removing unit and detecting a keyword from the voice section of the voice signal using the feature vector. A method for processing a voice signal includes receiving a voice signal as an input and dividing the voice signal into a voice section and a non-voice section; removing a noise in the voice section of the voice signal using noise information in the non-voice section of the voice signal; and extracting a feature vector from the voice signal noise-removed by the noise removing unit and detecting a keyword from the voice section of the voice signal using the feature vector.06-16-2011
20110224980SPEECH RECOGNITION SYSTEM AND SPEECH RECOGNIZING METHOD - A speech recognition system according to the present invention includes a sound source separating section which separates mixed speeches from multiple sound sources from one another; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each frequency spectral component of a separated speech signal using distributions of speech signal and noise against separation reliability of the separated speech signal; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.09-15-2011
20090063143SYSTEM FOR SPEECH SIGNAL ENHANCEMENT IN A NOISY ENVIRONMENT THROUGH CORRECTIVE ADJUSTMENT OF SPECTRAL NOISE POWER DENSITY ESTIMATIONS - A system estimates the spectral noise power density of an audio signal includes a spectral noise power density estimation unit, a correction term processor, and a combination processor. The spectral noise power density estimation unit may provide a first estimate of the spectral noise power density of the audio signal. The correction term processor may provide a time dependent correction term based, at least in part, on a spectral noise power density estimation error of the actual spectral noise power density. The correction term may be determined so that the spectral noise power density estimation error is reduced. The combination processor may combine the first estimate with the correction term to obtain a second estimate of the spectral noise power density that may be used for subsequent signal processing to enhance a desired signal component of the audio signal.03-05-2009
20120078624METHOD FOR DETECTING VOICE SECTION FROM TIME-SPACE BY USING AUDIO AND VIDEO INFORMATION AND APPARATUS THEREOF - The present invention relates to a method for detecting a voice section in time-space by using audio and video information. According to an embodiment of the present invention, a method for detecting a voice section from time-space by using audio and video information comprises the steps of: detecting a voice section in an audio signal which is inputted into a microphone array; verifying a speaker from the detected voice section; sensing the face of the speaker by using a video signal which is inputted into a camera if the speaker is successfully verified, and then estimating the direction of the face of the speaker; and determining the detected voice section as the voice section of the speaker if the estimated face direction corresponds to a reference direction which is previously stored.03-29-2012
20090006088SYSTEM AND METHOD OF PERFORMING SPEECH RECOGNITION BASED ON A USER IDENTIFIER - Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.01-01-2009
20090210224SYSTEM, METHOD AND PROGRAM FOR SPEECH PROCESSING - The present invention relates to a system, method and program for speech recognition. In an embodiment of the invention a method for processing a speech signal consists of receiving a power spectrum of a speech signal and generating a log power spectrum signal of the power spectrum. The method further consists of performing discrete cosine transformation on the log power spectrum signal and cutting off cepstrum upper and lower terms of the discrete cosine transformed signal. The method further consists of performing inverse discrete cosine transformation on the signal from which the cepstrum upper and lower terms are cut off. The method further consists of converting the inverse discrete cosine transformed signal so as to bring the signal back to a power spectrum domain and filtering the power spectrum of the speech signal by using, as a filter, the signal which is brought back to the power spectrum domain.08-20-2009
20090222264SUB-BAND CODEC WITH NATIVE VOICE ACTIVITY DETECTION - An augmented version of a Low-Complexity Sub-band Coder (LC-SBC) is described herein that is better suited than conventional LC-SBC for wideband voice communication in the Bluetooth™ framework, where minimizing the power consumption is of paramount importance. The augmented version of LC-SBC utilizes certain techniques associated with speech coding, such as Voice Activity Detection (VAD), to reduce bandwidth usage and power consumption while maintaining voice quality. The augmented version of LC-SBC reduces the average bit rate used for transmitting wideband speech in a manner that does not add significant computational complexity. Furthermore, the augmented version of LC-SBC may advantageously be implemented in a manner that does not require any modification of the underlying logic/structure of LC-SBC.09-03-2009
20090254341APPARATUS, METHOD, AND COMPUTER PROGRAM PRODUCT FOR JUDGING SPEECH/NON-SPEECH - A spectrum calculating unit calculates, for each of the frames, a spectrum by performing a frequency analysis on an acoustic signal. An estimating unit estimates a noise spectrum. An energy calculating unit calculates an energy characteristic amount. An entropy calculating unit calculates a normalized spectral entropy value. A generating unit generates a characteristic vector based on the energy characteristic amounts and the normalized spectral entropy values that have been calculated for a plurality of frames. A likelihood calculating unit calculates a speech likelihood value of a target frame that corresponds to the characteristic vector. In a case where the speech likelihood value is larger than a threshold value, a judging unit judges that the target frame is a speech frame.10-08-2009
20120197639SYSTEM THAT DETECTS AND IDENTIFIES PERIODIC INTERFERENCE - A system improves speech detection or processing by identifying registration signals. The system encodes a limited frequency band by varying the amplitude of a pulse width modulated signal between predefined values. The signal is separated into frequency bins that identify amplitude and phase. The registration signal is measured by comparing a difference in average acoustic power in a plurality of adjacent bins over time.08-02-2012
20100262425NOISE SUPPRESSION DEVICE AND NOISE SUPPRESSION METHOD - Disclosed is a noise suppression device capable of better noise suppression by means of a simpler structure and with a lighter computational load. A noise suppression device (10-14-2010
20100262424 Method of Eliminating Background Noise and a Device Using the Same - The present invention provides a method of eliminating background noise and a device using the same. The method of eliminating background noise comprises the steps of: detecting an effective value of a received audio signal, and generating an average power signal of the received audio signal; generating a noise eliminating control signal by comparing the average power signal with a first threshold; and eliminating the noise, and amplifying the voice signal using the noise eliminating control signal. A device of eliminating background noise comprises a detecting unit, which is configured to detect an effective value, and generate an average power signal of the received audio signal; a first signal generating unit, which is configured to generate a noise eliminating control signal; and an amplifying unit, which is configured to eliminate the noise, and amplify the voice signal.10-14-2010
20100262423FEATURE COMPENSATION APPROACH TO ROBUST SPEECH RECOGNITION - Described is a technology by which a feature compensation approach to speech recognition uses a high-order vector Taylor series (HOVTS) approximation of a model of distortions to improve recognition accuracy. Speech recognizer models trained with clean speech degrade when later dealing with speech that is corrupted by additive noises and convolutional distortions. The approach attempts to remove any such noise/distortions from the input speech. To use the HOVTS approximation, a Gaussian mixture model is trained and used to convert cepstral domain feature vectors to log spectrum components. HOVTS computes statistics for the components, which are transformed back to the cepstral domain. A noise/distortion estimate is obtained, and used to provide a clean speech estimate to the recognizer.10-14-2010
20100161326SPEECH RECOGNITION SYSTEM AND METHOD - A speech recognition system includes: a speed level classifier for measuring a moving speed of a moving object by using a noise signal at an initial time of speech recognition to determine a speed level of the moving object; a first speech enhancement unit for enhancing sound quality of an input speech signal of the speech recognition by using a Wiener filter, if the speed level of the moving object is equal to or lower than a specific level; and a second speech enhancement unit enhancing the sound quality of the input speech signal by using a Gaussian mixture model, if the speed level of the moving object is higher than the specific level. The system further includes an end point detection unit for detecting start and end points, an elimination unit for eliminating sudden noise components based on a sudden noise Gaussian mixture model.06-24-2010
20100217590SPEAKER LOCALIZATION SYSTEM AND METHOD - A system and method for performing speaker localization is described. The system and method utilizes speaker recognition to provide an estimate of the direction of arrival (DOA) of speech sound waves emanating from a desired speaker with respect to a microphone array included in the system. Candidate DOA estimates may be preselected or generated by one or more other DOA estimation techniques. The system and method is suited to support steerable beamforming as well as other applications that utilize or benefit from DOA estimation. The system and method provides robust performance even in systems and devices having small microphone arrays and thus may advantageously be implemented to steer a beamformer in a cellular telephone or other mobile telephony terminal featuring a speakerphone mode.08-26-2010
20090076815Speech Recognition Apparatus, Speech Recognition Apparatus and Program Thereof - Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction. Further, maximum likelihood estimation is executed by using voice data of the component of the sound source direction passed through these processes, and a voice model obtained by predetermined modeling of the voice data, and speech recognition is carried out based on an obtained assumption value.03-19-2009
20090076813METHOD FOR SPEECH RECOGNITION USING UNCERTAINTY INFORMATION FOR SUB-BANDS IN NOISE ENVIRONMENT AND APPARATUS THEREOF - According to a method and apparatus for speech recognition in noise environment of the present invention using uncertainty information for sub-band, uncertainty information of each sub-band is extracted from estimated clean speech using noise modeling, and helps to extract speech features that are robust to noise using the extracted uncertainty information as a weight with respect to each sub-band. Also, an acoustic model is converted according to each sub-band weight, and speech recognition is performed based on the converted acoustic model and the extracted speech features. As a result, while the noise modeling over time is not so accurate, noise influence resulted from sub-bands having high corruption can be reduced according to the uncertainty information of the corresponding sub-band, and speech recognition performance in complex noise environments can be improved.03-19-2009
20100241428METHOD AND SYSTEM FOR BEAMFORMING USING A MICROPHONE ARRAY09-23-2010
20100211388Speech Enhancement with Voice Clarity - A method for enhancing speech components of an audio signal composed of speech and noise components processes subbands of the audio signal, the processing including controlling the gain of the audio signal in ones of the subbands, wherein the gain in a subband is controlled at least by processes that convey either additive/subtractive differences in gain or multiplicative ratios of gain so as to reduce gain in a subband as the level of noise components increases with respect to the level of speech components in the subband and increase gain in a subband when speech components are present in subbands of the audio signal, the processes each responding to subbands of the audio signal and controlling gain independently of each other to provide a processed subband audio signal.08-19-2010
20090222263Method and Apparatus for Transmitting Speech Data To a Remote Device In a Distributed Speech Recognition System - A method of transmitting speech data to a remote device in a distributed speech recognition system, includes the steps of: dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in the frame; grouping the frames into multiframes, each multiframe including a predetermined number of frames; calculating, for each multiframe, a voice activity marker representative of the number of frames in the multiframe representing speech activity; and selectively transmitting, on the basis of the voice activity marker associated with each multiframe, the multiframes to the remote device.09-03-2009
20080312918VOICE PERFORMANCE EVALUATION SYSTEM AND METHOD FOR LONG-DISTANCE VOICE RECOGNITION - A system and a method are provided for evaluating a voice performance in order to recognize a long-distance voice. The system implements a voice performance evaluation function for long-distance voice input in a robot. Particularly, in robots including a network robot, it is required to normally perform voice recognition so that a speaking subject and a surrounding situation can be recognized by a robot. Accordingly, in order to obtain the most optimal voice quality, it is important to find a noise removal algorithm through an optimal hardware configuration and an optimal combination of the optimal hardware configuration and software. Therefore, a method for finding a noise removal algorithm appropriate for each of cases, including one case where a distance from a speaking subject is fixed and another case where a distance from a speaking subject changes. As a result, the most optimal voice quality can be obtained regardless of a noise environment even when the speaking subject is a long distance away from the robot.12-18-2008
20090177468SPEECH RECOGNITION WITH NON-LINEAR NOISE REDUCTION ON MEL-FREQUENCY CEPTRA - In an automatic speech recognition system, a feature extractor extracts features from a speech signal, and speech is recognized by the automatic speech recognition system based on the extracted features. Noise reduction as part of the feature extractor is provided by feature enhancement in which feature-domain noise reduction in the form of Mel-frequency cepstra is provided based on the minimum means square error criterion. Specifically, the devised method takes into account the random phase between the clean speech and the mixing noise. The feature-domain noise reduction is performed in a dimension-wise fashion to the individual dimensions of the feature vectors input to the automatic speech recognition system, in order to perform environment-robust speech recognition.07-09-2009
20110040560METHOD AND MEANS FOR DECODING BACKGROUND NOISE INFORMATION - A basic idea of the invention is to ascertain information on the course of the bit rate switching during an active speech phase. According to the invention, during the speech phase, information on the percentage proportion of broadband active speech frames in comparison to narrowband active speech frames is compiled on the part of the decoder. A high percentage proportion of broadband active speech frames indicates that a broadband use is preferred on the part of the codec and therefore a need exists for synthesizing noise information in broadband form during a DTX phase.02-17-2011
20080249772APPARATUS AND METHOD FOR ENHANCING SPEECH INTELLIGIBILITY IN A MOBILE TERMINAL - An apparatus and a method for enhancing speech intelligibility in a mobile terminal. A complex spectrum calculator calculates complex spectra of one input frame of an input speech signal by Fourier transform, a speech level calculator calculates its instant levels, an average speech level calculator calculates an average speech level of the speech frame using the instant levels, if the input frame is a speech frame, a scaling factor calculator calculates scaling factors by comparing the average speech level with the instant levels, an HPF characteristic calculator calculates amplitude characteristics using the scaling factors, a HPF high-pass-filters the complex spectra using the amplitude characteristics, a synthesizer converts high-pass-filtered signals to time signals by inverse Fourier transform and synthesizes the time signals, and a combiner outputs an enhanced intelligibility speech signal by combining the synthesized time signal with the input frame.10-09-2008
20090070108METHOD AND SYSTEM FOR IDENTIFYING SPEECH SOUND AND NON-SPEECH SOUND IN AN ENVIRONMENT - In a method and system for identifying speech sound and non-speech sound in an environment, a speech signal and other non-speech signals are identified from a mixed sound source having a plurality of channels. The method includes the following steps: (a) using a blind source separation (BSS) unit to separate the mixed sound source into a plurality of sound signals; (b) storing spectrum of each of the sound signals; (c) calculating spectrum fluctuation of each of the sound signals in accordance with stored past spectrum information and current spectrum information sent from the blind source separation unit; and (d) identifying one of the sound signals that has a largest spectrum fluctuation as the speech signal.03-12-2009
20130138437SPEECH RECOGNITION APPARATUS BASED ON CEPSTRUM FEATURE VECTOR AND METHOD THEREOF - A speech recognition apparatus, includes a reliability estimating unit configured to estimate reliability of a time-frequency segment from an input voice signal; and a reliability reflecting unit configured to reflect the reliability of the time-frequency segment to a normalized cepstrum feature vector extracted from the input speech signal and a cepstrum average vector included for each state of an HMM in decoding. Further, the speech recognition apparatus includes a cepstrum transforming unit configured to transform the cepstrum feature vector and the average vector through a discrete cosine transformation matrix and calculate a transformed cepstrum vector. Furthermore, the speech recognition apparatus includes an output probability calculating unit configured to calculate an output probability value of time-frequency segments of the input speech signal by applying the transformed cepstrum vector to the cepstrum feature vector and the average vector.05-30-2013
20130144618METHODS AND ELECTRONIC DEVICES FOR SPEECH RECOGNITION - A disclosed embodiment provides a speech recognition method to be performed by an electronic device. The method includes: collecting user-specific information that is specific to a user through the user's usage of the electronic device; recording an utterance made by the user; letting a remote server generate a remote speech recognition result for the recorded utterance; generating rescoring information for the recorded utterance based on the collected user-specific information; and letting the remote speech recognition result rescored based on the rescoring information.06-06-2013
20090187402Performance Prediction For An Interactive Speech Recognition System - The present invention provides an interactive speech recognition system and a corresponding method for determining a performance level of a speech recognition procedure on the basis of recorded background noise. The inventive system effectively exploits speech pauses that occur before the user enters speech that becomes subject to speech recognition. Preferably, the inventive performance prediction makes effective use of trained noise classification models. Moreover, predicted performance levels are indicated to the user in order to give a reliable feedback of the performance of the speech recognition procedure. In this way the interactive speech recognition system may react to noise conditions that are inappropriate for generating reliable speech recognition.07-23-2009
20110004472Speech Recognition Using Channel Verification - A method for automatic speech recognition includes determining for an input signal a plurality scores representative of certainties that the input signal is associated with corresponding states of a speech recognition model, using the speech recognition model and the determined scores to compute an average signal, computing a difference value representative of a difference between the input signal and the average signal, and processing the input signal in accordance with the difference value.01-06-2011
20110010171Singular Value Decomposition for Improved Voice Recognition in Presence of Multi-Talker Background Noise - A system and method for providing speech recognition functionality offers improved accuracy and robustness in noisy environments having multiple speakers. The described technique includes receiving speech energy and converting the received speech energy to a digitized form. The digitized speech energy is decomposed into features that are then projected into a feature space having multiple speaker subspaces. The projected features fall either into one of the multiple speaker subspaces or outside of all speaker subspaces. A speech recognition operation is performed on a selected one of the multiple speaker subspaces to resolve the utterance to a command or data.01-13-2011
20110010172NOISE REDUCTION SYSTEM USING A SENSOR BASED SPEECH DETECTOR - Speech detection is a technique to determine and classify periods of speech. In a normal conversation, each speaker speaks less than half the time. The remaining time is devoted to listening to the other end and pauses between speech and silence. The classification is usually done by comparing the signal energy to a threshold. Classifying speech as noise and noise as speech may affect the performance of the communication device. The current invention overcomes such problems by utilizing an alternate sensor signal indicating the presence or absence of speech. In the current invention, the communication device receives an audio signal via single or multiple microphones. The speech sensor may generate a unique signal based on the facial, bone, lips and/or throat movements. The system then combines the information received by the microphones and the speech sensor to decide the presence or absence of speech. This decision can be used in the coding, compression, noise reduction and other aspects of signal processing.01-13-2011
20110029310PROCEDURE FOR PROCESSING NOISY SPEECH SIGNALS, AND APPARATUS AND COMPUTER PROGRAM THEREFOR - Provided are a noise state determination method and an apparatus and a computer readable recording medium therefor. A noisy speech signal processing method according to the present invention includes calculating a transformed spectrum by transforming an input noisy speech signal to a frequency domain; calculating a smoothed magnitude spectrum by reducing magnitude differences of the transformed spectrum between neighboring frames; calculating a search spectrum which represents an estimated noise component of the smoothed magnitude spectrum; and calculating an identification ratio which represents a ratio of a noise component included in the input noisy speech signal, by using the smoothed magnitude spectrum and the search spectrum. Since a small amount of calculation is required and a large-capacity memory is not required, the present invention may be easily implemented as hardware or software. Also, since an adaptive operation is performed with respect to each frequency sub-band, the accuracy of determining a noise state may be improved.02-03-2011
20110029308Speech & Music Discriminator for Multi-Media Application - The present invention relates to means and methods of classifying speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.02-03-2011
20110029309SIGNAL SEPARATING APPARATUS AND SIGNAL SEPARATING METHOD - Provided are a signal separating apparatus and a signal separating method capable of solving the permutation problem and separating user speech to be extracted. The signal separating apparatus separates a specific speech signal and a noise signal from a received sound signal. First, a joint probability density distribution estimation unit of a permutation solving unit calculates joint probability density distributions of the respective separated signals. Then, a classifying determination unit of the permutation solving unit determines classifying based on shapes of the calculated joint probability density distributions.02-03-2011
20110035216SPEECH RECOGNITION METHOD FOR ALL LANGUAGES WITHOUT USING SAMPLES - The invention can recognize any several languages at the same time without using samples. The important skill is that features of known words in any language are extracted from unknown words or continuous voices. These unknown words represented by matrices are spread in the 144-dimensional space. The feature of a known word of any language represented by a matrix is simulated by the surrounding unknown words.02-10-2011
20110246193SIGNAL SEPARATION METHOD, AND COMMUNICATION SYSTEM SPEECH RECOGNITION SYSTEM USING THE SIGNAL SEPARATION METHOD - A method for signal separation, communication system, and voice recognition system using the method are disclosed. The method which is performed by an apparatus for signal separation includes receiving a mixed signal, wherein a first signal based on a first sound source signal and a second signal based on a second sound source signal are mixed, via a single voice input sensor, applying the modified BSS algorithm for separating the first sound source signal and the second sound source signal based on the received mixed signal, and separating the first sound source signal according to the result of applying the modified BSS algorithm.10-06-2011
20100131268VOICE-ESTIMATION INTERFACE AND COMMUNICATION SYSTEM - An apparatus having a voice-estimation (VE) interface that probes the vocal tract of a user with sub-threshold acoustic waves to estimate the user's voice while the user speaks silently or audibly in a noisy or socially sensitive environment. In one embodiment, the VE interface is integrated into a cell phone that directs an estimated-voice signal over a network to a remote party to enable (i) the user to have a conversation with the remote party without disturbing other people, e.g., at a meeting, conference, movie, or performance, and (ii) the remote party to more-clearly hear the user whose voice would otherwise be overwhelmed by a relatively loud ambient noise due to the user being, e.g., in a nightclub, disco, or flying aircraft.05-27-2010
20090216529ELECTRONIC DEVICES AND METHODS THAT ADAPT FILTERING OF A MICROPHONE SIGNAL RESPONSIVE TO RECOGNITION OF A TARGETED SPEAKER'S VOICE - Electronic devices and methods are disclosed that adaptively filter a microphone signal responsive to recognition of a targeted speaker's voice. An electronic device can include a microphone, a speaker characterization circuit, an adaptive sound filter circuit, and a speaker recognition circuit. The speaker characterization circuit operates in a training mode to learn characteristics of the targeted speaker's voice component in the microphone signal, and to store the learned characteristics. The adaptive sound filter circuit adaptively filters the microphone signal responsive to a control signal. The speaker recognition circuit uses the learned characteristics to recognize the presence of the targeted speaker's voice in the microphone signal and to regulate the control signal to cause the adaptive sound filter circuit to adapt the filtering to increase the targeted speaker's voice component of the microphone signal relative to other components.08-27-2009
20090216530Interference detector - A system improves speech detection or processing by identifying registration signals. The system encodes a limited frequency band by varying the amplitude of a pulse width modulated signal between predefined values. The signal is separated into frequency bins that identify amplitude and phase. The registration signal is measured by comparing a difference in average acoustic power in a plurality of adjacent bins over time.08-27-2009
20100070274APPARATUS AND METHOD FOR SPEECH RECOGNITION BASED ON SOUND SOURCE SEPARATION AND SOUND SOURCE IDENTIFICATION - An apparatus for a speech recognition based on source separation and identification includes: a sound source separator for separating mixed signals, which are input to two or more microphones, into sound source signals by using independent component analysis (ICA), and estimating direction information of the separated sound source signals; and a speech recognizer for calculating normalized log likelihood probabilities of the separated sound source signals. The apparatus further includes a speech signal identifier identifying a sound source corresponding to a user's speech signal by using both of the estimated direction information and the reliability information based on the normalized log likelihood probabilities.03-18-2010
20100057454SYSTEM AND METHOD FOR ECHO CANCELLATION - An echo canceller for improved recognition and removal of an echo from a communication device. The echo canceller can dynamically reduce echo using an improved energy estimator and an improved adaptive filter. The improved energy estimator can determine if conversation is in a single talk period or a double talk period based on the combined energy of both the near end background noise and speech. The improved adaptive filter can reduce echo by dynamically changing adaptation speed or step size. In double talk, the adaptive filter(s) can dynamically slow-down or stop adaptation. In single talk, the filter can dynamically increase the speed of adaptation to improve accuracy, or decrease adaptation speed for stability.03-04-2010
20110099010MULTI-CHANNEL NOISE SUPPRESSION SYSTEM - Techniques are described herein that provide multi-channel noise suppression based on a Teager energy ratio. A Teager energy ratio is a ratio of an average Teager energy operator (TEO) energy of a first signal to an average TEO energy of a second signal. The average TEO energy of a signal is defined by the equation:04-28-2011
20110071825DEVICE, METHOD AND PROGRAM FOR VOICE DETECTION AND RECORDING MEDIUM - To this end, a voice detection device includes a band-based power calculation unit that calculates a total of signal power values (sub-band power) of signals entered from the microphones from one preset frequency width (sub-band) to another. The voice detection device also includes a band-based noise estimation unit that estimates the sub-band based noise power, and a sub-band based SNR calculation unit. The sub-band based SNR calculation unit calculates a sub-band SNR from one sub-band to another to output the largest one of the sub-band SNRs as an SNR for a microphone of interest. The voice detection device further includes a voice/non-voice decision unit that determines the voice/non-voice using the SNR for the microphone of interest.03-24-2011
20120303367Robust Noise Estimation - An enhancement system improves the estimate of noise from a received signal. The system includes a spectrum monitor that divides a portion of the signal at more than one frequency resolution. Adaptation logic derives a noise adaptation factor of the received signal. A plurality of devices tracks the characteristics of an estimated noise in the received signal and modifies multiple noise adaptation rates. Weighting logic applies the modified noise adaptation rates derived from the signal divided at a first frequency resolution to the signal divided at a second frequency resolution.11-29-2012
20120303366SYSTEM FOR DETECTING SPEECH WITH BACKGROUND VOICE ESTIMATES AND NOISE ESTIMATES - A system detects a speech segment that may include unvoiced, fully voiced, or mixed voice content. The system includes a window function that passes signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range. A frequency converter converts the signals passing within the programmed aural frequency range into a plurality of frequency bins. A background voice detector estimates the strength of a background speech segment relative to the noise of selected portions of the aural spectrum. A noise estimator estimates a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins. A voice detector compares the strength of a desired speech segment to a maximum of an output of the background voice detector and an output of the noise estimator.11-29-2012
20130191124VOICE PROCESSING APPARATUS, METHOD AND PROGRAM - Provided is a voice processing apparatus including a feature quantity calculation section extracting a feature quantity from a target frame of an input voice signal, a sound pressure estimation candidate point updating section making each frame of the input voice signal a sound pressure estimation candidate point, retaining the feature quantity of each sound pressure estimation candidate point, and updating the sound pressure estimation candidate point based on the feature quantity of the sound pressure estimation candidate point and the feature quantity of the target frame, a sound pressure estimation section calculating an estimated sound pressure of the input voice signal, based on the feature quantity of the sound pressure estimation candidate point, a gain calculation section calculating a gain applied to the input voice signal based on the estimated sound pressure, and a gain application section performing a gain adjustment of the input voice signal based on the gain.07-25-2013
20100004928VOICE/MUSIC DETERMINING APPARATUS AND METHOD - A voice/music determining apparatus is configured to calculate first feature parameters for discriminating between a voice signal and a musical signal; and calculate second feature parameters for discriminating between a musical signal and a background-sound-superimposed voice signal. A first score is calculated to indicate likelihood that the input audio signal is a voice signal or a musical signal as a sum of weight-multiplied first feature parameters. A second score is calculated to indicate likelihood that the input audio signal is a musical signal or a background-sound-superimposed voice signal as a sum of weight-multiplied second feature parameters. It is determined whether the input audio signal is a voice signal or a musical signal on the basis of the first score. Further, it is determined whether the musical signal is the input audio signal is a background-sound-superimposed voice signal on the basis of the second score.01-07-2010
20120203550INTERIOR REARVIEW MIRROR SYSTEM FOR VEHICLE - An interior rearview mirror system suitable for use in a vehicle includes an interior rearview mirror assembly having a mirror head and a reflective element. The mirror head includes a first microphone operable to generate a first analog signal and a second microphone operable to generate a second analog signal. The first analog signal is converted to a first digital signal by at least one analog to digital converter and the second analog signal is converted to a second digital signal by the at least one analog to digital converter. A digital sound processor is operable to process the first and second digital signals. Responsive to the processing of the first and second digital signals, the digital sound processor generates a digital output, and the digital output, at least in part, distinguishes a human voice present in the vehicle from noise present in the vehicle.08-09-2012
20120203549NOISE REJECTION APPARATUS, NOISE REJECTION METHOD AND NOISE REJECTION PROGRAM - A speech-segment determination process is performed to determine whether audio data is a speech segment. A result of the speech-segment determination process is memorized. A noise rejection process is performed to reject a noise component of the audio data while performing an adaptive process to change filter coefficients for adaptive filtration if a result of the determination process indicates that the audio data is not the speech segment. The noise component is rejected with no adaptive process if the result of the determination process indicates that the audio data is the speech segment. The determination process is performed again to the audio data having the noise component rejected and the rejection process is performed again to the audio data if a result of the determination process performed again is different from the memorized result of the determination process.08-09-2012
20090240496SPEECH RECOGNIZER AND SPEECH RECOGNIZING METHOD - According to one aspect of the invention, a speech recognizer includes: an audio data acquiring portion configured to acquire audio data via a microphone; a speech section detecting portion configured to detect a talking start time and a talking end time based on the audio data; a spoken word identifying portion configured to identify the audio in a speech section from the talking start time to the talking end time; and a noise suppressing portion configured to suppress a generation of a noise from an electrical noise source for the speech section.09-24-2009
20110213611METHOD AND DEVICE FOR CONTROLLING THE TRANSPORT OF AN OBJECT TO A PREDETERMINED DESTINATION - A method and a device control the transport of an object to a predetermined destination. The object is provided with information on a destination to which the object is to be transported. The destination information with which the object is provided is inputted into a speech detection station. A speech recognition system evaluates the destination information detected by the speech detection station. A conveying device transports the object. The destination, the information of which is provided to the object, is determined. The evaluation result of the speech recognition system is used to determine the destination. A release signal is produced. The release signal triggers two processes: the speech detection station is released for the input of destination information on another object. The conveying device transports the object. The transport of the object to the determined destination is triggered.09-01-2011
20110257971Camera-Assisted Noise Cancellation and Speech Recognition - Methods, system, and articles are described herein for receiving an audio input and a facial image sequence for a period of time, in which the audio input includes speech input from multiple speakers. The audio input is extracted based on the received facial image sequence to extract a speech input of a particular speaker.10-20-2011
20110054892SYSTEM FOR DETECTING SPEECH INTERVAL AND RECOGNIZING CONTINUOUS SPEECH IN A NOISY ENVIRONMENT THROUGH REAL-TIME RECOGNITION OF CALL COMMANDS - The present invention relates to a continuous speech recognition system that is very robust in a noisy environment. In order to recognize continuous speech smoothly in a noisy environment, the system selects call commands, configures a minimum recognition network in token, which consists of the call commands and mute intervals including noises, recognizes the inputted speech continuously in real time, analyzes the reliability of speech recognition continuously and recognizes the continuous speech from a speaker. When a speaker delivers a call command, the system for detecting the speech interval and recognizing continuous speech in a noisy environment through the real-time recognition of call commands measures the reliability of the speech after recognizing the call command, and recognizes the speech from the speaker by transferring the speech interval following the call command to a continuous speech-recognition engine at the moment when the system recognizes the call command.03-03-2011
20110054891METHOD OF FILTERING NON-STEADY LATERAL NOISE FOR A MULTI-MICROPHONE AUDIO DEVICE, IN PARTICULAR A "HANDS-FREE" TELEPHONE DEVICE FOR A MOTOR VEHICLE - The method comprises the following steps in the frequency domain: 03-03-2011
20100292987CIRCUIT STARTUP METHOD AND CIRCUIT STARTUP APPARATUS UTILIZING UTTERANCE ESTIMATION FOR USE IN SPEECH PROCESSING SYSTEM PROVIDED WITH SOUND COLLECTING DEVICE - A circuit startup method utilizing utterance estimation in a speech processing system including a sound collecting device is provided. The circuit startup method includes a subset power supply step of supplying power to the sound collecting device and a signal processing circuit, and a sound collecting step of inputting a sound from the sound collecting device through the signal processing circuit. The circuit startup method further includes an utterance estimation step of estimating whether or not a speech is contained in the inputted sound, and a power supply step of supplying power to the speech processing circuit for an utterance interval when it is estimated that a speech is contained from an estimation result of the utterance estimation step.11-18-2010
20110125497Method and System for Voice Activity Detection - A method of voice activity detection is provided that includes measuring a first signal level in a first sample of a first audio signal from a first audio capture device and a second signal level in a second sample of a second audio signal from a second audio capture device, and detecting voice activity based on the first signal level, the second signal level, and an activity threshold.05-26-2011
20120310640MIC COVERING DETECTION IN PERSONAL AUDIO DEVICES - A personal audio device, such as a wireless telephone, includes noise canceling circuit that adaptively generates an anti-noise signal from a reference microphone signal and injects the anti-noise signal into the speaker or other transducer output to cause cancellation of ambient audio sounds. An error microphone may also be provided proximate the speaker to estimate an electro-acoustical path from the noise canceling circuit through the transducer. A processing circuit uses the reference and/or error microphone, optionally along with a microphone provided for capturing near-end speech, to determine whether one of the reference or error microphones is obstructed by comparing their received signal content and takes action to avoid generation of erroneous anti-noise.12-06-2012
20100121636Multisensory Speech Detection - A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.05-13-2010
20120310641Method And Apparatus For Voice Activity Determination - In accordance with an example embodiment of the invention, there is provided an apparatus for detecting voice activity in an audio signal. The apparatus comprises a first voice activity detector for making a first voice activity detection decision based at least in part on the voice activity of a first audio signal received from a first microphone. The apparatus also comprises a second voice activity detector for making a second voice activity detection decision based at least in part on an estimate of a direction of the first audio signal and an estimate of a direction of a second audio signal received from a second microphone. The apparatus further comprises a classifier for making a third voice activity detection decision based at least in part on the first and second voice activity detection decisions.12-06-2012
20110178800Distortion Measurement for Noise Suppression System - The present technology measures distortion introduced by a noise suppression system. The distortion may be measured as the difference between a noise-reduced speech signal and an estimated idealized noise reduced reference (EINRR). The EINRR may be determined from a speech component and noise component that are pre-processed, and the EINRR may be used with masks associated with energies lost and added in the speech component and noise component. The EINRR may be calculated on a time varying basis.07-21-2011
20100082340SPEECH RECOGNITION SYSTEM AND METHOD FOR GENERATING A MASK OF THE SYSTEM - The speech recognition system of the present invention includes: a sound source separating section which separates mixed speeches from multiple sound sources; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each separated speech according to reliability of separation in separating operation of the sound source separating section; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.04-01-2010
20090299742SYSTEMS, METHODS, APPARATUS, AND COMPUTER PROGRAM PRODUCTS FOR SPECTRAL CONTRAST ENHANCEMENT - Systems, methods, and apparatus for spectral contrast enhancement of speech signals, based on information from a noise reference that is derived by a spatially selective processing filter from a multichannel sensed audio signal, are disclosed.12-03-2009
20090299741Detection and Use of Acoustic Signal Quality Indicators - A computer-driven device assists a user in self-regulating speech control of the device. The device processes an input signal representing human speech to compute acoustic signal quality indicators indicating conditions likely to be problematic to speech recognition, and advises the user of those conditions.12-03-2009
20090254342DETECTING BARGE-IN IN A SPEECH DIALOGUE SYSTEM - A method for detecting barge-in in a speech dialogue system comprising determining whether a speech prompt is output by the speech dialogue system, and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector and/or based on speaker information, where the sensitivity threshold is increased if output of a speech prompt is determined and decreased if no output of a speech prompt is determined. If speech activity is detected in the input signal, the speech prompt may be interrupted or faded out. A speech dialogue system configured to detect barge-in is also disclosed.10-08-2009
20110144987USING PITCH DURING SPEECH RECOGNITION POST-PROCESSING TO IMPROVE RECOGNITION ACCURACY - A method of automated speech recognition in a vehicle. The method includes receiving audio in the vehicle, pre-processing the received audio to generate acoustic feature vectors, decoding the generated acoustic feature vectors to produce at least one speech hypothesis, and post-processing the at least one speech hypothesis using pitch to improve speech recognition accuracy. The speech hypothesis can be accepted as recognized speech during post-processing if pitch is present in the received audio. Alternatively, a pitch count for the received audio can be determined, N-best speech hypotheses can be post-processed by comparing the pitch count to syllable counts associated with the speech hypotheses, and the speech hypothesis having a syllable count equal to the pitch count can be accepted as recognized speech.06-16-2011
20110307253Speech and Noise Models for Speech Recognition - An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.12-15-2011
20090043577SIGNAL PRESENCE DETECTION USING BI-DIRECTIONAL COMMUNICATION DATA - A system and method for using bi-directional conversation data to improve signal presence detection are disclosed. The detector module is adapted to communicate with a signal enhancement module. The detector module collects data from a transmit direction of the connection and a receive direction of a data connection. The collected data from the transmit and the receive direction is used to classify at least one of data in the transmit direction and data in the receive direction. Responsive to the classification, the signal enhancement module enhances data in one of the transmit direction and the receive direction. Hence, data classification accuracy is improved by using data from both the transmit and receive directions. In one embodiment, the detector module applies a voice activity detection module (VAD) process to detect the presence or absence of voice data in the collected data.02-12-2009
20120209603ACOUSTIC VOICE ACTIVITY DETECTION - Techniques for acoustic voice activity detection (AVAD) is described, including detecting a signal associated with a subband from a microphone, performing an operation on data associated with the signal, the operation generating a value associated with the subband, and determining whether the value distinguishes the signal from noise by using the value to determine a signal-to-noise ratio and comparing the value to a threshold.08-16-2012
20120209604Method And Background Estimator For Voice Activity Detection - The present invention relates to a method and a background estimator in voice activity detector for updating a background noise estimate for an input signal. The input signal for a current frame is received and it is determined whether the current frame of the input signal comprises non-noise. Further, an additional determination is performed whether the current frame of the non-noise input comprises noise by analyzing characteristics at least related to correlation and energy level of the input signal, and background noise estimate is updated if it is determined that the current frame comprises noise.08-16-2012
20120004909SPEECH AUDIO PROCESSING - A speech processing engine is provided that in some embodiments, employs Kalman filtering with a particular speaker's glottal information to clean up an audio speech signal for more efficient automatic speech recognition.01-05-2012
20120046944ENVIRONMENT RECOGNITION OF AUDIO INPUT - The present disclosure introduces a new technique for environmental recognition of audio input using feature selection. In one embodiment, audio data may be identified using feature selection. A plurality of audio descriptors may be ranked by calculating a Fisher's discriminant ratio for each audio descriptor. Next, a configurable number of highest ranking audio descriptors based on the Fisher's discriminant ratio of each audio descriptor are selected to obtain a selected feature set. The selected feature set is then applied to audio data. Other embodiments are also described.02-23-2012
20110166856NOISE PROFILE DETERMINATION FOR VOICE-RELATED FEATURE - Systems, methods, and devices for noise profile determination for a voice-related feature of an electronic device are provided. In one example, an electronic device capable of such noise profile determination may include a microphone and data processing circuitry. When a voice-related feature of the electronic device is not in use, the microphone may obtain ambient sounds. The data processing circuitry may determine a noise profile based at least in part on the obtained ambient sounds. The noise profile may enable the data processing circuitry to at least partially filter other ambient sounds obtained when the voice-related feature of the electronic device is in use.07-07-2011
20120022864METHOD AND DEVICE FOR CLASSIFYING BACKGROUND NOISE CONTAINED IN AN AUDIO SIGNAL - Embodiments of methods and devices for classifying background noise contained in an audio signal are disclosed. In one embodiment, the device includes a module for extracting from the audio signal a background noise signal, termed the noise signal. Also included is a second that calculates a first parameter, termed the temporal indicator. The temporal indicator relates to the temporal evolution of the noise signal. The second module also calculates a second parameter, termed the frequency indicator. The frequency indicator relates to the frequency spectrum of the noise signal. Finally, the device includes a third module that classifies the background noise by selecting, as a function of the calculated values of the temporal indicator and of the frequency indicator, a class of background noise from among a predefined set of classes of background noise.01-26-2012
20120022863METHOD AND APPARATUS FOR VOICE ACTIVITY DETECTION - A method and apparatus for detecting voice activity are disclosed. The method of detecting voice activity is performed in a Continuous Listening environment which includes: extracting a feature parameter from a frame signal; determining whether the frame signal is a voice signal or a noise signal by comparing the feature parameter with model parameters of a plurality of comparison signals, respectively; and outputting the frame signal when the frame signal is determined to be a voice signal. The apparatus includes a classifier module which extracts a feature parameter from a frame signal, and generating labeling information with respect to the frame signal by comparing the feature parameter with model parameters of a plurality of comparison signals; and01-26-2012
20120158404APPARATUS AND METHOD FOR ISOLATING MULTI-CHANNEL SOUND SOURCE - In an apparatus and method for isolating a multi-channel sound source, the probability of speaker presence calculated when noise of a sound source signal separated by GSS is estimated is used to calculate a gain. Thus, it is not necessary to additionally calculate the probability of speaker presence when calculating the gain, the speaker's voice signal can be easily and quickly separated from peripheral noise and reverb and distortion are minimized. As such, if several interference sound sources, each of which has directivity, and speakers are simultaneously present in a room with high reverb, a plurality of sound sources generated from several microphones can be separated from one another with low sound quality distortion, and the reverb can also be removed.06-21-2012
20110106533Multi-Microphone Voice Activity Detector - A dual microphone voice activity detector system is presented. A voice activity detector system estimates the signal level and noise level at each microphone. A level differential between the two microphones of nearby sounds such as the signal is greater than the level differential of more distant sounds such as the noise. Thus, the voice activity detector detects the presence of nearby sounds.05-05-2011
20100094625METHODS AND APPARATUS FOR NOISE ESTIMATION - A system and method are disclosed for noise level/spectrum estimation and speech activity detection. Some embodiments include a probabilistic model to estimate noise level and subsequently detect the presence of speech. These embodiments outperform standard voice activity detectors (VADs), producing improved detection in a variety of noisy environments.04-15-2010
20120316872ADAPTIVE ACTIVE NOISE CANCELING FOR HANDSET - Embodiments of the present invention provide an adaptive noise canceling system. The adaptive noise canceling system may be used in a handset to cancel background noise by generating an anti-noise signal. The adaptive noise canceling system may include first input to receive a first signal from a feedforward microphone; a second input to receive a second signal from an error microphone; a controller coupled to the inputs, the controller configured to adaptively generate an anti-noise signal according to the received signals, wherein the controller derives a profile of the anti-noise signal from the first signal and derives a magnitude of the anti-noise signal from both first and second signal; and an output to transmit the anti-noise signal to a speaker.12-13-2012
20120123777ADJUSTING A SPEECH ENGINE FOR A MOBILE COMPUTING DEVICE BASED ON BACKGROUND NOISE - Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.05-17-2012
20120123776ADJUSTING A SPEECH ENGINE FOR A MOBILE COMPUTING DEVICE BASED ON BACKGROUND NOISE - Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.05-17-2012
20120084084Noise cancellation device for communications in high noise environments - This invention presents a noise cancellation device for improved personal face-to-face and radio communications in high noise environments. The device comprises speech acquisition components, an audio signal processing module, a loudspeaker, and a radio interface. With the noise cancellation device, the signal-to-noise ratio can be improved by as much as 30 dB.04-05-2012
20120166190APPARATUS FOR REMOVING NOISE FOR SOUND/VOICE RECOGNITION AND METHOD THEREOF - The present invention has been made in an effort to provide an apparatus for removing noise for sound/voice recognition removing a TV sound corresponding to a noise signal by using an adaptive filter capable of adapting a filter coefficient in order to remove an analogue signal and performing sound and/or voice recognition and a method thereof.06-28-2012
20100204988SPEECH RECOGNITION METHOD - A speech recognition method includes receiving a speech input signal in a first noise environment which includes a sequence of observations, determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, adapting the model trained in a second noise environment to that of the first environment, wherein adapting the model trained in the second environment to that of the first environment includes using second order or higher order Taylor expansion coefficients derived for a group of probability distributions and the same expansion coefficient is used for the whole group.08-12-2010
20100204987In-vehicle speech recognition device - A speech recognition device is disclosed. The device obtains sound of speech of a user and an image of a lip shape of the user. The device determines whether a sudden noise is generated during user speaking. When it is determined that a sudden noise is not generated, the device recognizes content of the speech based on the sound of the speech. When it is determined that a sudden noise is generated, the device recognize the content of the speech based on the image of the lip shape of the user.08-12-2010
20110184734METHOD AND APPARATUS FOR VOICE ACTIVITY DETECTION, AND ENCODER - A method and an apparatus for Voice Activity Detection (VAD) and an encoder are provided. The method for VAD includes: acquiring a fluctuant feature value of a background noise when an input signal is the background noise, in which the fluctuant feature value is used to represent fluctuation of the background noise; performing adaptive adjustment on a VAD decision criterion related parameter according to the fluctuant feature value; and performing VAD decision on the input signal by using the decision criterion related parameter on which the adaptive adjustment is performed. The method, the apparatus, and the encoder can be adaptive to fluctuation of the background noise to perform VAD decision, so as to enhance the VAD decision performance, save limited channel bandwidth resources, and use the channel bandwidth efficiently.07-28-2011
20120173234VOICE ACTIVITY DETECTION APPARATUS, VOICE ACTIVITY DETECTION METHOD, PROGRAM THEREOF, AND RECORDING MEDIUM - The processing efficiency and estimation accuracy of a voice activity detection apparatus are improved. An acoustic signal analyzer receives a digital acoustic signal containing a speech signal and a noise signal, generates a non-speech GMM and a speech GMM adapted to a noise environment, by using a silence GMM and a clean-speech GMM in each frame of the digital acoustic signal, and calculates the output probabilities of dominant Gaussian distributions of the GMMs. A speech state probability to non-speech state probability ratio calculator calculates a speech state probability to non-speech state probability ratio based on a state transition model of a speech state and a non-speech state, by using the output probabilities; and a voice activity detection unit judges, from the speech state probability to non-speech state probability ratio, whether the acoustic signal in the frame is in the speech state or in the non-speech state and outputs only the acoustic signal in the speech state.07-05-2012
20100299145ACOUSTIC DATA PROCESSOR AND ACOUSTIC DATA PROCESSING METHOD - An acoustic data processor according to the present invention is used for processing acoustic data including signal sounds to reduce noises generated by a mechanical apparatus. The acoustic data processor includes a motion status obtaining section for obtaining motion status of the mechanical apparatus, an acoustic data obtaining section for obtaining acoustic data corresponding to the obtained motion status, and a database for storing various motion statuses of the mechanical apparatus in a unit time and corresponding acoustic data as templates. The acoustic data processor further includes a database searching section for searching the database to retrieve the template having the motion status closest to the obtained motion status; and a template subtraction section for subtracting the acoustic data of the template having the motion status closest to the obtained motion status from the obtained acoustic data to reduce noises generated by the mechanical apparatus.11-25-2010
20100299144METHOD AND APPARATUS FOR THE USE OF CROSS MODAL ASSOCIATION TO ISOLATE INDIVIDUAL MEDIA SOURCES - Apparatus for isolation of a media stream of a first modality from a complex media source having at least two media modality, and multiple objects, and events, comprises: recording devices for the different modalities; an associator for associating between events recorded in said first modality and events recorded in said second modality, and providing an association output; and an isolator that uses the association output for isolating those events in the first mode correlating with events in the second mode associated with a predetermined object, thereby to isolate a isolated media stream associated with said predetermined object. Thus it is possible to identify events such as hand or mouth movements, and associate these with sounds, and then produce a filtered track of only those sounds associated with the events. In this way a particular speaker or musical instrument can be isolated from a complex scene.11-25-2010
20100049514DYNAMIC SPEECH SHARPENING - An enhanced system for speech interpretation is provided. The system may include receiving a user verbalization and generating one or more preliminary interpretations of the verbalization by identifying one or more phonemes in the verbalization. An acoustic grammar may be used to map the phonemes to syllables or words, and the acoustic grammar may include one or more linking elements to reduce a search space associated with the grammar. The preliminary interpretations may be subject to various post-processing techniques to sharpen accuracy of the preliminary interpretation. A heuristic model may assign weights to various parameters based on a context, a user profile, or other domain knowledge. A probable interpretation may be identified based on a confidence score for each of a set of candidate interpretations generated by the heuristic model. The model may be augmented or updated based on various information associated with the interpretation of the verbalization.02-25-2010
20100017206Sound source separation method and system using beamforming technique - A system and method for sound source separation. The system and method use a beamforming technique. The sound source separation system includes a windowing processor; a DFT transformer; a transfer function estimator; and a noise estimator. The system also includes a voice signal extractor that cancels individual voice signals, except an individual voice signal that is desired to be extracted among individual voice signals, from the integrated voice signals. The system further includes a voice signal detector that cancels a noise part provided through the noise estimator from a transfer function of an individual voice signal which is desired to be detected and extracts a noise-canceled individual voice signal. Even when two or more sound sources are simultaneously input, the sound sources can be separated from each other and separately stored and managed, or an initial sound source can be stored and managed.01-21-2010
20120232895APPARATUS AND METHOD FOR DISCRIMINATING SPEECH, AND COMPUTER READABLE MEDIUM - According to one embodiment, an apparatus for discriminating speech/non-speech of a first acoustic signal includes a weight assignment unit, a feature extraction unit, and a speech/non-speech discrimination unit. The weight assignment unit is configured to assign a weight to each frequency band, based on a frequency spectrum of the first acoustic signal including a user's speech and a frequency spectrum of a second acoustic signal including a disturbance sound. The feature extraction unit is configured to extract a feature from the frequency spectrum of the first acoustic signal, based on the weight of each frequency band. The speech/non-speech discrimination unit is configured to discriminate speech/non-speech of the first acoustic signal, based on the feature.09-13-2012
20120232896METHOD AND AN APPARATUS FOR VOICE ACTIVITY DETECTION - A voice activity detection apparatus (09-13-2012
20120226498MOTION-BASED VOICE ACTIVITY DETECTION - Motion-based voice activity detection may be provided. A data stream may be received and a determination may be made whether at least one non-audio element associated with the data stream indicates that the data stream comprises speech. In response to determining that the at least one non-audio element associated with the data stream indicates that the data stream comprises speech, a speech to text conversion may be performed on at least one audio element associated with the data stream.09-06-2012
20120259631Speech and Noise Models for Speech Recognition - An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.10-11-2012
20120259630DISPLAY APPARATUS AND VOICE CONVERSION METHOD THEREOF - The voice conversion method of a display apparatus includes: in response to the receipt of a first video frame, detecting one or more entities from the first video frame; in response to the selection of one of the detected entities, storing the selected entity; in response to the selection of one of a plurality of previously-stored voice samples, storing the selected voice sample in connection with the selected entity; and in response to the receipt of a second video frame including the selected entity, changing a voice of the selected entity based on the selected voice sample and outputting the changed voice.10-11-2012
20120259628ACCELEROMETER VECTOR CONTROLLED NOISE CANCELLING METHOD - A telecommunication device is disclosed, comprising: a microphone array comprising a plurality of microphones, wherein each microphone receives an analogue acoustic signal; a position sensing device for determining how the telecommunication device is positioned in three-dimensions with respect to a user's mouth; at least one analogue/digital converter for converting each analogue acoustic signal into a digital signal; a digital signal processor for performing signal processing on the received digital signals comprising a controller, a plurality of delay circuits for delaying each received signal based on an input from the controller and a plurality of preamplifiers for adjusting the gain of each received signal based on a gain input from the controller, wherein the controller selects the appropriate delay and gain values applied to each received signal to remove noise from the received signals based on the determined position of the telecommunication device. A method for creating and controlling a location of a virtual microphone near a telecommunication device so as to reduce background noise in a speech signal is also disclosed.10-11-2012
20120265526APPARATUS AND METHOD FOR VOICE ACTIVITY DETECTION - An input signal is received. A plurality of electrical characteristics from the input signal is obtained. A plurality of acoustic features is determined from the obtained electrical characteristics and each of the acoustic features being different from the others. At least some of the acoustic features are compared to a plurality of predetermined criteria. Based upon the comparing of the acoustic features to the plurality of predetermined criteria, it is determined when the signal is a voice signal or a noise signal.10-18-2012
20090076814Apparatus and method for determining speech signal - Provided are a method and apparatus for discriminating a speech signal. The apparatus for discriminating a speech signal includes: an input signal quality improver for reducing additional noise from an acoustic signal received from outside; a first start/end-point detector for receiving the acoustic signal from the input signal quality improver and detecting an end-point of a speech signal included in the acoustic signal; a voiced-speech feature extractor for extracting voiced-speech features of the input signal included in the acoustic signal received from the first start/end-point detector; a voiced-speech/unvoiced-speech discrimination model for storing a voiced-speech model parameter corresponding to a discrimination reference of the voiced-speech feature parameter extracted from the voiced-speech feature extractor; and a voiced-speech/unvoiced-speech discriminator for discriminating a voiced-speech portion using the voiced-speech features extracted by the voiced-speech feature extractor and the voiced-speech discrimination model parameter of the voiced/unvoiced-speech discrimination model.03-19-2009
20110231187VOICE PROCESSING DEVICE, VOICE PROCESSING METHOD AND PROGRAM - A voice processing device includes a zone detection unit which detects a voice zone including a voice signal or a non-steady sound zone including a non-steady signal other than the voice signal from an input signal and a filter calculation unit that calculates a filter coefficient for holding the voice signal in the voice zone and for suppressing the non-steady signal in the non-steady sound zone according to the detection result by the zone detection unit, in which the filter calculation unit calculates the filter coefficient by using a filter coefficient calculated in the non-steady sound zone for the voice zone and using a filter coefficient calculated in the voice zone for the non-steady sound zone.09-22-2011
20110231186SPEECH DETECTION METHOD - A speech detection method is presented, which includes the following steps. A first voice captured device samples a first signal and a second voice captured device samples a second signal. The first voice captured device is closer to a speech signal source than the second voice captured device. A first energy corresponding to the first signal within an interval is calculated, a second energy corresponding to the second signal within the interval is calculated, and a first ratio is calculated according to the first energy and the second energy. The first ratio is transformed into a second ratio. A threshold value is set. It is determined whether the speech signal source is detected by comparing the second ratio and the threshold value.09-22-2011
20110238418Method and Device for Tracking Background Noise in Communication System - A method and a device for tracking background noise in a communication system, where the method includes: calculating a SNR of a current frame according to input audio signals; increasing a frame counter, and calculating tone features and signal steadiness features of the current frame if the SNR of the current frame is not smaller than a first threshold; judging the possibility of a time window including a noise interval according to the calculated tone feature values and signal steadiness feature values of each frame of the time window when the frame counter is increased to the length of the time window; and extracting noise features in the time window. Existence of background noise is analyzed continuously in a time window, so that background noise that changes frequently and dramatically can be detected or tracked rapidly.09-29-2011
20110238417SPEECH DETECTION APPARATUS - According to one embodiment, a speech detection apparatus includes a first acoustic signal analyzing unit configured to analyze a frequency spectrum of a first acoustic signal, and a feature extracting unit configured to remove a frequency spectrum of the first acoustic signal from a third acoustic signal, which is obtained by suppressing an echo component of the first acoustic signal contained in a second acoustic signal, so as to extract a feature of a frequency spectrum of the third acoustic signal.09-29-2011
20110238416Acoustic Model Adaptation Using Splines - Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.09-29-2011
20120330655VOICE RECOGNITION DEVICE - A voice recognition device includes a voice recognition dictionary in which a word which is recognized as a result of voice recognition on an inputted voice is registered, a reply voice data storage unit for storing recorded voice data about words registered in the voice recognition dictionary, a dialog control unit for, when a word registered in the voice recognition dictionary is recognized, acquiring recorded voice data corresponding to the word from the reply voice data storage unit, a reproduction noise reduction unit for carrying out a process of reducing noise included in the recorded voice data, an amplitude adjusting unit for adjusting an amplitude of the recorded voice data in which the noise has been reduced to a predetermined amplitude level, and a voice reproduction unit for reproducing a voice from the amplitude-adjusted recorded voice data.12-27-2012
20120330656VOICE ACTIVITY DETECTION - Discrimination between two classes comprises receiving a set of frames including an input signal and determining at least two different feature vectors for each of the frames. Discrimination between two classes further comprises classifying the two different feature vectors using sets of preclassifiers trained for at least two classes of events and from that classification, and determining values for at least one weighting factor. Discrimination between two classes still further comprises calculating a combined feature vector for each of the received frames by applying the weighting factor to the feature vectors and classifying the combined feature vector for each of the frames by using a set of classifiers trained for at least two classes of events.12-27-2012
20120095761SPEECH RECOGNITION SYSTEM AND SPEECH RECOGNIZING METHOD - A speech recognition system and a speech recognizing method for high-accuracy speech recognition in the environment with ego noise are provided. A speech recognition system according to the present invention includes a sound source separating and speech enhancing section; an ego noise predicting section; and a missing feature mask generating section for generating missing feature masks using outputs of the sound source separating and speech enhancing section and the ego noise predicting section; an acoustic feature extracting section for extracting an acoustic feature of each sound source using an output for said each sound source of the sound source separating and speech enhancing section; and a speech recognizing section for performing speech recognition using outputs of the acoustic feature extracting section and the missing feature masks.04-19-2012
20120330657SPEECH FEATURE EXTRACTION APPARATUS, SPEECH FEATURE EXTRACTION METHOD, AND SPEECH FEATURE EXTRACTION PROGRAM - A speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program. A speech feature extraction apparatus includes: first difference calculation module to: (i) receive, as an input, a spectrum of a speech signal segmented into frames for each frequency bin; and (ii) calculate a delta spectrum for each of the frame, where the delta spectrum is a difference of the spectrum within continuous frames for the frequency bin; and first normalization module to normalize the delta spectrum of the frame for the frequency bin by dividing the delta spectrum by a function of an average spectrum; where the average spectrum is an average of spectra through all frames that are overall speech for the frequency bin; and where an output of the first normalization module is defined as a first delta feature.12-27-2012
20120101819SYSTEM AND A METHOD FOR PROVIDING SOUND SIGNALS - A sound system, the sound system including: (i) a processor, configured to: (a) receive a requested sound signal and an ambient sound input signal; and (b) generate a modified requested signal by processing, in response to a desired level of ambient sound that is defined by a user, the requested sound signal and the ambient sound input signal, wherein an inclusion level of the ambient sound input signal in the modified requested signal is responsive to the desired level of ambient sound; and (ii) a signal provider configured to provide the modified requested signal to multiple speakers of a headset.04-26-2012
20120290297Speaker Liveness Detection - A signal representative of an unpredictable audio stimulus is provided to a putative live speaker within a putative live recording environment. A second signal purportedly emanating from the putative live speaker and/or the environment is received. This second signal is examined for influence of the unpredictable audio stimulus on the putative live speaker and/or the putative live recording environment. The examining includes at least one of audio feedback analysis, Lombard analysis, and evoked otoacoustic response analysis. Based on the examining, a determination is made as to whether the putative live speaker is an actual live speaker and/or whether the putative live recording environment is an actual live recording environment.11-15-2012
20100198593Speech Enhancement with Noise Level Estimation Adjustment - Enhancing speech components of an audio signal composed of speech and noise components includes controlling the gain of the audio signal in ones of its subbands, wherein the gain in a subband is reduced as the level of estimated noise components increases with respect to the level of speech components, wherein the level of estimated noise components is determined at least in part by (1) comparing an estimated noise components level with the level of the audio signal in the subband and increasing the estimated noise components level in the subband by a predetermined amount when the input signal level in the subband exceeds the estimated noise components level in the subband by a limit for more than a defined time, or (2) obtaining and monitoring the signal-to-noise ratio in the subband and increasing the estimated noise components level in the subband by a predetermined amount when the signal-to-noise ratio in the subband exceeds a limit for more than a defined time. 08-05-2010
20100198592Method for recognizing and interpreting patterns in noisy data sequences - This invention maps possibly noisy digital input from any of a number of different hardware or software sources such as keyboards, automatic speech recognition systems, cell phones, smart phones or the web onto an interpretation consisting of an action and one or more physical objects, such as robots, machinery, vehicles, etc. or digital objects such as data files, tables and databases. Tables and lists of (i) homonyms and misrecognitions, (ii) thematic relation patterns, and (iii) lexicons are used to generate alternative forms of the input which are scored to determine the best interpretation of the noisy input. The actions may be executed internally or output to any device which contains a digital component such as, but not limited to, a computer, a robot, a cell phone, a smart phone or the web. This invention may be implemented on sequential and parallel compute engines and systems.08-05-2010
20100169089Voice Recognizing Apparatus, Voice Recognizing Method, Voice Recognizing Program, Interference Reducing Apparatus, Interference Reducing Method, and Interference Reducing Program - A voice recognizing apparatus includes a microphone 07-01-2010
20100169090WEIGHTED SEQUENTIAL VARIANCE ADAPTATION WITH PRIOR KNOWLEDGE FOR NOISE ROBUST SPEECH RECOGNITION - A method for adapting acoustic models used for automatic speech recognition is provided. The method includes estimating noise in a portion of a speech signal, determining a first estimated variance scaling vector using an estimated 2-order polynomial and the noise estimation, wherein the estimated 2-order polynomial represents a priori knowledge of a dependency of a variance scaling vector on noise, determining a second estimated variance scaling vector using statistics from prior portions of the speech signal, determining a variance scaling factor using the first estimated variance scaling vector and the second estimated variance scaling vector, and using the variance scaling factor to adapt an acoustic model.07-01-2010
20100131269SYSTEMS, METHODS, APPARATUS, AND COMPUTER PROGRAM PRODUCTS FOR ENHANCED ACTIVE NOISE CANCELLATION - Uses of an enhanced sidetone signal in an active noise cancellation operation are disclosed.05-27-2010
20110161078PITCH MODEL FOR NOISE ESTIMATION - Pitch is tracked for individual samples, which are taken much more frequently than an analysis frame. Speech is identified based on the tracked pitch and the speech components of the signal are removed with a time-varying filter, leaving only an estimate of a time-varying speech signal. This estimate is then used to generate a time-varying noise model which, in turn, can be used to enhance speech related systems.06-30-2011
20080249771System and method of voice activity detection in noisy environments - An efficient voice activity detection method and system suitable for real-time operation in low SNR (signal-to-noise) environments corrupted by non-Gaussian non-stationary background noise. The method utilizes rank order statistics to generate a binary voice detection output based on deviations between a short-term energy magnitude signal and a short-term noise reference signal. The method does not require voice-free training periods to track the background noise nor is it susceptible to rapid changes in overall noise level making it very robust. In addition a long-term adaptation mechanism is applied to reject harmonic or tonal interference.10-09-2008
20080235013METHOD AND APPARATUS FOR ESTIMATING NOISE BY USING HARMONICS OF VOICE SIGNAL - Disclosed is a method and an apparatus for estimating noise included in a sound signal during sound signal processing. The method includes estimating harmonics components in a frame of an input sound signal; using the estimated harmonics components, computing a Voice Presence Probability (VPP) on the frame of the input sound signal; determining a weight of an equation necessary to estimate a noise spectrum, depending on the computed VPP; and using the determined weight and the equation necessary to estimate a noise spectrum, estimating the noise spectrum, and updating the noise spectrum.09-25-2008
20130179163IN-CAR COMMUNICATION SYSTEM FOR MULTIPLE ACOUSTIC ZONES - An In-Car Communication (ICC) system supports the communication paths within a car by receiving the speech signals of a speaking passenger and playing it back for one or more listening passengers. Signal processing tasks are split into a microphone related part and into a loudspeaker related part. A sound processing system suitable for use in a vehicle having multiple acoustic zones includes a plurality of microphone In-Car Communication (Mic-ICC) instances coupled and a plurality of loudspeaker In-Car Communication (Ls-ICC) instances. The system further includes a dynamic audio routing matrix with a controller and coupled to the Mic-ICC instances, a mixer coupled to the plurality of Mic-ICC instances and a distributor coupled to the Ls-ICC instances.07-11-2013
20130096915System and Method for Dynamic Noise Adaptation for Robust Automatic Speech Recognition - A speech processing method and arrangement are described. A dynamic noise adaptation (DNA) model characterizes a speech input reflecting effects of background noise. A null noise DNA model characterizes the speech input based on reflecting a null noise mismatch condition. A DNA interaction model performs Bayesian model selection and re-weighting of the DNA model and the null noise DNA model to realize a modified DNA model characterizing the speech input for automatic speech recognition and compensating for noise to a varying degree depending on relative probabilities of the DNA model and the null noise DNA model.04-18-2013
20130103398Method and Apparatus for Audio Signal Classification - An apparatus comprising at least one processor and at least one memory including computer program code the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform determining a signal identification value for an audio signal, determining at least one noise level value for the audio signal, comparing the signal identification value against a signal identification threshold and each of the at least one noise level value against an associated noise level threshold, and identifying the audio signal dependent on the comparison.04-25-2013
20130103397SYSTEMS, DEVICES AND METHODS FOR LIST DISPLAY AND MANAGEMENT - Exemplary embodiments provide systems, devices and methods that allow creation and management of lists of items in an integrated manner on an interactive graphical user interface. A user may speak a plurality of list items in a natural unbroken manner to provide an audio input stream into an audio input device. Exemplary embodiments may automatically process the audio input stream to convert the stream into a text output, and may process the text output into one or more n-grams that may be used as list items to populate a list on a user interface.04-25-2013
20110313763PICKUP SIGNAL PROCESSING APPARATUS, METHOD, AND PROGRAM PRODUCT - According to one embodiment, a pickup signal processing apparatus includes microphones, a sound determining unit, a signal level calculating unit, a setting unit, and a calculating unit. The sound determining unit determines whether pickup signals picked up by the microphones are signals from a neighboring sound source or a background noise signal. The signal level calculating unit calculates the signal levels for the microphones. The setting unit sets a gain value of at least one microphone and reduces a difference between the signal levels for the microphones on the basis of the signal levels for the microphones, when determined that the pickup signal is the background noise signal. The calculating unit multiplies the pickup signal of the at least one microphone by the gain value set by the setting unit.12-22-2011
20130132078VOICE ACTIVITY SEGMENTATION DEVICE, VOICE ACTIVITY SEGMENTATION METHOD, AND VOICE ACTIVITY SEGMENTATION PROGRAM - Provided is a noise-robust voice activity segmentation device which updates parameters used in the determination of voice-active segments without burdening the user, and also provided are a voice activity segmentation method and a voice activity segmentation program.05-23-2013
20130132077Semi-Supervised Source Separation Using Non-Negative Techniques - Systems and methods for semi-supervised source separation using non-negative techniques are described. In some embodiments, various techniques disclosed herein may enable the separation of signals present within a mixture, where one or more of the signals may be emitted by one or more different sources. In audio-related applications, for instance, a signal mixture may include speech (e.g., from a human speaker) and noise (e.g., background noise). In some cases, speech may be separated from noise using a speech model developed from training data. A noise model may be created, for example, during the separation process (e.g., “on-the-fly”) and in the absence of corresponding training data.05-23-2013
20110213612Acoustic Signal Classification System - A system classifies the source of an input signal. The system determines whether a sound source belongs to classes that may include human speech, musical instruments, machine noise, or other classes of sound sources. The system is robust, performing classification despite variation in sound level and noise masking. Additionally, the system consumes relatively few computational resources and adapts over time to provide consistently accurate classification.09-01-2011
20120259629NOISE REDUCTION COMMUNICATION DEVICE - To provide a noise reduction transmitter which can secure clarity of sounds collected in very noisy environments and maintain a quality of sounds without devising a noise insulation cover particularly.10-11-2012
20100318354NOISE ADAPTIVE TRAINING FOR SPEECH RECOGNITION - Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.12-16-2010
20120284023METHOD OF SELECTING ONE MICROPHONE FROM TWO OR MORE MICROPHONES, FOR A SPEECH PROCESSOR SYSTEM SUCH AS A "HANDS-FREE" TELEPHONE DEVICE OPERATING IN A NOISY ENVIRONMENT - The method comprises the steps of: digitizing sound signals picked up simultaneously by two microphones (N, M); executing a short-term Fourier transform on the signals (x11-08-2012
20130185067NOISE REDUCTION METHOD. PROGRAM PRODUCT AND APPARATUS - A probability model represented as the product of the probability distribution of a mismatch vector g (or clean speech x) with an observed value y as a factor and the probability distribution of a mismatch vector g (or clean speech x) with a confidence index β for each band as a factor, executes MMSE estimation on the probability model, and estimates a clean speech estimated value x̂. As a result, each band influences the result of MMSE estimation, with a degree of contribution in accordance with the level of its confidence. Further, the higher the S/N ratio of observation speech, the more the output value becomes shifted to the observed value. As a result, the output of a front-end is optimized.07-18-2013
20130185068SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD AND PROGRAM - The present invention provides a speech recognition device includes a threshold value candidate generation unit which extracts a feature indicating likeliness of being speech from a temporal sequence of input sound, and generates a plurality of threshold value candidates for discriminating between speech and non-speech; a speech determination unit which, by comparing the feature indicating likeliness of being speech with the plurality of threshold value candidates, determines respective speech sections, and outputs determination information as a result of the determination; a search unit which corrects each of the speech sections represented by the determination information, using a speech model and a non-speech model; and a parameter update unit which estimates a threshold value for determining a speech section, on the basis of distribution profiles of the feature respectively in utterance sections and in non-utterance sections, within each of the corrected speech sections, and makes an update with the threshold value.07-18-2013
20130185066METHOD AND SYSTEM FOR USING VEHICLE SOUND INFORMATION TO ENHANCE AUDIO PROMPTING - Sound related vehicle information representing one or more sounds may be received in a processor associated with a vehicle. The sound related vehicle information may or may not include an audio signal. An audio signal output to a passenger may be modified based on the sound related vehicle information.07-18-2013
20130185065METHOD AND SYSTEM FOR USING SOUND RELATED VEHICLE INFORMATION TO ENHANCE SPEECH RECOGNITION - An audio signal may be received, in a processor associated with a vehicle. Sound related vehicle information representing one or more sounds may be received by the processor. The sound related vehicle information may or may not include an audio signal. A speech recognition process or system may be modified based on the sound related vehicle information.07-18-2013
20130204617APPARATUS, SYSTEM AND METHOD FOR NOISE CANCELLATION AND COMMUNICATION FOR INCUBATORS AND RELATED DEVICES - Systems, apparatuses and methods for integrating adaptive noise cancellation (ANC) with communication features in an enclosure, such as an incubator, bed, and the like. Utilizing one or more error and reference microphones, a controller for a noise cancellation portion reduces noise within a quiet area of the enclosure. Voice communications are provided to allow external voice signals to be transmitted to the enclosure with minimized interference with noise processing. Vocal communications from within the enclosure may be processed to determine certain characteristics/features of the vocal communications. Using these characteristics, certain emotive and/or physiological states may be identified.08-08-2013
20120078625WAVEFORM ANALYSIS OF SPEECH - A waveform analysis of speech is disclosed. Embodiments include methods for analyzing captured sounds produced by animals, such as human vowel sounds, and accurately determining the sound produced. Some embodiments utilize computer processing to identify the location of the sound within a waveform, select a particular time within the sound, and measure a fundamental frequency and one or more formants at the particular time. Embodiments compare the fundamental frequency and the one or more formants to known thresholds and multiples of the fundamental frequency, such as by a computer-run algorithm. The results of this comparison identify of the sound with a high degree of accuracy.03-29-2012
20120084085METHOD AND DEVICE FOR TRACKING BACKGROUND NOISE IN COMMUNICATION SYSTEM - A method and a device for tracking background noise in a communication system are provided. The method includes: calculating a SNR of a current frame according to input audio signals; increasing a frame counter, and calculating tone features and signal steadiness features of the current frame if the SNR of the current frame is not less than a first threshold; determining the possibility of a time window including a noise interval according to the calculated tone feature values and signal steadiness feature values of each frame of the time window when the frame counter is increased to the length of the time window; and extracting noise features in the time window. Existence of background noise is analyzed continuously in a time window, so that background noise that changes frequently.04-05-2012

Patent applications in class Detect speech in noise