Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Masafumi Nishimura, Yokohama-Shi JP

Masafumi Nishimura, Yokohama-Shi JP

Patent application numberDescriptionPublished
20080221872METHODS AND APPARATUS FOR NATURAL SPOKEN LANGUAGE SPEECH RECOGNITION - A word prediction method and apparatus improves precision and accuracy. For the prediction of a sixth word “?”, a partial analysis tree having a modification relationship with the sixth word is predicted. “sara-ni sho-senkyoku no” has two partial analysis trees, “sara-ni” and “sho-senkyoku no”. It is predicted that “sara-ni” does not have a modification relationship with the sixth word, and that “sho-senkyoku no” does. Then, “donyu”, which is the sixth word from “sho-senkyoku no”, is predicted. In this example, since “sara-ni” is not useful information for the prediction of “donyu”, it is preferable that “donyu” be predicted only by “sho-senkyoku no”.09-11-2008
20080221873METHODS AND APPARATUS FOR NATURAL SPOKEN LANGUAGE SPEECH RECOGNITION - A word prediction apparatus and method that improves the precision accuracy, and a speech recognition method and an apparatus therefor are provided. For the prediction of a sixth word “?”, a partial analysis tree having a modification relationship with the sixth word is predicted. “sara-ni sho-senkyoku no” has two partial analysis trees, “sara-ni” and “sho-senkyoku no”. It is predicted that “sara-ni” does not have a modification relationship with the sixth word, and that “sho-senkyoku no” does. Then, “donyu”, which is the sixth word from “sho-senkyoku no”, is predicted. In this example, since “sara-ni” is not useful information for the prediction of “donyu”, it is preferable that “donyu” be predicted only by “sho-senkyoku no”.09-11-2008
20080221890UNSUPERVISED LEXICON ACQUISITION FROM SPEECH AND TEXT - Techniques for acquiring, from an input text and an input speech, a set of a character string and a pronunciation thereof which should be recognized as a word. A system according to the present invention: selects, from an input text, plural candidate character strings which are candidates to be recognized as a word; generates plural pronunciation candidates of the selected candidate character strings; generates frequency data by combining data in which the generated pronunciation candidates are respectively associated with the character strings; generates recognition data in which character strings respectively indicating plural words contained in the input speech are associated with pronunciations; and selects and outputs a combination contained in the recognition data, out of combinations each consisting of one of the candidate character strings and one of the pronunciation candidates.09-11-2008
20080270131METHOD, PREPROCESSOR, SPEECH RECOGNITION SYSTEM, AND PROGRAM PRODUCT FOR EXTRACTING TARGET SPEECH BY REMOVING NOISE - The present invention relates to a method, preprocessor, speech recognition system, and program product for extracting a target speech by removing noise. In an embodiment of the invention target speech is extracted from two input speeches, which are obtained through at least two speech input devices installed in different places in a space, applies a spectrum subtraction process by using a noise power spectrum (Uω) estimated by one or both of the two speech input devices (Xω(T)) and an arbitrary subtraction constant (α) to obtain a resultant subtracted power spectrum (Yω(T)). The invention further applies a gain control based on the two speech input devices to the resultant subtracted power spectrum to obtain a gain-controlled power spectrum (Dω(T)). The invention further applies a flooring process to said resultant gain-controlled power spectrum on the basis of arbitrary Flooring factor (β) to obtain a power spectrum for speech recognition (Zω(T)).10-30-2008
20080294432Signal enhancement and speech recognition - Provides speech enhancement techniques which are effective even for extemporaneous noise without a noise interval and unknown extemporaneous noise. An example of a signal enhancement device includes: spectral subtraction means for subtracting a given reference signal from an input signal containing a target signal and a noise signal by spectral subtraction; an adaptive filter applied to the reference signal; and coefficient control means for controlling a filter coefficient of the adaptive filter in order to reduce components of the noise signal in the input signal. In the signal enhancement device, a database of a signal model concerning the target signal expressing a given feature by means of a given statistical model is provided, and the filter coefficient is controlled based on the likelihood of the signal model with respect to an output signal from the spectral subtraction means.11-27-2008
20090043570METHOD FOR PROCESSING SPEECH SIGNAL DATA - Method for processing speech signal data. A speech signal is divided into frames. Each frame is characterized by a frame number T representing a unique interval of time. Each speech signal is characterized by a power spectrum with respect to frame T and frequency band ω. A speech segment and a reverberation segment of the speech signal is determined. L filter coefficients W(k) (k=1, 2, . . . , L) respectively corresponding to L frames immediately preceding frame T are computed such that the L filter coefficients minimize a function Φ that is a linear combination of sum of squares of a residual speech power in the reverberation segment and a sum of squares of a subtracted speech power in the speech segment. The computed L filter coefficients are stored within storage media of the computing apparatus.02-12-2009
20090210224SYSTEM, METHOD AND PROGRAM FOR SPEECH PROCESSING - The present invention relates to a system, method and program for speech recognition. In an embodiment of the invention a method for processing a speech signal consists of receiving a power spectrum of a speech signal and generating a log power spectrum signal of the power spectrum. The method further consists of performing discrete cosine transformation on the log power spectrum signal and cutting off cepstrum upper and lower terms of the discrete cosine transformed signal. The method further consists of performing inverse discrete cosine transformation on the signal from which the cepstrum upper and lower terms are cut off. The method further consists of converting the inverse discrete cosine transformed signal so as to bring the signal back to a power spectrum domain and filtering the power spectrum of the speech signal by using, as a filter, the signal which is brought back to the power spectrum domain.08-20-2009
20090222258VOICE ACTIVITY DETECTION SYSTEM, METHOD, AND PROGRAM PRODUCT - A voice activity detection method in a low SNR environment. The voice activity detection is performed by extracting a long-term spectrum variation component and a harmonic structure as feature vectors from a speech signal and increasing difference in feature vectors between speech and non-speech (i) using the long-term spectrum variation component feature or (ii) using a long-term spectrum variation component extraction and a harmonic structure feature extraction. A correct rate and an accuracy rate of the voice activity detection is improved over conventional methods by using a long-term spectrum variation component having a window length over an average phoneme duration of an utterance in the speech signal. The voice activity detection system and method provides speech processing, automatic speech recognition, and speech output capable of very accurate voice activity detection.09-03-2009
20100125459STOCHASTIC PHONEME AND ACCENT GENERATION USING ACCENT CLASS - Exemplary embodiments provide for determining a sequence of words in a TTS system. An input text is analyzed using two models, a word n-gram model and an accent class n-gram model. A list of all possible words for each word in the input is generated for each model. Each word in each list for each model is given a score based on the probability that the word is the correct word in the sequence, based on the particular model. The two lists are combined and the two scores are combined for each word. A set of sequences of words are generated. Each sequence of words comprises a unique combination of an attribute and associated word for each word in the input. The combined score of each of word in the sequence of words is combined. A sequence of words having the highest score is selected and presented to a user.05-20-2010
20110131044TARGET VOICE EXTRACTION METHOD, APPARATUS AND PROGRAM PRODUCT - An apparatus, program product and method is provided for separating a target voice from a plurality of other voices having different directions of arrival. The method comprises the steps of disposing a first and a second voice input device at a predetermined distance from one another and upon receipt of voice signals at said devices calculating discrete Fourier transforms for the signals and calculating a CSP (cross-power spectrum phase) coefficient by superpositioning multiple frequency-bin components based on correlation of the two spectra signals received and then calculating a weighted CSP coefficient from said two discrete Fourier-transformed speech signals. A target voice is separated when received by said devices from other voice signals in a spectrum by using the calculated weighted CSP coefficient.06-02-2011

Patent applications by Masafumi Nishimura, Yokohama-Shi JP