Patent application number | Description | Published |
20080221872 | METHODS AND APPARATUS FOR NATURAL SPOKEN LANGUAGE SPEECH RECOGNITION - A word prediction method and apparatus improves precision and accuracy. For the prediction of a sixth word “?”, a partial analysis tree having a modification relationship with the sixth word is predicted. “sara-ni sho-senkyoku no” has two partial analysis trees, “sara-ni” and “sho-senkyoku no”. It is predicted that “sara-ni” does not have a modification relationship with the sixth word, and that “sho-senkyoku no” does. Then, “donyu”, which is the sixth word from “sho-senkyoku no”, is predicted. In this example, since “sara-ni” is not useful information for the prediction of “donyu”, it is preferable that “donyu” be predicted only by “sho-senkyoku no”. | 09-11-2008 |
20080221873 | METHODS AND APPARATUS FOR NATURAL SPOKEN LANGUAGE SPEECH RECOGNITION - A word prediction apparatus and method that improves the precision accuracy, and a speech recognition method and an apparatus therefor are provided. For the prediction of a sixth word “?”, a partial analysis tree having a modification relationship with the sixth word is predicted. “sara-ni sho-senkyoku no” has two partial analysis trees, “sara-ni” and “sho-senkyoku no”. It is predicted that “sara-ni” does not have a modification relationship with the sixth word, and that “sho-senkyoku no” does. Then, “donyu”, which is the sixth word from “sho-senkyoku no”, is predicted. In this example, since “sara-ni” is not useful information for the prediction of “donyu”, it is preferable that “donyu” be predicted only by “sho-senkyoku no”. | 09-11-2008 |
20080221890 | UNSUPERVISED LEXICON ACQUISITION FROM SPEECH AND TEXT - Techniques for acquiring, from an input text and an input speech, a set of a character string and a pronunciation thereof which should be recognized as a word. A system according to the present invention: selects, from an input text, plural candidate character strings which are candidates to be recognized as a word; generates plural pronunciation candidates of the selected candidate character strings; generates frequency data by combining data in which the generated pronunciation candidates are respectively associated with the character strings; generates recognition data in which character strings respectively indicating plural words contained in the input speech are associated with pronunciations; and selects and outputs a combination contained in the recognition data, out of combinations each consisting of one of the candidate character strings and one of the pronunciation candidates. | 09-11-2008 |
20080270131 | METHOD, PREPROCESSOR, SPEECH RECOGNITION SYSTEM, AND PROGRAM PRODUCT FOR EXTRACTING TARGET SPEECH BY REMOVING NOISE - The present invention relates to a method, preprocessor, speech recognition system, and program product for extracting a target speech by removing noise. In an embodiment of the invention target speech is extracted from two input speeches, which are obtained through at least two speech input devices installed in different places in a space, applies a spectrum subtraction process by using a noise power spectrum (Uω) estimated by one or both of the two speech input devices (Xω(T)) and an arbitrary subtraction constant (α) to obtain a resultant subtracted power spectrum (Yω(T)). The invention further applies a gain control based on the two speech input devices to the resultant subtracted power spectrum to obtain a gain-controlled power spectrum (Dω(T)). The invention further applies a flooring process to said resultant gain-controlled power spectrum on the basis of arbitrary Flooring factor (β) to obtain a power spectrum for speech recognition (Zω(T)). | 10-30-2008 |
20080294432 | Signal enhancement and speech recognition - Provides speech enhancement techniques which are effective even for extemporaneous noise without a noise interval and unknown extemporaneous noise. An example of a signal enhancement device includes: spectral subtraction means for subtracting a given reference signal from an input signal containing a target signal and a noise signal by spectral subtraction; an adaptive filter applied to the reference signal; and coefficient control means for controlling a filter coefficient of the adaptive filter in order to reduce components of the noise signal in the input signal. In the signal enhancement device, a database of a signal model concerning the target signal expressing a given feature by means of a given statistical model is provided, and the filter coefficient is controlled based on the likelihood of the signal model with respect to an output signal from the spectral subtraction means. | 11-27-2008 |
20090043570 | METHOD FOR PROCESSING SPEECH SIGNAL DATA - Method for processing speech signal data. A speech signal is divided into frames. Each frame is characterized by a frame number T representing a unique interval of time. Each speech signal is characterized by a power spectrum with respect to frame T and frequency band ω. A speech segment and a reverberation segment of the speech signal is determined. L filter coefficients W(k) (k=1, 2, . . . , L) respectively corresponding to L frames immediately preceding frame T are computed such that the L filter coefficients minimize a function Φ that is a linear combination of sum of squares of a residual speech power in the reverberation segment and a sum of squares of a subtracted speech power in the speech segment. The computed L filter coefficients are stored within storage media of the computing apparatus. | 02-12-2009 |
20090210224 | SYSTEM, METHOD AND PROGRAM FOR SPEECH PROCESSING - The present invention relates to a system, method and program for speech recognition. In an embodiment of the invention a method for processing a speech signal consists of receiving a power spectrum of a speech signal and generating a log power spectrum signal of the power spectrum. The method further consists of performing discrete cosine transformation on the log power spectrum signal and cutting off cepstrum upper and lower terms of the discrete cosine transformed signal. The method further consists of performing inverse discrete cosine transformation on the signal from which the cepstrum upper and lower terms are cut off. The method further consists of converting the inverse discrete cosine transformed signal so as to bring the signal back to a power spectrum domain and filtering the power spectrum of the speech signal by using, as a filter, the signal which is brought back to the power spectrum domain. | 08-20-2009 |
20090222258 | VOICE ACTIVITY DETECTION SYSTEM, METHOD, AND PROGRAM PRODUCT - A voice activity detection method in a low SNR environment. The voice activity detection is performed by extracting a long-term spectrum variation component and a harmonic structure as feature vectors from a speech signal and increasing difference in feature vectors between speech and non-speech (i) using the long-term spectrum variation component feature or (ii) using a long-term spectrum variation component extraction and a harmonic structure feature extraction. A correct rate and an accuracy rate of the voice activity detection is improved over conventional methods by using a long-term spectrum variation component having a window length over an average phoneme duration of an utterance in the speech signal. The voice activity detection system and method provides speech processing, automatic speech recognition, and speech output capable of very accurate voice activity detection. | 09-03-2009 |
20100125459 | STOCHASTIC PHONEME AND ACCENT GENERATION USING ACCENT CLASS - Exemplary embodiments provide for determining a sequence of words in a TTS system. An input text is analyzed using two models, a word n-gram model and an accent class n-gram model. A list of all possible words for each word in the input is generated for each model. Each word in each list for each model is given a score based on the probability that the word is the correct word in the sequence, based on the particular model. The two lists are combined and the two scores are combined for each word. A set of sequences of words are generated. Each sequence of words comprises a unique combination of an attribute and associated word for each word in the input. The combined score of each of word in the sequence of words is combined. A sequence of words having the highest score is selected and presented to a user. | 05-20-2010 |
20110131044 | TARGET VOICE EXTRACTION METHOD, APPARATUS AND PROGRAM PRODUCT - An apparatus, program product and method is provided for separating a target voice from a plurality of other voices having different directions of arrival. The method comprises the steps of disposing a first and a second voice input device at a predetermined distance from one another and upon receipt of voice signals at said devices calculating discrete Fourier transforms for the signals and calculating a CSP (cross-power spectrum phase) coefficient by superpositioning multiple frequency-bin components based on correlation of the two spectra signals received and then calculating a weighted CSP coefficient from said two discrete Fourier-transformed speech signals. A target voice is separated when received by said devices from other voice signals in a spectrum by using the calculated weighted CSP coefficient. | 06-02-2011 |
20130268275 | SPEECH SYNTHESIS SYSTEM, SPEECH SYNTHESIS PROGRAM PRODUCT, AND SPEECH SYNTHESIS METHOD - Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values. | 10-10-2013 |
20140358533 | PRONUNCIATION ACCURACY IN SPEECH RECOGNITION - A reading accuracy-improving system includes: a reading conversion unit for retrieving a plurality of candidate word strings from speech recognition results to determine the reading of each candidate word string; a reading score calculating unit for determining the speech recognition score for each of one or more candidate word strings with the same reading to determine a reading score; and a candidate word string selection unit for selecting a candidate to output from the plurality of candidate word strings on the basis of the reading score and speech recognition score corresponding to each candidate word string. | 12-04-2014 |
20150051899 | CORRECTING N-GRAM PROBABILITIES BY PAGE VIEW INFORMATION - Methods and a system for calculating N-gram probabilities in a language model. A method includes counting N-grams in each page of a plurality of pages or in each document of a plurality of documents to obtain respective N-gram counts therefor. The method further includes applying weights to the respective N-gram counts based on at least one of view counts and rankings to obtain weighted respective N-gram counts. The view counts and the rankings are determined with respect to the plurality of pages or the plurality of documents. The method also includes merging the weighted respective N-gram counts to obtain merged weighted respective N-gram counts for the plurality of pages or the plurality of documents. The method additionally includes calculating a respective probability for each of the N-grams based on the merged weighted respective N-gram counts. | 02-19-2015 |
20150051902 | CORRECTING N-GRAM PROBABILITIES BY PAGE VIEW INFORMATION - Methods and a system for calculating N-gram probabilities in a language model. A method includes counting N-grams in each page of a plurality of pages or in each document of a plurality of documents to obtain respective N-gram counts therefor. The method further includes applying weights to the respective N-gram counts based on at least one of view counts and rankings to obtain weighted respective N-gram counts. The view counts and the rankings are determined with respect to the plurality of pages or the plurality of documents. The method also includes merging the weighted respective N-gram counts to obtain merged weighted respective N-gram counts for the plurality of pages or the plurality of documents. The method additionally includes calculating a respective probability for each of the N-grams based on the merged weighted respective N-gram counts. | 02-19-2015 |