Patent application number | Description | Published |
20080306742 | APPARATUS, METHOD, AND PROGRAM FOR SUPPORTING SPEECH INTERFACE DESIGN - For design of a speech interface accepting speech control options, speech samples are stored on a computer-readable medium. A similarity calculating unit calculates a certain indication of similarity of first and second sets of ones of the speech samples, the first set of speech samples being associated with a first speech control option and the second set of speech samples being associated with a second speech control option. A display unit displays the similarity indication. | 12-11-2008 |
20090070115 | SPEECH SYNTHESIS SYSTEM, SPEECH SYNTHESIS PROGRAM PRODUCT, AND SPEECH SYNTHESIS METHOD - It is an objective of the present invention to provide waveform concatenation speech synthesis with high sound quality utilizing its advantages in the case where there is a large quantity of speech segments while providing waveform concatenation speech synthesis with accurate accents in other cases. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. In the preferred embodiment of the present invention, an accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values. | 03-12-2009 |
20090076815 | Speech Recognition Apparatus, Speech Recognition Apparatus and Program Thereof - Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction. Further, maximum likelihood estimation is executed by using voice data of the component of the sound source direction passed through these processes, and a voice model obtained by predetermined modeling of the voice data, and speech recognition is carried out based on an obtained assumption value. | 03-19-2009 |
20090306977 | SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, COMPUTER-EXECUTABLE PROGRAM FOR CAUSING COMPUTER TO EXECUTE RECOGNITION METHOD, AND STORAGE MEDIUM - A speech recognition device and method configured to include a computer, for recognizing speech, including: a storage location for storing a feature quantity acquired from a speech signal for each frame; storage portions for storing acoustic model data and language model data; a echo speech component for generating echo speech model data from a speech signal acquired prior to a speech signal to be processed at the current time point and using the echo speech model data to generate adapted acoustic model data; and a processing component for utilizing the feature quantity, the adapted acoustic model data, and the language model data to provide a speech recognition result of the speech signal. | 12-10-2009 |
20100008516 | METHOD AND SYSTEM FOR POSITION DETECTION OF A SOUND SOURCE - A position detection method, system, and computer readable article of manufacture tangibly embodying computer readable instructions for executing the method for detecting the position of a sound source using at least two microphones. The method includes the steps of: emitting a reproduced sound from the sound source; observing the reproduced sound and an observed sound at the microphones; converting the reproduced sound and the observed sound into electrical signals; transforming the signals of the reproduced sound and of the observed sound into frequency spectra by a frequency spectrum transformer apparatus; calculating Crosspower Spectrum Phase (CSP) coefficients of the frequency spectra of the signals by a CSP coefficient calculator apparatus; and calculating distances between the position of the sound source and the positions of the microphones based on the calculated CSP coefficients by a distance calculating apparatus, thereby detecting the position of the sound source. | 01-14-2010 |
20100030561 | ANNOTATING PHONEMES AND ACCENTS FOR TEXT-TO-SPEECH SYSTEM - A system that outputs phonemes and accents of texts. The system has a storage section storing a first corpus in which spellings, phonemes, and accents of a text input beforehand are recorded separately for individual segmentations of the words that are contained in the text. A text for which phonemes and accents are to be output is acquired and the first corpus is searched to retrieve at least one set of spellings that match the spellings in the text from among sets of contiguous spellings. Then, the combination of a phoneme and an accent that has a higher probability of occurrence in the first corpus than a predetermined reference probability is selected as the phonemes and accent of the text. | 02-04-2010 |
20100114575 | System and Method for Extracting a Specific Situation From a Conversation - A system, method, and computer readable article of manufacture for extracting a specific situation in a conversation. The system includes: an acquisition unit for acquiring speech voice data of speakers in the conversation; a specific expression detection unit for detecting the speech voice data of a specific expression from speech voice data of a specific speaker in the conversation; and a specific situation extraction unit for extracting, from the speech voice data of the speakers in the conversation, a portion of the speech voice data that forms a speech pattern that includes the speech voice data of the specific expression detected by the specific expression detection unit. | 05-06-2010 |
20110301945 | SPEECH SIGNAL PROCESSING SYSTEM, SPEECH SIGNAL PROCESSING METHOD AND SPEECH SIGNAL PROCESSING PROGRAM PRODUCT FOR OUTPUTTING SPEECH FEATURE - A speech signal processing system which outputs a speech feature, divides an input speech signal into frames so that each pair of consecutive frames have a frame shift length equal to at least one period of the speech signal and have an overlap equal to at least a predetermined length, applies discrete Fourier transform to each of the frames, calculates a CSP coefficient for the pair, searches a predetermined search range in which a speech wave lags a period equal to at least one period to obtain the maximum value of the CSP coefficient for the pair, and generates time-series data of the maximum CSP coefficient values arranged in the order in which the frames appear. A method and a computer readable article of manufacture for the implementing the same are also provided. | 12-08-2011 |
20120059654 | SPEAKER-ADAPTIVE SYNTHESIZED VOICE - An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts. | 03-08-2012 |
20120185243 | SPEECH FEATURE EXTRACTION APPARATUS, SPEECH FEATURE EXTRACTION METHOD, AND SPEECH FEATURE EXTRACTION PROGRAM - A speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program. A speech feature extraction apparatus includes: first difference calculation module to: (i) receive, as an input, a spectrum of a speech signal segmented into frames for each frequency bin; and (ii) calculate a delta spectrum for each of the frame, where the delta spectrum is a difference of the spectrum within continuous frames for the frequency bin; and first normalization module to normalize the delta spectrum of the frame for the frequency bin by dividing the delta spectrum by a function of an average spectrum; where the average spectrum is an average of spectra through all frames that are overall speech for the frequency bin; and where an output of the first normalization module is defined as a first delta feature. | 07-19-2012 |
20120197644 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, INFORMATION PROCESSING SYSTEM, AND PROGRAM - An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation. | 08-02-2012 |
20120316880 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, INFORMATION PROCESSING SYSTEM, AND PROGRAM - An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation. | 12-13-2012 |
20120330657 | SPEECH FEATURE EXTRACTION APPARATUS, SPEECH FEATURE EXTRACTION METHOD, AND SPEECH FEATURE EXTRACTION PROGRAM - A speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program. A speech feature extraction apparatus includes: first difference calculation module to: (i) receive, as an input, a spectrum of a speech signal segmented into frames for each frequency bin; and (ii) calculate a delta spectrum for each of the frame, where the delta spectrum is a difference of the spectrum within continuous frames for the frequency bin; and first normalization module to normalize the delta spectrum of the frame for the frequency bin by dividing the delta spectrum by a function of an average spectrum; where the average spectrum is an average of spectra through all frames that are overall speech for the frequency bin; and where an output of the first normalization module is defined as a first delta feature. | 12-27-2012 |