Class / Patent application number | Description | Number of patent applications / Date published |
704267000 | Time element | 18 |
20080270140 | SYSTEM AND METHOD FOR HYBRID SPEECH SYNTHESIS - A speech synthesis system receives symbolic input describing an utterance to be synthesized. In one embodiment, different portions of the utterance are constructed from different sources, one of which is a speech corpus recorded from a human speaker whose voice is to be modeled. The other sources may include other human speech corpora or speech produced using Rule-Based Speech Synthesis (RBSS). At least some portions of the utterance may be constructed by modifying prototype speech units to produce adapted speech units that are contextually appropriate for the utterance. The system concatenates the adapted speech units with the other speech units to produce a speech waveform. In another embodiment, a speech unit of a speech corpus recorded from a human speaker lacks transitions at one or both of its edges. A transition is synthesized using RBSS and concatenated with the speech unit in producing a speech waveform for the utterance. | 10-30-2008 |
20080319755 | Text-to-speech apparatus - According to an aspect of an embodiment, an apparatus for converting text data into sound signal, comprises: a phoneme determiner for determining phoneme data corresponding to a plurality of phonemes and pause data corresponding to a plurality of pauses to be inserted among a series of phonemes in the text data to be converted into sound signal; a phoneme length adjuster for modifying the phoneme data and the pause data by determining lengths of the phonemes, respectively in accordance with a speed of the sound signal and selectively adjusting the length of at least one of the phonemes which is placed immediately after one of the pauses so that the at least one of the phonemes is relatively extended timewise as compared to other phonemes; and a output unit for outputting sound signal on the basis of the adjusted phoneme data and pause data by the phoneme length adjuster. | 12-25-2008 |
20090043585 | SYSTEM AND METHOD FOR PERFORMING SPEECH SYNTHESIS WITH A CACHE OF PHONEME SEQUENCES - Disclosed are systems, methods, and computer readable media for performing speech synthesis. The method embodiment comprises applying a first part of a speech synthesizer to a text corpus to obtain a plurality of phoneme sequences, the first part of the speech synthesizer only identifying possible phoneme sequences, for each of the obtained plurality of phoneme sequences, identifying joins that would be calculated to synthesize each of the plurality of respective phoneme sequences, and adding the identified joins to a cache for use in speech synthesis. | 02-12-2009 |
20090048844 | Speech synthesis method and apparatus - A phoneme sequence corresponding to a target speech is divided into a plurality of segments. A plurality of speech units for each segment is selected from a speech unit memory that stores speech units having at least one frame. The plurality of speech units has a prosodic feature accordant or similar to the target speech. A formant parameter having at least one formant frequency is generated for each frame of the plurality of speech units. A fused formant parameter of each frame is generated from formant parameters of each frame of the plurality of speech units. A fused speech unit of each segment is generated from the fused formant parameter of each frame. A synthesized speech is generated by concatenating the fused speech unit of each segment. | 02-19-2009 |
20090216537 | SPEECH SYNTHESIS APPARATUS AND METHOD THEREOF - A speech synthesis apparatus includes a text obtaining device that obtains text data for speech synthesis from the outside, a language processor that carries out morphological analysis/parsing to the text data, a prosodic processor that outputs, to a speech synthesizer, a synthesis unit string based on the prosodic and language related attributes of the text data such as accents and word classes, the speech synthesizer that generates synthesized speech from the synthesis unit string, and a speech waveform output device that reproduces a prescribed amount of output synthesized speech after it is accumulated or sequentially as it is output. | 08-27-2009 |
20100057467 | Speech synthesis with dynamic constraints - A method is disclosed for providing speech parameters to be used for synthesis of a speech utterance. In at least one embodiment, the method includes receiving an input time series of first speech parameter vectors, preparing at least one input time series of second speech parameter vectors consisting of dynamic speech parameters, extracting from the input time series of first and second speech parameter vectors partial time series of first speech parameter vectors and corresponding partial time series of second speech parameter vectors, converting the corresponding partial time series of first and second speech parameter vectors into partial time series of third speech parameter vectors, wherein the conversion is done independently for each set of partial time series and can be started as soon as the vectors of the input time series of the first speech parameter vectors have been received. The speech parameter vectors of the partial time series of third speech parameter vectors are combined to form a time series of output speech parameter vectors to be used for synthesis of the speech utterance. At least one embodiment of the method allows a continuous providing of speech parameter vectors for synthesis of the speech utterance. The latency and the memory requirements for the synthesis of a speech utterance are reduced. | 03-04-2010 |
20100145707 | SYSTEM AND METHOD FOR PRONUNCIATION MODELING - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes. | 06-10-2010 |
20100324906 | METHOD OF SYNTHESIZING OF AN UNVOICED SPEECH SIGNAL - The present invention relates to a method of synthesizing a signal comprising the steps of
| 12-23-2010 |
20110004476 | Apparatus and Method for Creating Singing Synthesizing Database, and Pitch Curve Generation Apparatus and Method - Variation over time in fundamental frequency in singing voices is separated into a melody-dependent component and a phoneme-dependent component, modeled for each of the components and stored into a singing synthesizing database. In execution of singing synthesis, a pitch curve indicative of variation over time in fundamental frequency of the melody is synthesized in accordance with an arrangement of notes represented by a singing synthesizing score and the melody-dependent component, and the pitch curve is corrected, for each of pitch curve sections corresponding to phonemes constituting lyrics, using a phoneme-dependent component model corresponding to the phoneme. Such arrangements can accurately model a singing expression, unique to a singing person and appearing in a melody singing style of the person, while taking into account phoneme-dependent pitch variation, and thereby permits synthesis of singing voices that sound more natural. | 01-06-2011 |
20110071836 | SYSTEM AND METHOD FOR GENERALIZED PRESELECTION FOR UNIT SELECTION SYNTHESIS - Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for unit selection synthesis. The method causes a computing device to add a supplemental phoneset to a speech synthesizer front end having an existing phoneset, modify a unit preselection process based on the supplemental phoneset, preselect units from the supplemental phoneset and the existing phoneset based on the modified unit preselection process, and generate speech based on the preselected units. The supplemental phoneset can be a variation of the existing phoneset, can include a word boundary feature, can include a cluster feature where initial consonant clusters and some word boundaries are marked with diacritics, can include a function word feature which marks units as originating from a function word or a content word, and/or can include a pre-vocalic or post-vocalic feature. The speech synthesizer front end can incorporates the supplemental phoneset as an extra feature. | 03-24-2011 |
20110288870 | INFORMATION PROVIDING APPARATUS AND INFORMATION PROVIDING METHOD - Part units of speech information are arranged in a predetermined order to generate a sentence unit of a speech information set. To each of a plurality of speech part units of the speech information, an attribute of “interrupt possible after reproduction” with which reproduction of priority interrupt information can be started after the speech part unit of the speech information is reproduced or another attribute of “interrupt impossible after reproduction” with which reproduction of the priority interrupt information cannot be started even after the speech part unit of the speech information is reproduced is set. When the priority interrupt information having a high priority rank than the speech information set being currently reproduced is inputted, if the attribute of the speech information being reproduced at the point in time is “interrupt impossible after reproduction,” then the priority interrupt information is reproduced after the speech information is reproduced. | 11-24-2011 |
20120310651 | Voice Synthesis Apparatus - A voice signal is synthesized using a plurality of phonetic piece data each indicating a phonetic piece containing at least two phoneme sections corresponding to different phonemes. In the apparatus, a phonetic piece adjustor forms a target section from first and second phonetic pieces so as to connect the first and second phonetic pieces to each other such that the target section includes a rear phoneme section of the first piece and a front phoneme section of the second piece, and expands the target section by a target time length to form an adjustment section such that a central part is expanded at an expansion rate higher than that of front and rear parts of the target section, to thereby create synthesized phonetic piece data having the target time length. A voice synthesizer creates a voice signal from the synthesized phonetic piece data. | 12-06-2012 |
20130030811 | NATURAL QUERY INTERFACE FOR CONNECTED CAR - Sensors within the vehicle monitor driver movement, such as face and head movement to ascertain the direction a driver is looking, and gestural movement to ascertain what the driver may be pointing at. This information is combined with video camera data taken of the external vehicle surroundings. The apparatus uses these data to assist the speech dialogue processor to disambiguate phrases uttered by the driver. The apparatus can issue informative responses or control vehicular functions based on queries automatically generated based on the disambiguated phrases. | 01-31-2013 |
20130262121 | SOUND SYNTHESIZING APPARATUS - A sound synthesizing apparatus includes a processor coupled to a memory. The processor configured to execute computer-executable units comprising: an information acquirer adapted to acquire synthesis information which specifies a duration and an utterance content for each unit sound; a prolongation setter adapted to set whether prolongation is permitted or inhibited for each of a plurality of phonemes corresponding to the utterance content of the each unit sound; and a sound synthesizer adapted to generate a synthesized sound corresponding to the synthesis information by connecting a plurality of sound fragments corresponding to the utterance content of the each unit sound. The sound synthesizer prolongs a sound fragment corresponding to the phoneme the prolongation of which is permitted in accordance with the duration of the unit sound. | 10-03-2013 |
20150025892 | METHOD AND SYSTEM FOR TEMPLATE-BASED PERSONALIZED SINGING SYNTHESIS - A system and method for speech-to-singing synthesis is provided. The method includes deriving characteristics of a singing voice for a first individual and modifying vocal characteristics of a voice for a second individual in response to the characteristics of the singing voice of the first individual to generate a synthesized singing voice for the second individual. In one embodiment the method includes deriving a template of first speech characteristics and first singing characteristics in response to a first individual's speaking voice and singing voice and extracting second speech characteristics from a second individual's speaking voice, then modifying the second speech characteristics in accordance with the template to generate the second individual's approximated singing voice and aligning acoustic features of the second individual's approximated singing voice in response to the first speech characteristics, the first singing characteristics and the second speech characteristics to generate the second individual's synthesized singing voice. | 01-22-2015 |
20150302845 | ESTIMATION SYSTEM OF SPECTRAL ENVELOPES AND GROUP DELAYS FOR SOUND ANALYSIS AND SYNTHESIS, AND AUDIO SIGNAL SYNTHESIS SYSTEM - For high-accuracy analysis and high-quality synthesis of voice sound (singing and speech), provided herein are a system and a method for estimating from an audio signal spectral envelopes and group delays for sound analysis and synthesis with high accuracy and high temporal resolution. An estimation system of spectral envelopes and group delays includes a fundamental frequency estimation section, an amplitude spectrum acquisition section, a group delay extraction section, a spectral envelope integration section, and a group delay integration section. The spectral envelope integration section sequentially obtains a spectral envelope for sound synthesis by averaging overlapped spectra. The group delay integration section selects from a plurality of group delays a group delay corresponding to the maximum envelope of each frequency component of the spectral envelope and integrates groups delays thus selected to sequentially obtain a group delay for sound synthesis. | 10-22-2015 |
20160099003 | DIGITAL WATERMARK EMBEDDING DEVICE, DIGITAL WATERMARK EMBEDDING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM - A digital watermark embedding device includes a synthesized voice generating unit that outputs a synthesized voice according to an input text and outputs phoneme-based alignment regarding phonemes included in the synthesized voice; an estimating unit that estimates whether or not a potentially risky expression is included in the input text, and outputs a potentially risky segment in which the potentially risky expression is estimated to be included; an embedding control unit that associates the potentially risky segment with the phoneme-based alignment, and decides and outputs an embedding time for embedding a watermark in the synthesized voice; and an embedding unit that embeds a digital watermark in the synthesized voice at a time specified as the embedding time for the synthesized voice. | 04-07-2016 |
20160111083 | PHONEME INFORMATION SYNTHESIS DEVICE, VOICE SYNTHESIS DEVICE, AND PHONEME INFORMATION SYNTHESIS METHOD - Provided is a phoneme information synthesis device, including: an operation intensity information acquisition unit configured to acquire information indicating an operation intensity; and a phoneme information generation unit configured to output phoneme information for specifying a phoneme of a singing voice to be synthesized based on the information indicating the operation intensity supplied from the operation intensity information acquisition unit. | 04-21-2016 |