Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Pitch

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

704201000 - For storage or transmission

704205000 - Frequency

704206000 - Specialized information

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
704208000 Voiced or unvoiced 25
Entries
DocumentTitleDate
20110184732SIGNAL PRESENCE DETECTION USING BI-DIRECTIONAL COMMUNICATION DATA - A system and method for using bi-directional conversation data to improve signal presence detection are disclosed. The detector module is adapted to communicate with a signal enhancement module. The detector module collects data from a transmit direction of the connection and a receive direction of a data connection. The collected data from the transmit and the receive direction is used to classify at least one of data in the transmit direction and data in the receive direction. Responsive to the classification, the signal enhancement module enhances data in one of the transmit direction and the receive direction. Hence, data classification accuracy is improved by using data from both the transmit and receive directions. In one embodiment, the detector module applies a voice activity detection module (VAD) process to detect the presence or absence of voice data in the collected data.07-28-2011
20080275695Method and system for pitch contour quantization in audio coding - A method and device for improving coding efficiency in audio coding. From the pitch values of a pitch contour of an audio signal, a plurality of simplified pitch contour segments are generated to approximate the pitch contour, based on one or more pre-selected criteria. The contour segments can be linear or non-linear with each contour segment represented by a first end point and a second end point. If the contour segments are linear, then only the information regarding the end points, instead of the pitch values, are provided to a decoder for reconstructing the audio signal. The contour segment can have a fixed maximum length or a variable length, but the deviation between a contour segment and the pitch values in that segment is limited by a maximum value.11-06-2008
20130211827SAMPLE RATE CONVERTER WITH AUTOMATIC ANTI-ALIASING FILTER - The subject disclosure is directed towards dynamically computing anti-aliasing filter coefficients for sample rate conversion in digital audio. In one aspect, for each input-to-output sampling rate ratio (pitch) obtained, anti-aliasing filter coefficients are interpolated based upon the pitch (e.g., using the fractional part of the ratio) from two filters (coefficient sets) selected based upon the pitch (e.g., using the integer part of the ratio). The interpolation provides for fine-grained cutoff frequencies, and by re-computation for each pitch, smooth anti-aliasing with dynamically changing ratios.08-15-2013
20110196674SPECTRUM CODING APPARATUS, SPECTRUM DECODING APPARATUS, ACOUSTIC SIGNAL TRANSMISSION APPARATUS, ACOUSTIC SIGNAL RECEPTION APPARATUS AND METHODS THEREOF - A spectrum coding apparatus capable of performing coding at a low bit rate and with high quality is disclosed. This apparatus is provided with a section that performs the frequency transformation of a first signal and calculates a first spectrum, a section that converts the frequency of a second signal and calculates a second spectrum, a section that estimates the shape of the second spectrum in a band of FL≦k08-11-2011
20110196673CONCEALING LOST PACKETS IN A SUB-BAND CODING DECODER - An electronic device for reconstructing a lost packet in a Sub-Band Coding (SBC) decoder is described. The electronic device includes a processor and instructions stored in memory. The electronic device detects a lost packet, obtains a zero-input response of a synthesis filter bank and obtains a coarse pitch estimate. The electronic device also obtains a fine pitch estimate based on the zero-input response and the coarse pitch estimate. The electronic device selects a last pitch period based on the fine pitch estimate and uses samples from the last pitch period for the lost packet.08-11-2011
20090319261CODING OF TRANSITIONAL SPEECH FRAMES FOR LOW-BIT-RATE APPLICATIONS - Systems, methods, and apparatus for low-bit-rate coding of transitional speech frames are disclosed.12-24-2009
20130041656SYSTEM AND METHOD FOR TRACKING SOUND PITCH ACROSS AN AUDIO SIGNAL - A system and method may be configured to analyze audio information derived from an audio signal. The system and method may track sound pitch across the audio signal. The tracking of pitch across the audio signal may take into account change in pitch by determining at individual time sample windows in the signal duration an estimated pitch and an estimated fractional chirp rate of the harmonics at the estimated pitch. The estimated pitch and the estimated fractional chirp rate may then be implemented to determine an estimated pitch for another time sample window in the signal duration with an enhanced accuracy and/or precision.02-14-2013
20130041657SYSTEM AND METHOD FOR TRACKING SOUND PITCH ACROSS AN AUDIO SIGNAL USING HARMONIC ENVELOPE - A system and method may be configured to analyze audio information derived from an audio signal. The system and method may track sound pitch across the audio signal. The tracking of pitch across the audio signal may take into account change in pitch by determining at individual time sample windows in the signal duration an estimated pitch and a representation of harmonic envelope at the estimated pitch. The estimated pitch and the representation of harmonic envelope may then be implemented to determine an estimated pitch for another time sample window in the signal duration with an enhanced accuracy and/or precision.02-14-2013
20090157395Adaptive codebook gain control for speech coding - In accordance with one aspect of the invention, a selector supports the selection of a first encoding scheme or the second encoding scheme based upon the detection or absence of the triggering characteristic in the interval of the input speech signal. The first encoding scheme has a pitch pre-processing procedure for processing the input speech signal to form a revised speech signal biased toward an ideal voiced and stationary characteristic. The pre-processing procedure allows the encoder to fully capture the benefits of a bandwidth-efficient, long-term predictive procedure for a greater amount of speech components of an input speech signal than would otherwise be possible. In accordance with another aspect of the invention, the second encoding scheme entails a long-term prediction mode for encoding the pitch on a sub-frame by sub-frame basis. The long-term prediction mode is tailored to where the generally periodic component of the speech is generally not stationary or less than completely periodic and requires greater frequency of updates from the adaptive codebook to achieve a desired perceptual quality of the reproduced speech under a long-term predictive procedure.06-18-2009
20100106489Method and System for Speech Quality Prediction of the Impact of Time Localized Distortions of an Audio Transmission System - Method and processing system for establishing the impact of time response distortion of an input signal which is applied to an audio transmission system (04-29-2010
20100106488VOICE ENCODING DEVICE AND VOICE ENCODING METHOD - Provided is an audio encoding device which can detect an optimal pitch pulse when using pitch pulse information as redundant information. The device includes: a search start decision unit (04-29-2010
20120185244SPEECH PROCESSING DEVICE, SPEECH PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT - According to one embodiment, in a speech processing device, an extractor windows a part of the speech signal and extracts a partial waveform. A calculator performs frequency analysis of the partial waveform to calculate a frequency spectrum. An estimator generates an artificial waveform that is a waveform according to an interval between the pitch marks for each harmonic component having a frequency that is a predetermined multiple of a fundamental frequency of the speech signal and estimates harmonic spectral features representing characteristics of the frequency spectrum of the harmonic component from each of the artificial waveforms. A separator separates the partial waveform into a periodic component produced from periodic vocal-fold vibration as an acoustic source and an aperiodic component produced from aperiodic acoustic sources other than the vocal-fold vibration by using the respective harmonic spectral features and the frequency spectrum of the partial waveform.07-19-2012
20130046533IDENTIFYING FEATURES IN A PORTION OF A SIGNAL REPRESENTING SPEECH - Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, processing a signal representing speech can comprise receiving a region of the signal representing speech. The region can comprise a portion of a frame of the signal representing speech classified as a voiced frame. The region can be marked based on one or more pitch estimates for the region. A cord can be identified within the region based on occurrence of one or more events within the region of the signal. For example, the one or more events can comprise one or more glottal pulses. In such cases, cord can begin with onset of a first glottal pulse and extend to a point prior to an onset of a second glottal pulse. The cord may exclude a portion of the region of the signal prior to the onset of the second glottal pulse.02-21-2013
20090043569Pitch prediction for use by a speech decoder to conceal packet loss - There is provided a pitch lag predictor for use by a speech decoder to generate a predicted pitch lag parameter. The pitch lag predictor comprises a summation calculator configured to generate a first summation based on a plurality of previous pitch lag parameters, and a second summation based on a plurality of previous pitch lag parameters and a position of each of the plurality of previous pitch lag parameters with respect to the predicted pitch lag parameter; a coefficient calculator configured to generate a first coefficient using a first equation based on the first summation and the second summation, and a second coefficient using a second equation based on the first summation and the second summation, wherein the first equation is different than the second equation; and a predictor configured to generate the predicted pitch lag parameter based on the first coefficient and the second coefficient.02-12-2009
20090012780Speech signal decoding method and apparatus - In a speech signal decoding method, information containing at least a sound source signal, gain, and filter coefficients is decoded from a received bit stream. Voiced speech and unvoiced speech of a speech signal are identified using the decoded information. Smoothing processing based on the decoded information is performed for at least either one of the decoded gain and decoded filter coefficients in the unvoiced speech. The speech signal is decoded by driving a filter having the decoded filter coefficients by an excitation signal obtained by multiplying the decoded sound source signal by the decoded gain using the result of the smoothing processing. A speech signal decoding apparatus is also disclosed.01-08-2009
20100070270CELP Post-processing for Music Signals - In one embodiment, a method of receiving a decoded audio signal that has a transmitted pitch lag is disclosed. The method includes estimating pitch correlations of possible short pitch lags that are smaller than a minimum pitch limitation and have an approximated multiple relationship with the transmitted pitch lag, checking if one of the pitch correlations of the possible short pitch lags is large enough compared to a pitch correlation estimated with the transmitted pitch lag, and selecting a short pitch lag as a corrected pitch lag if a corresponding pitch correlation is large enough. The postprocessing is performed using the corrected pitch lag. In another embodiment, when the existence of irregular harmonics or wrong pitch lag is detected, a coded-excited linear prediction (CELP) postfilter is made more aggressive.03-18-2010
20130166288Very Short Pitch Detection and Coding - System and method embodiments are provided for very short pitch detection and coding for speech or audio signals. The system and method include detecting whether there is a very short pitch lag in a speech or audio signal that is shorter than a conventional minimum pitch limitation using a combination of time domain and frequency domain pitch detection techniques. The pitch detection techniques include using pitch correlations in time domain and detecting a lack of low frequency energy in the speech or audio signal in frequency domain. The detected very short pitch lag is coded using a pitch range from a predetermined minimum very short pitch limitation that is smaller than the conventional minimum pitch limitation.06-27-2013
20130166287Adaptively Encoding Pitch Lag For Voiced Speech - System and method embodiments for dual modes pitch coding are provided. The system and method embodiments are configured to adaptively code pitch lags of a voiced speech signal using one of two pitch coding modes according to a pitch length, stability, or both. The two pitch coding modes include a first pitch coding mode with relatively high precision and reduced dynamic range, and a second pitch coding mode with relatively large dynamic range and reduced precision. The first pitch coding mode is used upon determining that the voiced speech signal has a relatively short or substantially stable pitch. The second pitch coding mode is used upon determining that the voiced speech signal has a relatively long or less stable pitch or is a substantially noisy signal.06-27-2013
20100241424Open-Loop Pitch Track Smoothing - There is provided a speech encoder for performing an algorithm that comprises obtaining (09-23-2010
20110125493VOICE QUALITY CONVERSION APPARATUS, PITCH CONVERSION APPARATUS, AND VOICE QUALITY CONVERSION METHOD - The voice quality conversion apparatus includes: low-frequency harmonic level calculating units and a harmonic level mixing unit for calculating a low-frequency sound source spectrum by mixing a level of a harmonic of an input sound source waveform and a level of a harmonic of a target sound source waveform at a predetermined conversion ratio for each order of harmonics including fundamental, in a frequency range equal to or lower than a boundary frequency; a high-frequency spectral envelope mixing unit that calculates a high-frequency sound source spectrum by mixing the input sound source spectrum and the target sound source spectrum at the predetermined conversion ratio in a frequency range larger than the boundary frequency; and a spectrum combining unit that combines the low-frequency sound source spectrum with the high-frequency sound source spectrum at the boundary frequency to generate a sound source spectrum for an entire frequency range.05-26-2011
20100268530Signal Pitch Period Estimation - A method and apparatus for estimating the pitch period of a signal. The method includes identifying a first candidate pitch period by performing a search only over a first range of potential pitch periods. The method further includes determining a second candidate pitch period by dividing the first candidate pitch period by an integer, wherein the second candidate pitch period is outside the first range of potential pitch periods. The method further includes selecting as the estimate of the pitch period of the signal the smaller of the candidate pitch periods that is such that portions of the signal separated by that candidate pitch period are well correlated.10-21-2010
20120109645DSP-BASED DEVICE FOR AUDITORY SEGREGATION OF MULTIPLE SOUND INPUTS - There is provided a unique signal processing technique for localizing and characterizing each of a number of differently located acoustic sources. Specifically there is provided a method for auditory segregation of multiple voice inputs comprising the steps of: receiving a plurality of voice input signals from different source locations; filtering said voice input signals with head related transfer functions (HRTF) using a digital signal processor (DSP) thereby assigning the voice input signals to different locations in virtual auditory space; and changing the HRTF filtered voice input signals in two dimensions, wherein pitch is changed and the signal is filtered with different filters emulating vocal tracts of different sizes thereby further segregating the voice input signals from each other.05-03-2012
20090089050Device and Method For Frame Lost Concealment - A device and a method for frame lost concealment are disclosed. A pitch period of a current lost frame is obtained on the basis of a pitch period of the last good frame before the current lost frame. An excitation signal of the current lost frame is recovered on the basis of the pitch period of the current lost frame and an excitation signal of the last good frame before the lost frame. Thereby, the hearing contrast of a receiver is reduced, and the quality of speech is improved. Further, in the present invention, a pitch period of continual lost frames is adjusted on the basis of the change trend of the pitch period of the last good frame before the lost frame. Therefore, a buzz effect produced by the continual lost frames is avoided, and the quality of speech is further improved. In addition, by attenuating the energy of the excitation signal obtained from the continual lost frames, the device and method accord with the hearing physiological characteristics of human and reduce the hearing contrast of the receiver.04-02-2009
20090281797BIT ERROR CONCEALMENT FOR AUDIO CODING SYSTEMS - A bit error concealment (BEC) system and method is described herein that detects and conceals the presence of click-like artifacts in an audio signal caused by bit errors introduced during transmission of the audio signal within an audio communications system. A particular embodiment of the present invention utilizes a low-complexity design that introduces no added delay and that is particularly well-suited for applications such as Bluetooth® wireless audio devices which have low cost and low power dissipation requirements.11-12-2009
20080243493Method for Restoring Partials of a Sound Signal - A method for restoring a partial between a peak P10-02-2008
20080288246Selection of preferential pitch value for speech processing - There is provided a method of using a processing circuitry for selecting a preferential pitch lag value from a plurality of pitch lag values, including a first pitch lag value and a second pitch lag value, for coding an input speech signal. The method comprises determining a first timing relationship between a previous pitch lag value and at least one of the plurality of pitch lag values; determining a second timing relationship between the first pitch lag value and the second pitch lag value; favoring one of the first pitch lag value and the second pitch lag value based on the first timing relationship and the second timing relationship to select one of the first pitch lag value and the second pitch lag value as the preferential pitch lag value; and converting the input speech signal into an encoded speech using the preferential pitch lag value.11-20-2008
20080312914SYSTEMS, METHODS, AND APPARATUS FOR SIGNAL ENCODING USING PITCH-REGULARIZING AND NON-PITCH-REGULARIZING CODING - A time shift calculated during a pitch-regularizing (PR) encoding of a frame of an audio signal is used to time-shift a segment of another frame during a non-PR encoding.12-18-2008
20080312913Pitch-Estimation Method and System, and Pitch-Estimation Program - A pitch-estimation method, a pitch-estimation system, and a pitch-estimation program are provided, which estimate a weight of a probability density function of a fundamental frequency and relative amplitude of a harmonic component through fewer computations than ever. In the improved pitch-estimation method, 1200 log12-18-2008
20100305944Pitch Or Periodicity Estimation - A method of estimating a pitch period of a first portion of a signal wherein the first portion overlaps a previous portion. The method comprises computing a first autocorrelation value for part of the first portion not overlapping the previous portion. The method further comprises retrieving a stored second autocorrelation value for part of the first portion overlapping the previous portion, the second autocorrelation value having been computed during estimation of a pitch period of the previous portion. The method further comprises forming a combined autocorrelation value using the first and second autocorrelation values, and selecting the estimated pitch period in dependence on the combined autocorrelation value.12-02-2010
20090125300SCALABLE ENCODING APPARATUS, SCALABLE DECODING APPARATUS, AND METHODS THEREOF - A scalable encoding apparatus capable of reducing the bit rates of encoded parameters and also capable of efficiently encoding even audio signals in which a plurality of harmonic structures are coexistent. In the apparatus, an MDCT analyzing part (05-14-2009
20100023321Voice processing apparatus and method - Character extraction section extracts character amounts, pertaining to a prosody of voice, from a voice signal sequentially in a time-serial manner. Difference value calculation calculates a difference value between each of the extracted character amounts and a reference value. Processing values, corresponding to the individual character amounts, are generated in accordance with the respective difference values, and a voice processing section controls the individual character amounts of the voice signal in accordance with the processing values corresponding to the character amounts and thereby generates an output signal having a prosody changed from the prosody of the voice signal.01-28-2010
20110144981CONTINUOUS PITCH-CORRECTED VOCAL CAPTURE DEVICE COOPERATIVE WITH CONTENT SERVER FOR BACKING TRACK MIX - Techniques have been developed to facilitate (1) the capture and pitch correction of vocal performances on handheld or other portable computing devices and (2) the mixing of such pitch-corrected vocal performances with backing tracks for audible rendering on targets that include such portable computing devices and as well as desktops, workstations, gaming stations, even telephony targets. Implementations of the described techniques employ signal processing techniques and allocations of system functionality that are suitable given the generally limited capabilities of such handheld or portable computing devices and that facilitate efficient encoding and communication of the pitch-corrected vocal performances (or precursors or derivatives thereof) via wireless and/or wired bandwidth-limited networks for rendering on portable computing devices or other targets.06-16-2011
20090144053SPEECH PROCESSING APPARATUS AND SPEECH SYNTHESIS APPARATUS - An information extraction unit extracts spectral envelope information of L-dimension from each frame of speech data. The spectral envelope information does not have a spectral fine structure. A basis storage unit stores N bases (L>N>1). Each basis is differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension. A value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain is zero. Two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlap. A parameter calculation unit minimizes a distortion between the spectral envelope information and a linear combination of each basis with a coefficient by changing the coefficient, and sets the coefficient of each basis from which the distortion is minimized to a spectral envelope parameter of the spectral envelope information.06-04-2009
20090326930SPEECH DECODING APPARATUS AND SPEECH ENCODING APPARATUS - Provided is an audio decoding device capable of suppressing an information amount for a lost frame compensation process and encoding efficiency. In this device, a decoded sound source generation unit (12-31-2009
20110144982CONTINUOUS SCORE-CODED PITCH CORRECTION - Vocal musical performances may be captured and continuously pitch-corrected at a mobile device for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at the mobile device in accord with pitch correction settings. In some cases, such pitch correction settings code a particular key or scale for the vocal performance or for portions thereof. In some cases, pitch correction settings include a score-coded melody sequence of note targets supplied with, or for association with, the lyrics and/or backing track. In some cases, pitch correction settings are dynamically variable based on gestures captured at a user interface.06-16-2011
20110144983WORLD STAGE FOR PITCH-CORRECTED VOCAL PERFORMANCES - Techniques have been developed to facilitate the capture performances on handheld or other portable computing devices and, in some cases, the pitch-correction and mixing of such vocal performances with backing tracks for audible rendering on such devices. Captivating visual animations and/or facilities for listener comment and ranking are provided in association with an audible rendering of a performance, e.g., a vocal performance captured and pitch-corrected at another similarly configured mobile device and mixed with backing instrumentals and/or vocals. Geocoding of captured vocal performances and/or listener feedback may facilitate animations or display artifacts in ways that are suggestive of a performance or endorsement emanating from a particular geographic locale on a user manipulable globe. In this way, implementations of the described functionality can transform otherwise mundane mobile devices into social instruments that foster a unique sense of global connectivity and community.06-16-2011
20120089390PITCH CORRECTED VOCAL CAPTURE FOR TELEPHONY TARGETS - Vocal musical performances may be captured and pitch corrected and supplied to telephony targets such as conventional voice terminal equipment (telephone handsets, answering machines, etc.), wireless telephony devices and information services wherein particular device or subscriber targets are identifiable using telephone numbers or alphanumeric IDs (e.g., mobile phones with or without text/multimedia messaging support, VoIP terminals, answering or voicemail services, ASP-based telephony services, etc.) and/or telco or premises-based telephony equipment, such as switches, with support for customizable ringback tones. To facilitate the foregoing, techniques have been developed for capture and audible rendering of vocal performances on handheld or other portable devices using signal processing techniques suitable given the somewhat limited capabilities of such devices and in ways that facilitate efficient encoding and communication of such captured performances via ubiquitous, though bandwidth limited, wireless networks and through communication channels typical of the wired and wireless telephony networks.04-12-2012
20090240490METHOD AND APPARATUS FOR CONCEALING PACKET LOSS, AND APPARATUS FOR TRANSMITTING AND RECEIVING SPEECH SIGNAL - A method and apparatus for concealing frame loss and an apparatus for transmitting and receiving a speech signal that are capable of reducing speech quality degradation caused by packet loss are provided. In the method, when loss of a current received frame occurs, a random excitation signal having the highest correlation with a periodic excitation signal (i.e., a pitch excitation signal) decoded from a previous frame received without loss is used as a noise excitation signal to recover an excitation signal of a current lost frame. Furthermore, a third, new attenuation constant (AS) is obtained by summing a first attenuation constant (NS) obtained based on the number of continuously lost frames and a second attenuation constant (PS) predicted in consideration of change in amplitude of previously received frames to adjust the amplitude of the recovered excitation signal for the current lost frame. Speech quality degradation caused by packet loss can be reduced for enhanced communication quality in a packet network environment with continuous frame loss.09-24-2009
20090204396METHOD AND APPARATUS FOR IMPLEMENTING SPEECH DECODING IN SPEECH DECODER FIELD OF THE INVENTION - The present disclosure relates to a decoding method and apparatus. The method includes: receiving data frames from the coder; if any erroneous frame appears, calculating a pitch lag parameter of the erroneous frame; decoding the data frames according to the calculated pitch lag parameter of the erroneous frame, and obtaining decoded data. The process of determining the pitch lag parameter includes: determining the number of continuous erroneous frames and the pitch lag parameter of the previous frame; adjusting the pitch lag parameter of the previous frame according to the number of the continuous erroneous frames and a preset adjustment policy, and calculating and determining the pitch lag parameter of a current erroneous frame, wherein the preset adjustment policy is adjusting the determined pitch lag parameter of the current erroneous frame within a preset value range according to the number of the continuous erroneous frames.08-13-2009
20080262836PITCH ESTIMATION APPARATUS, PITCH ESTIMATION METHOD, AND PROGRAM - In a pitch estimation apparatus, a function estimation part estimates a fundamental frequency probability density function of an audio signal by repeating a weight calculation process and an estimated shape specification process. The weight calculation process calculates a weight of each tone model of each fundamental frequency based on an estimated shape of each tone model of each fundamental frequency. The estimated shape indicates a degree of dominancy of a corresponding tone model in a total harmonic structure of the audio signal. The estimated shape specification process specifies each estimated shape of each tone model based on an amplitude spectrum of the audio signal, the harmonic structure of each tone model and the weight of each tone model. A similarity analysis part calculates a similarity index value indicating a degree of similarity between each tone model and corresponding estimated shape. A weight correction part reduces a weight of a tone model of a certain fundamental frequency having the similarity index value indicating that the tone model and the corresponding estimated shape are not similar to each other.10-23-2008
20090222260System and method for multi-channel pitch detection - A method and system for multi-channel detection of pitch may comprise one or more of the following steps and/or means therefore: (a) sampling an audio input stream including at least a first channel and a second channel; (b) setting a search frequency for each of the first channel and the second channel; and (c) detecting a pitch of the first channel and a pitch of the second channel.09-03-2009
20090222259APPARATUS, METHOD AND COMPUTER PROGRAM PRODUCT FOR FEATURE EXTRACTION - A feature extraction apparatus includes a spectrum calculating unit that calculates, based on an input speech signal, a frequency spectrum having frequency components obtained at regular intervals on a logarithmic frequency scale for each of frames that are defined by regular time intervals, and thereby generates a time series of the frequency spectrum; a cross-correlation coefficients calculating unit that calculates, for each target frame of the frames, a cross-correlation coefficients between frequency spectra calculated for two different frames that are in vicinity of the target frame and a predetermined frame width apart from each other; and a shift amount predicting unit that predicts a shift amount of the frequency spectra on the logarithmic frequency scale with respect to the predetermined frame width by use of the cross-correlation coefficients.09-03-2009
20100161323AUDIO ENCODING DEVICE, AUDIO DECODING DEVICE, AND THEIR METHOD - Provided is an audio encoding device capable of preventing audio quality degradation of a decoded signal. In the audio encoding device, a noise analysis unit (06-24-2010
20100228543METHOD AND APPARATUS FOR EXTENDING THE BANDWIDTH OF A SPEECH SIGNAL - A bandwidth extension module, and an associated method and computer-readable medium, suitable for use in artificially extending the bandwidth of a lowband speech signal. The bandwidth extension module comprises a band-pass filter configured to produce a band-pass signal from the lowband speech signal; at least one carrier frequency modulator, each carrier frequency modulator configured to pitch-synchronously modulate the band-pass signal about a respective carrier frequency, the at least one carrier frequency modulator collectively producing a highband speech signal component; a synthesis filter configured to determine a highband speech signal based on the highband speech signal component; and a summation module configured to combine the lowband speech signal with the highband speech signal to obtain a bandwidth-extended speech signal.09-09-2010
20090076808METHOD AND DEVICE FOR PERFORMING FRAME ERASURE CONCEALMENT ON HIGHER-BAND SIGNAL - A method for performing a frame erasure concealment for a higher-band signal involves calculating a periodic intensity of the higher-band signal with respect to pitch period information of a lower-band signal; comparing the periodic intensity to a preconfigured threshold and, if the periodic intensity is greater or equal to the preconfigured threshold, performing the frame erasure concealment with a pitch period repetition based method. If the periodic intensity is less than the preconfigured threshold, performing the frame erasure concealment with a previous frame data repetition based method. A device for performing a frame erasure concealment includes a periodic intensity calculation module, a pitch period repetition module, and a previous frame data repetition module. The pitch period repetition module performs the frame erasure concealment with a pitch period repetition based method; and the previous frame data repetition module performs the frame erasure concealment with a previous frame data repetition based method.03-19-2009
20100235166APPARATUS AND METHOD FOR TRANSFORMING AUDIO CHARACTERISTICS OF AN AUDIO RECORDING - A method of audio processing comprises composing one or more transformation profiles for transforming audio characteristics of an audio recording and then generating for the or each transformation profile, a metadata set comprising transformation profile data and location data indicative of where in the recording the transformation profile data is to be applied; the or each metadata set is then stored in association with the corresponding recording. A corresponding method of audio reproduction comprises reading a recording and a meta-data set associated with that recording from storage, applying transformations to the recording data in accordance with the metadata set transformation profile; and then outputting the transformed recording.09-16-2010
20090043568Accent information extracting apparatus and method thereof - An accent type is determined by outputting mora synchronized signals, extracting a pitch pattern which is a variation pattern of a voice height (fundamental frequency) from a speech signal entered by a user, generating mora synchronized pattern from the pitch pattern and the mora synchronized signal, storing typical patterns for respective accent types, collating the mora synchronized pattern and reference accent pattern, calculating matching of the mora synchronized patterns with respect to the respective accent types, referring the matching and determining the accent type.02-12-2009
20100211384Pitch detection method and apparatus - A pitch detection method and apparatus are disclosed. The method includes: performing pitch detection on an input signal in a signal domain, and obtaining a candidate pitch; performing linear prediction (LP) on the input signal, and obtaining an LP residual signal; setting a candidate pitch range that includes the candidate pitch; searching the candidate pitch range for the LP residual signal, and obtaining a selected pitch.08-19-2010
20120143601Method and System for Determining a Perceived Quality of an Audio System - The invention relates to a method for determining a quality indicator representing a perceived quality of an output signal of an audio system with respect to a reference signal. The reference signal and the output signal are processed and compared. The processing includes dividing the reference signal and the output signal into mutually corresponding time frames. Additionally, the processing includes scaling the intensity of the reference signal towards a fixed intensity level, and then performing measurements on time frames within the scaled reference signal for determining reference signal time frame characteristics. The intensity of the reference signal is then scaled from the fixed intensity level towards an intensity level related to the output signal. Further on in the method, the loudness of the output signal is scaled towards a fixed loudness level in the perceptual loudness domain. This scaling action uses the reference signal time frame characteristics. Finally, the loudness of the reference signal is scaled from a loudness level corresponding to the output signal related intensity level towards a loudness level related to the loudness level of the scaled output signal in the perceptual loudness domain. This scaling action also uses the reference signal time frame characteristics.06-07-2012
20120143600Speech Synthesis information Editing Apparatus - In a speech synthesis information editing apparatus, a phoneme storage unit stores phoneme information that designates a duration of each phoneme of speech to be synthesized. A feature storage unit stores feature information that designates a time variation in a feature of the speech. An edition processing unit changes a duration of each phoneme designated by the phoneme information with an expansion/compression degree depending on a feature designated by the feature information in correspondence to the phoneme.06-07-2012
20100191524NON-SPEECH SECTION DETECTING METHOD AND NON-SPEECH SECTION DETECTING DEVICE - A non-speech section detecting device generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not containing voice data based on speech uttered by a person, the device including: a calculating part calculating a bias of a spectrum obtained by converting sound data of each frame into components on a frequency axis; a judging part judging whether the bias is greater than or equal to a given threshold or alternatively smaller than or equal to a given threshold; a counting part counting the number of consecutive frames judged as having a bias greater than or equal to the threshold or alternatively smaller than or equal to the threshold; a count judging part judging whether the obtained number of consecutive frames is greater than or equal to a given value.07-29-2010
20090210220Speech analyzer detecting pitch frequency, speech analyzing method, and speech analyzing program - A speech analyzer includes a speech acquiring section, a frequency converting section, an autocorrelation section, and a pitch detection section. The frequency converting section converts the speech signal acquired by the speech acquiring section into a frequency spectrum. The autocorrelation section determines an autocorrelation waveform by shifting the frequency spectrum along the frequency axis. The pitch detection section determines the pitch frequency from the distance between two local crests or troughs of the autocorrelation waveform.08-20-2009
20090319262CODING SCHEME SELECTION FOR LOW-BIT-RATE APPLICATIONS - Systems, methods, and apparatus for low-bit-rate coding of transitional speech frames are disclosed.12-24-2009
20090063139Signal modification method for efficient coding of speech signals - For determining a long-term-prediction delay parameter characterizing a long term prediction in a technique using signal modification for digitally encoding a sound signal, the sound signal is divided into a series of successive frames, a feature of the sound signal is located in a previous frame, a corresponding feature of the sound signal is located in a current frame, and the long-term-prediction delay parameter is determined for the current frame while mapping, with the long term prediction, the signal feature of the previous frame with the corresponding signal feature of the current frame. In a signal modification method for implementation into a technique for digitally encoding a sound signal, the sound signal is divided into a series of successive frames, each frame of the sound signal is partitioned into a plurality of signal segments, and at least a part of the signal segments of the frame are warped while constraining the warped signal segments inside the frame. For searching pitch pulses in a sound signal, a residual signal is produced by filtering the sound signal through a linear prediction analysis filter, a weighted sound signal is produced by processing the sound signal through a weighting filter, the weighted sound signal being indicative of signal periodicity, a synthesized weighted sound signal is produced by filtering a synthesized speech signal produced during a last subframe of a previous frame of the sound signal through the weighting filter, a last pitch pulse of the sound signal of the previous frame is located from the residual signal, a pitch pulse prototype of given length is extracted around the position of the last pitch pulse of the sound signal of the previous frame using the synthesized weighted sound signal, and the pitch pulses are located in a current frame using the pitch pulse prototype.03-05-2009
20110246188NONVOLATILE STORAGE SYSTEM AND MUSIC SOUND GENERATION SYSTEM - A music sound generation system is formed with a high sound quality and with a small size using a large-capacity NAND flash memory for storing music sound data. Music sound data is divided into N pitch groups and stored into N different storage modules as being divided in these storage modules. A sound generation command classification unit (10-06-2011
20110087489Method and Apparatus for Performing Packet Loss or Frame Erasure Concealment - The invention concerns a method and apparatus for performing packet loss or Frame Erasure Concealment (FEC) for a speech coder that does not have a built-in or standard FEC process. A receiver with a decoder receives encoded frames of compressed speech information transmitted from an encoder. A lost frame detector at the receiver determines if an encoded frame has been lost or corrupted in transmission, or erased. If the encoded frame is not erased, the encoded frame is decoded by a decoder and a temporary memory is updated with the decoder's output. A predetermined delay period is applied and the audio frame is then output. If the lost frame detector determines that the encoded frame is erased, a FEC module applies a frame concealment process to the signal. The FEC processing produces natural sounding synthetic speech for the erased frames.04-14-2011
20100063804ADAPTIVE SOUND SOURCE VECTOR QUANTIZATION DEVICE AND ADAPTIVE SOUND SOURCE VECTOR QUANTIZATION METHOD - Provided is an adaptive sound source vector quantization device which can always perform a pitch cycle search with a resolution appropriate for any section of the pitch cycle search range of a second sub-frame when a pitch cycle search range of the second sub-frame changes in accordance with a pitch cycle of a first sub-frame. The device includes a first pitch cycle instruction unit (03-11-2010
20100063806Classification of Fast and Slow Signal - Low bit rate audio coding such as BWE algorithm often encounters conflict goal of achieving high time resolution and high frequency resolution at the same time. In order to achieve best possible quality, input signal can be first classified into fast signal and slow signal. This invention focuses on classifying signal into fast signal and slow signal, based on at least one of the following parameters or a combination of the following parameters: spectral sharpness, temporal sharpness, pitch correlation (pitch gain), and/or spectral envelope variation. This classification information can help to choose different BWE algorithms, different coding algorithms, and different postprocessing algorithms respectively for fast signal and slow signal.03-11-2010
20100070269Adding Second Enhancement Layer to CELP Based Core Layer - In an embodiment, a method of transmitting an input audio signal is disclosed. A first coding error of the input audio signal with a scalable codec having a first enhancement layer is encoded, and a second coding error is encoded using a second enhancement layer after the first enhancement layer. Encoding the second coding error includes coding fine spectrum coefficients of the second coding error to produce coded fine spectrum coefficients, and coding a spectral envelope of the second coding error to produce a coded spectral envelope. The coded fine spectrum coefficients and the coded spectral envelope are transmitted.03-18-2010
20110099005FRAMING METHOD AND APPARATUS - A framing method and apparatus are disclosed to overcome inconsistency of gains between sub-frames caused by simple average framing in the prior art. The method includes: obtaining the Linear Prediction Coding (LPC) order and the pitch of the signal; removing the samples inapplicable to Long-Term Prediction (LTP) synthesis according to the LPC prediction order and the pitch; and splitting the remaining samples of the signal into several sub-frames. The technical solution under the present invention is applicable to the multimedia speech coding field.04-28-2011
20110087488SPEECH SYNTHESIS APPARATUS AND METHOD - According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.04-14-2011
20120303361Method and Apparatus for Sculpting Synthesized Speech - Methods and systems for sculpting synthesized speech using a graphic user interface are disclosed. An operator enters a stream of text that is used to produce a stream of target phonetic-units. The stream of target phonetic-units is then submitted to a unit-selection process to produce a stream of selected phonetic-units, each selected phonetic-unit derived from a database of sample phonetic-units. After the stream of sample phonetic-units is selected, an operator can remove various selected phonetic-units from the stream of selected phonetic-units, prune the sample phonetic-database and edit various cost functions using the graphic user interface. The edited speech information can then be submitted to the unit-selection process to produce a second stream of selected phonetic-units.11-29-2012
20100332221ENCODING DEVICE, DECODING DEVICE, AND METHOD THEREOF - It is possible to improve quality of a decoding signal in a band spread for estimating a high band from a low band of a decoding signal. A first layer encoding unit (12-30-2010
20110251842COMPUTATIONAL TECHNIQUES FOR CONTINUOUS PITCH CORRECTION AND HARMONY GENERATION - Using signal processing techniques described herein, pitch detection and correction of a user's vocal performance can be performed continuously and in real-time with respect to the audible rendering of the backing track at the handheld or portable computing device. In some implementations, pitch detection builds on time-domain pitch correction techniques that employ average magnitude difference function (AMDF) or autocorrelation-based techniques together with zero-crossing and/or peak picking techniques to identify differences between pitch of a captured vocal signal and score-coded target pitches. Based on detected differences, pitch correction based on pitch synchronous overlapped add (PSOLA) and/or linear predictive coding (LPC) techniques allow captured vocals to be pitch shifted in real-time to “correct” notes in accord with pitch correction settings that code score-coded melody targets and harmonies.10-13-2011
20110251841COORDINATING AND MIXING VOCALS CAPTURED FROM GEOGRAPHICALLY DISTRIBUTED PERFORMERS - Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. Based on the techniques described herein, even mere amateurs are encouraged to share with friends and family or to collaborate and contribute vocal performances as part of virtual “glee clubs.” In some implementations, these interactions are facilitated through social network- and/or eMail-mediated sharing of performances and invitations to join in a group performance. Using uploaded vocals captured at clients such as a mobile device, a content server (or service) can mediate such virtual glee clubs by manipulating and mixing the uploaded vocal performances of multiple contributing vocalists.10-13-2011
20110251840PITCH-CORRECTION OF VOCAL PERFORMANCE IN ACCORD WITH SCORE-CODED HARMONIES - Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured on mobile devices in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at a portable computing device (such as a mobile phone, personal digital assistant, laptop computer, notebook computer, pad-type computer or netbook) in accord with pitch correction settings. In some cases, pitch correction settings include a score-coded melody and/or harmonies supplied with, or for association with, the lyrics and backing tracks. Harmonies notes or chords may be coded as explicit targets or relative to the score coded melody or even actual pitches sounded by a vocalist, if desired.10-13-2011
20100063805NON-CAUSAL POSTFILTER - A decoder arrangement comprising a receiver input for parameters of frame-based coded signals and a decoder arranged to provide frames of decoded audio signals based on the parameters. The receiver input and/or the decoder is arranged to establish a time difference between the occasion when parameters of a first frame is available at the receiver input and the occasion when a decoded audio signal of the first frame is available at an output of the decoder, which time difference corresponds to at least one frame. A postfilter is connected to the output of the decoder and to the receiver input. The postfilter is arranged to provide a filtering of the frames of decoded audio signals into an output signal in response to parameters of a respective subsequent frame.03-11-2010
20090119097PITCH SELECTION MODULES IN A SYSTEM FOR AUTOMATIC TRANSCRIPTION OF SUNG OR HUMMED MELODIES - The technology disclosed relates to audio signal processing. It includes a series of modules that individually are useful to solve audio signal processing problems. Among the problems addressed are buzz removal, selecting a pitch candidate among pitch candidates based on local continuity of pitch and regional octave consistency, making small adjustments in pitch, ensuring that a selected pitch is consistent with harmonic peaks, determining whether a given frame or region of frames includes harmonic, voiced signal, extracting harmonics from voice signals and detecting vibrato. One environment in which these modules are useful is transcribing singing or humming into a symbolic melody. Another environment that would usefully employ some of these modules is speech processing. Some of the modules, such as buzz removal, are useful in many other environments as well.05-07-2009
20110054886EFFECT DEVICE - An effect device may be configured such that when an input audio signal switches from a consonant to a vowel and an input level of the switched vowel is greater than a threshold value Lc (and a variable t is greater than time Ts), an audio effect signal A may be generated. Such an effect device may allow for increasing the occurrences when portamento is simulated, while still sounding natural. In general, a detecting module detects whether an audio signal is a vowel sound or a consonant sound and whether the audio signal changed from a consonant sound to a vowel sound; and a pitch change module changes a pitch of the audio signal and changes, based on a prescribed function, an amount the pitch is changed to produce a modified audio signal, when the audio signal changed from a consonant sound to a vowel sound.03-03-2011
20110125492Speech Intelligibility - The perceived quality of a narrowband speech signal truncated from a wideband speech signal is improved by generating in a third frequency band third speech components matching first speech components in a first frequency band of the narrowband signal, and generating in a fourth frequency band fourth speech components matching second speech components in a second frequency band of the narrowband signal. A first gain factor is applied to the third speech components to generate adjusted third speech components, and a second gain factor is applied to the fourth speech components to generate adjusted fourth speech components, the gain factors being selected such that the ratios of the average powers of the adjusted third and fourth speech components to the average power of the first speech components are predetermined values.05-26-2011
20110125491Speech Intelligibility - The perceived quality of a speech signal is improved by estimating the average power of first and second signal components and applying a first gain factor to the second signal components to generate adjusted second signal components. The first gain factor is selected such that on application of the first gain factor to the second signal components, the ratio of the average power of the first signal components to the average power of the adjusted second signal components would be a first predetermined value, the first predetermined value being such as to inhibit perceptual distortion of the improved speech signal.05-26-2011
20090171656Method and apparatus for performing packet loss or frame erasure concealment - The invention concerns a method and apparatus for performing packet loss or Frame Erasure Concealment (FEC) for a speech coder that does not have a built-in or standard FEC process. A receiver with a decoder receives encoded frames of compressed speech information transmitted from an encoder. A lost frame detector at the receiver determines if an encoded frame has been lost or corrupted in transmission, or erased. If the encoded frame is not erased, the encoded frame is decoded by a decoder and a temporary memory is updated with the decoder's output. A predetermined delay period is applied and the audio frame is then output. If the lost frame detector determines that the encoded frame is erased, a FEC module applies a frame concealment process to the signal. The FEC processing produces natural sounding synthetic speech for the erased frames.07-02-2009
20090299736Pitch period equalizing apparatus and pitch period equalizing method, and speech coding apparatus, speech decoding apparatus, and speech coding method - To provide a speech coding technology that realizes a low bit rate and can suppress distortion of reproduction speech as compared with a conventional technology.12-03-2009
20110191102SYSTEMS AND METHODS FOR SPEECH EXTRACTION - In some embodiments, a processor-readable medium stores code representing instructions to cause a processor to receive an input signal having a first component and a second component. An estimate of the first component of the input signal is calculated based on an estimate of a pitch of the first component of the input signal. An estimate of the input signal is calculated based on the estimate of the first component of the input signal and an estimate of the second component of the input signal. The estimate of the first component of the input signal is modified based on a scaling function to produce a reconstructed first component of the input signal. The scaling function is a function of at least one of the input signal, the estimate of the first component of the input signal, the estimate of the second component of the input signal, or a residual signal.08-04-2011
20100017201Data embedding apparatus, data extraction apparatus, and voice communication system - A voice communication system having, on a transmission side, a data embedding apparatus provided with an embedding allowability judgment unit (01-21-2010
20100017200ENCODING DEVICE, DECODING DEVICE, AND METHOD THEREOF - Disclosed is an encoding device which can accurately specify a band having a large error among all the bands by using a small calculation amount. The device includes: a first position identification unit (01-21-2010
20090119096PARTIAL SPEECH RECONSTRUCTION - A system enhances the quality of a digital speech signal that may include noise. The system identifies vocal expressions that correspond to the digital speech signal. A signal-to-noise ratio of the digital speech signal is measured before a portion of the digital speech signal is synthesized. The selected portion of the digital speech signal may have a signal-to-noise ratio below a predetermined level and the synthesis of the digital speech signal may be based on speaker identification.05-07-2009
20100010810POST FILTER AND FILTERING METHOD - When a decoding audio signal is to be acquired by pitch-filtering a combined signal of a sub-frame length, a decoding audio signal is continuously changed at the boundary between sub-frames. The post filter includes: a first filter coefficient calculation unit (01-14-2010
20120004908VOICE RECOGNITION TERMINAL - A voice recognition terminal executes a local voice recognition process and utilizes an external center voice recognition process. The terminal includes: a voice message synthesizing element for synthesizing at least one of a voice message to be output from a speaker according to the external center voice recognition process and a voice message to be output from the speaker according to the local voice recognition process so as to distinguish between characteristics of the voice message to be output from the speaker according to the external center voice recognition process and characteristics of the voice message to be output from the speaker according to the local voice recognition process; and a voice output element for outputting a synthesized voice message from the speaker.01-05-2012
20120004907SYSTEM AND METHOD FOR BIOMETRIC ACOUSTIC NOISE REDUCTION - Embodiments of the invention provide a communication device and methods for generating enhanced audio signals. An audio signal comprising a speech signal and a noise signals is acquired at the communication device. A noise processor of the communication device detects a pitch estimation of the audio signal. Thereafter, the audio signal is processed based on the pitch estimation and processing parameters of the audio signals to remove noise signals and generate an enhanced audio signal.01-05-2012
20090063138Method and System for Determining Predominant Fundamental Frequency - Methods, digital systems, and computer readable media are provided for determining a predominant fundamental frequency of a frame of an audio signal by finding a maximum absolute signal value in history data for the frame, determining a number of bits for downshifting based on the maximum absolute signal value, computing autocorrelations for the frame using signal values downshifted by the number of bits, and determining the predominant fundamental frequency using the computed autocorrelations.03-05-2009
20090177464Speech gain quantization strategy - A speech encoder that analyzes and classifies each frame of speech as being periodic-like speech or non-periodic like speech where the speech encoder performs a different gain quantization process depending if the speech is periodic or not. If the speech is periodic, the improved speech encoder obtains the pitch gains from the unquantized weighted speech signal and performs a pre-vector quantization of the adaptive codebook gain G07-09-2009
20120116756METHOD FOR TONE/INTONATION RECOGNITION USING AUDITORY ATTENTION CUES - In a spoken language processing method for tone/intonation recognition, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more tonal characteristics corresponding to the input window of sound can be determined by mapping the cumulative gist vector to one or more tonal characteristics using a machine learning algorithm.05-10-2012
20120209598STATE DETECTING DEVICE AND STORAGE MEDIUM STORING A STATE DETECTING PROGRAM - A state detecting device includes an input unit that receives an input voice sound; an analyzer that calculates a feature parameter of each of plurality of frames extracted from the voice sound; a calculator that calculates the average of the feature parameters of the frames, determines a threshold on the basis of the average and statistical data representing relationships between other averages of other feature parameters obtained from a plurality of speakers and cumulative frequencies of the other feature parameters, and calculates an appearance frequency of a frame that is among the plurality of frames and whose feature parameter is larger than the threshold; a determining unit that determines, on the basis of the appearance frequency, a strained state of a vocal cord that has made the voice sound; and an output unit that outputs a result of the determination.08-16-2012
20120072208DETERMINING PITCH CYCLE ENERGY AND SCALING AN EXCITATION SIGNAL - An electronic device for determining a set of pitch cycle energy parameters is described. The electronic device includes a processor and executable instructions stored in memory. The electronic device obtains a frame, a set of filter coefficients and a residual signal based on the frame and the set of filter coefficients. The electronic device determines a set of peak locations based on the residual signal and segments the residual signal such that each segment includes one peak. The electronic device determines a first set of pitch cycle energy parameters based on a frame region between two consecutive peak locations and maps regions between peaks in the residual signal to regions between peaks in a synthesized excitation signal to produce a mapping. The electronic device determines a second set of pitch cycle energy parameters based on the first set of pitch cycle energy parameters and the mapping.03-22-2012
20120072209ESTIMATING A PITCH LAG - An electronic device for estimating a pitch lag is described. The electronic device includes a processor and executable instructions stored in memory that is in electronic communication with the processor. The electronic device obtains a current frame. The electronic device also obtains a residual signal based on the current frame. The electronic device additionally determines a set of peak locations based on the residual signal. Furthermore, the electronic device obtains a set of pitch lag candidates based on the set of peak locations. The electronic device also estimates a pitch lag based on the set of pitch lag candidates.03-22-2012
20110066426REAL-TIME SPEAKER-ADAPTIVE SPEECH RECOGNITION APPARATUS AND METHOD - A speech recognition apparatus and method for real-time speaker adaptation are provided. The speech recognition apparatus may estimate a pitch of a speech section from an inputted speech signal, extract a speech feature for speech recognition based on the estimated pitch, and perform speech recognition with respect to the speech signal based on the speech feature. The speech recognition apparatus may be adaptively normalized depending on a speaker. Thus, the speech recognition apparatus may extract a speech feature for speech recognition, and may improve a performance of speech recognition based on the extracted speech feature.03-17-2011
20120166187SYSTEM AND METHOD FOR AUDIO SYNTHESIZER UTILIZING FREQUENCY APERTURE ARRAYS - A system and method for audio synthesizer utilizing frequency aperture cells (FAC) and frequency aperture arrays (FAA). In accordance with an embodiment, an audio processing system can be provided for the transformation of audio-band frequencies for musical and other purposes. In accordance with an embodiment, a single stream of mono, stereo, or multi-channel monophonic audio can be transformed into polyphonic music, based on a desired target musical note or set of multiple notes. At its core, the system utilizes an input waveform(s) (which can be either file-based or streamed) which is then fed into an array of filters, which are themselves optionally modulated, to generate a new synthesized audio output.06-28-2012
20120136655SPEECH PROCESSING APPARATUS AND SPEECH PROCESSING METHOD - A signal portion is extracted per frame having a specific duration from an input signal, thus generating a per-frame input signal. The per-frame input signal in the time domain is converted into a per-frame input signal in the frequency domain, thereby generating a spectral pattern of spectra. Peak spectra having peaks are detected in the spectral pattern. A harmonic spectrum is determined, in the peak spectra, having a harmonic structure showing a relationship between a fundamental pitch and a harmonic overtone.05-31-2012
20100185441Error Concealment - A method of updating a state of a decoder that decodes successive portions of a data stream representing an encoded voice signal in dependence on its state, the method comprising: at the decoder, decoding portions of the data stream to form decoded portions; storing the decoded portions; storing respective decoder states held by the decoder after forming each decoded portion; identifying that a portion of the data stream is degraded; estimating a pitch period of a stored decoded portion formed by decoding a portion of the data stream that precedes the degraded portion of the data stream; selecting a stored decoder state held by the decoder after decoding a portion of the data stream that precedes the degraded portion by a multiple of the estimated pitch period; and updating the state of the decoder with the selected decoder state.07-22-2010
20100174536METHOD AND DEVICE FOR ADAPTIVE BANDWIDTH PITCH SEARCH IN CODING WIDEBAND SIGNALS - A pitch search method and device for digitally encoding a wideband signal, in particular but not exclusively a speech signal, in view of transmitting, or storing, and synthesizing this wideband sound signal. The new method and device which achieve efficient modeling of the harmonic structure of the speech spectrum uses several forms of low pass filters applied to a pitch codevector, the one yielding higher prediction gain (i.e. the lowest pitch prediction error) is selected and the associated pitch codebook parameters are forwarded.07-08-2010
20100174535Filtering speech - A method of filtering a speech signal for speech encoding in a communications network, includes determining a cut off frequency for a filter, wherein a component of the speech signal in a frequency range less than the cut off frequency is to be attenuated by the filter; receiving the speech signal at the filter; determining at least one parameter of the received speech signal, the at least one parameter providing an indication of the energy of the component of the received speech signal that is to be attenuated; and adjusting the cut off frequency in dependence on the at least one parameter, thereby adjusting the frequency range to be attenuated.07-08-2010
20100174534Speech coding - A method of encoding speech, the method comprising: receiving a signal representative of speech to be encoded; at each of a plurality of intervals during the encoding, determining a pitch lag between portions of the signal having a degree of repetition; selecting for a set of said intervals a pitch lag vector from a pitch lag codebook of such vectors, each pitch lag vector comprising a set of offsets corresponding to the offset between the pitch lag determined for each said interval and an average pitch lag for said set of intervals, and transmitting an indication of the selected vector and said average over a transmission medium as part of the encoded signal representative of said speech.07-08-2010
20100049505METHOD AND DEVICE FOR PERFORMING PACKET LOSS CONCEALMENT - A method, device and system to implement hiding the loss packet are provided. The provided method, device and system recover the lost frame according to the data before and after the lost frame and enhances the correlation of the recovered lost frame data and the data after the lost frame. A method and device for estimating pitch period are also provided which select a pitch period from the initial pitch period and the pitch periods corresponding to the frequencies which are one or more times higher than the frequencies corresponding to the initial pitch period as the final estimated pitch period, may improve frequency multiplication when estimating the pitch period; in addition, by tuning of the pitch period by matching the waves, the error of estimating pitch period may be reduced and the quality of the audio data may be improved.02-25-2010
20100049506METHOD AND DEVICE FOR PERFORMING PACKET LOSS CONCEALMENT - A method, device and system to implement hiding the loss packet are provided. The provided method, device and system recover the lost frame according to the data before and after the lost frame and enhances the correlation of the recovered lost frame data and the data after the lost frame. A method and device for estimating pitch period are also provided which select a pitch period from the initial pitch period and the pitch periods corresponding to the frequencies which are one or more times higher than the frequencies corresponding to the initial pitch period as the final estimated pitch period, may improve frequency multiplication when estimating the pitch period; in addition, by tuning of the pitch period by matching the waves, the error of estimating pitch period may be reduced and the quality of the audio data may be improved.02-25-2010
20080300867SYSTEM AND METHOD OF ANALYZING VOICE VIA VISUAL AND ACOUSTIC DATA - A method and system for the assessment and diagnosis of voice in normal and diseased states can include determining at least one quantitative measure of vocal fold vibration using a laryngeal image recording of a subject's vocal fold obtained from an endoscopic device or an auditory recording of a subject during a phonatory task, and can include subsequent analysis of a waveform selected from waveform types comprising a) an acoustic recording, and b) a glottal waveform that is extracted from the laryngeal image recording. The method and system can generate a comprehensive, at-a-glance, physician friendly visual pattern and characteristics of vocal fold vibrations and correlate with specific voice conditions for diagnosis and assessment of voice and therapies and treatments of voice disorder.12-04-2008
20110004467VOCAL AND INSTRUMENTAL AUDIO EFFECTS - Systems, methods, and computer program products are provided for producing audio and/or visual effects according to a correlation between reference data and estimated note data derived from an input acoustic audio waveform. Some embodiments calculate a pitch score as a function of a pitch estimate derived from the input waveform, a reference pitch, and a real-time-adjustable pitch gating window. Other embodiments calculate the pitch score as a function of pitch and timing estimates derived from the input waveform, reference pitch and note timing data, an adjustable rhythm gating window, and an adjustable pitch gating window. The audio and/or visual effects are produced according to the pitch score, and may be used to generate outputs (e.g., in real time) for affecting a live performance, an audio mix, a video gaming environment, an educational feedback environment, etc.01-06-2011
20120089391ESTIMATION OF SPEECH MODEL PARAMETERS - Methods for estimating speech model parameters are disclosed. For pulsed parameter estimation, a speech signal is divided into multiple frequency bands or channels using bandpass filters. Channel processing reduces sensitivity to pole magnitudes and frequencies and reduces impulse response time duration to improve pulse location and strength estimation performance. These methods are useful for high quality speech coding and reproduction at various bit rates for applications such as satellite and cellular voice communication.04-12-2012
20110276324Adaptive Filter Pitch Extraction - An enhancement system extracts pitch from a processed speech signal. The system estimates the pitch of voiced speech by deriving filter coefficients of an adaptive filter and using the obtained filter coefficients to derive pitch. The pitch estimation may be enhanced by using various techniques to condition the input speech signal, such as spectral modification of the background noise and the speech signal, and/or reduction of the tonal noise from the speech signal.11-10-2011
20110276323SPEECH-BASED SPEAKER RECOGNITION SYSTEMS AND METHODS - The illustrative embodiments described herein provide systems and methods for authenticating a speaker. In one embodiment, a method includes receiving reference speech input including a reference passphrase to form a reference recording, and receiving test speech input including a test passphrase to form a test recording. The method includes determining whether the test passphrase matches the reference passphrase, and determining whether one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase. The method authenticates the speaker of the test speech input in response to determining that the reference passphrase matches the test passphrase and that one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase.11-10-2011
20120101815QUERY BY HUMMING FOR RINGTONE SEARCH AND DOWNLOAD - Described is a technology by which a user hums, sings or otherwise plays a user-provided rendition of a ringtone (or ringback tone) through a mobile telephone to a ringtone search service (e.g., a WAP, interactive voice response or SMS-based search platform). The service matches features of the user's rendition against features of actual ringtones to determine one or more matching candidate ringtones for downloading. Features may include pitch contours (up or down), pitch intervals and durations of notes. Matching candidates may be ranked based on the determined similarity, possibly in conjunction with weighting criterion such as the popularity of the ringtone and/or the importance of the matched part. The candidate set may be augmented with other ringtones independent of the matching, such as the most popular ones downloaded by other users, ringtones from similar artists, and so forth.04-26-2012
20120101814Artifact Reduction in Packet Loss Concealment - Various techniques are disclosed for improving packet loss concealment to reduce artifacts by using audio character measures of the audio signal. These techniques include attenuation to a noise fill instead of attenuation to silence, varying how long to wait before attenuating the extrapolation, varying the rate of attenuation of the extrapolation, attenuating periodic extrapolation at a different rate than non-periodic extrapolation, and performing period extrapolation on successively longer fill data based on the audio character measures, adjusting weighting between periodic and non-periodic extrapolation based on the audio character measures, and adjusting weighting between periodic extrapolation and non-periodic extrapolation non-linearly.04-26-2012
20100169084METHOD AND APPARATUS FOR PITCH SEARCH - The present invention relates to a method and apparatus for pitch search. One method includes: obtaining a characteristic function value of a residual signal, where the residual signal is a result of removing a Long-Term Prediction (LTP) contribution signal from input speech signals; and obtaining a pitch according to the characteristic function value of the residual signal.07-01-2010
20100169085MODEL BASED REAL TIME PITCH TRACKING SYSTEM AND SINGER EVALUATION METHOD - The various embodiments herein provide a system and method to track the pitch of a human being in real time using time varying model. According to one embodiment, the input voice is synthesised to obtain a lower order model. The lower model is down sampled and fitted to a time varying 2nd order model. The down sampled signal is passed through a pitch tracking filter, a fading filter and a gradient filter to obtain a pitch signal in real time. The noise included in the pitch signal is removed by passing the acquired pitch signal through a Kalman filter to obtain a smoothened pitch signal in real time.07-01-2010
20130144611CODING DEVICE, DECODING DEVICE, CODING METHOD, AND DECODING METHOD - A coding device includes: a pitch contour detection unit which detects a pitch contour of an input audio signal; a dynamic time warping unit which determines the number of pitch nodes based on the pitch contour and generates a first time warping parameter including information indicating the determined number of pitch nodes, a pitch change position, and a pitch change ratio; a first encoder which codes the first time warping parameter; a time warping unit which corrects pitch, using the information obtained from the first time warping parameter, to approximate the pitches of the number of pitch nodes to a predetermined reference value; a second encoder which codes the input audio signal at the corrected pitch; and a multiplexer which multiplexes the coded time warping parameter and the coded audio signal to generate a bitstream.06-06-2013
20130096912SELECTIVE BASS POST FILTER - In one aspect, the invention provides an audio encoding method characterized by a decision being made as to whether the device which will decode the resulting bit stream Bitstream should apply post filtering including attenuation of interharmonic noise. Hence, the decision whether to use the post filter, which is encoded in the bit stream, is taken separately from the decision as to the most suitable coding mode. In another aspect, there is provided an audio decoding method with a decoding step followed by a post-filtering step, including interharmonic noise attenuation, and being characterized in a step of disabling the post filter in accordance with post filtering information encoded in the bit stream signal. Such a method is well suited for mixed-origin audio signals by virtue of its capability to deactivate the post filter in dependence of the post filtering information only, hence independently of factors such as the current coding mode.04-18-2013
20110313759Method for changing the caller voice during conversation in voice communication device - The invention relates to a cellular phone terminal system and in particular to a method for changing caller's voice of speech signal during conversation. The cellular phone terminal system has a filter for filtering signal. The method comprises the steps of: waiting for a caller voice selector key input for a desired caller voice when a caller voice converter key is pressed during conversation; and setting an even or odd harmonic deletion bins on the frequency domain of the uncompressed speech signal correspondingly to the caller voice selector key input to change caller voice.12-22-2011
20130132075METHODS AND ARRANGEMENTS IN A TELECOMMUNICATIONS NETWORK - The present invention relates to a postfilter and a postfilter control to be associated with a postfilter for improving perceived quality of speech reconstructed at a speech decoder. The postfilter control comprises means for measuring stationarity of a speech signal reconstructed at a decoder, means for determining a coefficient to a postfilter control parameter based on the measured stationarity, and means for transmitting the determined coefficient to a postfilter, such that the postfilter can process the reconstructed speech signal by applying the determined coefficient to the postfilter control parameter to obtain an enhanced speech signal.05-23-2013
20090076807METHOD AND DEVICE FOR PERFORMING FRAME ERASURE CONCEALMENT TO HIGHER-BAND SIGNAL - The present invention discloses a method for performing a frame erasure concealment to a higher-band signal, including: calculating a periodic intensity of a higher-band signal with respect to a lower-band signal; judging whether the periodic intensity of the higher-band signal is higher than or equal to a preconfigured threshold; if the periodic intensity of the higher-band signal is higher than or equal to the preconfigured threshold, using a pitch period repetition method to perform the frame erasure concealment to the higher-band signal of a current lost frame; and if the periodic intensity of the higher-band signal is lower than the preconfigured threshold, using a previous frame data repetition method to perform the frame erasure concealment to the higher-band signal of the current lost frame. The present invention further discloses a device for performing a frame erasure concealment to a higher-band signal and a speech decoder. The problem that the quality of the voice signal is lowered is avoided.03-19-2009
20110218800METHOD AND APPARATUS FOR OBTAINING PITCH GAIN, AND CODER AND DECODER - The present invention relates to a method and apparatus for obtaining a pitch gain, and a coder and a decoder. The method includes: obtaining information about an input signal; and obtaining a pitch gain corresponding to the information about the input signal according to the correspondence between the signal information and the pitch gain. The embodiments of the present invention obtain the corresponding pitch gain according to the signal information by using the obtained correspondence between the signal information and the pitch gain, and the pitch gain is applicable to the coder and the decoder, thus making it unnecessary for the coder to transmit the pitch gain to the decoder and solving the problem of bit overhead. The embodiments of the present invention determine the pitch gain adaptively according to the signal information, avoid consumption of extra bits for quantizing the pitch gain, avoid impact on the coding performance, and improve the compression ratio.09-08-2011
20110224977ROBOT, METHOD AND PROGRAM OF CONTROLLING ROBOT - A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.09-15-2011
20130151245Method for Determining Fundamental-Frequency Courses of a Plurality of Signal Sources - The invention relates to a method for establishing fundamental frequency curves of a plurality of signal sources from a single-channel audio recording of a mix signal, said method including the following steps:06-13-2013
20100318349SYNTHESIS OF LOST BLOCKS OF A DIGITAL AUDIO SIGNAL, WITH PITCH PERIOD CORRECTION - The present invention relates to signal modification before pitch period repetition for the synthesis of blocks lost on decoding digital audio signals. The effects of repetition of transitories, such as the plosives of a speech signal, are avoided by comparing the samples of a pitch period with those of the previous pitch period. The signal is modified preferentially by taking the minimum between a current sample (e(12-16-2010

Patent applications in class Pitch

Patent applications in all subclasses Pitch