Pitch

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

704201000 - For storage or transmission

704205000 - Frequency

704206000 - Specialized information

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application number	Description	Number of patent applications / Date published
704208000	Voiced or unvoiced	43

Document	Title	Date
Entries
20080243493	Method for Restoring Partials of a Sound Signal - A method for restoring a partial between a peak P	10-02-2008
20080262836	PITCH ESTIMATION APPARATUS, PITCH ESTIMATION METHOD, AND PROGRAM - In a pitch estimation apparatus, a function estimation part estimates a fundamental frequency probability density function of an audio signal by repeating a weight calculation process and an estimated shape specification process. The weight calculation process calculates a weight of each tone model of each fundamental frequency based on an estimated shape of each tone model of each fundamental frequency. The estimated shape indicates a degree of dominancy of a corresponding tone model in a total harmonic structure of the audio signal. The estimated shape specification process specifies each estimated shape of each tone model based on an amplitude spectrum of the audio signal, the harmonic structure of each tone model and the weight of each tone model. A similarity analysis part calculates a similarity index value indicating a degree of similarity between each tone model and corresponding estimated shape. A weight correction part reduces a weight of a tone model of a certain fundamental frequency having the similarity index value indicating that the tone model and the corresponding estimated shape are not similar to each other.	10-23-2008
20080275695	Method and system for pitch contour quantization in audio coding - A method and device for improving coding efficiency in audio coding. From the pitch values of a pitch contour of an audio signal, a plurality of simplified pitch contour segments are generated to approximate the pitch contour, based on one or more pre-selected criteria. The contour segments can be linear or non-linear with each contour segment represented by a first end point and a second end point. If the contour segments are linear, then only the information regarding the end points, instead of the pitch values, are provided to a decoder for reconstructing the audio signal. The contour segment can have a fixed maximum length or a variable length, but the deviation between a contour segment and the pitch values in that segment is limited by a maximum value.	11-06-2008
20080288246	Selection of preferential pitch value for speech processing - There is provided a method of using a processing circuitry for selecting a preferential pitch lag value from a plurality of pitch lag values, including a first pitch lag value and a second pitch lag value, for coding an input speech signal. The method comprises determining a first timing relationship between a previous pitch lag value and at least one of the plurality of pitch lag values; determining a second timing relationship between the first pitch lag value and the second pitch lag value; favoring one of the first pitch lag value and the second pitch lag value based on the first timing relationship and the second timing relationship to select one of the first pitch lag value and the second pitch lag value as the preferential pitch lag value; and converting the input speech signal into an encoded speech using the preferential pitch lag value.	11-20-2008
20080300867	SYSTEM AND METHOD OF ANALYZING VOICE VIA VISUAL AND ACOUSTIC DATA - A method and system for the assessment and diagnosis of voice in normal and diseased states can include determining at least one quantitative measure of vocal fold vibration using a laryngeal image recording of a subject's vocal fold obtained from an endoscopic device or an auditory recording of a subject during a phonatory task, and can include subsequent analysis of a waveform selected from waveform types comprising a) an acoustic recording, and b) a glottal waveform that is extracted from the laryngeal image recording. The method and system can generate a comprehensive, at-a-glance, physician friendly visual pattern and characteristics of vocal fold vibrations and correlate with specific voice conditions for diagnosis and assessment of voice and therapies and treatments of voice disorder.	12-04-2008
20080312913	Pitch-Estimation Method and System, and Pitch-Estimation Program - A pitch-estimation method, a pitch-estimation system, and a pitch-estimation program are provided, which estimate a weight of a probability density function of a fundamental frequency and relative amplitude of a harmonic component through fewer computations than ever. In the improved pitch-estimation method, 1200 log	12-18-2008
20080312914	SYSTEMS, METHODS, AND APPARATUS FOR SIGNAL ENCODING USING PITCH-REGULARIZING AND NON-PITCH-REGULARIZING CODING - A time shift calculated during a pitch-regularizing (PR) encoding of a frame of an audio signal is used to time-shift a segment of another frame during a non-PR encoding.	12-18-2008
20090012780	Speech signal decoding method and apparatus - In a speech signal decoding method, information containing at least a sound source signal, gain, and filter coefficients is decoded from a received bit stream. Voiced speech and unvoiced speech of a speech signal are identified using the decoded information. Smoothing processing based on the decoded information is performed for at least either one of the decoded gain and decoded filter coefficients in the unvoiced speech. The speech signal is decoded by driving a filter having the decoded filter coefficients by an excitation signal obtained by multiplying the decoded sound source signal by the decoded gain using the result of the smoothing processing. A speech signal decoding apparatus is also disclosed.	01-08-2009
20090043568	Accent information extracting apparatus and method thereof - An accent type is determined by outputting mora synchronized signals, extracting a pitch pattern which is a variation pattern of a voice height (fundamental frequency) from a speech signal entered by a user, generating mora synchronized pattern from the pitch pattern and the mora synchronized signal, storing typical patterns for respective accent types, collating the mora synchronized pattern and reference accent pattern, calculating matching of the mora synchronized patterns with respect to the respective accent types, referring the matching and determining the accent type.	02-12-2009
20090043569	Pitch prediction for use by a speech decoder to conceal packet loss - There is provided a pitch lag predictor for use by a speech decoder to generate a predicted pitch lag parameter. The pitch lag predictor comprises a summation calculator configured to generate a first summation based on a plurality of previous pitch lag parameters, and a second summation based on a plurality of previous pitch lag parameters and a position of each of the plurality of previous pitch lag parameters with respect to the predicted pitch lag parameter; a coefficient calculator configured to generate a first coefficient using a first equation based on the first summation and the second summation, and a second coefficient using a second equation based on the first summation and the second summation, wherein the first equation is different than the second equation; and a predictor configured to generate the predicted pitch lag parameter based on the first coefficient and the second coefficient.	02-12-2009
20090063138	Method and System for Determining Predominant Fundamental Frequency - Methods, digital systems, and computer readable media are provided for determining a predominant fundamental frequency of a frame of an audio signal by finding a maximum absolute signal value in history data for the frame, determining a number of bits for downshifting based on the maximum absolute signal value, computing autocorrelations for the frame using signal values downshifted by the number of bits, and determining the predominant fundamental frequency using the computed autocorrelations.	03-05-2009
20090063139	Signal modification method for efficient coding of speech signals - For determining a long-term-prediction delay parameter characterizing a long term prediction in a technique using signal modification for digitally encoding a sound signal, the sound signal is divided into a series of successive frames, a feature of the sound signal is located in a previous frame, a corresponding feature of the sound signal is located in a current frame, and the long-term-prediction delay parameter is determined for the current frame while mapping, with the long term prediction, the signal feature of the previous frame with the corresponding signal feature of the current frame. In a signal modification method for implementation into a technique for digitally encoding a sound signal, the sound signal is divided into a series of successive frames, each frame of the sound signal is partitioned into a plurality of signal segments, and at least a part of the signal segments of the frame are warped while constraining the warped signal segments inside the frame. For searching pitch pulses in a sound signal, a residual signal is produced by filtering the sound signal through a linear prediction analysis filter, a weighted sound signal is produced by processing the sound signal through a weighting filter, the weighted sound signal being indicative of signal periodicity, a synthesized weighted sound signal is produced by filtering a synthesized speech signal produced during a last subframe of a previous frame of the sound signal through the weighting filter, a last pitch pulse of the sound signal of the previous frame is located from the residual signal, a pitch pulse prototype of given length is extracted around the position of the last pitch pulse of the sound signal of the previous frame using the synthesized weighted sound signal, and the pitch pulses are located in a current frame using the pitch pulse prototype.	03-05-2009
20090076807	METHOD AND DEVICE FOR PERFORMING FRAME ERASURE CONCEALMENT TO HIGHER-BAND SIGNAL - The present invention discloses a method for performing a frame erasure concealment to a higher-band signal, including: calculating a periodic intensity of a higher-band signal with respect to a lower-band signal; judging whether the periodic intensity of the higher-band signal is higher than or equal to a preconfigured threshold; if the periodic intensity of the higher-band signal is higher than or equal to the preconfigured threshold, using a pitch period repetition method to perform the frame erasure concealment to the higher-band signal of a current lost frame; and if the periodic intensity of the higher-band signal is lower than the preconfigured threshold, using a previous frame data repetition method to perform the frame erasure concealment to the higher-band signal of the current lost frame. The present invention further discloses a device for performing a frame erasure concealment to a higher-band signal and a speech decoder. The problem that the quality of the voice signal is lowered is avoided.	03-19-2009
20090076808	METHOD AND DEVICE FOR PERFORMING FRAME ERASURE CONCEALMENT ON HIGHER-BAND SIGNAL - A method for performing a frame erasure concealment for a higher-band signal involves calculating a periodic intensity of the higher-band signal with respect to pitch period information of a lower-band signal; comparing the periodic intensity to a preconfigured threshold and, if the periodic intensity is greater or equal to the preconfigured threshold, performing the frame erasure concealment with a pitch period repetition based method. If the periodic intensity is less than the preconfigured threshold, performing the frame erasure concealment with a previous frame data repetition based method. A device for performing a frame erasure concealment includes a periodic intensity calculation module, a pitch period repetition module, and a previous frame data repetition module. The pitch period repetition module performs the frame erasure concealment with a pitch period repetition based method; and the previous frame data repetition module performs the frame erasure concealment with a previous frame data repetition based method.	03-19-2009
20090089050	Device and Method For Frame Lost Concealment - A device and a method for frame lost concealment are disclosed. A pitch period of a current lost frame is obtained on the basis of a pitch period of the last good frame before the current lost frame. An excitation signal of the current lost frame is recovered on the basis of the pitch period of the current lost frame and an excitation signal of the last good frame before the lost frame. Thereby, the hearing contrast of a receiver is reduced, and the quality of speech is improved. Further, in the present invention, a pitch period of continual lost frames is adjusted on the basis of the change trend of the pitch period of the last good frame before the lost frame. Therefore, a buzz effect produced by the continual lost frames is avoided, and the quality of speech is further improved. In addition, by attenuating the energy of the excitation signal obtained from the continual lost frames, the device and method accord with the hearing physiological characteristics of human and reduce the hearing contrast of the receiver.	04-02-2009
20090119096	PARTIAL SPEECH RECONSTRUCTION - A system enhances the quality of a digital speech signal that may include noise. The system identifies vocal expressions that correspond to the digital speech signal. A signal-to-noise ratio of the digital speech signal is measured before a portion of the digital speech signal is synthesized. The selected portion of the digital speech signal may have a signal-to-noise ratio below a predetermined level and the synthesis of the digital speech signal may be based on speaker identification.	05-07-2009
20090119097	PITCH SELECTION MODULES IN A SYSTEM FOR AUTOMATIC TRANSCRIPTION OF SUNG OR HUMMED MELODIES - The technology disclosed relates to audio signal processing. It includes a series of modules that individually are useful to solve audio signal processing problems. Among the problems addressed are buzz removal, selecting a pitch candidate among pitch candidates based on local continuity of pitch and regional octave consistency, making small adjustments in pitch, ensuring that a selected pitch is consistent with harmonic peaks, determining whether a given frame or region of frames includes harmonic, voiced signal, extracting harmonics from voice signals and detecting vibrato. One environment in which these modules are useful is transcribing singing or humming into a symbolic melody. Another environment that would usefully employ some of these modules is speech processing. Some of the modules, such as buzz removal, are useful in many other environments as well.	05-07-2009
20090125300	SCALABLE ENCODING APPARATUS, SCALABLE DECODING APPARATUS, AND METHODS THEREOF - A scalable encoding apparatus capable of reducing the bit rates of encoded parameters and also capable of efficiently encoding even audio signals in which a plurality of harmonic structures are coexistent. In the apparatus, an MDCT analyzing part (	05-14-2009
20090144053	SPEECH PROCESSING APPARATUS AND SPEECH SYNTHESIS APPARATUS - An information extraction unit extracts spectral envelope information of L-dimension from each frame of speech data. The spectral envelope information does not have a spectral fine structure. A basis storage unit stores N bases (L>N>1). Each basis is differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension. A value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain is zero. Two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlap. A parameter calculation unit minimizes a distortion between the spectral envelope information and a linear combination of each basis with a coefficient by changing the coefficient, and sets the coefficient of each basis from which the distortion is minimized to a spectral envelope parameter of the spectral envelope information.	06-04-2009
20090157395	Adaptive codebook gain control for speech coding - In accordance with one aspect of the invention, a selector supports the selection of a first encoding scheme or the second encoding scheme based upon the detection or absence of the triggering characteristic in the interval of the input speech signal. The first encoding scheme has a pitch pre-processing procedure for processing the input speech signal to form a revised speech signal biased toward an ideal voiced and stationary characteristic. The pre-processing procedure allows the encoder to fully capture the benefits of a bandwidth-efficient, long-term predictive procedure for a greater amount of speech components of an input speech signal than would otherwise be possible. In accordance with another aspect of the invention, the second encoding scheme entails a long-term prediction mode for encoding the pitch on a sub-frame by sub-frame basis. The long-term prediction mode is tailored to where the generally periodic component of the speech is generally not stationary or less than completely periodic and requires greater frequency of updates from the adaptive codebook to achieve a desired perceptual quality of the reproduced speech under a long-term predictive procedure.	06-18-2009
20090171656	Method and apparatus for performing packet loss or frame erasure concealment - The invention concerns a method and apparatus for performing packet loss or Frame Erasure Concealment (FEC) for a speech coder that does not have a built-in or standard FEC process. A receiver with a decoder receives encoded frames of compressed speech information transmitted from an encoder. A lost frame detector at the receiver determines if an encoded frame has been lost or corrupted in transmission, or erased. If the encoded frame is not erased, the encoded frame is decoded by a decoder and a temporary memory is updated with the decoder's output. A predetermined delay period is applied and the audio frame is then output. If the lost frame detector determines that the encoded frame is erased, a FEC module applies a frame concealment process to the signal. The FEC processing produces natural sounding synthetic speech for the erased frames.	07-02-2009
20090177464	Speech gain quantization strategy - A speech encoder that analyzes and classifies each frame of speech as being periodic-like speech or non-periodic like speech where the speech encoder performs a different gain quantization process depending if the speech is periodic or not. If the speech is periodic, the improved speech encoder obtains the pitch gains from the unquantized weighted speech signal and performs a pre-vector quantization of the adaptive codebook gain G	07-09-2009
20090204396	METHOD AND APPARATUS FOR IMPLEMENTING SPEECH DECODING IN SPEECH DECODER FIELD OF THE INVENTION - The present disclosure relates to a decoding method and apparatus. The method includes: receiving data frames from the coder; if any erroneous frame appears, calculating a pitch lag parameter of the erroneous frame; decoding the data frames according to the calculated pitch lag parameter of the erroneous frame, and obtaining decoded data. The process of determining the pitch lag parameter includes: determining the number of continuous erroneous frames and the pitch lag parameter of the previous frame; adjusting the pitch lag parameter of the previous frame according to the number of the continuous erroneous frames and a preset adjustment policy, and calculating and determining the pitch lag parameter of a current erroneous frame, wherein the preset adjustment policy is adjusting the determined pitch lag parameter of the current erroneous frame within a preset value range according to the number of the continuous erroneous frames.	08-13-2009
20090210220	Speech analyzer detecting pitch frequency, speech analyzing method, and speech analyzing program - A speech analyzer includes a speech acquiring section, a frequency converting section, an autocorrelation section, and a pitch detection section. The frequency converting section converts the speech signal acquired by the speech acquiring section into a frequency spectrum. The autocorrelation section determines an autocorrelation waveform by shifting the frequency spectrum along the frequency axis. The pitch detection section determines the pitch frequency from the distance between two local crests or troughs of the autocorrelation waveform.	08-20-2009
20090222259	APPARATUS, METHOD AND COMPUTER PROGRAM PRODUCT FOR FEATURE EXTRACTION - A feature extraction apparatus includes a spectrum calculating unit that calculates, based on an input speech signal, a frequency spectrum having frequency components obtained at regular intervals on a logarithmic frequency scale for each of frames that are defined by regular time intervals, and thereby generates a time series of the frequency spectrum; a cross-correlation coefficients calculating unit that calculates, for each target frame of the frames, a cross-correlation coefficients between frequency spectra calculated for two different frames that are in vicinity of the target frame and a predetermined frame width apart from each other; and a shift amount predicting unit that predicts a shift amount of the frequency spectra on the logarithmic frequency scale with respect to the predetermined frame width by use of the cross-correlation coefficients.	09-03-2009
20090222260	System and method for multi-channel pitch detection - A method and system for multi-channel detection of pitch may comprise one or more of the following steps and/or means therefore: (a) sampling an audio input stream including at least a first channel and a second channel; (b) setting a search frequency for each of the first channel and the second channel; and (c) detecting a pitch of the first channel and a pitch of the second channel.	09-03-2009
20090240490	METHOD AND APPARATUS FOR CONCEALING PACKET LOSS, AND APPARATUS FOR TRANSMITTING AND RECEIVING SPEECH SIGNAL - A method and apparatus for concealing frame loss and an apparatus for transmitting and receiving a speech signal that are capable of reducing speech quality degradation caused by packet loss are provided. In the method, when loss of a current received frame occurs, a random excitation signal having the highest correlation with a periodic excitation signal (i.e., a pitch excitation signal) decoded from a previous frame received without loss is used as a noise excitation signal to recover an excitation signal of a current lost frame. Furthermore, a third, new attenuation constant (AS) is obtained by summing a first attenuation constant (NS) obtained based on the number of continuously lost frames and a second attenuation constant (PS) predicted in consideration of change in amplitude of previously received frames to adjust the amplitude of the recovered excitation signal for the current lost frame. Speech quality degradation caused by packet loss can be reduced for enhanced communication quality in a packet network environment with continuous frame loss.	09-24-2009
20090281797	BIT ERROR CONCEALMENT FOR AUDIO CODING SYSTEMS - A bit error concealment (BEC) system and method is described herein that detects and conceals the presence of click-like artifacts in an audio signal caused by bit errors introduced during transmission of the audio signal within an audio communications system. A particular embodiment of the present invention utilizes a low-complexity design that introduces no added delay and that is particularly well-suited for applications such as Bluetooth® wireless audio devices which have low cost and low power dissipation requirements.	11-12-2009
20090299736	Pitch period equalizing apparatus and pitch period equalizing method, and speech coding apparatus, speech decoding apparatus, and speech coding method - To provide a speech coding technology that realizes a low bit rate and can suppress distortion of reproduction speech as compared with a conventional technology.	12-03-2009
20090319261	CODING OF TRANSITIONAL SPEECH FRAMES FOR LOW-BIT-RATE APPLICATIONS - Systems, methods, and apparatus for low-bit-rate coding of transitional speech frames are disclosed.	12-24-2009
20090319262	CODING SCHEME SELECTION FOR LOW-BIT-RATE APPLICATIONS - Systems, methods, and apparatus for low-bit-rate coding of transitional speech frames are disclosed.	12-24-2009
20090326930	SPEECH DECODING APPARATUS AND SPEECH ENCODING APPARATUS - Provided is an audio decoding device capable of suppressing an information amount for a lost frame compensation process and encoding efficiency. In this device, a decoded sound source generation unit (	12-31-2009
20100010810	POST FILTER AND FILTERING METHOD - When a decoding audio signal is to be acquired by pitch-filtering a combined signal of a sub-frame length, a decoding audio signal is continuously changed at the boundary between sub-frames. The post filter includes: a first filter coefficient calculation unit (	01-14-2010
20100017200	ENCODING DEVICE, DECODING DEVICE, AND METHOD THEREOF - Disclosed is an encoding device which can accurately specify a band having a large error among all the bands by using a small calculation amount. The device includes: a first position identification unit (	01-21-2010
20100017201	Data embedding apparatus, data extraction apparatus, and voice communication system - A voice communication system having, on a transmission side, a data embedding apparatus provided with an embedding allowability judgment unit (	01-21-2010
20100023321	Voice processing apparatus and method - Character extraction section extracts character amounts, pertaining to a prosody of voice, from a voice signal sequentially in a time-serial manner. Difference value calculation calculates a difference value between each of the extracted character amounts and a reference value. Processing values, corresponding to the individual character amounts, are generated in accordance with the respective difference values, and a voice processing section controls the individual character amounts of the voice signal in accordance with the processing values corresponding to the character amounts and thereby generates an output signal having a prosody changed from the prosody of the voice signal.	01-28-2010
20100049505	METHOD AND DEVICE FOR PERFORMING PACKET LOSS CONCEALMENT - A method, device and system to implement hiding the loss packet are provided. The provided method, device and system recover the lost frame according to the data before and after the lost frame and enhances the correlation of the recovered lost frame data and the data after the lost frame. A method and device for estimating pitch period are also provided which select a pitch period from the initial pitch period and the pitch periods corresponding to the frequencies which are one or more times higher than the frequencies corresponding to the initial pitch period as the final estimated pitch period, may improve frequency multiplication when estimating the pitch period; in addition, by tuning of the pitch period by matching the waves, the error of estimating pitch period may be reduced and the quality of the audio data may be improved.	02-25-2010
20100049506	METHOD AND DEVICE FOR PERFORMING PACKET LOSS CONCEALMENT - A method, device and system to implement hiding the loss packet are provided. The provided method, device and system recover the lost frame according to the data before and after the lost frame and enhances the correlation of the recovered lost frame data and the data after the lost frame. A method and device for estimating pitch period are also provided which select a pitch period from the initial pitch period and the pitch periods corresponding to the frequencies which are one or more times higher than the frequencies corresponding to the initial pitch period as the final estimated pitch period, may improve frequency multiplication when estimating the pitch period; in addition, by tuning of the pitch period by matching the waves, the error of estimating pitch period may be reduced and the quality of the audio data may be improved.	02-25-2010
20100063804	ADAPTIVE SOUND SOURCE VECTOR QUANTIZATION DEVICE AND ADAPTIVE SOUND SOURCE VECTOR QUANTIZATION METHOD - Provided is an adaptive sound source vector quantization device which can always perform a pitch cycle search with a resolution appropriate for any section of the pitch cycle search range of a second sub-frame when a pitch cycle search range of the second sub-frame changes in accordance with a pitch cycle of a first sub-frame. The device includes a first pitch cycle instruction unit (	03-11-2010
20100063805	NON-CAUSAL POSTFILTER - A decoder arrangement comprising a receiver input for parameters of frame-based coded signals and a decoder arranged to provide frames of decoded audio signals based on the parameters. The receiver input and/or the decoder is arranged to establish a time difference between the occasion when parameters of a first frame is available at the receiver input and the occasion when a decoded audio signal of the first frame is available at an output of the decoder, which time difference corresponds to at least one frame. A postfilter is connected to the output of the decoder and to the receiver input. The postfilter is arranged to provide a filtering of the frames of decoded audio signals into an output signal in response to parameters of a respective subsequent frame.	03-11-2010
20100063806	Classification of Fast and Slow Signal - Low bit rate audio coding such as BWE algorithm often encounters conflict goal of achieving high time resolution and high frequency resolution at the same time. In order to achieve best possible quality, input signal can be first classified into fast signal and slow signal. This invention focuses on classifying signal into fast signal and slow signal, based on at least one of the following parameters or a combination of the following parameters: spectral sharpness, temporal sharpness, pitch correlation (pitch gain), and/or spectral envelope variation. This classification information can help to choose different BWE algorithms, different coding algorithms, and different postprocessing algorithms respectively for fast signal and slow signal.	03-11-2010
20100070269	Adding Second Enhancement Layer to CELP Based Core Layer - In an embodiment, a method of transmitting an input audio signal is disclosed. A first coding error of the input audio signal with a scalable codec having a first enhancement layer is encoded, and a second coding error is encoded using a second enhancement layer after the first enhancement layer. Encoding the second coding error includes coding fine spectrum coefficients of the second coding error to produce coded fine spectrum coefficients, and coding a spectral envelope of the second coding error to produce a coded spectral envelope. The coded fine spectrum coefficients and the coded spectral envelope are transmitted.	03-18-2010
20100070270	CELP Post-processing for Music Signals - In one embodiment, a method of receiving a decoded audio signal that has a transmitted pitch lag is disclosed. The method includes estimating pitch correlations of possible short pitch lags that are smaller than a minimum pitch limitation and have an approximated multiple relationship with the transmitted pitch lag, checking if one of the pitch correlations of the possible short pitch lags is large enough compared to a pitch correlation estimated with the transmitted pitch lag, and selecting a short pitch lag as a corrected pitch lag if a corresponding pitch correlation is large enough. The postprocessing is performed using the corrected pitch lag. In another embodiment, when the existence of irregular harmonics or wrong pitch lag is detected, a coded-excited linear prediction (CELP) postfilter is made more aggressive.	03-18-2010
20100106488	VOICE ENCODING DEVICE AND VOICE ENCODING METHOD - Provided is an audio encoding device which can detect an optimal pitch pulse when using pitch pulse information as redundant information. The device includes: a search start decision unit (	04-29-2010
20100106489	Method and System for Speech Quality Prediction of the Impact of Time Localized Distortions of an Audio Transmission System - Method and processing system for establishing the impact of time response distortion of an input signal which is applied to an audio transmission system (	04-29-2010
20100161323	AUDIO ENCODING DEVICE, AUDIO DECODING DEVICE, AND THEIR METHOD - Provided is an audio encoding device capable of preventing audio quality degradation of a decoded signal. In the audio encoding device, a noise analysis unit (	06-24-2010
20100169084	METHOD AND APPARATUS FOR PITCH SEARCH - The present invention relates to a method and apparatus for pitch search. One method includes: obtaining a characteristic function value of a residual signal, where the residual signal is a result of removing a Long-Term Prediction (LTP) contribution signal from input speech signals; and obtaining a pitch according to the characteristic function value of the residual signal.	07-01-2010
20100169085	MODEL BASED REAL TIME PITCH TRACKING SYSTEM AND SINGER EVALUATION METHOD - The various embodiments herein provide a system and method to track the pitch of a human being in real time using time varying model. According to one embodiment, the input voice is synthesised to obtain a lower order model. The lower model is down sampled and fitted to a time varying 2nd order model. The down sampled signal is passed through a pitch tracking filter, a fading filter and a gradient filter to obtain a pitch signal in real time. The noise included in the pitch signal is removed by passing the acquired pitch signal through a Kalman filter to obtain a smoothened pitch signal in real time.	07-01-2010
20100174534	Speech coding - A method of encoding speech, the method comprising: receiving a signal representative of speech to be encoded; at each of a plurality of intervals during the encoding, determining a pitch lag between portions of the signal having a degree of repetition; selecting for a set of said intervals a pitch lag vector from a pitch lag codebook of such vectors, each pitch lag vector comprising a set of offsets corresponding to the offset between the pitch lag determined for each said interval and an average pitch lag for said set of intervals, and transmitting an indication of the selected vector and said average over a transmission medium as part of the encoded signal representative of said speech.	07-08-2010
20100174535	Filtering speech - A method of filtering a speech signal for speech encoding in a communications network, includes determining a cut off frequency for a filter, wherein a component of the speech signal in a frequency range less than the cut off frequency is to be attenuated by the filter; receiving the speech signal at the filter; determining at least one parameter of the received speech signal, the at least one parameter providing an indication of the energy of the component of the received speech signal that is to be attenuated; and adjusting the cut off frequency in dependence on the at least one parameter, thereby adjusting the frequency range to be attenuated.	07-08-2010
20100174536	METHOD AND DEVICE FOR ADAPTIVE BANDWIDTH PITCH SEARCH IN CODING WIDEBAND SIGNALS - A pitch search method and device for digitally encoding a wideband signal, in particular but not exclusively a speech signal, in view of transmitting, or storing, and synthesizing this wideband sound signal. The new method and device which achieve efficient modeling of the harmonic structure of the speech spectrum uses several forms of low pass filters applied to a pitch codevector, the one yielding higher prediction gain (i.e. the lowest pitch prediction error) is selected and the associated pitch codebook parameters are forwarded.	07-08-2010
20100185441	Error Concealment - A method of updating a state of a decoder that decodes successive portions of a data stream representing an encoded voice signal in dependence on its state, the method comprising: at the decoder, decoding portions of the data stream to form decoded portions; storing the decoded portions; storing respective decoder states held by the decoder after forming each decoded portion; identifying that a portion of the data stream is degraded; estimating a pitch period of a stored decoded portion formed by decoding a portion of the data stream that precedes the degraded portion of the data stream; selecting a stored decoder state held by the decoder after decoding a portion of the data stream that precedes the degraded portion by a multiple of the estimated pitch period; and updating the state of the decoder with the selected decoder state.	07-22-2010
20100191524	NON-SPEECH SECTION DETECTING METHOD AND NON-SPEECH SECTION DETECTING DEVICE - A non-speech section detecting device generating a plurality of frames having a given time length on the basis of sound data obtained by sampling sound, and detecting a non-speech section having a frame not containing voice data based on speech uttered by a person, the device including: a calculating part calculating a bias of a spectrum obtained by converting sound data of each frame into components on a frequency axis; a judging part judging whether the bias is greater than or equal to a given threshold or alternatively smaller than or equal to a given threshold; a counting part counting the number of consecutive frames judged as having a bias greater than or equal to the threshold or alternatively smaller than or equal to the threshold; a count judging part judging whether the obtained number of consecutive frames is greater than or equal to a given value.	07-29-2010
20100211384	Pitch detection method and apparatus - A pitch detection method and apparatus are disclosed. The method includes: performing pitch detection on an input signal in a signal domain, and obtaining a candidate pitch; performing linear prediction (LP) on the input signal, and obtaining an LP residual signal; setting a candidate pitch range that includes the candidate pitch; searching the candidate pitch range for the LP residual signal, and obtaining a selected pitch.	08-19-2010
20100228543	METHOD AND APPARATUS FOR EXTENDING THE BANDWIDTH OF A SPEECH SIGNAL - A bandwidth extension module, and an associated method and computer-readable medium, suitable for use in artificially extending the bandwidth of a lowband speech signal. The bandwidth extension module comprises a band-pass filter configured to produce a band-pass signal from the lowband speech signal; at least one carrier frequency modulator, each carrier frequency modulator configured to pitch-synchronously modulate the band-pass signal about a respective carrier frequency, the at least one carrier frequency modulator collectively producing a highband speech signal component; a synthesis filter configured to determine a highband speech signal based on the highband speech signal component; and a summation module configured to combine the lowband speech signal with the highband speech signal to obtain a bandwidth-extended speech signal.	09-09-2010
20100235166	APPARATUS AND METHOD FOR TRANSFORMING AUDIO CHARACTERISTICS OF AN AUDIO RECORDING - A method of audio processing comprises composing one or more transformation profiles for transforming audio characteristics of an audio recording and then generating for the or each transformation profile, a metadata set comprising transformation profile data and location data indicative of where in the recording the transformation profile data is to be applied; the or each metadata set is then stored in association with the corresponding recording. A corresponding method of audio reproduction comprises reading a recording and a meta-data set associated with that recording from storage, applying transformations to the recording data in accordance with the metadata set transformation profile; and then outputting the transformed recording.	09-16-2010
20100241424	Open-Loop Pitch Track Smoothing - There is provided a speech encoder for performing an algorithm that comprises obtaining (	09-23-2010
20100268530	Signal Pitch Period Estimation - A method and apparatus for estimating the pitch period of a signal. The method includes identifying a first candidate pitch period by performing a search only over a first range of potential pitch periods. The method further includes determining a second candidate pitch period by dividing the first candidate pitch period by an integer, wherein the second candidate pitch period is outside the first range of potential pitch periods. The method further includes selecting as the estimate of the pitch period of the signal the smaller of the candidate pitch periods that is such that portions of the signal separated by that candidate pitch period are well correlated.	10-21-2010
20100305944	Pitch Or Periodicity Estimation - A method of estimating a pitch period of a first portion of a signal wherein the first portion overlaps a previous portion. The method comprises computing a first autocorrelation value for part of the first portion not overlapping the previous portion. The method further comprises retrieving a stored second autocorrelation value for part of the first portion overlapping the previous portion, the second autocorrelation value having been computed during estimation of a pitch period of the previous portion. The method further comprises forming a combined autocorrelation value using the first and second autocorrelation values, and selecting the estimated pitch period in dependence on the combined autocorrelation value.	12-02-2010
20100318349	SYNTHESIS OF LOST BLOCKS OF A DIGITAL AUDIO SIGNAL, WITH PITCH PERIOD CORRECTION - The present invention relates to signal modification before pitch period repetition for the synthesis of blocks lost on decoding digital audio signals. The effects of repetition of transitories, such as the plosives of a speech signal, are avoided by comparing the samples of a pitch period with those of the previous pitch period. The signal is modified preferentially by taking the minimum between a current sample (e(	12-16-2010
20100332221	ENCODING DEVICE, DECODING DEVICE, AND METHOD THEREOF - It is possible to improve quality of a decoding signal in a band spread for estimating a high band from a low band of a decoding signal. A first layer encoding unit (	12-30-2010
20110004467	VOCAL AND INSTRUMENTAL AUDIO EFFECTS - Systems, methods, and computer program products are provided for producing audio and/or visual effects according to a correlation between reference data and estimated note data derived from an input acoustic audio waveform. Some embodiments calculate a pitch score as a function of a pitch estimate derived from the input waveform, a reference pitch, and a real-time-adjustable pitch gating window. Other embodiments calculate the pitch score as a function of pitch and timing estimates derived from the input waveform, reference pitch and note timing data, an adjustable rhythm gating window, and an adjustable pitch gating window. The audio and/or visual effects are produced according to the pitch score, and may be used to generate outputs (e.g., in real time) for affecting a live performance, an audio mix, a video gaming environment, an educational feedback environment, etc.	01-06-2011
20110054886	EFFECT DEVICE - An effect device may be configured such that when an input audio signal switches from a consonant to a vowel and an input level of the switched vowel is greater than a threshold value Lc (and a variable t is greater than time Ts), an audio effect signal A may be generated. Such an effect device may allow for increasing the occurrences when portamento is simulated, while still sounding natural. In general, a detecting module detects whether an audio signal is a vowel sound or a consonant sound and whether the audio signal changed from a consonant sound to a vowel sound; and a pitch change module changes a pitch of the audio signal and changes, based on a prescribed function, an amount the pitch is changed to produce a modified audio signal, when the audio signal changed from a consonant sound to a vowel sound.	03-03-2011
20110066426	REAL-TIME SPEAKER-ADAPTIVE SPEECH RECOGNITION APPARATUS AND METHOD - A speech recognition apparatus and method for real-time speaker adaptation are provided. The speech recognition apparatus may estimate a pitch of a speech section from an inputted speech signal, extract a speech feature for speech recognition based on the estimated pitch, and perform speech recognition with respect to the speech signal based on the speech feature. The speech recognition apparatus may be adaptively normalized depending on a speaker. Thus, the speech recognition apparatus may extract a speech feature for speech recognition, and may improve a performance of speech recognition based on the extracted speech feature.	03-17-2011
20110087488	SPEECH SYNTHESIS APPARATUS AND METHOD - According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.	04-14-2011
20110087489	Method and Apparatus for Performing Packet Loss or Frame Erasure Concealment - The invention concerns a method and apparatus for performing packet loss or Frame Erasure Concealment (FEC) for a speech coder that does not have a built-in or standard FEC process. A receiver with a decoder receives encoded frames of compressed speech information transmitted from an encoder. A lost frame detector at the receiver determines if an encoded frame has been lost or corrupted in transmission, or erased. If the encoded frame is not erased, the encoded frame is decoded by a decoder and a temporary memory is updated with the decoder's output. A predetermined delay period is applied and the audio frame is then output. If the lost frame detector determines that the encoded frame is erased, a FEC module applies a frame concealment process to the signal. The FEC processing produces natural sounding synthetic speech for the erased frames.	04-14-2011
20110099005	FRAMING METHOD AND APPARATUS - A framing method and apparatus are disclosed to overcome inconsistency of gains between sub-frames caused by simple average framing in the prior art. The method includes: obtaining the Linear Prediction Coding (LPC) order and the pitch of the signal; removing the samples inapplicable to Long-Term Prediction (LTP) synthesis according to the LPC prediction order and the pitch; and splitting the remaining samples of the signal into several sub-frames. The technical solution under the present invention is applicable to the multimedia speech coding field.	04-28-2011
20110125491	Speech Intelligibility - The perceived quality of a speech signal is improved by estimating the average power of first and second signal components and applying a first gain factor to the second signal components to generate adjusted second signal components. The first gain factor is selected such that on application of the first gain factor to the second signal components, the ratio of the average power of the first signal components to the average power of the adjusted second signal components would be a first predetermined value, the first predetermined value being such as to inhibit perceptual distortion of the improved speech signal.	05-26-2011
20110125492	Speech Intelligibility - The perceived quality of a narrowband speech signal truncated from a wideband speech signal is improved by generating in a third frequency band third speech components matching first speech components in a first frequency band of the narrowband signal, and generating in a fourth frequency band fourth speech components matching second speech components in a second frequency band of the narrowband signal. A first gain factor is applied to the third speech components to generate adjusted third speech components, and a second gain factor is applied to the fourth speech components to generate adjusted fourth speech components, the gain factors being selected such that the ratios of the average powers of the adjusted third and fourth speech components to the average power of the first speech components are predetermined values.	05-26-2011
20110125493	VOICE QUALITY CONVERSION APPARATUS, PITCH CONVERSION APPARATUS, AND VOICE QUALITY CONVERSION METHOD - The voice quality conversion apparatus includes: low-frequency harmonic level calculating units and a harmonic level mixing unit for calculating a low-frequency sound source spectrum by mixing a level of a harmonic of an input sound source waveform and a level of a harmonic of a target sound source waveform at a predetermined conversion ratio for each order of harmonics including fundamental, in a frequency range equal to or lower than a boundary frequency; a high-frequency spectral envelope mixing unit that calculates a high-frequency sound source spectrum by mixing the input sound source spectrum and the target sound source spectrum at the predetermined conversion ratio in a frequency range larger than the boundary frequency; and a spectrum combining unit that combines the low-frequency sound source spectrum with the high-frequency sound source spectrum at the boundary frequency to generate a sound source spectrum for an entire frequency range.	05-26-2011
20110144981	CONTINUOUS PITCH-CORRECTED VOCAL CAPTURE DEVICE COOPERATIVE WITH CONTENT SERVER FOR BACKING TRACK MIX - Techniques have been developed to facilitate (1) the capture and pitch correction of vocal performances on handheld or other portable computing devices and (2) the mixing of such pitch-corrected vocal performances with backing tracks for audible rendering on targets that include such portable computing devices and as well as desktops, workstations, gaming stations, even telephony targets. Implementations of the described techniques employ signal processing techniques and allocations of system functionality that are suitable given the generally limited capabilities of such handheld or portable computing devices and that facilitate efficient encoding and communication of the pitch-corrected vocal performances (or precursors or derivatives thereof) via wireless and/or wired bandwidth-limited networks for rendering on portable computing devices or other targets.	06-16-2011
20110144982	CONTINUOUS SCORE-CODED PITCH CORRECTION - Vocal musical performances may be captured and continuously pitch-corrected at a mobile device for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at the mobile device in accord with pitch correction settings. In some cases, such pitch correction settings code a particular key or scale for the vocal performance or for portions thereof. In some cases, pitch correction settings include a score-coded melody sequence of note targets supplied with, or for association with, the lyrics and/or backing track. In some cases, pitch correction settings are dynamically variable based on gestures captured at a user interface.	06-16-2011
20110144983	WORLD STAGE FOR PITCH-CORRECTED VOCAL PERFORMANCES - Techniques have been developed to facilitate the capture performances on handheld or other portable computing devices and, in some cases, the pitch-correction and mixing of such vocal performances with backing tracks for audible rendering on such devices. Captivating visual animations and/or facilities for listener comment and ranking are provided in association with an audible rendering of a performance, e.g., a vocal performance captured and pitch-corrected at another similarly configured mobile device and mixed with backing instrumentals and/or vocals. Geocoding of captured vocal performances and/or listener feedback may facilitate animations or display artifacts in ways that are suggestive of a performance or endorsement emanating from a particular geographic locale on a user manipulable globe. In this way, implementations of the described functionality can transform otherwise mundane mobile devices into social instruments that foster a unique sense of global connectivity and community.	06-16-2011
20110184732	SIGNAL PRESENCE DETECTION USING BI-DIRECTIONAL COMMUNICATION DATA - A system and method for using bi-directional conversation data to improve signal presence detection are disclosed. The detector module is adapted to communicate with a signal enhancement module. The detector module collects data from a transmit direction of the connection and a receive direction of a data connection. The collected data from the transmit and the receive direction is used to classify at least one of data in the transmit direction and data in the receive direction. Responsive to the classification, the signal enhancement module enhances data in one of the transmit direction and the receive direction. Hence, data classification accuracy is improved by using data from both the transmit and receive directions. In one embodiment, the detector module applies a voice activity detection module (VAD) process to detect the presence or absence of voice data in the collected data.	07-28-2011
20110191102	SYSTEMS AND METHODS FOR SPEECH EXTRACTION - In some embodiments, a processor-readable medium stores code representing instructions to cause a processor to receive an input signal having a first component and a second component. An estimate of the first component of the input signal is calculated based on an estimate of a pitch of the first component of the input signal. An estimate of the input signal is calculated based on the estimate of the first component of the input signal and an estimate of the second component of the input signal. The estimate of the first component of the input signal is modified based on a scaling function to produce a reconstructed first component of the input signal. The scaling function is a function of at least one of the input signal, the estimate of the first component of the input signal, the estimate of the second component of the input signal, or a residual signal.	08-04-2011
20110196673	CONCEALING LOST PACKETS IN A SUB-BAND CODING DECODER - An electronic device for reconstructing a lost packet in a Sub-Band Coding (SBC) decoder is described. The electronic device includes a processor and instructions stored in memory. The electronic device detects a lost packet, obtains a zero-input response of a synthesis filter bank and obtains a coarse pitch estimate. The electronic device also obtains a fine pitch estimate based on the zero-input response and the coarse pitch estimate. The electronic device selects a last pitch period based on the fine pitch estimate and uses samples from the last pitch period for the lost packet.	08-11-2011
20110196674	SPECTRUM CODING APPARATUS, SPECTRUM DECODING APPARATUS, ACOUSTIC SIGNAL TRANSMISSION APPARATUS, ACOUSTIC SIGNAL RECEPTION APPARATUS AND METHODS THEREOF - A spectrum coding apparatus capable of performing coding at a low bit rate and with high quality is disclosed. This apparatus is provided with a section that performs the frequency transformation of a first signal and calculates a first spectrum, a section that converts the frequency of a second signal and calculates a second spectrum, a section that estimates the shape of the second spectrum in a band of FL≦k	08-11-2011
20110218800	METHOD AND APPARATUS FOR OBTAINING PITCH GAIN, AND CODER AND DECODER - The present invention relates to a method and apparatus for obtaining a pitch gain, and a coder and a decoder. The method includes: obtaining information about an input signal; and obtaining a pitch gain corresponding to the information about the input signal according to the correspondence between the signal information and the pitch gain. The embodiments of the present invention obtain the corresponding pitch gain according to the signal information by using the obtained correspondence between the signal information and the pitch gain, and the pitch gain is applicable to the coder and the decoder, thus making it unnecessary for the coder to transmit the pitch gain to the decoder and solving the problem of bit overhead. The embodiments of the present invention determine the pitch gain adaptively according to the signal information, avoid consumption of extra bits for quantizing the pitch gain, avoid impact on the coding performance, and improve the compression ratio.	09-08-2011
20110224977	ROBOT, METHOD AND PROGRAM OF CONTROLLING ROBOT - A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.	09-15-2011
20110246188	NONVOLATILE STORAGE SYSTEM AND MUSIC SOUND GENERATION SYSTEM - A music sound generation system is formed with a high sound quality and with a small size using a large-capacity NAND flash memory for storing music sound data. Music sound data is divided into N pitch groups and stored into N different storage modules as being divided in these storage modules. A sound generation command classification unit (	10-06-2011
20110251840	PITCH-CORRECTION OF VOCAL PERFORMANCE IN ACCORD WITH SCORE-CODED HARMONIES - Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured on mobile devices in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at a portable computing device (such as a mobile phone, personal digital assistant, laptop computer, notebook computer, pad-type computer or netbook) in accord with pitch correction settings. In some cases, pitch correction settings include a score-coded melody and/or harmonies supplied with, or for association with, the lyrics and backing tracks. Harmonies notes or chords may be coded as explicit targets or relative to the score coded melody or even actual pitches sounded by a vocalist, if desired.	10-13-2011
20110251841	COORDINATING AND MIXING VOCALS CAPTURED FROM GEOGRAPHICALLY DISTRIBUTED PERFORMERS - Despite many practical limitations imposed by mobile device platforms and application execution environments, vocal musical performances may be captured and continuously pitch-corrected for mixing and rendering with backing tracks in ways that create compelling user experiences. Based on the techniques described herein, even mere amateurs are encouraged to share with friends and family or to collaborate and contribute vocal performances as part of virtual “glee clubs.” In some implementations, these interactions are facilitated through social network- and/or eMail-mediated sharing of performances and invitations to join in a group performance. Using uploaded vocals captured at clients such as a mobile device, a content server (or service) can mediate such virtual glee clubs by manipulating and mixing the uploaded vocal performances of multiple contributing vocalists.	10-13-2011
20110251842	COMPUTATIONAL TECHNIQUES FOR CONTINUOUS PITCH CORRECTION AND HARMONY GENERATION - Using signal processing techniques described herein, pitch detection and correction of a user's vocal performance can be performed continuously and in real-time with respect to the audible rendering of the backing track at the handheld or portable computing device. In some implementations, pitch detection builds on time-domain pitch correction techniques that employ average magnitude difference function (AMDF) or autocorrelation-based techniques together with zero-crossing and/or peak picking techniques to identify differences between pitch of a captured vocal signal and score-coded target pitches. Based on detected differences, pitch correction based on pitch synchronous overlapped add (PSOLA) and/or linear predictive coding (LPC) techniques allow captured vocals to be pitch shifted in real-time to “correct” notes in accord with pitch correction settings that code score-coded melody targets and harmonies.	10-13-2011
20110276323	SPEECH-BASED SPEAKER RECOGNITION SYSTEMS AND METHODS - The illustrative embodiments described herein provide systems and methods for authenticating a speaker. In one embodiment, a method includes receiving reference speech input including a reference passphrase to form a reference recording, and receiving test speech input including a test passphrase to form a test recording. The method includes determining whether the test passphrase matches the reference passphrase, and determining whether one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase. The method authenticates the speaker of the test speech input in response to determining that the reference passphrase matches the test passphrase and that one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase.	11-10-2011
20110276324	Adaptive Filter Pitch Extraction - An enhancement system extracts pitch from a processed speech signal. The system estimates the pitch of voiced speech by deriving filter coefficients of an adaptive filter and using the obtained filter coefficients to derive pitch. The pitch estimation may be enhanced by using various techniques to condition the input speech signal, such as spectral modification of the background noise and the speech signal, and/or reduction of the tonal noise from the speech signal.	11-10-2011
20110313759	Method for changing the caller voice during conversation in voice communication device - The invention relates to a cellular phone terminal system and in particular to a method for changing caller's voice of speech signal during conversation. The cellular phone terminal system has a filter for filtering signal. The method comprises the steps of: waiting for a caller voice selector key input for a desired caller voice when a caller voice converter key is pressed during conversation; and setting an even or odd harmonic deletion bins on the frequency domain of the uncompressed speech signal correspondingly to the caller voice selector key input to change caller voice.	12-22-2011
20120004907	SYSTEM AND METHOD FOR BIOMETRIC ACOUSTIC NOISE REDUCTION - Embodiments of the invention provide a communication device and methods for generating enhanced audio signals. An audio signal comprising a speech signal and a noise signals is acquired at the communication device. A noise processor of the communication device detects a pitch estimation of the audio signal. Thereafter, the audio signal is processed based on the pitch estimation and processing parameters of the audio signals to remove noise signals and generate an enhanced audio signal.	01-05-2012
20120004908	VOICE RECOGNITION TERMINAL - A voice recognition terminal executes a local voice recognition process and utilizes an external center voice recognition process. The terminal includes: a voice message synthesizing element for synthesizing at least one of a voice message to be output from a speaker according to the external center voice recognition process and a voice message to be output from the speaker according to the local voice recognition process so as to distinguish between characteristics of the voice message to be output from the speaker according to the external center voice recognition process and characteristics of the voice message to be output from the speaker according to the local voice recognition process; and a voice output element for outputting a synthesized voice message from the speaker.	01-05-2012
20120072208	DETERMINING PITCH CYCLE ENERGY AND SCALING AN EXCITATION SIGNAL - An electronic device for determining a set of pitch cycle energy parameters is described. The electronic device includes a processor and executable instructions stored in memory. The electronic device obtains a frame, a set of filter coefficients and a residual signal based on the frame and the set of filter coefficients. The electronic device determines a set of peak locations based on the residual signal and segments the residual signal such that each segment includes one peak. The electronic device determines a first set of pitch cycle energy parameters based on a frame region between two consecutive peak locations and maps regions between peaks in the residual signal to regions between peaks in a synthesized excitation signal to produce a mapping. The electronic device determines a second set of pitch cycle energy parameters based on the first set of pitch cycle energy parameters and the mapping.	03-22-2012
20120072209	ESTIMATING A PITCH LAG - An electronic device for estimating a pitch lag is described. The electronic device includes a processor and executable instructions stored in memory that is in electronic communication with the processor. The electronic device obtains a current frame. The electronic device also obtains a residual signal based on the current frame. The electronic device additionally determines a set of peak locations based on the residual signal. Furthermore, the electronic device obtains a set of pitch lag candidates based on the set of peak locations. The electronic device also estimates a pitch lag based on the set of pitch lag candidates.	03-22-2012
20120089390	PITCH CORRECTED VOCAL CAPTURE FOR TELEPHONY TARGETS - Vocal musical performances may be captured and pitch corrected and supplied to telephony targets such as conventional voice terminal equipment (telephone handsets, answering machines, etc.), wireless telephony devices and information services wherein particular device or subscriber targets are identifiable using telephone numbers or alphanumeric IDs (e.g., mobile phones with or without text/multimedia messaging support, VoIP terminals, answering or voicemail services, ASP-based telephony services, etc.) and/or telco or premises-based telephony equipment, such as switches, with support for customizable ringback tones. To facilitate the foregoing, techniques have been developed for capture and audible rendering of vocal performances on handheld or other portable devices using signal processing techniques suitable given the somewhat limited capabilities of such devices and in ways that facilitate efficient encoding and communication of such captured performances via ubiquitous, though bandwidth limited, wireless networks and through communication channels typical of the wired and wireless telephony networks.	04-12-2012
20120089391	ESTIMATION OF SPEECH MODEL PARAMETERS - Methods for estimating speech model parameters are disclosed. For pulsed parameter estimation, a speech signal is divided into multiple frequency bands or channels using bandpass filters. Channel processing reduces sensitivity to pole magnitudes and frequencies and reduces impulse response time duration to improve pulse location and strength estimation performance. These methods are useful for high quality speech coding and reproduction at various bit rates for applications such as satellite and cellular voice communication.	04-12-2012
20120101814	Artifact Reduction in Packet Loss Concealment - Various techniques are disclosed for improving packet loss concealment to reduce artifacts by using audio character measures of the audio signal. These techniques include attenuation to a noise fill instead of attenuation to silence, varying how long to wait before attenuating the extrapolation, varying the rate of attenuation of the extrapolation, attenuating periodic extrapolation at a different rate than non-periodic extrapolation, and performing period extrapolation on successively longer fill data based on the audio character measures, adjusting weighting between periodic and non-periodic extrapolation based on the audio character measures, and adjusting weighting between periodic extrapolation and non-periodic extrapolation non-linearly.	04-26-2012
20120101815	QUERY BY HUMMING FOR RINGTONE SEARCH AND DOWNLOAD - Described is a technology by which a user hums, sings or otherwise plays a user-provided rendition of a ringtone (or ringback tone) through a mobile telephone to a ringtone search service (e.g., a WAP, interactive voice response or SMS-based search platform). The service matches features of the user's rendition against features of actual ringtones to determine one or more matching candidate ringtones for downloading. Features may include pitch contours (up or down), pitch intervals and durations of notes. Matching candidates may be ranked based on the determined similarity, possibly in conjunction with weighting criterion such as the popularity of the ringtone and/or the importance of the matched part. The candidate set may be augmented with other ringtones independent of the matching, such as the most popular ones downloaded by other users, ringtones from similar artists, and so forth.	04-26-2012
20120109645	DSP-BASED DEVICE FOR AUDITORY SEGREGATION OF MULTIPLE SOUND INPUTS - There is provided a unique signal processing technique for localizing and characterizing each of a number of differently located acoustic sources. Specifically there is provided a method for auditory segregation of multiple voice inputs comprising the steps of: receiving a plurality of voice input signals from different source locations; filtering said voice input signals with head related transfer functions (HRTF) using a digital signal processor (DSP) thereby assigning the voice input signals to different locations in virtual auditory space; and changing the HRTF filtered voice input signals in two dimensions, wherein pitch is changed and the signal is filtered with different filters emulating vocal tracts of different sizes thereby further segregating the voice input signals from each other.	05-03-2012
20120116756	METHOD FOR TONE/INTONATION RECOGNITION USING AUDITORY ATTENTION CUES - In a spoken language processing method for tone/intonation recognition, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more tonal characteristics corresponding to the input window of sound can be determined by mapping the cumulative gist vector to one or more tonal characteristics using a machine learning algorithm.	05-10-2012
20120136655	SPEECH PROCESSING APPARATUS AND SPEECH PROCESSING METHOD - A signal portion is extracted per frame having a specific duration from an input signal, thus generating a per-frame input signal. The per-frame input signal in the time domain is converted into a per-frame input signal in the frequency domain, thereby generating a spectral pattern of spectra. Peak spectra having peaks are detected in the spectral pattern. A harmonic spectrum is determined, in the peak spectra, having a harmonic structure showing a relationship between a fundamental pitch and a harmonic overtone.	05-31-2012
20120143600	Speech Synthesis information Editing Apparatus - In a speech synthesis information editing apparatus, a phoneme storage unit stores phoneme information that designates a duration of each phoneme of speech to be synthesized. A feature storage unit stores feature information that designates a time variation in a feature of the speech. An edition processing unit changes a duration of each phoneme designated by the phoneme information with an expansion/compression degree depending on a feature designated by the feature information in correspondence to the phoneme.	06-07-2012
20120143601	Method and System for Determining a Perceived Quality of an Audio System - The invention relates to a method for determining a quality indicator representing a perceived quality of an output signal of an audio system with respect to a reference signal. The reference signal and the output signal are processed and compared. The processing includes dividing the reference signal and the output signal into mutually corresponding time frames. Additionally, the processing includes scaling the intensity of the reference signal towards a fixed intensity level, and then performing measurements on time frames within the scaled reference signal for determining reference signal time frame characteristics. The intensity of the reference signal is then scaled from the fixed intensity level towards an intensity level related to the output signal. Further on in the method, the loudness of the output signal is scaled towards a fixed loudness level in the perceptual loudness domain. This scaling action uses the reference signal time frame characteristics. Finally, the loudness of the reference signal is scaled from a loudness level corresponding to the output signal related intensity level towards a loudness level related to the loudness level of the scaled output signal in the perceptual loudness domain. This scaling action also uses the reference signal time frame characteristics.	06-07-2012
20120166187	SYSTEM AND METHOD FOR AUDIO SYNTHESIZER UTILIZING FREQUENCY APERTURE ARRAYS - A system and method for audio synthesizer utilizing frequency aperture cells (FAC) and frequency aperture arrays (FAA). In accordance with an embodiment, an audio processing system can be provided for the transformation of audio-band frequencies for musical and other purposes. In accordance with an embodiment, a single stream of mono, stereo, or multi-channel monophonic audio can be transformed into polyphonic music, based on a desired target musical note or set of multiple notes. At its core, the system utilizes an input waveform(s) (which can be either file-based or streamed) which is then fed into an array of filters, which are themselves optionally modulated, to generate a new synthesized audio output.	06-28-2012
20120185244	SPEECH PROCESSING DEVICE, SPEECH PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT - According to one embodiment, in a speech processing device, an extractor windows a part of the speech signal and extracts a partial waveform. A calculator performs frequency analysis of the partial waveform to calculate a frequency spectrum. An estimator generates an artificial waveform that is a waveform according to an interval between the pitch marks for each harmonic component having a frequency that is a predetermined multiple of a fundamental frequency of the speech signal and estimates harmonic spectral features representing characteristics of the frequency spectrum of the harmonic component from each of the artificial waveforms. A separator separates the partial waveform into a periodic component produced from periodic vocal-fold vibration as an acoustic source and an aperiodic component produced from aperiodic acoustic sources other than the vocal-fold vibration by using the respective harmonic spectral features and the frequency spectrum of the partial waveform.	07-19-2012
20120209598	STATE DETECTING DEVICE AND STORAGE MEDIUM STORING A STATE DETECTING PROGRAM - A state detecting device includes an input unit that receives an input voice sound; an analyzer that calculates a feature parameter of each of plurality of frames extracted from the voice sound; a calculator that calculates the average of the feature parameters of the frames, determines a threshold on the basis of the average and statistical data representing relationships between other averages of other feature parameters obtained from a plurality of speakers and cumulative frequencies of the other feature parameters, and calculates an appearance frequency of a frame that is among the plurality of frames and whose feature parameter is larger than the threshold; a determining unit that determines, on the basis of the appearance frequency, a strained state of a vocal cord that has made the voice sound; and an output unit that outputs a result of the determination.	08-16-2012
20120303361	Method and Apparatus for Sculpting Synthesized Speech - Methods and systems for sculpting synthesized speech using a graphic user interface are disclosed. An operator enters a stream of text that is used to produce a stream of target phonetic-units. The stream of target phonetic-units is then submitted to a unit-selection process to produce a stream of selected phonetic-units, each selected phonetic-unit derived from a database of sample phonetic-units. After the stream of sample phonetic-units is selected, an operator can remove various selected phonetic-units from the stream of selected phonetic-units, prune the sample phonetic-database and edit various cost functions using the graphic user interface. The edited speech information can then be submitted to the unit-selection process to produce a second stream of selected phonetic-units.	11-29-2012
20130041656	SYSTEM AND METHOD FOR TRACKING SOUND PITCH ACROSS AN AUDIO SIGNAL - A system and method may be configured to analyze audio information derived from an audio signal. The system and method may track sound pitch across the audio signal. The tracking of pitch across the audio signal may take into account change in pitch by determining at individual time sample windows in the signal duration an estimated pitch and an estimated fractional chirp rate of the harmonics at the estimated pitch. The estimated pitch and the estimated fractional chirp rate may then be implemented to determine an estimated pitch for another time sample window in the signal duration with an enhanced accuracy and/or precision.	02-14-2013
20130041657	SYSTEM AND METHOD FOR TRACKING SOUND PITCH ACROSS AN AUDIO SIGNAL USING HARMONIC ENVELOPE - A system and method may be configured to analyze audio information derived from an audio signal. The system and method may track sound pitch across the audio signal. The tracking of pitch across the audio signal may take into account change in pitch by determining at individual time sample windows in the signal duration an estimated pitch and a representation of harmonic envelope at the estimated pitch. The estimated pitch and the representation of harmonic envelope may then be implemented to determine an estimated pitch for another time sample window in the signal duration with an enhanced accuracy and/or precision.	02-14-2013
20130046533	IDENTIFYING FEATURES IN A PORTION OF A SIGNAL REPRESENTING SPEECH - Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, processing a signal representing speech can comprise receiving a region of the signal representing speech. The region can comprise a portion of a frame of the signal representing speech classified as a voiced frame. The region can be marked based on one or more pitch estimates for the region. A cord can be identified within the region based on occurrence of one or more events within the region of the signal. For example, the one or more events can comprise one or more glottal pulses. In such cases, cord can begin with onset of a first glottal pulse and extend to a point prior to an onset of a second glottal pulse. The cord may exclude a portion of the region of the signal prior to the onset of the second glottal pulse.	02-21-2013
20130096912	SELECTIVE BASS POST FILTER - In one aspect, the invention provides an audio encoding method characterized by a decision being made as to whether the device which will decode the resulting bit stream Bitstream should apply post filtering including attenuation of interharmonic noise. Hence, the decision whether to use the post filter, which is encoded in the bit stream, is taken separately from the decision as to the most suitable coding mode. In another aspect, there is provided an audio decoding method with a decoding step followed by a post-filtering step, including interharmonic noise attenuation, and being characterized in a step of disabling the post filter in accordance with post filtering information encoded in the bit stream signal. Such a method is well suited for mixed-origin audio signals by virtue of its capability to deactivate the post filter in dependence of the post filtering information only, hence independently of factors such as the current coding mode.	04-18-2013
20130117014	MULTIPLE MICROPHONE BASED LOW COMPLEXITY PITCH DETECTOR - Disclosed are various embodiments of multiple microphone based pitch detection. In one embodiment, a method includes obtaining a primary signal and a secondary signal associated with multiple microphones. A pitch value is determined based at least in part upon a level difference between the primary and secondary signals. In another embodiment, a system includes a plurality of microphones configured to provide a primary signal and a secondary signal. A level difference detector is configured to determine a level difference between the primary and secondary signals and a pitch identifier is configured to clip the primary and secondary signals based at least in part upon the level difference. In another embodiment, a method determines the presence of voice activity based upon a pitch prediction gain variation that is determined based at least in part upon a pitch lag.	05-09-2013
20130117015	AUDIO SIGNAL DECODER, AUDIO SIGNAL ENCODER, METHOD FOR DECODING AN AUDIO SIGNAL, METHOD FOR ENCODING AN AUDIO SIGNAL AND COMPUTER PROGRAM USING A PITCH-DEPENDENT ADAPTATION OF A CODING CONTEXT - An audio signal decoder includes a context-based spectral value decoder configured to decode a codeword describing one or more spectral values or at least a portion of a number representation thereof in dependence on a context state. The audio signal decoder also includes a context state determinator configured to determine a current context state in dependence on one or more previously decoded spectral values and a time warping frequency-domain-to-time-domain converter configured to provide a time-warped time-domain representation of a given audio frame on the basis of a set of decoded spectral values provided by the context-based spectral value decoder and in dependence on the time warp information. The context-state determinator is configured to adapt the determination of the context state to a change of a fundamental frequency between subsequent audio frames. An audio signal encoder applies a comparable concept.	05-09-2013
20130132075	METHODS AND ARRANGEMENTS IN A TELECOMMUNICATIONS NETWORK - The present invention relates to a postfilter and a postfilter control to be associated with a postfilter for improving perceived quality of speech reconstructed at a speech decoder. The postfilter control comprises means for measuring stationarity of a speech signal reconstructed at a decoder, means for determining a coefficient to a postfilter control parameter based on the measured stationarity, and means for transmitting the determined coefficient to a postfilter, such that the postfilter can process the reconstructed speech signal by applying the determined coefficient to the postfilter control parameter to obtain an enhanced speech signal.	05-23-2013
20130144611	CODING DEVICE, DECODING DEVICE, CODING METHOD, AND DECODING METHOD - A coding device includes: a pitch contour detection unit which detects a pitch contour of an input audio signal; a dynamic time warping unit which determines the number of pitch nodes based on the pitch contour and generates a first time warping parameter including information indicating the determined number of pitch nodes, a pitch change position, and a pitch change ratio; a first encoder which codes the first time warping parameter; a time warping unit which corrects pitch, using the information obtained from the first time warping parameter, to approximate the pitches of the number of pitch nodes to a predetermined reference value; a second encoder which codes the input audio signal at the corrected pitch; and a multiplexer which multiplexes the coded time warping parameter and the coded audio signal to generate a bitstream.	06-06-2013
20130151245	Method for Determining Fundamental-Frequency Courses of a Plurality of Signal Sources - The invention relates to a method for establishing fundamental frequency curves of a plurality of signal sources from a single-channel audio recording of a mix signal, said method including the following steps:	06-13-2013
20130166287	Adaptively Encoding Pitch Lag For Voiced Speech - System and method embodiments for dual modes pitch coding are provided. The system and method embodiments are configured to adaptively code pitch lags of a voiced speech signal using one of two pitch coding modes according to a pitch length, stability, or both. The two pitch coding modes include a first pitch coding mode with relatively high precision and reduced dynamic range, and a second pitch coding mode with relatively large dynamic range and reduced precision. The first pitch coding mode is used upon determining that the voiced speech signal has a relatively short or substantially stable pitch. The second pitch coding mode is used upon determining that the voiced speech signal has a relatively long or less stable pitch or is a substantially noisy signal.	06-27-2013
20130166288	Very Short Pitch Detection and Coding - System and method embodiments are provided for very short pitch detection and coding for speech or audio signals. The system and method include detecting whether there is a very short pitch lag in a speech or audio signal that is shorter than a conventional minimum pitch limitation using a combination of time domain and frequency domain pitch detection techniques. The pitch detection techniques include using pitch correlations in time domain and detecting a lack of low frequency energy in the speech or audio signal in frequency domain. The detected very short pitch lag is coded using a pitch range from a predetermined minimum very short pitch limitation that is smaller than the conventional minimum pitch limitation.	06-27-2013
20130211827	SAMPLE RATE CONVERTER WITH AUTOMATIC ANTI-ALIASING FILTER - The subject disclosure is directed towards dynamically computing anti-aliasing filter coefficients for sample rate conversion in digital audio. In one aspect, for each input-to-output sampling rate ratio (pitch) obtained, anti-aliasing filter coefficients are interpolated based upon the pitch (e.g., using the fractional part of the ratio) from two filters (coefficient sets) selected based upon the pitch (e.g., using the integer part of the ratio). The interpolation provides for fine-grained cutoff frequencies, and by re-computation for each pitch, smooth anti-aliasing with dynamically changing ratios.	08-15-2013
20130226567	System for Conealing Missing Audio Waveforms - In one embodiment, a method can include: (i) establishing an internet protocol (IP) connection; (ii) forming a buffered version of a plurality of voice frame slices from received audio packets; and (iii) when an erasure is detected, performing a packet loss concealment (PLC) to provide a synthesized speech signal for the erasure, where the PLC can include: (a) identifying first and second pitches from the buffered version of the plurality of voice frame slices; and (b) forming the synthesized speech signal by using the first and second pitches, and more if needed, followed by an overlay-add (OLA).	08-29-2013
20130226568	AUDIO SIGNALS BY ESTIMATIONS AND USE OF HUMAN VOICE ATTRIBUTES - Disclosed embodiments include means and methods of enhancing the quality of an audio signal by estimations and manipulations of voice attributes. Disclosed embodiments include means and methods of estimating a pitch period of an input audio signal, converting the audio signal into the frequency domain using a FFT, decreasing the pitch estimation value based on a fundamental frequency of the signal based on a first predefined condition and increasing the fundamental frequency of the audio signal based upon a second predefined condition.	08-29-2013
20130231924	Format Based Speech Reconstruction from Noisy Signals - Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.	09-05-2013
20130231925	Monaural Noise Suppression Based on Computational Auditory Scene Analysis - The present technology provides a robust noise suppression system that may concurrently reduce noise and echo components in an acoustic signal while limiting the level of speech distortion. A time-domain acoustic signal may be received and be transformed to frequency-domain sub-band signals. Features, such as pitch, may be identified and tracked within the sub-band signals. Initial speech and noise models may be then be estimated at least in part from a probability analysis based on the tracked pitch sources. Speech and noise models may be resolved from the initial speech and noise models and noise reduction may be performed on the sub-band signals. An acoustic signal may be reconstructed from the noise-reduced sub-band signals.	09-05-2013
20130231926	METHOD AND DEVICE FOR ESTIMATING A PATTERN IN A SIGNAL - The present invention relates to a method for estimating a pattern, in particular a pitch and/or a fundamental frequency, in a signal having a periodic, quasiperiodic or virtually periodic component, wherein the signal is transformed from a time-domain to a frequency-domain to obtain a spectrum of the signal, the spectrum is processed to obtain a zero-phase spectrum of the signal, the spectrum of the signal is transformed to the time-domain to obtain a correlation signal, the spectrum and the correlation signals are combined to a combined spectrum, and the pattern is estimated on the basis of the combined spectrum.	09-05-2013
20130304459	METHODS AND APPARATUS FOR PROCESSING AUDIO SIGNALS - A method for processing an audio signal (i(t)), comprises: receiving a first set (x(t)) of time-varying signals representing a first sound comprised in the audio signal (i(t)), the first set (x(t)) of time-varying signals comprising an amplitude modulation signal (a(t)), a carrier frequency signal (f	11-14-2013
20130304460	Method for Encoding Signal, and Method for Decoding Signal - The present disclosure relates to a method, apparatus, and system for encoding and decoding signals. The encoding method includes: converting a first-domain signal into a second-domain signal; performing Linear Prediction (LP) processing and Long-Term Prediction (LTP) processing for the second-domain signal; obtaining a long-term flag value according to a decision criterion; obtaining a second-domain predictive signal according to the LP processing result and the LTP processing result when the long-term flag value is a first value; obtaining a second-domain predictive signal according to the LP processing result when the long-term flag value is a second value; converting the second-domain predictive signal into a first-domain predictive signal, and calculating a first-domain predictive residual signal; and outputting a bit stream that includes the first-domain predictive residual signal.	11-14-2013
20130311173	Method for exemplary voice morphing - A method of morphing speech from an original speaker into the speech of a second, target speaker with decomposing either speech into source and filter, and without the need to determine the formant positions by warping spectral envelops.	11-21-2013
20130325455	VOCAL SOURCE EXTRACTION BY MAXIMUM PHASE DETECTION - Methods, apparatus and computer program products implement embodiments of the present invention that include receiving a time domain voice signal, and extracting a single pitch cycle from the received signal. The extracted single pitch cycle is transformed to a frequency domain, and the misclassified roots of the frequency domain are identified and corrected. Using the corrected roots, an indication of a maximum phase of the frequency domain is generated.	12-05-2013
20130332151	APPARATUS AND METHOD FOR PROCESSING A DECODED AUDIO SIGNAL IN A SPECTRAL DOMAIN - An apparatus for processing a decoded audio signal including a filter for filtering the decoded audio signal to obtain a filtered audio signal, a time-spectral converter stage for converting the decoded audio signal and the filtered audio signal into corresponding spectral representations, each spectral representation having a plurality of subband signals, a weighter for performing a frequency selective weighting of the filtered audio signal by a multiplying subband signals by respective weighting coefficients to obtain a weighted filtered audio signal, a subtractor for performing a subband-wise subtraction between the weighted filtered audio signal and the spectral representation of the decoded audio signal, and a spectral-time converter for converting the result audio signal or a signal derived from the result audio signal into a time domain representation to obtain a processed decoded audio signal.	12-12-2013
20130339011	SYSTEMS, METHODS, APPARATUS, AND COMPUTER-READABLE MEDIA FOR PITCH TRAJECTORY ANALYSIS - Systems, methods, and apparatus for pitch trajectory analysis are described. Such techniques may be used to remove vocals and/or vibrato from an audio mixture signal. For example, such a technique may be used to pre-process the signal before an operation to decompose the mixture signal into individual instrument components.	12-19-2013
20140039883	SOCIAL MUSIC SYSTEM AND METHOD WITH CONTINUOUS, REAL-TIME PITCH CORRECTION OF VOCAL PERFORMANCE AND DRY VOCAL CAPTURE FOR SUBSEQUENT RE-RENDERING BASED ON SELECTIVELY APPLICABLE VOCAL EFFECT(S) SCHEDULE(S) - Vocal musical performances may be captured and, in some cases or embodiments, pitch-corrected and/or processed in accord with a user selectable vocal effects schedule for mixing and rendering with backing tracks in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured on mobile devices in the context of a karaoke-style presentation of lyrics in correspondence with audible renderings of a backing track. Such performances can be pitch-corrected in real-time at the mobile device in accord with pitch correction settings. Vocal effects schedules may also be selectively applied to such performances. In these ways, even amateur user/performers with imperfect pitch are encouraged to take a shot at “stardom” and/or take part in a game play, social network or vocal achievement application architecture that facilitates musical collaboration on a global scale and/or, in some cases or embodiments, to initiate revenue generating in-application transactions.	02-06-2014
20140136191	SPEECH SIGNAL PROCESSING APPARATUS AND METHOD - A speech signal processing apparatus includes an amplitude and phase signal generation section that, based on an analyzing signal expressed by a complex signal generated from a speech signal applied with pitch marks every 1 pitch cycle, generates an amplitude signal and a phase signal on the time axis of the speech signal, a phase signal conversion section that converts the phase signal into a phase signal of a target pitch cycle width for each section of the 1 pitch cycle width based on the pitch marks, and a pitch conversion speech signal generation section that generates a speech signal in which pitch cycle is converted to the target pitch cycle based on an amplitude signal of the target pitch cycle width of a section corresponding to the section of the amplitude signal and based on a phase signal of the target pitch cycle width.	05-15-2014
20140136192	WAVEFORM PROCESSING DEVICE, WAVEFORM PROCESSING METHOD, AND WAVEFORM PROCESSING PROGRAM - There is provided a waveform processing device for changing power of each pitch waveform of a segment in order to acquire a natural synthesis speech. A power calculation means	05-15-2014
20140142931	PITCH DETECTION METHOD AND APPARATUS - The present invention discloses a pitch detection method and apparatus, which belong to the field of speech and audio. The pitch detection method includes: performing pitch detection on a speech signal in a time domain to obtain an initial pitch period; converting the speech signal to a frequency domain to obtain a frequency spectrum of the speech signal, where the frequency spectrum includes a magnitude spectrum of the frequency spectrum; extracting a feature parameter according to the initial pitch period and the frequency spectrum of the speech signal; and performing fine pitch period detection according to the initial pitch period and the feature parameter to obtain a fine pitch period.	05-22-2014
20140156267	Packet Loss Concealment for Speech Coding - A speech coding method of reducing error propagation due to voice packet loss, is achieved by limiting or reducing a pitch gain only for the first subframe or the first two subframes within a speech frame. The method is used for a voiced speech class. A pitch cycle length is compared to a subframe size to decide to reduce the pitch gain for the first subframe or the first two subframes within the frame. A strongly voiced class is decided by checking if the pitch lags are stable and the pitch gains are high enough with the frame; for the strongly voiced frame, the pitch lags and the pitch gains can be encoded more efficiently than other speech classes.	06-05-2014
20140180682	NOISE DETECTION DEVICE, NOISE DETECTION METHOD, AND PROGRAM - There is provided a noise detection device including an amplitude feature quantity calculator, a frequency feature quantity calculator, a feature variation calculator, an interval specification unit, a feature quantity set generation unit, and a noise determination unit.	06-26-2014
20140214412	APPARATUS AND METHOD FOR PROCESSING VOICE SIGNAL - A voice signal processing method processes voice signals acquired by a microphone. A voice processing device acquires first voice signals according to a first sampling frequency, and samples second voice signals from the first voice signals according to a second sampling frequency. The second voice signals are encoded to obtain a basic voice package. A voiceprint data package of each voice signal frame of the first voice signals is obtained using a curve fitting method, and a pitch data package of each voice signal frame of the first voice signals is obtained according to pitch distribution of twelve central octave keys of a standard piano. The voiceprint data package and the pitch data package are embedded into the basic audio package to generate a final voice package of the first voice signals.	07-31-2014
20140236584	SYSTEMS AND METHODS FOR QUANTIZING AND DEQUANTIZING PHASE INFORMATION - A method for quantizing phase information on an electronic device is described. The method includes obtaining a speech signal. The method also includes determining a prototype pitch period signal based on the speech signal and transforming the prototype pitch period signal into a first frequency-domain signal. The method additionally includes mapping the first frequency-domain signal into a plurality of subbands. The method also includes determining a global alignment based on the first frequency-domain signal and quantizing the global alignment utilizing scalar quantization to obtain a quantized global alignment. The method additionally includes determining a plurality of band alignments corresponding to the plurality of subbands. The method also includes quantizing the plurality of band alignments utilizing vector quantization to obtain a quantized plurality of band alignments. The method further includes transmitting the quantized global alignment and the quantized plurality of band alignments.	08-21-2014
20140236585	SYSTEMS AND METHODS FOR DETERMINING PITCH PULSE PERIOD SIGNAL BOUNDARIES - A method for determining pitch pulse period signal boundaries by an electronic device is described. The method includes obtaining a signal. The method also includes determining a first averaged curve based on the signal. The method further includes determining at least one first averaged curve peak position based on the first averaged curve and a threshold. The method additionally includes determining pitch pulse period signal boundaries based on the at least one first averaged curve peak position. The method also includes synthesizing a speech signal.	08-21-2014
20140249807	DEVICE AND METHOD FOR REDUCING QUANTIZATION NOISE IN A TIME-DOMAIN DECODER - The present disclosure relates to a device and method for reducing quantization noise in a signal contained in a time-domain excitation decoded by a time-domain decoder. The decoded time-domain excitation is converted into a frequency-domain excitation. A weighting mask is produced for retrieving spectral information lost in the quantization noise. The frequency-domain excitation is modified to increase spectral dynamics by application of the weighting mask. The modified frequency-domain excitation is converted into a modified time-domain excitation. The method and device can be used for improving music content rendering of linear-prediction (LP) based codecs. Optionally, a synthesis of the decoded time-domain excitation may be classified into one of a first set of excitation categories and a second set of excitation categories, the second set including INACTIVE or UNVOICED categories, the first set including an OTHER category.	09-04-2014
20140249808	Methods and Arrangements in a Telecommunications Network - The present invention relates to a postfilter and a postfilter control to be associated with a postfilter for improving perceived quality of speech reconstructed at a speech decoder. The postfilter control comprises means for measuring stationarity of a speech signal reconstructed at a decoder, means for determining a coefficient to a postfilter control parameter based on the measured stationarity, and means for transmitting the determined coefficient to a postfilter, such that the postfilter can process the reconstructed speech signal by applying the determined coefficient to the postfilter control parameter to obtain an enhanced speech signal.	09-04-2014
20140297271	SPEECH SIGNAL ENCODING/DECODING METHOD AND APPARATUS - The present invention relates to a speech signal encoding method for encoding an inputted first speech signal into a second speech signal having a narrower available bandwidth than the first speech signal. The method comprises generating a pitch-scaled version of higher frequencies of the first speech signal and including in the second speech signal lower frequencies of the first speech signal and the pitch-scaled version of the higher frequencies. At least a part of the higher frequencies are frequencies that are outside the available bandwidth of the second speech signal. The pitch-scaled version of the higher frequencies is preferably included in the second speech signal with a gain factor having a value of 1 or a value higher than 1. The present invention further relates to a corresponding speech signal decoding method for decoding an inputted first speech signal into a second speech signal having a wider available bandwidth than the first speech signal.	10-02-2014
20140303968	DYNAMIC CONTROL OF VOICE CODEC DATA RATE - A method, system, and computer-usable non-transitory storage device for dynamic voice codec adaptation are disclosed. The voice codec adapts in real time to devote more bits to audio quality when it is most needed, and fewer bits to less important parts of utterances are disclosed. Dialog knowledge is utilized for compression opportunities to adjust the bitrate moment-by-moment, based on the inferred value of each frame. Frame importance and appropriate transmission fidelity is predicted based on prosodic features and models of dialog dynamics. This technique provides the same communications quality with less spectrum needs, fewer antennas, and less battery drain.	10-09-2014
20140337019	MUSIC SIGNAL PROCESSING APPARATUS AND METHOD, AND PROGRAM - A music signal processing apparatus includes a frequency spectrum transform unit, a filter, a frequency feature amount generation unit, and a melody feature amount sequence acquisition unit. The frequency spectrum transform unit is configured to transform a music signal into a frequency spectrum, the music signal being a signal of a musical piece containing a part with a melody. The filter is configured to remove a steep peak of the frequency spectrum. The frequency feature amount generation unit is configured to generate, from a signal output from the filter, a frequency feature amount in which a fundamental frequency component of the part is emphasized. The melody feature amount sequence acquisition unit is configured to acquire, based on the frequency feature amount, a melody feature amount sequence that specifies a fundamental frequency of the part at each time.	11-13-2014
20140343933	SYSTEM AND METHOD FOR CALCULATING SIMILARITY OF AUDIO FILE - A method for calculating a similarity of audio files includes constituting a pitch sequence of a first audio file and a pitch sequence of a second audio file; calculating an eigenvector of the first audio file according to the pitch sequence of the first audio file, and calculating an eigenvector of the second audio file according to the pitch sequence of the second audio file; calculating a similarity between the first audio file and the second audio file according to the eigenvector of the first audio file and the eigenvector of the second audio file.	11-20-2014
20140358530	METHOD OF PROCESSING A VOICE SEGMENT AND HEARING AID - A method of processing a voice segment includes checking whether a voice segment is a vowel segment. If the voice segment is not a vowel segment, then the process checks whether the voice segment is a high frequency consonant or a low frequency consonant. If the voice segment is a high frequency consonant, then the voice segment will be processed to lower its frequency.	12-04-2014
20150051905	Adaptive High-Pass Post-Filter - In accordance with an embodiment of the present invention, a method of speech processing included receiving a coded audio signal having coding noise. The method further includes generating a decoded audio signal from the coded audio signal, and determining a pitch corresponding to the fundamental frequency of the audio signal. The method also includes determining the minimum allowable pitch and determining if the pitch of the audio signal is less than the minimum allowable pitch. If the pitch of the audio signal is less than the minimum allowable pitch, applying an adaptive high pass filter on the decoded audio signal to lower the coding noise at frequencies below the fundamental frequency.	02-19-2015
20150057998	METHODS AND SYSTEMS FOR ENHANCING PITCH ASSOCIATED WITH AN AUDIO SIGNAL PRESENTED TO A COCHLEAR IMPLANT PATIENT - An exemplary method of enhancing pitch of an audio signal presented to a cochlear implant patient includes 1) determining a frequency spectrum of an audio signal presented to a cochlear implant patient, the frequency spectrum comprising a plurality of frequency bins that each contain spectral energy, 2) generating a modified spectral envelope of the frequency spectrum of the audio signal, 3) identifying each frequency bin included in the plurality of frequency bins that contains spectral energy above the modified spectral envelope and each frequency bin included in the plurality of frequency bins that contains spectral energy below the modified spectral envelope, 4) enhancing the spectral energy contained in each frequency bin identified as containing spectral energy above the modified spectral envelope, and 5) compressing the spectral energy contained in each frequency bin identified as containing spectral energy below the modified spectral envelope. Corresponding methods and systems are also disclosed.	02-26-2015
20150066491	TIME WARP ACTIVATION SIGNAL PROVIDER, AUDIO SIGNAL ENCODER, METHOD FOR PROVIDING A TIME WARP ACTIVATION SIGNAL, METHOD FOR ENCODING AN AUDIO SIGNAL AND COMPUTER PROGRAMS - An audio encoder has a window function controller, a windower, a time warper with a final quality check functionality, a time/frequency converter, a TNS stage or a quantizer encoder, the window function controller, the time warper, the TNS stage or an additional noise filling analyzer are controlled by signal analysis results obtained by a time warp analyzer or a signal classifier. Furthermore, a decoder applies a noise filling operation using a manipulated noise filling estimate depending on a harmonic or speech characteristic of the audio signal.	03-05-2015
20150066492	TIME WARP ACTIVATION SIGNAL PROVIDER, AUDIO SIGNAL ENCODER, METHOD FOR PROVIDING A TIME WARP ACTIVATION SIGNAL, METHOD FOR ENCODING AN AUDIO SIGNAL AND COMPUTER PROGRAMS - An audio encoder has a window function controller, a windower, a time warper with a final quality check functionality, a time/frequency converter, a TNS stage or a quantizer encoder, the window function controller, the time warper, the TNS stage or an additional noise filling analyzer are controlled by signal analysis results obtained by a time warp analyzer or a signal classifier. Furthermore, a decoder applies a noise filling operation using a manipulated noise filling estimate depending on a harmonic or speech characteristic of the audio signal.	03-05-2015
20150073781	Method and Apparatus for Detecting Correctness of Pitch Period - A method and an apparatus for detecting correctness of a pitch period. The method for detecting correctness of a pitch period includes determining, according to an initial pitch period of an input signal in a time domain, a pitch frequency bin of the input signal, where the initial pitch period is obtained by performing open-loop detection on the input signal; determining, based on an amplitude spectrum of the input signal in a frequency domain, a pitch period correctness decision parameter, associated with the pitch frequency bin, of the input signal; and determining correctness of the initial pitch period according to the pitch period correctness decision parameter. The method and apparatus for detecting correctness of a pitch period according to the embodiments of the present invention can improve, based on a relatively less complex algorithm, accuracy of detecting correctness of a pitch period.	03-12-2015
20150325248	SYSTEM AND METHOD FOR PROSODICALLY MODIFIED UNIT SELECTION DATABASES - Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.	11-12-2015
20150332668	AUDIO SIGNAL RECOGNITION METHOD AND ELECTRONIC DEVICE SUPPORTING THE SAME - An electronic device is provided. The electronic device includes a signal acquisition module configured to transmit a signal toward an object and receive an echo signal obtained by transformation of the signal through a collision with one surface of the object; a feature extraction module configured to extract a signal descriptor from the echo signal and analyze the extracted signal descriptor; a conversion module configured to convert the signal descriptor into an audio descriptor; and a synthesis module configured to convert the audio descriptor into an audio signal in a determined frequency band and output the converted audio signal.	11-19-2015
20150332700	APPARATUS AND METHOD FOR PROCESSING AN ENCODED SIGNAL AND ENCODER AND METHOD FOR GENERATING AN ENCODED SIGNAL - An apparatus for processing an encoded signal, the encoded signal having an encoded audio signal having information on a pitch delay or a pitch gain, and a bass post-filter control parameter, has: an audio signal decoder for decoding the encoded audio signal using the information on the pitch delay or the pitch gain to obtain a decoded audio signal; a controllable bass post-filter for filtering the decoded audio signal to obtain a processed signal, wherein the controllable bass post-filter has the variable bass post-filter characteristic controllable by the bass post-filter control parameter; and a controller for setting the variable bass post-filter characteristic in accordance with the bass post-filter control parameter included in the encoded signal.	11-19-2015
20150348566	AUDIO CORRECTION APPARATUS, AND AUDIO CORRECTION METHOD THEREOF - An audio correction apparatus and an audio correction method. The audio correction method includes: receiving audio data, which may be input by a user and/or an instrument uttering sounds; detecting onset information by analyzing harmonic components of the received audio data; detecting pitch information of the received audio data based on the detected onset information; comparing the audio data with reference audio data and aligning the two based on the detected onset information and the detected pitch information; and correcting the aligned audio data to match the reference audio data.	12-03-2015
20150348567	DYNAMICALLY ADAPTED PITCH CORRECTION BASED ON AUDIO INPUT - Systems and methods for adjusting pitch of an audio signal include detecting input notes in the audio signal, mapping the input notes to corresponding output notes, each output note having an associated upper note boundary and lower note boundary, and modifying at least one of the upper note boundary and the lower note boundary of at least one output note in response to previously received input notes. Pitch of the input notes may be shifted to match an associated pitch of corresponding output notes. Delay of the pitch shifting process may be dynamically adjusted based on detected stability of the input notes.	12-03-2015
20160005416	Continuous Pitch-Corrected Vocal Capture Device Cooperative with Content Server for Backing Track Mix - Techniques have been developed to facilitate (1) the capture and pitch correction of vocal performances on handheld or other portable computing devices and (2) the mixing of such pitch-corrected vocal performances with backing tracks for audible rendering on targets that include such portable computing devices and as well as desktops, workstations, gaming stations, even telephony targets. Implementations of the described techniques employ signal processing techniques and allocations of system functionality that are suitable given the generally limited capabilities of such handheld or portable computing devices and that facilitate efficient encoding and communication of the pitch-corrected vocal performances (or precursors or derivatives thereof) via wireless and/or wired bandwidth-limited networks for rendering on portable computing devices or other targets.	01-07-2016
20160099012	ESTIMATING PITCH USING SYMMETRY CHARACTERISTICS - An estimate of a pitch of a signal may be computed by using correlations of frequency portions of a frequency representation of the signal. An initial pitch estimate may be obtained and frequency portions of the frequency representation may be identified using multiples of the initial pitch estimate. Correlations of the frequency portions may be computed, and a score for the initial pitch estimate may be determined using the correlations. A second pitch estimate may be determined using the first score, and the process may be repeated.	04-07-2016
20160104500	Adaptive Codebook Gain Control for Speech Coding - In accordance with one aspect of the invention, a selector supports the selection of a first encoding scheme or the second encoding scheme based upon the detection or absence of the triggering characteristic in the interval of the input speech signal. The first encoding scheme has a pitch pre-processing procedure for processing the input speech signal to form a revised speech signal biased toward an ideal voiced and stationary characteristic. The pre-processing procedure allows the encoder to fully capture the benefits of a bandwidth-efficient, long-term predictive procedure for a greater amount of speech components of an input speech signal than would otherwise be possible. In accordance with another aspect of the invention, the second encoding scheme entails a long-term prediction mode for encoding the pitch on a sub-frame by sub-frame basis. The long-term prediction mode is tailored to where the generally periodic component of the speech is generally not stationary or less than completely periodic and requires greater frequency of updates from the adaptive codebook to achieve a desired perceptual quality of the reproduced speech under a long-term predictive procedure.	04-14-2016
20160111082	VOICE AND TEXT COMMUNICATION SYSTEM, METHOD AND APPARATUS - The disclosure relates to systems, methods and apparatus to convert speech to text and vice versa. One apparatus comprises a vocoder, a speech to text conversion engine, a text to speech conversion engine, and a user interface. The vocoder is operable to convert speech signals into packets and convert packets into speech signals. The speech to text conversion engine is operable to convert speech to text. The text to speech conversion engine is operable to convert text to speech. The user interface is operable to receive a user selection of a mode from among a plurality of modes, wherein a first mode enables the speech to text conversion engine, a second mode enables the text to speech conversion engine, and a third mode enables the speech to text conversion engine and the text to speech conversion engine.	04-21-2016
20160111094	APPARATUS AND METHOD FOR IMPROVED CONCEALMENT OF THE ADAPTIVE CODEBOOK IN A CELP-LIKE CONCEALMENT EMPLOYING IMPROVED PULSE RESYNCHRONIZATION - An apparatus for reconstructing a frame including a speech signal as a reconstructed frame is provided, the apparatus including a determination unit and a frame reconstructor being configured to reconstruct the reconstructed frame, such that the reconstructed frame completely or partially includes the first reconstructed pitch cycle, such that the reconstructed frame completely or partially includes a second reconstructed pitch cycle, and such that the number of samples of the first reconstructed pitch cycle differs from a number of samples of the second reconstructed pitch cycle.	04-21-2016
20160118053	APPARATUS AND METHOD FOR IMPROVED CONCEALMENT OF THE ADAPTIVE CODEBOOK IN A CELP-LIKE CONCEALMENT EMPLOYING IMPROVED PITCH LAG ESTIMATION - An apparatus for determining an estimated pitch lag is provided. The apparatus includes an input interface for receiving a plurality of original pitch lag values, and a pitch lag estimator for estimating the estimated pitch lag. The pitch lag estimator is configured to estimate the estimated pitch lag depending on a plurality of original pitch lag values and depending on a plurality of information values, wherein for each original pitch lag value of the plurality of original pitch lag values, an information value of the plurality of information values is assigned to the original pitch lag value.	04-28-2016
20160163326	PITCH FILTER FOR AUDIO SIGNALS - In some embodiments, a pitch filter for filtering a preliminary audio signal generated from an audio bitstream is disclosed. The pitch filter has an operating mode selected from one of either: (i) an active mode where the preliminary audio signal is filtered using filtering information to obtain a filtered audio signal, and (ii) an inactive mode where the pitch filter is disabled. The preliminary audio signal is generated in an audio encoder or audio decoder having a coding mode selected from at least two distinct coding modes, and the pitch filter is capable of being selectively operated in either the active mode or the inactive mode while operating in the coding mode based on control information.	06-09-2016
20160189725	Voice Processing Method and Apparatus, and Recording Medium Therefor - A processing unit of a voice processing apparatus first generates a target voice signal in a time domain by adjusting a fundamental frequency of a target voice signal to a fundamental frequency of an initial voice signal, so as to generate a spectrum of the target voice signal after pitch is adjusted. Second, the processing unit reallocates, along a frequency axis, the spectrum of the target voice characteristics by having the spectrum correspond to each of the fundamental frequencies of the initial voice signal. The processing unit then generates a converted spectrum by adjusting component values of the spectrum of the target voice characteristics, which spectrum has been reallocated, so as to correspond to the component values of the spectrum of the initial voice signal, and by adapting the component values of the spectrum of the initial voice signal to specific frequency bands of the spectrum of the target voice characteristics, with each specific frequency band including one of the harmonic frequencies corresponding to the fundamental frequency of the initial voice signal.	06-30-2016
20160196836	Transmission Method And Device For Voice Data	07-07-2016
20160203827	Audio-Visual Dialogue System and Method	07-14-2016
20160203829	SYSTEMS AND METHODS FOR SPEECH EXTRACTION	07-14-2016
20160254003	DIGITAL WATERMARK DETECTING DEVICE, METHOD, AND PROGRAM	09-01-2016
20160379672	COMMUNICATING DATA WITH AUDIBLE HARMONIES - In some implementations, a process for communicating data over audio is performed. In one aspect, one or more ordered sequences of audio attribute values that are selected based on a musical relationship between the audio attribute values and associated with data values may be played by a first device and received by a second device. This technique may allow for sound-based communications to take place between devices that listeners may find pleasant.	12-29-2016

Patent applications in class Pitch

Patent applications in all subclasses Pitch

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Pitch

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

704201000 - For storage or transmission

704205000 - Frequency

704206000 - Specialized information

Patent class list (only not empty are listed)

Deeper subclasses: