Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

For storage or transmission

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression


Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
704205000 Frequency 276
704226000 Noise 203
704219000 Linear prediction 123
704211000 Time 82
704225000 Gain control 68
704203000 Transformation 67
704221000 Pattern matching vocoders 46
704230000 Quantization 36
704229000 Adaptive bit allocation 16
704220000 Analysis by synthesis 10
704224000 Normalizing 10
704202000 Neural network 1
20120290294NEURAL TRANSLATOR - A method and apparatus are provided for processing a set of communicated signals associated with a set of muscles, such as the muscles near the larynx of the person, or any other muscles the person use to achieve a desired response. The method includes the steps of attaching a single integrated sensor, for example, near the throat of the person proximate to the larynx and detecting an electrical signal through the sensor. The method further includes the steps of extracting features from the detected electrical signal and continuously transforming them into speech sounds without the need for further modulation. The method also includes comparing the extracted features to a set of prototype features and selecting a prototype feature of the set of prototype features providing a smallest relative difference.11-15-2012
20130085750COMMUNICATION SYSTEM, METHOD, AND APPARATUS - A server apparatus acquires content based on instruction information; decodes image data of the acquired content compression encodes captured image data using a predetermined encoding scheme; decodes an audio signal and compression encodes the decoded audio signal using the predetermined encoding scheme, stores the image and the audio signal and sends the packet to a packet forwarding apparatus. A mobile terminal receives the packet, decodes and displays the compression encoded image data stored in the packet; and decodes and reproduces the compression encoded audio signal.04-04-2013
20130085749SOUND PROCESSING TO PRODUCE TARGET VOLUME LEVEL - A sound process apparatus includes a processor. The processor may execute instructions, which are stored on a memory, and when executed cause the sound process apparatus to perform operations. An obtaining operation may obtain sound data in a remote site. A first determining operation may determine volume levels of voice and noise in the remote site based on the sound data. A second determining operation may determine a volume level of noise in a local site based on the sound in the local site. A third determining operation may determine a target volume level based on the volume level of the voice in the remote site, the volume level of the noise in the remote site, and the volume level of the noise in the local site. A notifying operation may notify a user of the target volume level.04-04-2013
20130211826Audio Signals as Buffered Streams of Audio Signals and Metadata - Historically, most audio recording and communication control has been exerted through the use of physical buttons, slider and knobs, e.g. to start/stop recording or communicating, control speaker and microphone volume settings, etc. The present invention describes improvements to this approach, such as detecting and analyzing audio signals with human voice components, e.g. to start/stop recording and communicating, set local and remote recording and playback volumes and filters, and manage metadata associated with temporal ranges in audio streams.08-15-2013
20090182555Speech Enhancement Device and Method for the Same - A speech enhancement device and a method for the same are included. The device includes a down-converter, a speech enhancement processor, and an up-converter. The method includes steps of down-converting audio signals to generate down-converted audio signals; performing speech enhancement on the down-converted audio signals to generate speech-enhanced audio signals; and up-converting the speech enhancement audio signals to generate up-converted audio signals.07-16-2009
20100076753DIALOGUE GENERATION APPARATUS AND DIALOGUE GENERATION METHOD - A dialogue generation apparatus includes a reception unit configured to receive a first text from a dialogue partner, an information storage unit configured to store profile information specific to a person who can be the dialogue partner and a fixed-pattern text associated with the person, a presentation unit configured to present the first text to a user, a speech recognition unit configured to perform speech recognition on speech the user has uttered about the first text presented to the user, and generate a speech recognition result showing the content of the speech, a generation unit configured to generate a second text from the profile information about the dialogue partner, fixed-pattern text about the dialogue partner, and the speech recognition result, and a transmission unit configured to transmit the second text to the dialogue partner.03-25-2010
20100114565AUDIBLE ERRORS DETECTION AND PREVENTION FOR SPEECH DECODING, AUDIBLE ERRORS CONCEALING - A method and apparatus of providing an audio output to a user in a communications system in which the audio to be output to a user, preferably an audio frame, is assessed before it is broadcast to the user, and then selectively changed on the basis of the assessment. The assessment may be carried out in the audio encoding process, in the audio decoding process and/or after the audio decoding process. The selective changing of the audio output may comprise selectively replacing the audio output and/or re-encoding of the audio output.05-06-2010
20130085748Method and device for modifying a compounded voice message - A method and device are provided for modifying a compounded voice message having at least one first voice component. The method includes a step of obtaining at least one second voice component, a step of updating at least one item of information belonging to a group of items of information associated with the compounded voice message as a function of the at least one second voice component and a step of making available the compounded voice message comprising the at least one first and second voice components, and the group of items of information associated with the compounded voice message. The compounded voice message is intended to be consulted by at least one recipient user.04-04-2013
20120185241AUDIO DECODING APPARATUS, AUDIO CODING APPARATUS, AND SYSTEM COMPRISING THE APPARATUSES - An audio decoding apparatus comprises: a plurality of decoding units; a band replicating unit which processes a decoded signal obtained when a corresponding decoding unit decodes a coded signal, according to a scheme specified by transmitted information; and an information transmitting unit which transmits, to a signal processing unit, information identifying the corresponding decoding unit from among the plurality of decoding units.07-19-2012
20120185240SYSTEM AND METHOD FOR GENERATING AND SENDING A SIMPLIFIED MESSAGE USING SPEECH RECOGNITION - An embodiment provides a system and method for generating and sending a simplified message using speech recognition. The system provides a speech recognition software that may be utilized for receiving audio, converting audio to text derived from audio, comparing text derived from audio to match fields to find matches, replacing matched text with contents of replacement fields associated to the match fields, generating an output message incorporating the replacement text into the text derived from audio, transmitting the output message to a messaging system and redistributing the output message to recipients.07-19-2012
20090125299Speech recognition system - A speech recognition system comprises at least a speech recognition engine and a display device that contains a signal status interface and a textual interface. The signal status interface is used to show a recording status, a speech processing status, or a complete speech recognition status based on waveforms display. The textual interface is used to show word units of the speech recognition results. Two sets of commands are connected with each waveform unit on the signal status interface and each word unit on the textural interface, respectively, in order to allow users to correct the recognition errors or to adjust the speech recognition system.05-14-2009
20120166185SYSTEM AND METHOD FOR THE CREATION AND AUTOMATIC DEPLOYMENT OF PERSONALIZED, DYNAMIC AND INTERACTIVE VOICE SERVICES WITH CLOSED LOOP TRANSACTION PROCESSING - A method and system for accomplishing closed-loop transaction processing in conjunction with interactive, real-time, voice transmission of information to a user is disclosed. A voice-based communication between a user and a first system is established and a report is transmitted to the user. The report might comprise information and at least one request for user input based on said information. In response to the report, the user can request a transaction based on said information. The requested transaction is completed automatically by connecting to a second system for processing.06-28-2012
20090281794METHOD AND SYSTEM FOR ORDERING A GIFT WITH A PERSONALIZED CELEBRITY AUDIBLE MESSAGE - A method for providing a gift with a personalized celebrity message is disclosed. The method includes providing a database of prerecorded audio celebrity messages; offering a customer to select a celebrity message from the database of prerecorded audio celebrity messages; offering the customer to produce a personalized audio message; combining the personalized audio message with the celebrity message into an incorporated audio message; saving the incorporated audio message on a storage medium incorporated with a playback device; and coupling the playback device with the saved incorporated audio message with a gift.11-12-2009
20120226494IDENTIFYING AN ENCODING FORMAT OF AN ENCODED VOICE SIGNAL - A digital broadcast transmitting device is described that includes a packet generation unit configured to generate packetized elementary stream (PES) data by converting an inputted voice signal into an encoded voice signal and generating a voice stream packet including the encoded voice signal; a descriptor updating unit configured to update a component descriptor to include a component type identification (ID) and a change reservation ID, the component type ID indicating an encoding format of the encoded voice signal is MPEG surround format and the change reservation ID indicating a change of a format of the encoded voice signal to the MPEG surround format; a packetizing unit configured to generate section data by packetizing the component descriptor; a multiplexing unit configured to multiplex the PES data and the section data; and a modulation unit configured to modulate and transmit multiplexed data acquired from the multiplexing unit.09-06-2012
20130166285MULTI-CORE PROCESSING FOR PARALLEL SPEECH-TO-TEXT PROCESSING - This specification describes technologies relating to multi core processing for parallel speech-to-text processing. In some implementations, a computer-implemented method is provided that includes the actions of receiving an audio file; analyzing the audio file to identify portions of the audio file as corresponding to one or more audio types; generating a time-ordered classification of the identified portions, the time-ordered classification indicating the one or more audio types and position within the audio file of each portion; generating a queue using the time-ordered classification, the queue including a plurality of jobs where each job includes one or more identifiers of a portion of the audio file classified as belonging to the one or more speech types; distributing the jobs in the queue to a plurality of processors; performing speech-to-text processing on each portion to generate a corresponding text file; and merging the corresponding text files to generate a transcription file.06-27-2013
20090030675Apparatus and method of encoding and decoding audio signal - In one embodiment, the method includes receiving the audio signal having a plurality of random access units. The random access unit includes one or more frames and at least one of the frames is a random access frame. The random access frame is a frame encoded such that previous frames are not necessary to decode the random access frame. The embodiment further includes reading location information from the audio signal. The location information indicates whether random access unit size information is stored or not in the audio signal. If the random access unit size information is stored, the location information further indicates a location where the random access unit size information is stored in the audio signal. Random access unit size information is read according to the location information. The random access unit size information indicates a distance between random access frames in bytes. The random access units are decoded based on the random access size information.01-29-2009
20120197635METHOD FOR GENERATING AN AUDIO SIGNAL - A method for generating an audio signal of a user is provided. According to the method, a first audio signal inside of an ear of the user and a second audio signal outside of the ear is detected. The first audio signal and the second audio signal comprise at least a voice signal component generated by the user. Depending on the first audio signal the second audio signal is processed and output as the audio signal.08-02-2012
20120197634VOICE CORRECTION DEVICE, VOICE CORRECTION METHOD, AND RECORDING MEDIUM STORING VOICE CORRECTION PROGRAM - A voice correction device includes a detector that detects a response from a user, a calculator that calculates an acoustic characteristic amount of an input voice signal, an analyzer that outputs an acoustic characteristic amount of a predetermined amount when having acquired a response signal due to the response from the detector, a storage unit that stores the acoustic characteristic amount output by the analyzer, a controller that calculates an correction amount of the voice signal on the basis of a result of a comparison between the acoustic characteristic amount calculated by the calculator and the acoustic characteristic amount stored in the storage unit, and a correction unit that corrects the voice signal on the basis of the correction amount calculated by the controller.08-02-2012
20090234643TRANSCRIPTION SYSTEM AND METHOD - A transcription system and method for facilitating the transcription of audio messages are disclosed. The transcription system may include a telephony server for receiving an audio message from a customer, an audio broadcast server coupled to the telephony server for streaming the message from the customer in real-time, at least one agent transcriber for receiving the streamed audio message from the audio broadcast server for facilitating the transcribing of the streamed audio message into a transcription text file in real time, and a computer server for providing the customer access to the transcribed text file.09-17-2009
20090306970SYSTEM AND METHOD OF AN IN-BAND MODEM FOR DATA COMMUNICATIONS OVER DIGITAL WIRELESS COMMUNICATION NETWORKS - A system is provided for transmitting information through a speech codec (in-band) such as found in a wireless communication network. A modulator transforms the data into a spectrally noise-like signal based on the mapping of a shaped pulse to predetermined positions within a modulation frame, and the signal is efficiently encoded by a speech codec. A synchronization sequence provides modulation frame timing at the receiver and is detected based on analysis of a correlation peak pattern. A request/response protocol provides reliable transfer of data using message redundancy, retransmission, and/or robust modulation modes dependent on the communication channel conditions.12-10-2009
20090018823Speech coding - A method of encoding a speech signal for transmission in a communications network involves transforming the signal into a sequence of frames, each frame including a plurality of coefficients; dividing the frame into a set of sub-bands each containing a sub-set of the plurality of coefficients; applying an optimization function to calculate respective test values corresponding to respective candidate sets of pulses representing a coded form of at least some of the coefficients; and selecting a set of pulses having a test value which meets a selectability criterion. If the optimisation function is an error function, the selectability criterion is minimization of the function. If the optimization function is an iterative function, the selectability criterion is selecting an iteration in which a certain condition is reached.01-15-2009
20100088088CUSTOMIZABLE METHOD AND SYSTEM FOR EMOTIONAL RECOGNITION - An automated emotional recognition system is adapted to determine emotional states of a speaker based on the analysis of a speech signal. The emotional recognition system includes at least one server function and at least one client function in communication with the at least one server function for receiving assistance in determining the emotional states of the speaker. The at least one client function includes an emotional features calculator adapted to receive the speech signal and to extract therefrom a set of speech features indicative of the emotional state of the speaker. The emotional state recognition system further includes at least one emotional state decider adapted to determine the emotional state of the speaker exploiting the set of speech features based on a decision model. The server function includes at least a decision model trainer adapted to update the selected decision model according to the speech signal. The decision model to be used by the emotional state decider for determining the emotional state of the speaker is selectable based on a context of use of the recognition system.04-08-2010
20090204393Systems and Methods For Adaptive Multi-Rate Protocol Enhancement - A method of processing a codec sample is provided. The method includes: removing from a first portion of the codec sample, a first number of first information bits. The first information bits are indicative of frame information associated with the codec sample. The method also includes inserting at the first portion of the codec sample from a second portion of the codec sample, a second number of data bits. The first number of the first information bits is greater than or equal to the second number of the data bits. The method also includes removing the second portion of the codec sample. The method may also include encrypting and decrypting the codec sample. In some embodiments, the codec sample is an adaptive multi-rate codec sample. In some embodiments, the adaptive multi-rate codec sample is a 5.15 mode adaptive multi-rate codec sample.08-13-2009
20120101812METHODS AND APPARATUS FOR GENERATING, UPDATING AND DISTRIBUTING SPEECH RECOGNITION MODELS - Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability. Voice dialing, telephone control and/or other services are provided by the speech processing facility in response to speech recognition results.04-26-2012
20100070266Performance metrics for telephone-intensive personnel - Systems and methods for generating performance metrics to monitor and/or enhance the performance of telephone-intensive personnel are disclosed. The method generally includes detecting voice activity on a receive and/or a transmit channel in a communications system, outputting voicing decision outputs based on the detecting, storing the voicing decision outputs over a period of time to memory, and generating voice activity performance metrics based on the voicing decision output stored in the memory. The generating may include generating a running average ratio of duration of voice activity on the transmit channel to duration of voice activity on the receive channel (talk-listen ratio) over a certain period of time for one or more agents. The talk-listen ratio may be compared to a target ratio. The system may generally include a voice activity detector (VAD) configured to detect voice activity on a receive and/or transmit channel in a communications system, a memory to store outputs from the VAD, and a voice activity analyzer configured to generate performance metrics based on the VAD outputs stored in the memory.03-18-2010
20130024187METHOD AND APPARATUS FOR SOCIAL NETWORK COMMUNICATION OVER A MEDIA NETWORK - A system that incorporates teachings of the present disclosure may include, for example, transmitting a request to initiate a communication session with a member device of a social network, activating a speech capture element, maintaining activation of the speech capture element in accordance with a pattern of prior speech messages, detecting a speech message at the activated speech capture element, and transmitting the detected speech message, or a derivative thereof, to the member device of the social network. Other embodiments are disclosed.01-24-2013
20110301944DIVER AUDIO COMMUNICATION SYSTEM - An underwater communications system is provided that transmits electromagnetic and/or magnetic signals to a remote receiver. The transmitter includes a data input. A digital data compressor compresses data to be transmitted. A modulator modulates compressed data onto a carrier signal. An electrically insulated, magnetic coupled antenna transmits the compressed, modulated signals. The receiver that has an electrically insulated, magnetic coupled antenna for receiving a compressed, modulated signal. A demodulator is provided for demodulating the signal to reveal compressed data. A de-compressor de-compresses the data. An appropriate human interface is provided to present transmitted data into text/audio/visible form. Similarly, the transmit system comprises appropriate audio/visual/text entry mechanisms.12-08-2011
20110288857Distributed Speech Recognition Using One Way Communication - A speech recognition client sends a speech stream and control stream in parallel to a server-side speech recognizer over a network. The network may be an unreliable, low-latency network. The server-side speech recognizer recognizes the speech stream continuously. The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client. The client may remotely reconfigure the state of the server-side recognizer during recognition.11-24-2011
20110295597SYSTEM AND METHOD FOR AUTOMATED ANALYSIS OF EMOTIONAL CONTENT OF SPEECH - A method and apparatus for automated analysis of emotional content of speech is presented. Telephony calls are routed via a network such as public service telephone network (PSTN) and delivered to an interactive voice response system (IVR) where prerecorded or synthesized prompts guide a caller to speech responses. Speech responses are analyzed for emotional content in real time or collected via recording and analyzed in batch. If performed in real time, results of emotional content analysis (ECA) may be used as input to IVR call processing and call routing. In some applications this might involve ECA input to expert system process whose results interact with an IVR for prompt creation and call processing. In any case, ECA data is valuable on its own and may be culled and restated in the form of reports for business application.12-01-2011
20100063801Postfilter For Layered Codecs - A scalable decoder device (03-11-2010
20110218798OBFUSCATING SENSITIVE CONTENT IN AUDIO SOURCES - Techniques implemented as systems, methods, and apparatuses, including computer program products, for obfuscating sensitive content in an audio source representative of an interaction between a contact center caller and a contact center agent. The techniques include performing, by an analysis engine of a contact center system, a context-sensitive content analysis of the audio source to identify each audio source segment that includes content determined by the analysis engine to be sensitive content based on its context; and processing, by an obfuscation engine of the contact center system, one or more identified audio source segments to generate corresponding altered audio source segments each including obfuscated sensitive content.09-08-2011
20100280823Method and Apparatus for Encoding and Decoding - An encoding method includes extracting background noise characteristic parameters within a hangover period, for a first superframe after the hangover period, performing background noise encoding based on the extracted background noise characteristic parameters, for superframes after the first superframe, performing background noise characteristic parameter extraction and DTX decision for each frame in the superframes after the first superframe, and for the superframes after the first superframe, performing background noise encoding based on extracted background noise characteristic parameters of the current superframe, background noise characteristic parameters of a plurality of superframes previous to the current superframe, and a final DTX decision. Also, a decoding method and apparatus and an encoding apparatus are disclosed. Bandwidth occupancy may be reduced substantially while the signal quality may be guaranteed.11-04-2010
20090006082ACTIVITY-WARE FOR NON-TEXTUAL OBJECTS - Providing for summarization and analysis of audio content is described herein. By way of example, an oral conversation can be analyzed, such that points of interest within the oral conversation can be identified and file locations related to such points of interest can be marked. Points of interest can be inferred based on a level of energy, e.g., excitement, pitch, tone, pace, or the like, associated with one or more speakers. Alternatively, or in addition, speaker and/or reviewer activity can form the basis for identifying points of interest within the conversation. Moreover, a compilation of the identified points of interest and portions of the original oral conversation related thereto can be assembled. As described herein, audio content can be succinctly summarized with respect to inferred and/or indicated points of interest, to facilitate an efficient and pertinent review of such content.01-01-2009
20120109643ADAPTIVE AUDIO TRANSCODING - A system and method provide an audio/video coding system for adaptively transcoding audio streams based on content characteristics of the audio streams. An audio stream metadata extraction module of the system is configured to extract metadata of a source audio stream. An audio stream classification module of the system is configured to classify the source audio stream into one of the several audio content categories based on the metadata of the source audio stream. An adaptive audio encoder of the system is configured to determine one or more transcoding parameters including target bitrate and sampling rate based on the metadata and classification of the source audio stream. An adaptive audio transcoder of the system is configured to transcode the source audio stream into an output audio stream using the transcoding parameters.05-03-2012
20100100372STEREO ENCODING DEVICE, STEREO DECODING DEVICE, AND THEIR METHOD - Disclosed is a stereo encoding device which can improve critical channel encoding accuracy without increasing the encoding information amount. The device includes: a monaural signal synthesis unit (04-22-2010
20090287477System and method for providing network coordinated conversational services - A system and method for providing automatic and coordinated sharing of conversational resources, e.g., functions and arguments, between network-connected servers and devices and their corresponding applications. In one aspect, a system for providing automatic and coordinated sharing of conversational resources includes a network having a first and second network device, the first and second network device each comprising a set of conversational resources, a dialog manager for managing a conversation and executing calls requesting a conversational service, and a communication stack for communicating messages over the network using conversational protocols, wherein the conversational protocols establish coordinated network communication between the dialog managers of the first and second network device to automatically share the set of conversational resources of the first and second network device, when necessary, to perform their respective requested conversational service.11-19-2009
20110172992METHOD FOR EMOTION COMMUNICATION BETWEEN EMOTION SIGNAL SENSING DEVICE AND EMOTION SERVICE PROVIDING DEVICE - Provided are a method for emotion communication to share a user's emotions between an emotion signal sensing device and an emotion service providing device. The method for emotion communication includes: the emotion signal sensing device's sensing biological and environmental information of the user and generating an emotion signal and emotion information of the user based on the biological and environmental information; establishing an emotion communication connection with the emotion service providing device; transmitting the emotion signal and the emotion information to the emotion service providing device by the emotion communication connection establishment; and breaking the connection with the emotion service providing device07-14-2011
20080243490MULTI-LANGUAGE TEXT FRAGMENT TRANSCODING AND FEATURIZATION - Embodiments of the present invention provide methods and apparatus for transcoding received text fragments and documents. A featurization configuration is produced to create token components for evaluating the content of the text fragment. Other embodiments may be described and claimed.10-02-2008
20080281585Voice, location finder, modulation format selectable Wi-Fi, cellular mobile systems - Voice and location finder Analog to digital (A/D) converted signal is processed and provided to Orthogonal Frequency Division Multiplexed (OFDM), Orthogonal Frequency Division Multiple Access (OFDMA), Time Division Multiple Access (TDMA), Global Mobile System (GSM), spread spectrum and Wideband Code Division Multiple Access (WCDMA) baseband processor, filter, modulation format selectable modulator and transmitter. Receiver and modulation format selectable demodulator for location finder satellite and ground based location finder, Wireless Local Area Network (WLAN), wireless fidelity (Wi-Fi), world wide web (www) and cellular network provided signals. Modulator and a transmitter for modulation and transmission of selected or combined signal having two distinct transmitters operated in separate radio frequency (RF) bands. Receiver with two antennas for receiving transmitted signal, a demodulator, receive filter and receiver processor for demodulation, filtering and processing of received TDMA signal. Receive filter for filtering of the TDMA signal is mismatched to the transmit TDMA baseband filter. Receive processor provides received baseband mis-match filtered cross-correlated in-phase and quadrature-phase TDMA signal.11-13-2008
20080288245TANDEM-FREE INTERSYSTEM VOICE COMMUNICATION - Techniques are presented herein to provide tandem-free operation between two wireless terminals through two otherwise incompatible wireless networks. Specifically, embodiments provide tandem-free operation between a wireless terminal communicating through a continuous transmission (CTX) wireless channel to a wireless terminal communicating through a discontinuous transmission (DTX) wireless channel. In a first aspect, inactive speech frames are translated between DTX and CTX formats. In a second aspect, each wireless terminal includes an active speech decoder that is compatible with the active speech encoder on the opposite end of the mobile-to-mobile connection.11-20-2008
20090265166BOUNDARY ESTIMATION APPARATUS AND METHOD - A boundary estimation apparatus includes an boundary estimation unit which estimates a first boundary separating a speech into first meaning units, a boundary estimation unit configured to estimate a second boundary separating a speech, related to the speech, into second meaning units related to the first meaning units, a pattern generating unit configured to generate a representative pattern showing representative characteristic in the analysis interval, a similarity calculation unit configured to calculate a similarity between the representative pattern and a characteristic pattern showing feature in a calculation interval for calculating the similarity in the speech, and the boundary estimation unit estimate as the second boundary based on the calculation interval, in which the similarity is higher than a threshold value or relatively high.10-22-2009
20090265165AUTOMATIC META-DATA TAGGING PICTURES AND VIDEO RECORDS - A method and apparatus for labeling an image recorded by a portable electronic device with descriptive tags is disclosed. Sounds in the vicinity of the portable electronic device are recorded. When the image is captured, the audio record of recorded sounds from a first predetermined period of time prior to the capture of the image until a second predetermined period of time after the capture of the image is retrieved. The retrieved audio record is processed to create a list of recognizable words in the retrieved audio record. The list of recognizable words is then stored in a metatag field associated with the captured image.10-22-2009
20110208514DATA EMBEDDING DEVICE AND DATA EXTRACTION DEVICE - A data embedding device for embedding data in a speech code obtained by encoding a speech in accordance with a speech encoding method based on a voice generation process of a human being, includes an embedding judgment unit, every speech code, judging whether or not data should be embedded in the speech code, and an embedding unit embedding data in two or more parameter codes of a plurality of parameter codes constituting the speech code for which it is judged by the embedding judgment unit that the data should be embedded.08-25-2011
20100145683METHOD OF PROVIDING DYNAMIC SPEECH PROCESSING SERVICES DURING VARIABLE NETWORK CONNECTIVITY - A device for providing dynamic speech processing services during variable network connectivity with a network server includes a connection determiner that determines the level of network connectivity of the client device and the network server; and a simplified speech processor that processes speech data and is initiated based on the determination from the connection determiner that the network connectivity is impaired or unavailable. The devices further includes a speech data storage that stores processed speech data from the simplified speech processor; and a transition unit that determines when to transmit the stored speech data and connects with the network server, based on the determination of the connection determiner.06-10-2010
20080235006Method and Apparatus for Decoding an Audio Signal - An apparatus for decoding an audio signal and method thereof are disclosed. The present invention includes receiving the audio signal and spatial information, identifying a type of modified spatial information, generating the modified spatial information using the spatial information, and decoding the audio signal using the modified spatial information, wherein the type of the modified spatial information includes at least one of partial spatial information, combined spatial information and expanded spatial information. Accordingly, an audio signal can be decoded into a configuration different from a configuration decided by an encoding apparatus. Even if the number of speakers is smaller or greater than that of multi-channels before execution of downmixing, it is able to generate output channels having the number equal to that of the speakers from a downmix audio signal.09-25-2008
20090164209DEVICE AND METHOD FOR CAPTURING AND FORWARDING VERBALIZED COMMENTS TO A REMOTE LOCATION - Disclosed is a device for sending a verbalized comment to a remote computer server. The device includes a processor that executes and operates the various software and hardware components. A microphone is utilized to record a comment. Temporary storage buffers the recorded comment. An auto-dialing application is utilized to automatically dial a telephone number associated with the remote computer server. The mode of transmission can include an RF module for automatically establishing a voice connection with an external mobile network, a WiFi module for automatically establishing a voice connection with an external IP network, and a plain old telephone service (POTS) interface module for automatically establishing a voice connection with an external legacy telephone network. The comment can be sent via email over an IP network, via MMS over a mobile network, or directly over a telephone connection including POTS, cellular (mobile), and VoIP.06-25-2009
20100153097MULTI-CHANNEL AUDIO CODING - A multi-channel audio encoder (06-17-2010
20090132240METHOD AND APPARATUS FOR MANAGING SPEECH DECODERS - A method and apparatus that manages speech decoders in a communication device is disclosed. The method may include detecting a change in transmission rate from a higher rate to a lower rate, decoding and shifting a first, second and third received first decoder set of frame parameters, generating a first decoder output audio frame from the previously shifted frame parameters, generating a first, second and third second decoder audio fill frame, the second decoder being a higher rate decoder than first decoder, outputting a first and second second decoder audio fill frame, combining the first decoder audio frame and the third second decoder audio fill frame with overlapping triangular windows, and outputting combined first decoder and second decoder frames to an audio buffer for subsequent transmission to a user of the communication device. In an alternative embodiment, another method may include detecting and processing a change in transmission rate from a lower rate to a higher rate.05-21-2009
20090281793Time varying processing of repeated digital audio samples in accordance with a user defined effect - A programmed “Stutter Edit” creates, stores and triggers combinations of effects to be used on a repeated short sample (“slice”) of recorded audio. The combination of effects (“gesture”) act on the sample over a specified duration (“gesture length”), with the change in parameters for each effect over the gesture length being dictated by user-defined curves. Such a system affords wide manipulation of audio recorded on-the-fly, perfectly suited for live performance. These effects preferably include not only stuttering but also imposing an amplitude envelope on the slice being triggered, sample rate and bit rate manipulation, panning (interpolation between pre-defined spatial positions), high- and low-pass filters and compression. Destructive edits, such as reversing, pitch shifting, and fading may also alter the way the Stutter Edit is heard. More advanced techniques, include using filters, FX processors, and other plug-ins, can increase the detail and uniqueness of a particular Stutter Edit effect.11-12-2009
20090006083Systems And Methods For Spoken Information - A communication system, according to various aspects of the present invention, communicates via a network with a person who operates a voice input/output device. The communication system provides collecting and reporting services for journals and surveys all via spoken information. The communication system includes a meta-data database, a constructing engine, and a conversing engine. The constructing engine directs collecting of meta-data directs constructing a journal in accordance with the meta-data database. The conversing engine collects the meta-data from the person for storage in the meta-data database and collects data from the person for storage in the journal. The conversing engine collects further spoken information for a survey.01-01-2009
20130218556VOICE PROCESSING APPARATUS AND VOICE PROCESSING METHOD - A voice processing apparatus is provided in an ADPCM (Adaptive Differential Pulse Code Modulation) voice transmission system in which voice data that is differentially quantized through an ADPCM scheme is transmitted. The voice processing apparatus includes an error detector which detects whether or not an error occurs in a transmission frame containing voice data that indicates a differential value, and an error determiner which determines a level of the error detected by the error detector when the error detector detects the error. The voice processing apparatus also includes a voice processor which corrects the voice data with a correction value depending on the level of the error detected b the error detector and an ADPCM decoder which decodes the voice data corrected by the voice processor.08-22-2013
20110224974SPEECH RECOGNITION AND TRANSCRIPTION AMONG USERS HAVING HETEROGENEOUS PROTOCOLS - A system is disclosed for facilitating speech recognition and transcription among users employing incompatible protocols for generating, transcribing, and exchanging speech. The system includes a system transaction manager that receives a speech information request from at least one of the users. The speech information request includes formatted spoken text generated using a first protocol. The system also includes a speech recognition and transcription engine, which communicates with the system transaction manager. The speech recognition and transcription engine receives the speech information request from the system transaction manager and generates a transcribed response, which includes a formatted transcription of the formatted speech. The system transmits the response to the system transaction manager, which routes the response to one or more of the users. The latter users employ a second protocol to handle the response, which may be the same as or different than the first protocol. The system transaction manager utilizes a uniform system protocol for handling the speech information request and the response.09-15-2011
20090024386Multi-mode speech encoding system - A method comprises analyzing each frame of a plurality of frames of the speech signal to determine one or more speech parameters for the speech signal; deciding, for each frame of the plurality of frames of the speech signal, based on the one or more speech parameters of the speech signal, to select one of a plurality of encoding modes including a first encoding mode and a second encoding mode for encoding each frame of the plurality of frames of the speech signal; encoding each frame of the plurality of frames of the speech signal according to the selected one of the plurality of encoding modes for each frame of the plurality of frames in the deciding; the first encoding mode supports a first encoding rate and the second encoding mode supports a second encoding rate, wherein the first encoding rate is the same encoding rate as the encoding rate.01-22-2009
20110144980SYSTEM AND METHOD FOR UPDATING INFORMATION IN ELECTRONIC CALENDARS - Systems and methods for updating electronic calendar information. Speech is received from a user at a vehicle telematics unit (VTU), wherein the speech is representative of information related to a particular vehicle trip. The received speech is recorded in the VTU as a voice memo, and data associated with the voice memo is communicated from the VTU to a computer running a calendaring application. The data is associated with a field of the calendaring application, and stored in association with the calendaring application field.06-16-2011
20110231184Correlation of transcribed text with corresponding audio - In one embodiment, a method includes receiving at a communication device an audio communication and a transcribed text created from the audio communication, and generating a mapping of the transcribed text to the audio communication independent of transcribing the audio. The mapping identifies locations of portions of the text in the audio communication. An apparatus for mapping the text to the audio is also disclosed.09-22-2011
20090248402VOICE MIXING METHOD AND MULTIPOINT CONFERENCE SERVER AND PROGRAM USING THE SAME METHOD - The voice mixing method includes a first step for selecting voice information from a plurality of voice information, a second step for adding up all the selected voice information, a third step for obtaining a voice signal totaling the voice signals other than one voice signal, of the selected voice signals, a fourth step for encoding the voice information obtained in the second step, a fifth step for encoding the voice signal obtained in the third step, and a sixth step for copying the encoded information obtained in the fourth step into the encoded information in the fifth step.10-01-2009
20100191522Apparatus and method for noise generation - The disclosure provides a method for noise generation, including: determining an initial value of a reconstructed parameter; determining a random value range based on the initial value of the reconstructed parameter; taking a value in the random value range randomly as a reconstructed noise parameter; and generating noise by using the reconstructed noise parameter. The disclosure also provides an apparatus for noise generation.07-29-2010
20120197633VOICE QUALITY MEASUREMENT DEVICE, METHOD AND COMPUTER READABLE MEDIUM - A voice quality measurement device that measures voice quality of a decoded voice signal outputted from a voice decoder unit. The voice quality measurement device includes a packet buffer unit and a voice information monitoring unit. The packet buffer unit accumulates voice packets that arrive non-periodically as voice information, and outputs the voice information to the voice decoder unit periodically. The voice information monitoring unit monitors continuity of the voice information inputted to the voice decoder unit, and calculates an index of voice quality of the decoded voice signal that reflects acceptability of this continuity.08-02-2012
20100241421LANGUAGE PROCESSOR - A language processor according to the present invention includes a probability calculating section (09-23-2010
20100241422SYNCHRONIZING A CHANNEL CODEC AND VOCODER OF A MOBILE STATION - In one embodiment, the present invention includes a method for maintaining a vocoder and channel codec in substantial synchronization. The method may include receiving a configuration message that includes rate information and an effective radio block identifier at a mobile station, coding a current radio block via a vocoder and channel codec, configuring an encoding portion of the vocoder and channel codec with the rate information after performing the coding, and then coding the effective radio block using the rate information. Other embodiments are described and claimed.09-23-2010
20110238414TELEPHONY SERVICE INTERACTION MANAGEMENT - A method for managing an interaction of a calling party to a communication partner is provided. The method includes automatically determining if the communication partner expects DTMF input. The method also includes translating speech input to one or more DTMF tones and communicating the one or more DTMF tones to the communication partner, if the communication partner expects DTMF input.09-29-2011
20100274556VECTOR QUANTIZER, VECTOR INVERSE QUANTIZER, AND METHODS THEREFOR - Disclosed is a vector quantizer in which, in multistage vector quantization, the vector quantization of the following stage can be performed adaptively to the result of the vector quantization of the preceding stage to improve the accuracy of the quantization at less calculation amount and bit rate. The quantizer comprises a product set circle calculating section (10-28-2010
20100274554SPEECH ANALYSIS SYSTEM - A speech analysis system, including a kurtosis module for processing a coded sound signal to generate kurtosis measure data; a wavelet module for processing the coded sound signal to generate wavelet coefficients; and a classification module for processing the wavelet coefficients and the kurtosis measure data to generate label data representing a classification for the coded sound signal. The sound signal is classified as environmental noise, silence, speech from a single speaker, speech from multiple speakers, speech from a single speaker plus environmental noise, or speech from multiple speakers plus environmental noise. Speech is further classified as voiced or unvoiced.10-28-2010
20100305943Method and node for the control of a connection in a communication network - A method, a control node and a program unit for controlling the establishment or modification of a connection for a subscriber having a subscription in a communication network are disclosed. The connection is to be established or modified between nodes that are adapted to employ a coding scheme selected from a plurality of supported coding schemes potentially affecting connection quality. In accordance with the invention a subscribed quality level indicating a target quality level for the subscriber associated with the subscription is determined, and a node controlling the connection checks the subscribed quality level when it selects a coding scheme to be employed for the connection.12-02-2010
20130138431SPEECH SIGNAL TRANSMISSION AND RECEPTION APPARATUSES AND SPEECH SIGNAL TRANSMISSION AND RECEPTION METHODS - A speech signal transmission apparatus includes an extractor to extract speech signals from speech source signals collected by a plurality of microphones, a power calculator to calculate powers of speech signals of multiple channels and set any one of the speech signals of the multiple channels as a reference speech signal, a synchronization adjustor to adjust synchronization of the other speech signals based on the reference speech signal, a signal generator to generate extraction signals by offsetting the reference speech signal from the other synchronization-adjusted speech signals, an encryptor to compress and encrypt the reference speech signal and the extraction signals, and a transmitter to transmit the compressed and encrypted reference speech signal and extraction signals.05-30-2013
20090076803WIRED AND MOBILE WI-FI NETWORKS, CELLULAR, GPS AND OTHER POSITION FINDING SYSTEMS - A voice signal is processed and connected by wire to a mobile wireless unit for further processing into time division multiple access (TDMA) and into spread spectrum signals. The wireless unit is processing a data signal into orthogonal frequency division multiplex (OFDM) signal. The wireless unit receives and processes a position finder signal from Global Positioning System (GPS) satellite and from land based transmitter and provides processed position finder signals. The wireless unit generates a processed touch screen control signal and processes the touch screen control signal with processed position finder, TDMA and spread spectrum signal or with processed OFDM signal and provides these processed signals to a transmitter for wireless signal transmission. The processed OFDM signal is used in a Wi-Fi wireless network and the TDMA or spectrum signal is used in a cellular system, wherein the wireless network and the cellular system are distinct. Processing of position finder signal incorporates step of receiving and processing a signal from Global Positioning System (GPS) satellite and from land based transmitter and for providing processed position finder signals received from GPS satellite and from land based transmitters for transmission. Processed TDMA or spread spectrum signals include code division multiple access (CDMA) and code selectable cross-correlated in-phase and quadrature-phase baseband signals. Modulation and amplification structures and methods of the hybrid wired and wireless systems include amplification by non-linearly amplified (NLA) and by linearly amplified transmitters.03-19-2009
20110010166MOBILE COMMUNICATION TERMINAL CONNECTABLE TO NETWORK - A mobile communication terminal includes a first communication unit connectable to a first communication apparatus via a first network using a first wireless communication protocol, a second communication unit connectable to a second communication apparatus via a second network using a second wireless communication protocol different from the first wireless communication protocol, a controller configured to control the first and second communication units, a microphone, and a speaker. The controller transmits first data, addressed to the second communication apparatus, to the second network, transmits second data, addressed to the first communication apparatus, to the first network, causes the speaker to convert into corresponding voice a second voice signal addressed to the mobile communication terminal, and transmits a first voice signal.01-13-2011
20110010167METHOD FOR GENERATING BACKGROUND NOISE AND NOISE PROCESSING APPARATUS - A method for generating background noise and a noise processing apparatus are provided in order to improve user experience. The method includes: if an obtained signal frame is a noise frame, a high band noise encoding parameter is obtained from the noise frame; weighting and/or smoothing is performed on the high band noise encoding parameter to obtain a second high band noise encoding parameter; and a high band background noise signal is generated according to the second high band noise encoding parameter. A noise processing apparatus is also provided.01-13-2011
20130191116LANGUAGE DICTATION RECOGNITION SYSTEMS AND METHODS FOR USING THE SAME - Language dictation recognition systems and methods for using the same. In at least one exemplary system for analyzing verbal records, the system comprises a database capable of receiving a plurality of verbal records, the verbal record comprising at least one identifier and at least one verbal feature and a processor operably coupled to the database, where the processor has and executes a software program. The processor being operational to identify a subset of the plurality of verbal records from the database, extract at least one verbal feature from the identified records, analyze the at least one verbal feature of the subset of the plurality of verbal records, process the subset of the plurality of records using the analyzed feature according to at least one reasoning approach, generate a processed verbal record using the processed subset of the plurality of records, and deliver the processed verbal record to a recipient.07-25-2013
20100145684Regeneration of wideband speed - A system and method for processing a narrowband speech signal comprising speech samples in a first range of frequencies. the method comprises: generating from the narrowband speech signal a highband speech signal in a second range of frequencies above the first range of frequencies; determining a pitch of the highband speech signal; using the pitch to generate a pitch-dependent tonality measure from samples of the highband speech signal; and filtering the speech samples using a gain factor derived from the tonality measure and selected to reduce the amplitude of harmonics in the highband speech signal.06-10-2010
20110246186INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM - There is provided an information processing device including a storage unit that stores music data for playing music and lyrics data indicating lyrics of the music, a display control unit that displays the lyrics of the music on a screen, a playback unit that plays the music and a user interface unit that detects a user input. The lyrics data includes a plurality of blocks each having lyrics of at least one character. The display control unit displays the lyrics of the music on the screen in such a way that each block included in the lyrics data is identifiable to a user while the music is played by the playback unit. The user interface unit detects timing corresponding to a boundary of each section of the music corresponding to each displayed block in response to a first user input.10-06-2011
20120035918METHOD AND ARRANGEMENT FOR PROVIDING A BACKWARDS COMPATIBLE PAYLOAD FORMAT - In a method of providing a backward and forward compatible speech codec payload format, the following steps are included: providing S02-09-2012
20110082690SOUND MONITORING SYSTEM AND SPEECH COLLECTION SYSTEM - Monitoring accuracy degrades due to a noise in an environment where there are many sound sources other than those to be monitored. Easy initialization is required for an environment where many apparatuses operate. A sound monitoring system includes a microphone array having multiple microphones and a location-based abnormal sound monitoring section as a processing section. The location-based abnormal sound monitoring section is supplied with an input signal from the microphone array via a waveform acquisition section and a network. Using the input signal, the location-based abnormal sound monitoring section detects a temporal change in a sound source direction histogram. Based on a detected change result, the location-based abnormal sound monitoring section checks for abnormality in a sound field and outputs a monitoring result. The processing section searches for a microphone array near the sound source to be monitored. The processing section selects a sound field monitoring function for the sound source to be monitored based on various data concerning a microphone belonging to the searched microphone array.04-07-2011
20110087487METHOD AND SYSTEM FOR MEMORY USAGE IN REAL-TIME AUDIO SYSTEMS - System and method for encoding, transmitting and decoding audio data. Audio bit steam syntax is re-organized to allow system optimizations that work well with memory latency and memory burst operations. Multiple small entropy coding tables are stored in RAM and loaded to on-chip memory as needed. Audio prediction is pipelined in the bitstream syntax. Intra frames, independent of other frames in the bitstream, are included in the bitstream for error recovery and channel change. New algorithms are implemented in legacy syntax by including the new information in the user data space of the audio frame. The new decoder can use projection to determine where the new information is and read ahead in the stream. Audio prediction from the immediately previous frame is restricted. Audio prediction is performed across channels within a single audio frame. A variable re-order function comprises storing channels of data to DRAM in the order they are decoded and reading them out in presentation order.04-14-2011
20100017196METHOD, SYSTEM, AND APPARATUS FOR COMPRESSION OR DECOMPRESSION OF DIGITAL SIGNALS - Embodiments of methods, apparatuses, devices and systems associated with compression and decompression of digital signals are disclosed.01-21-2010
20100057445System And Method For Automatically Adjusting Floor Controls For A Conversation - A system and method for automatically adjusting floor controls for a conversation is provided. Audio streams are received, which each originate from an audio source. Floor controls for a current configuration including at least a portion of the audio streams are maintained. Conversational characteristics shared by two or more of the audio sources are determined. Possible configurations for the audio streams are identified based on the conversational characteristics. An analysis of the current configuration and the possible configurations is performed. A change threshold is applied to the analysis. When the analysis satisfies the change threshold, the floor controls are automatically adjusted. The audio streams are mixed into one or more outputs based on the adjusted floor controls.03-04-2010
20100070267METHOD AND APPARATUS FOR QOS IMPROVEMENT WITH PACKET VOICE TRANSMISSION OVER WIRELESS LANS - A method for improving packetized speech transmitted over a wireless LAN is disclosed. Speech packets transmitted over the wireless LAN are monitored for errors. Any of the speech packets found to have errors are replaced with synthesized speech packets. The synthesized speech packets may be created from a vocal tract model generated from the received speech packets during periods of time when there are no errors.03-18-2010
20100057444METHOD AND SYSTEM OF EXTENDING BATTERY LIFE OF A WIRELESS MICROPHONE UNIT - A method of extending battery life of a wireless microphone unit includes muting the wireless microphone unit responsive to a mute signal from a base station unit, transmitting, by the wireless microphone unit, compressed muted audio data, wherein the compressed muted audio data is compressed via a first compression scheme, determining, by the wireless microphone unit, whether an unmute signal has been received from the base station unit, and responsive to a determination that the unmute signal has been received, unmuting the wireless microphone unit. The method further includes discontinuing transmission of the compressed muted audio data and transmitting compressed audio data via a second compression scheme, wherein the first transmitting step causes the wireless microphone unit to consume less power per unit of transmission time than the second transmitting step.03-04-2010
20110077938DATA REPRODUCTION METHOD AND DATA REPRODUCTION APPARATUS - A reproduction apparatus that reproduces compressed audio data recorded in a recording medium inserts dummy data between data to be concatenated and reproduces the data when performing a specific reproduction of the data obtained by concatenating data which are discontinuously read from the recording medium.03-31-2011
20120303360PRESERVING AUDIO DATA COLLECTION PRIVACY IN MOBILE DEVICES - Techniques are disclosed for using the hardware and/or software of the mobile device to obscure speech in the audio data before a context determination is made by a context awareness application using the audio data. In particular, a subset of a continuous audio stream is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the gathered audio. The subset is analyzed for audio characteristics, and a determination can be made regarding the ambient environment.11-29-2012
20100063802Adaptive Frequency Prediction - In one embodiment, a method of transceiving an audio signal is disclosed. The method includes providing low band spectral information having a plurality of spectrum coefficients and predicting a high band extended spectral fine structure from the low band spectral information for at least one subband, where the high band extended spectral fine structure are made of a plurality of spectrum coefficients. The predicting includes preparing the spectrum coefficients of the low band spectral information, defining prediction parameters for the high band extended spectral fine structure and index ranges of the prediction parameters, and determining possible best indices of the prediction parameters, where determining includes minimizing a prediction error between a reference subband in high band and a predicted subband that is selected and composed from an available low band. The possible best indices of the prediction parameters are transmitted.03-11-2010
20100004926APPARATUS AND METHOD FOR CLASSIFICATION AND SEGMENTATION OF AUDIO CONTENT, BASED ON THE AUDIO SIGNAL - An apparatus for classifying an input audio signal into audio contents of a first and second class, comprising an audio segmentation module adapted to segment said input audio signal into segments of a predetermined length; a feature computation module adapted to calculate for the segments features characterizing said audio input signal; a threshold comparison module adapted to generate a feature vector for each of said one or more segments based on a plurality of predetermined thresholds, the thresholds including for each of the audio contents of the first class and of the second class a substantially near certainty threshold, a substantially high certainty threshold, and a substantially low certainty threshold; and a classification module adapted to analyze the feature vector and classify each one of said one or more segments as audio contents of the first class, of the second class, or as non-decisive audio contents.01-07-2010
20080255829Method and Test Signal for Measuring Speech Intelligibility - A method and apparatus for estimating speech intelligibility in a mobile communications network component handling two-way communication between two ends of a signal path. Test signals adapted for speech intelligibility measurements are inserted into the signal path to simulate two-way communication. Double-talk is detected during the communication, and speech intelligibility measurements are performed only during periods of double-talk. This enables the effect of echo to be taken into account while avoiding undesirable effects from non-linear processing, and comfort noise if present, in the signal path. Voice enhancement devices may then be adjusted in response to the estimated speech intelligibility.10-16-2008
20080255827Voice Conversion Training and Data Collection - It may be desirable to provide a way to collect high quality speech training data without undue burden to the user. Speech training data may be collected during normal usage of a device. In this way, the collection of speech training data may be effectively transparent to the user, without the need for a distinct collection mode from the user's point of view. For example, where the device is or includes a phone (such as a cellular phone), when the user makes or receives a phone call to/from another party, speech training data may be automatically collected from one or both of the parties during the phone call.10-16-2008
20110264446METHOD, SYSTEM, AND MEDIA GATEWAY FOR REPORTING MEDIA INSTANCE INFORMATION - A method, a system, and a media gateway (MG) for reporting media instance information are disclosed. The method for reporting media instance information includes: detecting, by an MG, received media data according to a set media instance detection (MID) event; and reporting, by the MG, the MID event when the media instance information is detected. With the present invention, the MG reports the detected media instance information related to the media data to a media gateway controller (MGC) through a set MID event, so that the MG can detect media instance information related to the media data, and report the detected media instance information related to the media data to the MGC. In this way, the MGC can execute corresponding control operations according to the media instance information related to the media data, extending the applicable scope of media services.10-27-2011
20110257964Minimizing Speech Delay in Communication Devices - Methods and apparatus for coordinating audio data processing and network communication processing in a communication device. In an exemplary method lower and upper threshold values for use by a network communication processing circuit are set, the lower and upper threshold values defining a window of timing offsets relative to each of a series of periodic network communications frame boundaries. A series of encoded audio data frames are sent to the network communication processing circuit for transmission over the network communications link. The delivery of encoded audio data to the network communication processing circuit outside of the corresponding time window defined by the threshold values will trigger an event report. This event report is received from the network communication processing circuit by the audio data processing circuit, and, in response, timing is adjusted for the sending of one or more of the encoded audio data frames.10-20-2011
20110119053System And Method For Leaving And Transmitting Speech Messages - A system for leaving and transmitting speech messages automatically analyzes input speech of at least a reminder, fetches a plurality of tag informations, and transmits speech message to at least a message receiver, according to the transmit criterions of the reminder. A command or message parser parses the tag informations at least including at least a reminder ID, at least a transmitted command and at least a speech message. The tag informations are sent to a message composer for being synthesized into a transmitted message. A transmitting controller controls a device switch according to the reminder ID and the transmitted command, to allow the transmitted message send to the message receiver via a transmitting device.05-19-2011
20110125488ADAPTIVE DATA TRANSMISSION FOR A DIGITAL IN-BAND MODEM OPERATING OVER A VOICE CHANNEL - In one example, a mobile device encodes a digital bitstream using a particular set of modulation parameters to generate an audio signal that has different audio tones selected to pass through a vocoder of the mobile device. The particular set of modulation parameters is optimized for a subset of a plurality of vocoding modes without a priori knowledge of which one of the vocoding modes is currently operated by the vocoder. The mobile device conducts transmissions over the wireless telecommunications network through the vocoder using the particular set of modulation parameters, and monitors these transmissions for errors. If the errors reach a threshold, then the vocoder may be using one of the vocoding modes that are not included in the subset for which the particular set of modulation parameters is optimized, and accordingly, the modulation device switches from the particular set of modulation parameters to a different set of modulation parameters.05-26-2011
20110082689VAMOS - DARP receiver switching for mobile receivers - Embodiments of the invention include apparatuses, systems, computer readable media, and methods for processing speech signals in a manner that enhances capacity, efficiency and hardware utilization of a communications network. A method, according to one embodiment, includes receiving speech signals, determining a subchannel power imbalance ratio of at least two subchannels, and selecting a receiver architecture for processing the speech signals in accordance with the determined subchannel power imbalance ratio.04-07-2011
20100324890Method and Apparatus For Selecting An Audio Stream - An active stream is selected from one of a plurality of audio streams generated in a common acoustic environment by obtaining, for each stream obtaining, at a series of measurement instants t12-23-2010
20100332220Conversation Recording with Real-Time Notification for Users of Communication Terminals - A recording device provides conversation recording with real-time notification between users of communication terminals engaged in a conversation. The recording device provides a recording start notification to a second communication terminal in response to receiving an initiate recording request from a first communication terminal, and initiates recording of the conversation. The recording device terminates recording of the conversation in response to receiving a terminate recording request from the first communication terminal, provides a recording stop notification to the second communication terminal, and saves the recorded conversation to a file in a file storage medium. The recording start and recording stop notifications can be either audible or electronic notifications. The first communication terminal may be muted prior to providing a notification, and un-muted subsequent to the notification. The recording device may obtain permission from the second communication terminal to record the conversation.12-30-2010
20120310634COMMUNICATION DEVICE WITH REDUCED NOISE SPEECH CODING - A communication device includes memory, an input interface, a processing module, and a transmitter. The processing module receives a digital signal from the input interface, wherein the digital signal includes a desired digital signal component and an undesired digital signal component. The processing module identifies one of a plurality of codebooks based on the undesired digital signal component. The processing module then identifies a codebook entry from the one of the plurality of codebooks based on the desired digital signal component to produce a selected codebook entry. The processing module then generates a coded signal based on the selected codebook entry, wherein the coded signal includes a substantially unattenuated representation of the desired digital signal component and an attenuated representation of the undesired digital signal component. The transmitter converts the coded signal into an outbound signal in accordance with a signaling protocol and transmits it.12-06-2012
20100082334SYSTEM AND METHOD FOR VOICE USER INTERFACE NAVIGATION - A Voice User Interface (VUI) or Interactive Voice Response (IVR) system utilizes three levels of navigation (e.g. Main Menu, Services, and Helper Commands) in presenting information units arranged in sets. The units are “spoken” by a system in a group to a human user and the group of information at each level is preceded by a tone that is unique to the level. When navigating the levels, the tones of the levels are in a musical progression, e.g. the three-note blues progression I, IV, V, for preceding the groups of information, respectively. The musical progression returns to the tonic of the musical key when the navigation returns to the level one of the first group of information.04-01-2010
20090299733METHODS AND SYSTEM FOR CREATING AND EDITING AN XML-BASED SPEECH SYNTHESIS DOCUMENT - A method for creating and editing an XML-based speech synthesis document for input to a text-to-speech engine is provided. The method includes recording voice utterances of a user reading a pre-selected text and parsing the recorded voice utterances into individual words and periods of silence. The method also includes recording a synthesized speech output generated by a text-to-speech engine, the synthesized speech output being an audible rendering of the pre-selected text, and parsing the synthesized speech output into individual words and periods of silence. The method further includes annotating the XML-based speech synthesis document based upon a comparison of the recorded voice utterances and the recorded synthesized speech output.12-03-2009
20090299735Method for Transferring an Audio Stream Between a Plurality of Terminals - A method of transferring an audio stream between at least two terminals, comprising the following steps: a step of connecting a first device and at least a second device; and a step of transferring an audio stream from the first device to the second device; the method being characterized in that it further comprises the following steps: a determination step during which it is determined that the first device or a first network to which the first device is connected or a second network to which the second device is connected is adapted to produce or transfer an audio stream comprising N channels and that the second device includes an electroacoustic transducer adapted to receive an audio stream comprising P channels; and a conversion step during which an audio stream sent by the first device or transmitted by the first network or transmitted by the second network is converted into an audio stream comprising P channels. An associated transfer system is also disclosed. Application to telephone or videophone communication in IP networks.12-03-2009
20090299734STEREO AUDIO ENCODING DEVICE, STEREO AUDIO DECODING DEVICE, AND METHOD THEREOF - Disclosed is a stereo audio encoding device capable of improving a spatial image of a decoded audio in stereo audio encoding. In this device, an original cross correlation calculation unit (12-03-2009
20090292531SYSTEM FOR HANDLING A PLURALITY OF STREAMING VOICE SIGNALS FOR DETERMINATION OF RESPONSIVE ACTION THERETO - Streaming voice signals, such as might be received at a contact center or similar operation, are analyzed to detect the occurrence of one or more unprompted, predetermined utterances. The predetermined utterances preferably constitute a vocabulary of words and/or phrases having particular meaning within the context in which they are uttered. Detection of one or more of the predetermined utterances during a call causes a determination of response-determinative significance of the detected utterance(s). Based on the response-determinative significance of the detected utterance(s), a responsive action may be further determined. Additionally, long term storage of the call corresponding to the detected utterance may also be initiated. Conversely, calls in which no predetermined utterances are detected may be deleted from short term storage. In this manner, the present invention simplifies the storage requirements for contact centers and provides the opportunity to improve caller experiences by providing shorter reaction times to potentially problematic situations.11-26-2009
20130124196METHOD AND APPARATUS FOR GENERATING NOISES - A method and an apparatus for generating comfortable noises so as to improve user experience are disclosed. The method includes: if a received data frame is a noise frame, calculating a corresponding energy attenuation parameter based on the noise frame and a data frame received earlier than the noise frame; and attenuating noise energy based on the energy attenuation parameter to obtain a comfortable noise signal. An apparatus for generating comfortable noise is also provided.05-16-2013
20090319260METHOD AND SYSTEM FOR AUDIO TRANSMIT PROCESSING IN AN AUDIO CODEC - Methods and systems for audio transmit processing in an audio CODEC are disclosed and may comprise receiving one or more analog and/or digital audio signals, and simultaneously processing the received one or more analog audio and/or digital audio signals via a plurality of processing paths of the audio CODEC. The digital audio signals may be generated via a digital microphone, which may comprise a microelectromechanical (MEMS) microphone, and may be utilized for audio beamforming. The received analog and digital signals may be processed at one or more sampling rates, and may be filtered via decimation filters. The received analog signals may be converted to digital signals. The processing may comprise converting a sampling rate of the received digital signals and the converted analog signals. The processing may comprises filtering of the received digital signals and the converted analog signals via infinite impulse response (IIR) filters.12-24-2009
20120041760VOICE RECORDING EQUIPMENT AND METHOD - In a voice recording equipment and method, voice data from a speaker is received using a microphone. Threshold values T02-16-2012
20120041761VOICE DECODING APPARATUS AND VOICE DECODING METHOD - Disclosed is a voice decoding apparatus wherein the processor may be continuously employed for other applications for a prescribed time but, in response to an urgent interrupt, the processor can generate synthesised sound even when being used for other applications, without interruption. In this apparatus, a packet receiving section (02-16-2012
20120041759Mobile Replacement-Dialogue Recording System - A mobile replacement-dialogue recording system enables the creation of replacement-dialogue items by mobile users not at a media recording studio. Studio-users prepare guide media video, audio and text data which are made available to mobile users through a media server. A mobile user's mobile replacement-dialogue recording device obtains guide media and allows the user to view the guide media in rehearsal mode. The mobile replacement-dialogue recording device then records the mobile user's dialogue performance while presenting the mobile user with synchronized guide media. The mobile user can review, delete, and rerecord the resulting potential replacement dialogue, as well as create feedback media characterizing the replacement dialogue. Selected replacement dialogue items can be transmitted to the media server. A studio-module can then obtain the selected replacement dialogue items and feedback media from the media server so that they may be used in media-replacement.02-16-2012
20120209596ENCODING DEVICE, DECODING DEVICE AND METHOD FOR BOTH - Disclosed are an encoding device and a decoding device which suppress the occurrence of pre-echo artifacts and post-echo artifacts caused by a high layer having a low temporal resolution, and which implement high subjective quality encoding and decoding. An encoding device (08-16-2012
20120046941DIGITAL VOICE COMMUNICATION CONTROL DEVICE AND METHOD - A digital audio communication control apparatus includes a first mixing unit that mixes a voice input from a voice input unit and uttered by a specific speaker with a voice input from a digital audio packet receiving unit and uttered by at least one speaker except for the specific speaker, and a second mixing unit that mixes the voices mixed by the first mixing unit with the voice of the specific speaker. The voices mixed by the second mixing unit are fed back to the specific speaker.02-23-2012
20120010877SYSTEM AND METHOD FOR PERFORMING SPEECH SYNTHESIS WITH A CACHE OF PHONEME SEQUENCES - Disclosed are systems, methods, and computer readable media for performing speech synthesis. The method embodiment comprises applying a first part of a speech synthesizer to a text corpus to obtain a plurality of phoneme sequences, the first part of the speech synthesizer only identifying possible phoneme sequences, for each of the obtained plurality of phoneme sequences, identifying joins that would be calculated to synthesize each of the plurality of respective phoneme sequences, and adding the identified joins to a cache for use in speech synthesis.01-12-2012
20120116752AUDIO DATA PROCESSING METHOD AND AUDIO DATA PROCESSING SYSTEM - Audio data processing method and an audio data processing system are described. The audio data processing system includes an audio collect module, a processing module, a virtual play module, a virtual collect module, and a buffer memory. The virtual play module and the virtual collect module are registered in an application interface layer of a third-part software. The third-part software chooses the virtual play module and the virtual collect module. The virtual play module is configured for receiving audio data processed by the processing module and storing the processed audio data in the buffer memory. The virtual collect module is configured for collecting the processed audio data from the buffer memory and transmitting the processed audio data to the third-part software. The invention provides a universal solution suitable for any chatting tool by installing the virtual speaker and the virtual microphone.05-10-2012
20120016666Audiovisual (AV) Device and Control Method Thereof - According to one embodiment, an AV device comprises a receiving section, a processing section, a storage section and a control section. The receiving section receives a digital voice signal. The processing section applies a predetermined signal processing operation to the digital voice signal received by the receiving section. The storage section stores information indicating time required for the signal processing operation at the processing section, and when a voice has been set in a mute state, stores the information indicating the time required for the signal processing operation by the processing section which is rewritten into a value that cannot be taken in general. The control section outputs information stored in the storage section upon an external request. Other embodiments are also described.01-19-2012
20090157392PROVIDING SPEECH RECOGNITION DATA TO A SPEECH ENABLED DEVICE WHEN PROVIDING A NEW ENTRY THAT IS SELECTABLE VIA A SPEECH RECOGNITION INTERFACE OF THE DEVICE - The present invention discloses a solution for providing a phonetic representation for a content item along with a content item delivered to a speech enabled computing device. The phonetic representation can be specified in a manner that enables it to be added to a speech recognition grammar of the speech enabled computing device. Thus, the device can recognize speech commands using the newly added phonetic representation that involve the content item. Current implementations of speech recognition systems of this type rely internal generation of speech recognition data that is added to the speech recognition grammar. Generation of speech recognition data can, however, be resource intensive, which can be particularly problematic when the speech enabled device is resource limited. The disclosed solution offloads the task of providing the speech recognition data to an external device, such as a relatively resource rich server or a desktop device.06-18-2009
20110093260SIGNAL CLASSIFYING METHOD AND APPARATUS - A signal classifying method and apparatus are disclosed. The signal classifying method includes: obtaining a spectrum fluctuation parameter of a current signal frame determined as a foreground frame, and buffering the spectrum fluctuation parameter; obtaining a spectrum fluctuation variance of the current signal frame according to spectrum fluctuation parameters of all buffered signal frames, and buffering the spectrum fluctuation variance; and calculating a ratio of signal frames whose spectrum fluctuation variance is above or equal to a first threshold to all the buffered signal frames, and determining the current signal frame as a speech frame if the ratio is above or equal to a second threshold or determining the current signal frame as a music frame if the ratio is below the second threshold. In the embodiments of the present disclosure, the spectrum fluctuation variance of the signal is used as a parameter for classifying the signals, and a local statistical method is applied to decide the type of the signal. Therefore, the signals are classified with few parameters, simple logical relations and low complexity.04-21-2011
20120072207DOWN-MIXING DEVICE, ENCODER, AND METHOD THEREFOR - Provided are a down-mixing method and an encoder, wherein a high quantization performance can be realized when a balance adjustment operation due to a balance weight coefficient and a removal operation of a main component are combined. In the encoder (03-22-2012
20120072206TERMINAL APPARATUS AND SPEECH PROCESSING PROGRAM - A terminal apparatus configured to obtain positional information indicating a position of another apparatus; to obtain positional information indicating a position of the terminal apparatus; to obtain a first direction, which is a direction to the obtained position of the another apparatus and calculated using the obtained position of the terminal apparatus; to obtain a second direction, which is a direction in which the terminal apparatus is oriented; to obtain inclination information indicating whether the terminal apparatus is inclined to the right or to the left; to switch an amount of correction for a relative angle between the first direction and the second direction in accordance with whether the obtained inclination information indicates an inclination to the right or an inclination to the left; and to determine an attribute of speech output from a speech output unit in accordance with the relative angle corrected by the amount of correction.03-22-2012
20130185061METHOD AND APPARATUS FOR MASKING SPEECH IN A PRIVATE ENVIRONMENT - A speech masking apparatus includes a microphone and a speaker. The microphone can detect a human voice. The speaker can output a masking language which can include phonemes resembling human speech. At least one component of the masking language can have a pitch, a volume, a theme, and/or a phonetic content substantially matching a pitch, a volume, a theme, and/or a phonetic content of the voice.07-18-2013
20120130709SYSTEM AND METHOD FOR BUILDING AND EVALUATING AUTOMATIC SPEECH RECOGNITION VIA AN APPLICATION PROGRAMMER INTERFACE - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for building an automatic speech recognition system through an Internet API. A network-based automatic speech recognition server configured to practice the method receives feature streams, transcriptions, and parameter values as inputs from a network client independent of knowledge of internal operations of the server. The server processes the inputs to train an acoustic model and a language model, and transmits the acoustic model and the language model to the network client. The server can also generate a log describing the processing and transmit the log to the client. On the server side, a human expert can intervene to modify how the server processes the inputs. The inputs can include an additional feature stream generated from speech by algorithms in the client's proprietary feature extraction.05-24-2012
20120109644INFORMATION PROCESSING DEVICE AND MOBILE TERMINAL - There is a need to enable decompression of a speech signal even if no network synchronizing signal is output from a baseband processing portion. For this purpose, an information processing device includes a first serial interface. The first serial interface includes a notification signal generation circuit that generates a notification signal each time compressed data incorporated from the baseband processing portion reaches a predetermined data quantity, and notifies a speech processing portion of this state using the notification signal. The speech processing portion includes a synchronizing signal generation circuit that generates a network synchronizing signal based on the notification signal. A clock signal for PCM communication is generated based on the network synchronizing signal. A speech signal can be decompressed even if no network synchronizing signal is output from the baseband processing portion.05-03-2012
20090076802WIDEBAND CODEC NEGOTIATION - The invention proposes several methods for codec handling. In specific, methods involving providing a supported codec list of a Call Control Server are described. A node receives information, whether a terminal supports a wideband codec, wherein the information is received in call set up signaling from the terminal of the subscriber. Furthermore, configuration information is retrieved, whether a Radio Access Node supports the wideband codec. Additionally, information is retrieved, whether a media gateway supports the wideband codec, wherein the information is either provided by the operator or retrieved from the media gateway (MGW1, MGW2, MGWx). The information is analyzed and in response to the analysis a supported codec list is provided. Furthermore, alternative embodiments and devices adapted for the methods are disclosed.03-19-2009
20120316868Methods And Systems For Changing A Communication Quality Of A Communication Session Based On A Meaning Of Speech Data - Methods and systems are described for changing a communication quality of a communication session based on a meaning of speech data. Speech data exchanged between clients participating in a communication session is parsed. A meaning of the parsed speech data is determined to determine a communication quality of the communication session. An action is performed to change the communication quality of the communication session based on the meaning of the parsed speech data.12-13-2012
20100250243Service Oriented Speech Recognition for In-Vehicle Automated Interaction and In-Vehicle User Interfaces Requiring Minimal Cognitive Driver Processing for Same - A system and method for implementing a server-based speech recognition system for multi-modal automated interaction in a vehicle includes receiving, by a vehicle driver, audio prompts by an on-board human-to-machine interface and a response with speech to complete tasks such as creating and sending text messages, web browsing, navigation, etc. This service-oriented architecture is utilized to call upon specialized speech recognizers in an adaptive fashion. The human-to-machine interface enables completion of a text input task while driving a vehicle in a way that minimizes the frequency of the driver's visual and mechanical interactions with the interface, thereby eliminating unsafe distractions during driving conditions. After the initial prompting, the typing task is followed by a computerized verbalization of the text. Subsequent interface steps can be visual in nature, or involve only sound.09-30-2010
20120084079Integration of Embedded and Network Speech Recognizers - A method, computer program product, and system are provided for performing a voice command on a client device. The method can include translating, using a first speech recognizer located on the client device, an audio stream of a voice command to a first machine-readable voice command and generating a first query result using the first machine-readable voice command to query a client database. In addition, the audio stream can be transmitted to a remote server device that translates the audio stream to a second machine-readable voice command using a second speech recognizer. Further, the method can include receiving a second query result from the remote server device, where the second query result is generated by the remote server device using the second machine-readable voice command and displaying the first query result and the second query result on the client device.04-05-2012
20120084078Method And Apparatus For Voice Signature Authentication - A scalable voice signature authentication capability is provided herein. The scalable voice signature authentication capability enables authentication of varied services such as speaker identification (e.g. private banking and access to healthcare account records), voice signature as a password (e.g. secure access for remote services and document retrieval) and the Internet and its various services (e.g., online shopping), and the like04-05-2012
20120166184Selective Transmission of Voice Data - Systems and methods that provide for voice command devices that receive sound but do not transfer the voice data beyond the system unless certain voice-filtering criteria have been met are described herein. In addition, embodiments provide devices that support voice command operation while external voice data transmission is in mute operation mode. As such, devices according to embodiments may process voice data locally responsive to the voice data matching voice-filtering criteria. Furthermore, systems and methods are described herein involving voice command devices that capture sound and analyze it in real-time on a word-by-word basis and decide whether to handle the voice data locally, transmit it externally, or both.06-28-2012
20110184730MULTI-DIMENSIONAL DISAMBIGUATION OF VOICE COMMANDS - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing voice commands. In one aspect, a method includes receiving an audio signal at a server, performing, by the server, speech recognition on the audio signal to identify one or more candidate terms that match one or more portions of the audio signal, identifying one or more possible intended actions for each candidate term, providing information for display on a client device, the information specifying the candidate terms and the actions for each candidate term, receiving from the client device an indication of an action selected by a user, where the action was selected from among the actions included in the provided information, and invoking the action selected by the user.07-28-2011
20100262420AUDIO ENCODER FOR ENCODING AN AUDIO SIGNAL HAVING AN IMPULSE-LIKE PORTION AND STATIONARY PORTION, ENCODING METHODS, DECODER, DECODING METHOD, AND ENCODING AUDIO SIGNAL - An audio encoder for encoding an audio signal includes an impulse extractor for extracting an impulse-like portion from the audio signal. This impulse-like portion is encoded and forwarded to an output interface. Furthermore, the audio encoder includes a signal encoder which encodes a residual signal derived from the original audio signal so that the impulse-like portion is reduced or eliminated in the residual audio signal. The output interface forwards both, the encoded signals, i.e., the encoded impulse signal and the encoded residual signal for transmission or storage. On the decoder-side, both signal portions are separately decoded and then combined to obtain a decoded audio signal.10-14-2010
20100174530ELECTRONIC AUDIO PLAYING APPARATUS WITH AN INTERACTIVE FUNCTION AND METHOD THEREOF - An audio playing apparatus with an interactive function is provided. An interactive file stored in a data storage of the audio playing apparatus includes controlling data, a main audio, and at least one question audio. The controlling data is for controlling the playing controlling data of the main audio and the question audios. After each question audio is played, the audio playing apparatus output a voice prompt to give user a reference answer.07-08-2010
20100174531Speech coding - A method of encoding one or more parent blocks of values, the number of values being the length of each block, the method comprising for each parent block:07-08-2010
20100274555Audio Coding Apparatus and Method Thereof - An apparatus comprising at least one processor and at least one memory including computer program code the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to determine at least one characteristic of the audio signal; divide the audio signal into at least a low frequency portion and a high frequency portion, and generate from the high frequency portion a plurality of high frequency band signals dependent on the at least one characteristic of the audio signal; and determine for each of the plurality of high frequency band signals at least part of the low frequency portion which can represent the high frequency band signal.10-28-2010
20120179457CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS - Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.07-12-2012
20100268529VOICE COMMUNICATION APPARATUS - An art capable of transmitting a voice separated for each speaker when voice communications are conducted in a state that a plurality of communication terminals are connected in a cascade mode is provided. When a conference is started, each participant using each terminal 10-21-2010
20120253794VOICE CONVERSION METHOD AND SYSTEM - A method of converting speech from the characteristics of a first voice to the characteristics of a second voice, the method comprising: 10-04-2012
20120253795AUDIO COMMENTING AND PUBLISHING SYSTEM - An audio commenting and publishing system including a storage database, media content and a computing device all coupled together via a network. The computing device comprises a processor and an application executed by the processor configured to input audio data that a user wishes to associate with the media content from an audio recording mechanism or a memory device. The application is then able to store the audio data on the storage database and use the network address of the audio data along with the network address of the media content to publish the audio data and the media content such that a view is able to hear and access them concurrently at a network-accessible location.10-04-2012
20120259623System and Method of Providing Generated Speech Via A Network - A system and method of operating an automatic speech recognition application over an Internet Protocol network is disclosed. The ASR application communicates over a packet network such as an Internet Protocol network or a wireless network. A grammar for recognizing received speech from a user over the IP network is selected from a plurality of grammars according to a user-selected application. A server receives information representing speech over the IP network, performs speech recognition using the selected grammar, and returns information based upon the recognized speech. Sub-grammars may be included within the grammar to recognize speech from sub-portions of a dialog with the user.10-11-2012
20120259622AUDIO ENCODING DEVICE AND AUDIO ENCODING METHOD - Disclosed is an audio encoding device which removes unnecessary inter-channel parameters from the subject to be encoded, improving the encoding efficiency thereby. In this audio encoding device, a principal component analysis unit (10-11-2012
20120265523FRAME ERASURE CONCEALMENT FOR A MULTI RATE SPEECH AND AUDIO CODEC - An audio coding terminal and method is provided. The terminal includes a coding mode setting unit to set an operation mode, from plural operation modes, for input audio coding by a codec, configured to code the input audio based on the set operation mode such that when the set operation mode is a high frame erasure rate (FER) mode the codec codes a current frame of the input audio according to a select frame erasure concealment (FEC) mode of one or more FEC modes. Upon the setting of the operation mode to be the High FER mode, the one FEC mode is selected, from the one or more FEC modes predetermined for the High FER mode, to control the codec by incorporating of redundancy within a coding of the input audio or as separate redundancy information separate from the coded input audio according to the selected one FEC mode.10-18-2012
20120265522Time Scaling of Audio Frames to Adapt Audio Processing to Communications Network Timing - Methods and apparatus for coordinating audio data processing and network communication processing in a communication device by using time scaling for either inbound or outbound audio data processing, or both, in an communication device. In particular, time scaling of audio data is used to adapt timing for audio data processing to timing for modem processing, by dynamically adjusting a collection of audio samples to fit the container size required by the modem. Speech quality can be preserved while recovering and/or maintaining correct synchronizing between audio processing and communication processing circuits. In an example method, it is determined that a completion time for processing a first audio data frame falls outside a pre-determined timing window. Responsive to this determination, a subsequent audio data frame is time-scaled to control the completion time for processing the subsequent audio data frame.10-18-2012
20110004466STEREO SIGNAL ENCODING DEVICE, STEREO SIGNAL DECODING DEVICE AND METHODS FOR THEM - A technique of improving the degree of freedom of controlling the accuracy of encoding a stereo signal. In a stereo signal encoding device (01-06-2011
20120239386METHOD AND DEVICE FOR DETERMINING A DECODING MODE OF IN-BAND SIGNALING - In the field of communications, a method and a device for determining a decoding mode of in-band signaling are provided, which improve accuracy of in-band signaling decoding. The method includes: calculating a probability of each decoding mode of in-band signaling of a received signal at a predetermined moment by using a posterior probability algorithm; and from the calculated probabilities of the decoding modes, selecting a decoding mode having a maximum probability value as a decoding mode of the in-band signaling of the received signal at the predetermined moment. The method and the device are mainly used in a process for determining a decoding mode of in-band signaling in a speech frame transmission process.09-20-2012
20110046945METHOD AND DEVICE OF BITRATE DISTRIBUTION/TRUNCATION FOR SCALABLE AUDIO CODING - Embodiments of the invention provides a method and device for assigning bitrates to a plurality of channels in a scalable audio encoding/truncation process. Different bitrates are assigned to different channels in the scalable audio encoding/truncation process.02-24-2011
20120323568Source Code Adaption Based on Communication Link Quality and Source Coding Delay - Method and arrangement in a network node for adapting a property of source coding to the quality of a communication link in packet switched conversational services in a communication system. The method comprises obtaining (12-20-2012
20120323567Packet Loss Concealment for Speech Coding - A speech coding method of significantly reducing error propagation due to voice packet loss, while still greatly profiting from a pitch prediction or Long-Term Prediction (LTP), is achieved by limiting or reducing a pitch gain only for the first subframe or the first two subframes within a speech frame. The method is used for a voiced speech class; a pitch cycle length is compared to a subframe size to decide to reduce the pitch gain for the first subframe or the first two subframes within the frame. Speech coding quality loss due to the pitch gain reduction is compensated by increasing a bit rate of a second excitation component or adding one more stage of excitation component only for the first subframe or the first two subframes within the speech frame.12-20-2012
20110320193SPEECH ENCODING DEVICE, SPEECH DECODING DEVICE, SPEECH ENCODING METHOD, AND SPEECH DECODING METHOD - Provided is a speech encoding device that is capable of performing encoding in an extension encoder even when the core encoder and core decoder of each layer have been interchanged, and that is also capable of performing high precision encoding by using the appropriate codec for each situation. The speech encoding device (12-29-2011
20110320192GATEWAY APPARATUS AND METHOD AND COMMUNICATION SYSTEM - A gateway apparatus receives a call control signal and/or a packet with voice data stored therein in a predetermined protocol from a packet transfer apparatus on a mobile high-speed network and converts the received protocol into a circuit-switched protocol used when an RNC connects to a circuit switching equipment on a mobile circuit-switched network, for output to the circuit switching equipment The gateway apparatus, on receipt of a call process signal and/or a voice signal, from the circuit switching equipment, converts the received protocol for output to the packet transfer apparatus.12-29-2011
20130013299METHOD AND APPARATUS FOR DEVELOPMENT, DEPLOYMENT, AND MAINTENANCE OF A VOICE SOFTWARE APPLICATION FOR DISTRIBUTION TO ONE OR MORE CONSUMERS - A system for developing, deploying and maintaining a voice application over a communications network to one or more recipients has a voice application server connected to a data network for storing and serving voice applications, a network communications server connected to the data network and to the communications network for routing the voice applications to their intended recipients, a computer station connected to the data network having control access to at least the voice application server, and a software application running on the computer station for creating applications and managing their states. The system is characterized in that a developer operating the software application from the computer station creates voice applications through object modeling and linking, stores them for deployment in the application server, and manages deployment and state of deployed applications including scheduled deployment and repeat deployments in terms of intended recipients.01-10-2013
20130013298METHODS AND APPARATUS FOR GENERATING, UPDATING AND DISTRIBUTING SPEECH RECOGNITION MODELS - Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability. Voice dialing, telephone control and/or other services are provided by the speech processing facility in response to speech recognition results.01-10-2013
20130013297MESSAGE SERVICE METHOD USING SPEECH RECOGNITION - A message service method using speech recognition includes a message server recognizing a speech transmitted from a transmission terminal, generating and transmitting a recognition result of the speech and N-best results based on a confusion network to the transmission terminal; if a message is selected through the recognition result and the N-best results and an evaluation result according to accuracy of the message are decided, the transmission terminal transmitting the message and the evaluation result to a reception terminal; and the reception terminal displaying the message and the evaluation result.01-10-2013
20130018654METHOD AND APPARATUS FOR ENABLING PLAYBACK OF AD HOC CONVERSATIONSAANM Toebes; John A.AACI CaryAAST NCAACO USAAGP Toebes; John A. Cary NC US - In one embodiment, a method includes monitoring activity in an environment, and storing a snippet of the monitored activity. Monitoring the activity in the environment includes operating a device arranged to capture the activity between approximately a first time and approximately a second time. The snippet has a particular duration that is arranged to end at approximately the second time;. The method also includes storing the snippet in a storage module and determining when a request to provide the snippet is obtained from a party. If it is determined that the request to play the snippet is obtained, the method includes accessing the storage module to obtain the snippet and providing the snippet to the party if it is determined that the request to provide the snippet is obtained.01-17-2013
20110161075REAL-TIME VOICE RECOGNITION ON A HANDHELD DEVICE - A method and apparatus for implementation of real-time speech recognition using a handheld computing apparatus are provided. The handheld computing apparatus receives an audio signal, such as a user's voice. The handheld computing apparatus ultimately transmits the voice data to a remote or distal computing device with greater processing power and operating a speech recognition software application. The speech recognition software application processes the signal and outputs a set of instructions for implementation either by the computing device or the handheld apparatus. The instructions can include a variety of items including instructing the presentation of a textual representation of dictation, or a function or command to be executed by the handheld device (such as linking to a website, opening a file, cutting, pasting, saving, or other file menu type functionalities), or by the computing device itself.06-30-2011
20110161074REMOTE CONFERENCING CENTER - Certain embodiments disclosed herein relate to systems and methods for recording audio and video. In particular, in one embodiment, a method of recording audio signals is provided. The method includes recording audio signals with a plurality of distributed audio transducers to create multiple recordings of the audio signals and providing each of the multiple recordings of the audio signals to a computing device. The computing device combines each of the multiple recordings into a master recording and determines a source for each audio signal in the master recording. Additionally, the computing device stores each audio signal in separate audio files according to the determined source of each audio signal.06-30-2011
20130024189MOBILE TERMINAL AND DISPLAY METHOD THEREOF - A mobile terminal and a control method thereof are provided. The mobile terminal includes: an audio output module; a memory storing text; and a controller configured to convert at least a portion of the text into a speech and output the speech through the audio output module, wherein the controller stores at least a portion of speech data obtained by converting the at least a portion of the text into the speech in the memory, and outputs the speech based on the stored speech data to the audio output module when a speech output signal with respect to the at least portion of the text is obtained. When speech output signal with respect to a portion which has been output by speech is obtained, speech is output based on the stored speech data, thereby shortening time required for outputting the speech.01-24-2013
20130024188Real-Time Encoding Technique - A system for encoding an audio signal includes an audio console configured to receive a voice audio signal contained within a first audio spectrum, encode the voice audio signal with a background audio signal contained within a second audio spectrum wider than the first audio spectrum, encode the voice audio signal with a monitoring code and output a combined signal including the voice audio signal encoded with the background audio signal and the monitoring code. The combined signal is contained within an audio spectrum including the first audio spectrum and the second audio spectrum.01-24-2013
20130173259Method and Apparatus for Processing Audio Frames to Transition Between Different Codecs07-04-2013
20130173260DEVICE FOR ASSESSING ACCURACY OF STATEMENTS AND METHOD OF OPERATION - A device receives voice and/or data from a speaker, such as a politician, and presents a signal indicative of the accuracy of the speaker's statements. The device maybe a mobile device, such as a smart phone, or a fixed device, such as a TV set. The device compares a speaker segment, automatically selected from the speaker statement, with a factual segment, automatically selected from a database comprising stored facts, and presents the accuracy of the speaker statement to the user of the device. The device may be configured so that the user may manually select the speaker segment to be assessed by the device.07-04-2013
20080255828DATA COMMUNICATION VIA A VOICE CHANNEL OF A WIRELESS COMMUNICATION NETWORK USING DISCONTINUITIES - A system and method for data communication over a cellular communications network that allows the transmission of digital data over a voice channel using a vocoder that operates in different modes depending upon characteristics of the inputted signal it receives. To prepare the digital data for transmission, one or more carrier signals are encoded with the digital data using one of a number of modulation schemes that utilize differential phase shift keying to give the modulated carrier signal certain periodicity and energy characteristics that allow it to be transmitted by the vocoder at full rate. The modulation schemes include DPSK using either a single or multiple frequency carriers, combined FSK-DPSK modulation, combined ASK-DPSK, PSK with a phase tracker in the demodulator, as well as continuous signal modulation (ASK or FSK) with inserted discontinuities that can be independent of the digital data.10-16-2008
20080243491Modulation Device, Modulation Method, Demodulation Device, and Demodulation Method - A modulation device including: a modulation unit for modulating a carrier in an audible sound range by an encoded transmission signal to generate a modulated signal; a masker sound generation unit for generating a masker signal outputted as a masker sound for making the modulated signal harder to hear when transmitted with the modulated signal; and an acoustic signal generation unit for inserting the masker signal in the modulated signal to generate an acoustic signal.10-02-2008
20080243489Multiple stream decoder - A method is provided for decoding data streams in a voice communication system. The method includes: receiving two or more data streams having voice data encoded therein; decoding each data stream into a set of speech coding parameters; forming a set of combined speech coding parameters by combining the sets of decoded speech coding parameters, where speech coding parameters of a given type are combined with speech coding parameters of the same type; and inputting the set of combined speech coding parameters into a speech synthesizer.10-02-2008
20080221876Method for processing audio data into a condensed version - Recorded audio data is compressed to obtain a condensed version, by first selecting a number of subsequent non-overlapping segments of the audio data, then reducing each segment by temporal compression and combining the reduced segments into a shortened version which can be output. The temporal compression may be made with a local compression factor which varies between the segments. The segmenting may be chosen based on an innovation signal derived from the audio data itself to indicate a content change rate in the audio data.09-11-2008
20130179156QR DATA PROXY AND PROTOCOL GATEWAY - A quick response (QR) proxy and protocol gateway for interfacing with a carrier network, a QR-equipped device, and a contact center and contact center database is disclosed. A data link is connected to a carrier network to receive QR codes and other data. Additional data links are connected to a contact center database and a QR-equipped device to obtain information used in determining routing and tagging instructions. A user interface is connected to the gateway to accept configurable conditions for determining routing instructions. There is a text conversion function and speech conversion function for each target enterprise contact center. Synchronization between stored user preferences to automated or semi-automated customer service routes is provided by a consumer preference template system.07-11-2013
20130179157COMPUTER, INTERNET AND TELECOMMUNICATIONS BASED NETWORK - A method and apparatus for a computer and telecommunication network which can receive, send and manage information from or to a subscriber of the network, based on the subscriber's configuration. The network is made up of at least one cluster containing voice servers which allow for telephony, speech recognition, text-to-speech and conferencing functions, and is accessible by the subscriber through standard telephone connections or through internet connections. The network also utilizes a database and file server allowing the subscriber to maintain and manage certain contact lists and administrative information. A web server is also connected to the cluster thereby allowing access to all functions through internet connections.07-11-2013
20130144610ACTION GENERATION BASED ON VOICE DATA - An automated technique is disclosed for processing audio data and generating one or more actions in response thereto. In particular embodiments, the audio data can be obtained during a phone conversation and post-call actions can be provided to the user with contextually relevant entry points for completion by an associated application. Audio transcription services available on a remote server can be leveraged. The entry points can be generated based on keyword recognition in the transcription and passed to the application in the form of parameters.06-06-2013
20130103392APPARATUS AND METHOD OF REPRODUCING AUDIO DATA USING LOW POWER - A method and apparatus for reproducing audio data using low power are provided. The apparatus may reproduce the audio data by determining a power mode based on a memory resource of an internal memory, and an amount of a memory required for reproducing the audio data, controlling a power based on the determined power mode, and decoding the audio data.04-25-2013
20130103393Multi-point sound mixing and distant view presentation method, apparatus and system - The disclosure provides a multi-point sound mixing and distant view presentation method, apparatus and system, wherein the multi-point sound mixing and distant view presentation method includes: receiving audio code streams from a plurality of meeting places, wherein each meeting place comprises one or more meeting sections, and each meeting section corresponds to one audio code stream; mixing the audio code streams of the meeting sections which have a corresponding relationship among the plurality of meeting places; and outputting mixed audio code streams to the meeting sections which have the corresponding relationship among the plurality of meeting places. Sounds in different sections of the distant view presentation conference system can be distinguished by technical solutions provided by the disclosure.04-25-2013
20130124197MULTI-LAYERED SPEECH RECOGNITION APPARATUS AND METHOD - A multi-layered speech recognition apparatus and method, the apparatus includes a client checking whether the client recognizes the speech using a characteristic of speech to be recognized and recognizing the speech or transmitting the characteristic of the speech according to a checked result; and first through N-th servers, wherein the first server checks whether the first server recognizes the speech using the characteristic of the speech transmitted from the client, and recognizes the speech or transmits the characteristic according to a checked result, and wherein an n-th (2≦n≦N) server checks whether the n-th server recognizes the speech using the characteristic of the speech transmitted from an (n−1)-th server, and recognizes the speech or transmits the characteristic according to a checked result.05-16-2013
20130124198METHOD AND APPARATUS FOR PROCESSING AUDIO SIGNALS - A method of pre-processing an audio signal transmitted to a user terminal via a communication network and an apparatus using the method are provided. The method of pre-processing the audio signal may prevent deterioration of a sound quality of the audio signal transmitted to the user terminal by pre-processing the audio signal, and by enabling a codec module, encoding the audio signal, to determine the audio signal as a speech signal. Also, the method of pre-processing the audio signal may improve a probability that the codec module may determine a corresponding audio signal as a speech when the audio signal is transmitted via the communication network by pre-processing the audio signal using a speech codec.05-16-2013
20130132074METHOD AND SYSTEM FOR REPRODUCING AND DISTRIBUTING SOUND SOURCE OF ELECTRONIC TERMINAL - There is provided a method of reproducing and distributing a sound source of en electronic terminal. The method includes a step of starting to simultaneously reproduce a stream of an MR (Music Recorded) sound source file and a stream of an AR (All Recorded) sound source file that a voice is recorded to be added to the MR sound source file by a reproducing unit of the electronic terminal, and outputting one stream of the streams through an output unit; and a step of controlling the reproducing unit to stop the output of the one stream that is currently being output through the output unit and to output the other stream through the output unit by a reproducing switch unit of the electronic terminal based on a selection of a user while the stream of the MR sound source file and the stream of the AR sound source file are reproduced, respectively.05-23-2013
20100280822STEREO SOUND DECODING APPARATUS, STEREO SOUND ENCODING APPARATUS AND LOST-FRAME COMPENSATING METHOD - A stereo sound decoding apparatus wherein lost-frame compensation performance has been improved to enhance the quality of decoded sounds. In this stereo sound decoding apparatus, a sound decoding part (11-04-2010
20080208570Methods and Apparatus for Blind Separation of Multichannel Convolutive Mixtures in the Frequency Domain - A method and apparatus performing blind source separation using frequency-domain normalized multichannel blind deconvolution. Multichannel mixed signals are frames of N samples including r consecutive blocks of M samples. The frames are separated using separating filters in frequency domain in an overlap-save manner by discrete Fourier transform (DFT). The separated signals are then converted back into time domain using inverse DFT applied to a nonlinear function. Cross-power spectra between separated signals and nonlinear-transformed signals are computed and normalized by power spectra of both separated signals and nonlinear-transformed signals to have flat spectra. Time domain constraint is then applied to preserve first L cross-correlations. These alias-free normalized cross-power spectra are further constrained by nonholonomic constraints. Then, natural gradient is computed by convolving alias-free normalized cross-power spectra with separating filters. After the separating filters length is constrained to L, the separating filters are updated using the natural gradient and normalized to have unit norm. Terminating conditions are checked to determine if separating filters converged.08-28-2008
20080201136Apparatus and Method for Speech Recognition - A speech recognition apparatus includes a first storing unit configured to store a first acoustic model invariable regardless of speaker and environment, a second storing unit configured to store a classification model that has shared parameters and non-shared parameters with the first acoustic model to classify second acoustic models, a recognizing unit configured to calculate a first likelihood with regard to the input speech by applying the first acoustic model to the input speech and obtain calculation result on the shared parameter and a plurality of candidate words that have relatively large values as the first likelihood, and a calculating unit configured to calculate a second likelihood for each of the groups with regard to the input speech by use of the calculation result on the shared parameters and the non-shared parameters of the classification model.08-21-2008
20080201135Spoken Dialog System and Method - A spoken dialog system stores a history of dialog states in a memory, outputs a system response in a current dialog state, inputs a user utterance, performs speech recognition of the user utterance, to obtain one or a plurality of recognition candidates of the user utterance and likelihoods thereof with respect to the user utterance, calculates a degree of state conformance of each of the current and the preceding dialog states stored in the memory with respect to the user utterance, selects one of the current and the preceding dialog states and one of the recognition candidates based on a combination of the degree of state conformance of each dialog state and the likelihood of each recognition candidate, and performs transition from the current dialog state to a new dialog state based on dialog state selected and recognition candidate selected.08-21-2008
20100286980METHOD AND APPARATUS FOR SPEECH CODING - A method and apparatus for prediction in a speech-coding system extends a 111-11-2010
20110246187SPEECH SIGNAL PROCESSING - A speech signal processing system comprises an audio processor (10-06-2011
20130151243VOICE MODULATION APPARATUS AND VOICE MODULATION METHOD USING THE SAME - A voice modulation apparatus is provided. The voice modulation apparatus includes an audio signal input unit which receives an audio signal from an external source; an extraction unit which extracts property information relating to a voice from the audio signal; a storage unit which stores the extracted property information; a control unit which modulates a target voice based on the extracted property information; and an output unit which outputs the modulated target voice.06-13-2013
20130151242Method to Select Active Channels in Audio Mixing for Multi-Party Teleconferencing - An apparatus comprising an ingress port configured to receive a signal comprising a plurality of encoded audio signals corresponding to a plurality of sources; and a processor coupled to the ingress port and configured to calculate a parameter for each of the plurality of encoded audio signals, wherein each parameter is calculated without decoding any of the encoded audio signals, select some, but not all, of the plurality of encoded audio signals according to the parameter for each of the encoded audio signals, decode the selected signals to generate a plurality of decoded audio signals, and combine the plurality of decoded audio signals into a first audio signal.06-13-2013
20130158988GRACEFUL DEGRADATION FOR COMMUNICATION SERVICES OVER WIRED AND WIRELESS NETWORKS - A method for gracefully extending the range and/or capacity of voice communication systems is disclosed. The method involves the persistent storage of voice media on a communication device. When the usable bit rate on the network is poor and below that necessary for conducting a live conversation, voice media is transmitted and received by the communication device at the available usable bit rate on the network. Although latency may be introduced, the persistent storage of both transmitted and received media of a conversation provides the ability to extend the useful range of wireless networks beyond what is required for live conversations. In addition, the capacity and robustness in not being affected by external interferences for both wired and wireless communications is improved.06-20-2013
20110295596DIGITAL VOICE RECORDING DEVICE WITH MARKING FUNCTION AND METHOD THEREOF - A digital voice recording device includes a storage unit, a display unit, and a processing unit. The processing unit includes a recording module, a storing module, a marking module, and a playing module. The recording module converts audio into digital signals, and records the digital signals into an audio file. Each audio file is associated with a document including textual content of the audio file. The storing module stores the audio file and the document. The display module displays the document. The marking module creates a plurality of flags for the audio file. Each flag is associated with a time point in the audio file, and is assigned an identifier. The playing module identifies an identifier of a flag to acquire a time point in response to a user input, and begin playing the audio file from the acquired time point.12-01-2011
20120029911METHOD AND SYSTEM FOR DISTRIBUTED AUDIO TRANSCODING IN PEER-TO-PEER SYSTEMS - A method for streaming audio data in a network, the audio data having a sequence of samples, includes encoding the sequence of samples into a plurality of coded base bitstreams, generating a plurality of enhancement streams, and transmitting the coded base bitstreams and the enhancement bitstreams to a receiver for decoding. Each of the enhancement bitstreams is generated from one of a plurality of non-overlapping portions of the sequence of samples.02-02-2012
20130197902SYSTEM, METHOD AND COMPUTER PROGRAM FOR SHARING AUDIBLE WORD TAGS - The invention provides a system, method and computer program for sharing audible word tags. The word may be an individual's name or information conveyed through series of words. An audible word tag may be recorded by an individual. The audible word tag may be embedded in electronic correspondence and/or documents for sharing with others, or accessed dynamically via the internet and/or other applicable network connectivity on an as-required basis. The method includes generating a profile for associating one or more words to an audible word tag. An audio recording is made of the one or more words. The audio recording is linked to the profile. The audible word tag is linked to one or more electronic correspondence or print. The audible word tag is accessible by a receiver of the correspondence to initiate the playback of the audio recording.08-01-2013
20130197903RECORDING SYSTEM, METHOD, AND DEVICE - An exemplary recording method receives the personal information of a speaker transmitted from a RFID tag through a RFID reader. Then the method receives the voice of the speaker through a microphone. The method next receives the personal information of the speaker and the identifier of the audio input device transmitted from the audio input device, and associates the personal information of the speaker with the received identifier of the audio input device. Then, the method receives the voice and the identifier of the audio input device transmitted from the audio input device. The method further converts the received voice to text. The method determines the personal information corresponding to the identifier of the audio input device received with the voice, and associates the converted text with the determined personal information to generate a record.08-01-2013

Patent applications in class For storage or transmission

Patent applications in all subclasses For storage or transmission