Application

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application number	Description	Number of patent applications / Date published
704275000	Speech controlled system	636
704270100	Speech assisted network	167
704273000	Security system	55
704271000	Handicap aid	33
704276000	Pattern display	30
704272000	Novelty item	29
704278000	Sound editing	29
704274000	Warning/alarm system	19
704277000	Translation	17

Document	Title	Date
Entries
20080221895	Method and Apparatus for Processing Audio for Playback - A method and apparatus for processing audio for playback to provide a smooth transition between a beginning region of an audio track and an end region of a previous audio track is disclosed. A quantity representative of a chromagram is calculated for each of the audio tracks and the mixing points for the beginning and end regions of each audio track are identified. A quantity representative of a chromagram at the mixing point of the beginning region of the audio track and a quantity representative of a chromagram at the mixing point of the end region of the previous audio track are correlated to determine an order of audio tracks for playback and/or to determine the duration of the mix transition.	09-11-2008
20080228488	Computer-Implemented Voice Application Indexing WEB Site - A computer-implemented voice application indexing method and system for supplying voice applications that provide telephony services to users. The method and system include receiving voice application data over the network regarding the voice applications. The voice application data contains location data to indicate where the voice applications are located on the network. The voice application data are stored in a database in accordance with a predetermined voice application taxonomy. A request is received for a voice application based upon a user requesting a telephony service. The request includes search criteria for selecting a voice application from the database. The location data of at least one voice application (whose stored voice application information substantially satisfies the search criteria) is retrieved from the database. The voice application located at the retrieved location data is used to perform the user-requested telephony service.	09-18-2008
20080235026	VOICE ACTIVATED DISTANCE MEASURING DEVICE - A voice activated device for annunciating a message indicative of a distance of the device spaced from another location is disclosed. The device comprises a voice sensor for receiving a voice command requesting annunciation of a message indicative of the distance of the device spaced from the other location, converting circuitry coupled to the voice sensor for converting the received voice command to a corresponding electrical command, determining circuitry responsive to the electrical command for determining the distance of the device from the other location, and a speaker coupled to the determining circuitry for annunciating the message indicative of the determined distance of the device from the other location. The device may be used for informing a golfer of the golfer's distance from the pin.	09-25-2008
20080243512	Method of and System For Classification of an Audio Signal - The invention describes a method of classifying an audio input signal (	10-02-2008
20080243513	Apparatus And Method For Controlling Output Format Of Information - An apparatus for controlling the output format of information is provided. The apparatus includes a communications unit configured to receive information intended for at least one recipient. The apparatus also includes a selection unit, which is configured to automatically detect, based on the at least one recipient, an externally-specified indication of a preferred form of output selected from amongst a plurality of available forms of output. The selection unit causes the information to be outputted in the preferred form of output. A method and a computer program product are also provided for controlling the output format of information.	10-02-2008
20080243514	NATURAL ERROR HANDLING IN SPEECH RECOGNITION - A user interface, and associated techniques, that permit a fast and efficient way of correcting speech recognition errors, or of diminishing their impact. The user may correct mistakes in a natural way, essentially by repeating the information that was incorrectly recognized previously. Such a mechanism closely approximates what human-to-human dialogue would be in similar circumstances. Such a system fully takes advantage of all the information provided by the user, and on its own estimates the quality of the recognition in order to determine the correct sequence of words in the fewest number of steps.	10-02-2008
20080249777	Method And System For Control Of An Application - The invention describes a dialog management system and method for control of an application (A	10-09-2008
20080249778	Communications Using Different Modalities - Communications between users of different modalities are enabled by a single integrated platform that allows both the input of voice (from a telephone, for example) to be realized as text (such as an interactive text message) and allows the input of text (from the interactive text messaging application, for example) to be realized as voice (on the telephone). Real-time communication may be enabled between any permutation of any number of text devices (desktop, PDA, mobile telephone) and voice devices (mobile telephone, regular telephone, etc.). A call to a text device user may be initiated by a voice device user or vice versa.	10-09-2008
20080249779	SPEECH DIALOG SYSTEM - A speech dialog system includes a signal input unit that receives an acoustic input signal. A voice activity detector compares a portion of the received signal to a noise estimate to determine if the signal includes voice activity. A speech recognizer processes signals containing voice activity to determine if the signal contains speech. An output unit modifies signals when output of the system substantially coincides with the delivered speech.	10-09-2008
20080249780	VOICE GUIDANCE SYSTEM FOR VEHICLE - A voice guidance system for a vehicle includes a transmitter, a tuner, a touch sensor, a smart ECU, a D-seat speaker, and a P-seat speaker, which are all mounted in a vehicle. It is used for an in-vehicle system, such as a smart entry system, which performs intercommunication with a portable unit. In this guidance system, a smart ECU stores in a memory information indicating that a user has performed predetermined operation with the smart entry system. When it is determined that a user will use the smart entry system, the following processing is performed: voice guidance about the operation procedures for the system is outputted from a driver seat speaker or a passenger seat speaker when information indicating that the user has performed the predetermined operation in the past is not stored in the memory; and voice guidance is disabled when information indicating that the user has performed the predetermined operation is stored.	10-09-2008
20080255846	METHOD OF PROVIDING LANGUAGE OBJECTS BY INDENTIFYING AN OCCUPATION OF A USER OF A HANDHELD ELECTRONIC DEVICE AND A HANDHELD ELECTRONIC DEVICE INCORPORATING THE SAME - The disclosed and claimed concept relates generally to handheld electronic devices and, more particularly, to a method of providing language objects by identifying an occupation of a user of a handheld electronic device and a handheld electronic device incorporating the same. A method and apparatus of providing language objects by identifying an occupation of a user of a handheld electronic device includes the following steps: identifying the occupation of the user of the handheld electronic device from a number of occupations; detecting a text input; and displaying at least a portion of at least a first language object that is associated with the identified occupation and that corresponds to the text input.	10-16-2008
20080270141	VIRTUAL VOCAL DYNAMICS IN WRITTEN EXCHANGE - The illustrative embodiments described herein provide a computer implemented method and computer program product for providing context in an electronic text communication. A biometric gathering input device is associated with a sending data processing system. A first set of metrics is identified based on a sender interacting with the biometric gathering input device. A sending communications process on the sending data processing system is calibrated based on the first set of metrics. During the generation of the electronic text communication, a portion of the first set of metrics is identified based on the sender interacting with the biometric gathering input device to form a second set of metrics. The second set of metrics and the electronic text communication are sent from the sending data processing system to a recipient data processing system. The second set of metrics is represented at the recipient data processing system using criteria selected by a recipient of the electronic text communication.	10-30-2008
20080281598	METHOD AND SYSTEM FOR PROMPT CONSTRUCTION FOR SELECTION FROM A LIST OF ACOUSTICALLY CONFUSABLE ITEMS IN SPOKEN DIALOG SYSTEMS - A method (and system) of determining confusable list items and resolving this confusion in a spoken dialog system includes receiving user input, processing the user input and determining if a list of items needs to be played back to the user, retrieving the list to be played back to the user, identifying acoustic confusions between items on the list, changing the items on the list as necessary to remove the acoustic confusions, and playing unambiguous list items back to the user.	11-13-2008
20080281599	PROCESSING AUDIO DATA - A method of processing audio data including obtaining (	11-13-2008
20080300883	Projection Apparatus with Speech Indication and Control Method Thereof - A projection apparatus with speech indication and a control method thereof are provided. The projection apparatus comprises a storage unit, a transmission interface, a process unit, and an output unit. The storage unit is configured to store a plurality of speech data. The transmission interface is configured to connect to an external apparatus for accessing the storage unit. The process unit is configured to select at least one of the speech data according to the present state of the projection apparatus. The output unit is configured to output the selected speech datum to broadcast the speech indication.	12-04-2008
20080306739	SOUND SOURCE SEPARATION SYSTEM - A system capable of separating sound source signals with high precision while improving a convergence rate and convergence precision. A process of updating a current separation matrix W	12-11-2008
20080312932	ERROR MANAGEMENT IN AN AUDIO PROCESSING SYSTEM - An audio processing system includes a voice decoder and an audio processor. In one exemplary embodiment, the audio processing system is embedded in a headset unit that is wirelessly coupled to a game console. The voice decoder is used to decode a stream of incoming voice data packets carried over a wireless signal. The decoded voice data packets are used to drive an audio transducer of the headset unit. Upon detection of an error in the incoming stream, a decoded error-free voice data packet that has been stored in a replay buffer is used to generate an amplitude scaled audio signal. The voice decoder is disconnected from the audio transducer and the scaled audio signal is used to drive the audio transducer instead.	12-18-2008
20080319756	Electronic Device and Method for Determining a Mixing Parameter - The method of determining a parameter for mixing a first content item (X	12-25-2008
20090012794	System For Giving Intelligibility Feedback To A Speaker - System for giving intelligibility feedback to a speaker (	01-08-2009
20090018840	AUTOMATED SPEECH RECOGNITION (ASR) TILING - Techniques are described related to tiles of automatic speech recognition data. In an implementation, automated speech recognition (ASR) data is obtained. The ASR data is divided into a plurality of tiles based on an approximate amount of data to be included in each tile. Each of the tiles is a geographic partition of the ASR data.	01-15-2009
20090030693	AUTOMATED NEAR-END DISTORTION DETECTION FOR VOICE COMMUNICATION SYSTEMS - In one embodiment, a method for providing voice quality assurance is provided. The method determines voice information for an end point in a voice communication system. The voice information may be from an ingress microphone. The method determines if the voice quality is considered degraded based on an analysis of the voice information. For example, the voice information may indicate that it is distorted, too loud, too soft, is subject to an external noise, etc. Feedback information is determined if the voice quality is considered degraded where the feedback information designed to improve voice quality at an ingress point for a user speaking. The feedback information is then outputted at the end point to the user using the end point.	01-29-2009
20090043586	Detecting a Physiological State Based on Speech - A computer-implemented method identifies a spoken audio signal representing speech of a person and estimates a physiological state of the person based on the spoken audio signal. For example, the method may identify articulatory patterns (such as landmarks) in the speech and estimate the person's physiological state based on those articulatory patterns. The method may estimate, for example, the amount of time the person has been without sleep. The method may produce the physiological state estimate without performing speech recognition on the spoken audio signal. The method may produce the physiological state estimate in real-time.	02-12-2009
20090048845	APPARATUS, SYSTEM, AND METHOD FOR VOICE CHAT TRANSCRIPTION - An apparatus, system, and method to transcribe a voice chat session initiated from a text chat session. The system includes a chat server, a voice server, and a transcription engine. The chat server is configured to facilitate a text chat session between multiple instant messaging clients. The voice server is coupled to the chat server and configured to facilitate a transition from the text chat session to a voice chat session between the multiple instant messaging clients. The transcription engine is coupled to the voice server and configured to generate a voice transcription of the voice chat session. The voice transcription may be aggregated into a text chat history.	02-19-2009
20090055189	Automatic Replacement of Objectionable Audio Content From Audio Signals - A method, apparatus and system are provided for the automatic replacement of potentially objectionable audio content from an audio signal in real time. In one embodiment of the present invention, the selective filtering of objectionable audio content from an audio signal is accomplished by first marking objectionable audio content in the audio signal with filtering information that identifies the type of objectionable audio content (e.g., crude language, ethnic and racial slurs, cursing, strong profanity) and storing the filtering information and the corresponding location of the objectionable audio content for that particular audio signal. Objectionable audio content having filtering information corresponding to a stored replacement content code determined from a predetermined replacement setting is then automatically replaced with an audio clip corresponding to the replacement setting.	02-26-2009
20090055190	EMOTIVE ENGINE AND METHOD FOR GENERATING A SIMULATED EMOTION FOR AN INFORMATION SYSTEM - Information about a device may be emotively conveyed to a user of the device. Input indicative of an operating state of the device may be received. The input may be transformed into data representing a simulated emotional state. Data representing an avatar that expresses the simulated emotional state may be generated and displayed. A query from the user regarding the simulated emotional state expressed by the avatar may be received. The query may be responded to.	02-26-2009
20090063157	APPARATUS AND METHOD OF GENERATING INFORMATION ON RELATIONSHIP BETWEEN CHARACTERS IN CONTENT - A method of generating information on relationships between characters of a content includes dividing a text extracted from the content into one or more predetermined units, determining one or more dominant relationships between characters of the content by comparing the divided units with relationship keyword information in which keywords contained in categories are defined, wherein the categories represent one or more relationships between the characters, and generating information on the relationships between the characters in accordance with the determined dominant relationships.	03-05-2009
20090083038	MOBILE RADIO TERMINAL, SPEECH CONVERSION METHOD AND PROGRAM FOR THE SAME - The mobile radio terminal includes a speech input unit which inputs a speech signal obtained from speech of a speaking person, an estimating unit which estimates a speech style of the speaking person from the speech signal, and a converting unit which converts the speech signal into a converted speech signal in accordance with the estimated speech style.	03-26-2009
20090089062	PUBLIC SPEAKING SELF-EVALUATION TOOL - A public speaking self-evaluation tool that helps a user practice public speaking in terms of avoiding undesirable words or sounds, maintaining a desirable speech rhythm, and ensuring that the user is regularly glancing at the audience. The system provides a user interface through which the user is able to define the undesirable words or sounds that are to be avoided, as well as a maximum frequency of occurrence threshold to be used for providing warning signals based on detection of such filler or undesirable words or sounds. The user interface allows a user to define a speech rhythm, e.g. in terms of spoken syllables per minute, that is another maximum threshold for providing a visual warning indication. The disclosed system also provides a visual indication when the user fails to glance at the audience at least as often as defined by a predefined minimum threshold.	04-02-2009
20090089063	VOICE CONVERSION METHOD AND SYSTEM - A method, system and computer program product for voice conversion. The method includes performing speech analysis on the speech of a source speaker to achieve speech information; performing spectral conversion based on said speech information, to at least achieve a first spectrum similar to the speech of a target speaker; performing unit selection on the speech of said target speaker at least using said first spectrum as a target; replacing at least part of said first spectrum with the spectrum of the selected target speaker's speech unit; and performing speech reconstruction at least based on the replaced spectrum.	04-02-2009
20090099847	Template constrained posterior probability - Detailed herein is a technology which, among other things, reduces errors introduced in recording and transcription data. In one approach to this technology, a method of detecting audio transcription errors is utilized. This method includes selected a focus unit, and selecting a context template corresponding to the focus unit. A hypothesis set is then determined, with reference to the context template and the focus unit. A probability is calculated corresponding to the focus unit, across the hypothesis set.	04-16-2009
20090112598	SYSTEM AND METHOD FOR APPLYING PROBABILITY DISTRIBUTION MODELS TO DIALOG SYSTEMS IN THE TROUBLESHOOTING DOMAIN - Disclosed herein are systems, methods, and computer-readable media for troubleshooting based on a probability distribution model. The method for troubleshooting based on a probability distribution model includes establishing a speech-based channel of interaction, establishing at least one non-speech-based channel of interaction, maintaining a probability distribution over time for each of a plurality of component variables describing the state of the product or service and state of the conversation, and troubleshooting a product or service by responding based on the probability distribution.	04-30-2009
20090112599	MULTI-STATE BARGE-IN MODELS FOR SPOKEN DIALOG SYSTEMS - Disclosed are systems, methods and computer readable media for applying a multi-state barge-in acoustic model in a spoken dialogue system comprising the steps of (1) presenting a prompt to a user from the spoken dialog system. (2) receiving an audio speech input from the user during the presentation of the prompt, (3) accumulating the audio speech input from the user, (4) applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, (5) applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, (6) determining whether the audio speech input is a barge-in-speech input from the user, and (7) if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.	04-30-2009
20090132254	DIAGNOSTIC REPORT BASED ON QUALITY OF USER'S REPORT DICTATION - A system and method are provided for automatically routing a diagnostic interpretation from diagnostic data received from a diagnostic source. The diagnostic interpretation is produced automatically using a voice recognition system. Along with the transcription of the interpretation, the voice recognition system returns a level of confidence of the voice recognition. Based on this level of confidence, the system automatically routes the transcribed interpretation to the appropriate destination for further processing.	05-21-2009
20090150158	Portable Networked Picting Device - A portable picting device automatically converts an audio signal from a microphone into a digital data stream, parses a series of words from the digital data stream, and detects any words that match tags in a tag/image database. An image corresponding to the matching tag(s) is then retrieved and transmitted to a display. The images may be stored on a remote network, such as the Internet. In the illustrative embodiment the display is integrated into an article of clothing such as a shirt.	06-11-2009
20090157410	Speech Translating System - Disclosed is a speech translating system for translating speech from a first language to a language selected from a set of second languages. The system includes an input unit, a processor, and an output unit. The input unit is capable of receiving the speech in the first language. The processor is operatively coupled to the input unit and is capable of converting the speech in the first language to the speech in the selected language. The output unit is operatively coupled to the processor. The output unit is capable of outputting the speech in the selected language.	06-18-2009
20090192798	METHOD AND SYSTEM FOR CAPABILITIES LEARNING - A method for task execution improvement, the method includes: generating a baseline model for executing a task; recording a user executing a task; comparing the baseline model to the user's execution of the task; and providing feedback to the user based on the differences in the user's execution and the baseline model.	07-30-2009
20090192799	Breathing Apparatus Speech Enhancement - Speech enhancement in a breathing apparatus is provided using a primary sensor mounted near a breathing mask user's mouth, at least one reference sensor mounted near a noise source, and a processor that combines the signals from these sensors to produce an output signal with an enhanced speech component. The reference sensor signal may be filtered and the result may be subtracted from the primary sensor signal to produce the output signal with an enhanced speech component. A method for detecting the exclusive presence of a low air alarm noise may be used to determine when to update the filter. A triple filter adaptive noise cancellation method may provide improved performance through reduction of filter maladaptation. The speech enhancement techniques may be employed as part of a communication system or a speech recognition system.	07-30-2009
20090204406	SYSTEM AND METHODS FOR DETECTING DECEPTION AS TO FLUENCY OR OTHER ABILITY IN A GIVEN LANGUAGE - The invention relates to a system and methods for detecting when a user is representing he is not fluent in a language in which he is fluent. The present system and methods are adapted to be used in conjunction with conventional and novel computer systems and methods and provides detection of concealment of language skills by a user.	08-13-2009
20090276221	Method and System for Processing Channel B Data for AMR and/or WAMR - A method and system for processing channel B data for AMR and/or WAMR may include generating one or more channel B data hypotheses for a present speech frame, if channel A data has a valid CRC and channel B data is unacceptable. Channel B data may be unacceptable, for example, due to high residual bit error rate and/or low Viterbi metric. Speech hypotheses may also be generated for the present speech frame, where each speech hypothesis may be based on a corresponding channel B data hypothesis and channel A data. A speech constraint metric may be assigned to each speech hypothesis that is compared to a previous frame speech data. The speech hypothesis that is closest to the previous frame speech data may be selected as a present speech data. The speech constraint metric may, for example, measure gain continuity and/or pitch continuity.	11-05-2009
20090276222	Method and system for incorporating one or more inputs and defining and applying a plurality of rules for the different stages of speech and video analytics systems - A method and system for improving the context and accuracy of speech and video analytics searches by incorporating one or more inputs and defining and applying a plurality of rules for the different stages of said speech and video analytics system searches.	11-05-2009
20090299748	MULTIPLE AUDIO FILE PROCESSING METHOD AND SYSTEM - An audio file generation method and system. A computing system receives a first audio file comprising first speech data associated with a first party. The computing system receives a second audio file comprising second speech data associated with a second party. The first audio file differs from the second audio file. The computing system generates a third audio file from the second audio file. The third audio file differs from the second audio file. The process to generate the third audio file includes identifying a first set of attributes missing from the second audio file and adding the first set of attributes to the second audio file. The process to generate the third audio file additionally includes removing a second set of attributes from the second audio file. The third audio file includes third speech data associated with the second party. The computing system broadcasts the third audio file.	12-03-2009
20090299749	PRE-PROCESSED ANNOTATION OF STREET GRAMMAR IN SPEECH ENABLED NAVIGATION SYSTEMS - Embodiments of the present invention address deficiencies of the art in respect to virtualization and provide a novel and non-obvious method, system and computer program product for annotation of street grammar in speech enabled navigation devices. In an embodiment of the invention, a pre-processing street grammar annotation system can be provided. The system can include an annotated street grammar storage that contains street root names wherein each street root name has more than one street suffix associated with said street root name, and a street annotation pre-processor wherein the street annotation pre-processor contains logic enabled to annotate a set of street suffixes to a street root name prior to processing a voice input in a speech enabled navigation device, wherein the street root name has more than one street suffix associated with said street root name.	12-03-2009
20090299750	Voice/Music Determining Apparatus, Voice/Music Determination Method, and Voice/Music Determination Program - According to one embodiment, various characteristic parameters for determining whether an input audio signal is a voice signal or a music signal are calculated and the calculated characteristic parameters are compared with a threshold value for voice determination and a threshold value for music determination. A voice characteristic score is provided to a characteristic parameter indicating voice and a music characteristic score is provided to a characteristic parameter indicating music. Then, based on a difference between a sum total of voice characteristic scores and a sum total of music characteristic scores, it is determined whether the input audio signal is a voice signal or a music signal.	12-03-2009
20090306989	VOICE INPUT SUPPORT DEVICE, METHOD THEREOF, PROGRAM THEREOF, RECORDING MEDIUM CONTAINING THE PROGRAM, AND NAVIGATION DEVICE - A setting-item selector calculates probability of a name of a setting item to match a voice based on a conversion database and an audio signal, and retrieves and notifies the setting item in a manner corresponding to the probability. The related-item selector retrieves setting-item information corresponding to the setting item inputted through an input operation by a user based on a setting-item database, and retrieves a name of a related other setting item based on coincidence of set-content information and operation-content information of the setting-item information. A notification controller notifies a combination of related setting items.	12-10-2009
20090326952	SPEECH PROCESSING METHOD, SPEECH PROCESSING PROGRAM, AND SPEECH PROCESSING DEVICE - [Problems] To convert a signal of non-audible murmur obtained through an in-vivo conduction microphone into a signal of a speech that is recognizable for (hardly misrecognized by) a receiving person with maximum accuracy.	12-31-2009
20100017211	Method for the construction of a cross-linked system - Disclosed is a method for constructing a cross-linked system whose topology of components creates a network, especially a method for creating predetermined functional units, such as cell types and tissues as well as biological and/or physical components that are based thereupon, by developing the cross-linking of the system in a self-organizing manner. The inventive method is characterized by the following steps: a) the network is represented by graph; b) edges of the said graph are provided with markings which are formed such that the graph can be unambiguously assigned to a minimal automaton; c) the automaton is described by a formal grammar representing a system of equations whose solution are defined in text form. The approach to obtain the solutions of the system of equations describes a way to construct the system, while transducers insert the components into the network in order to entirely construct the system.	01-21-2010
20100030562	SOUND DETERMINATION DEVICE, SOUND DETECTION DEVICE, AND SOUND DETERMINATION METHOD - A sound determination device (	02-04-2010
20100036667	VOICE ASSISTANT SYSTEM - Methods and apparatuses to assist a user in the performance of a plurality of tasks are provided. The method may comprise storing at least one care plan in a voice assistant, the care plan defining a plurality of tasks to be performed, capturing speech input from the user, determining, from the speech input, a selected interaction with a care plan, and in response to the selected interaction, providing a speech dialog with the user reflective of the care plan. Alternatively, the method may comprise capturing speech input from a user, determining from the speech input, a first weight associated with a resident, associating the first weight with a care plan in turn associated with the resident, comparing the first weight to a second weight associated with the resident and the care plan, and providing a speech dialog regarding reweighting the resident based on the comparison.	02-11-2010
20100036668	METHOD AND APPARATUS FOR IMPROVED DETECTION OF RATE ERRORS IN VARIABLE RATE RECEIVERS - A system and method for detection of rate determination algorithm errors in variable rate communications system receivers. The disclosed embodiments prevent rate determination algorithm errors from causing audible artifacts such as screeches or beeps. The disclosed system and method detects frames with incorrectly determined data rates and performs frame erasure processing and/or memory state clean up to prevent propagation of distortion across multiple frames. Frames with incorrectly determined data rates are detected by checking illegal rate transitions, reserved bits, validating unused filter type bit combinations and analyzing relationships between fixed code-book gains and linear prediction coefficient gains.	02-11-2010
20100042411	Automatic Creation of Audio Files - A method of building an audio description of a particular product of a class of products includes providing a plurality of human voice recordings, wherein each of the human voice recordings includes audio corresponding to an attribute value common to many of the products. The method also includes automatically obtaining attribute values of the particular product, wherein the attribute values reside electronically. The method also includes automatically applying a plurality of rules for selecting a subset of the human voice recordings that correspond to the obtained attribute values and automatically stitching the selected subset of human voice recordings together to provide a voiceover product description of the particular product. A similar method is used to build an audio description of a particular process.	02-18-2010
20100042412	SKIPPING RADIO/TELEVISION PROGRAM SEGMENTS - Techniques for notifying at least one entity of an occurrence of an event in an audio signal are provided. At least one preference is obtained from the at least one entity. An occurrence of an event in the audio signal is determined. The event is related to at least one of at least one speaker and at least one topic. The at least one entity is notified of the occurrence of the event in the audio signal, in accordance with the at least one preference.	02-18-2010
20100042413	Voice Activated Application Service Architecture and Delivery - A system and method for retrieving distributed content responsive to voice data are disclosed. Voice data is transmitted from a source client device to media server which applies a mixing table to route the voice data to one or more destinations described by the mixing table. The media server also analyzes the received voice data for one or more events. Responsive to detecting an event, the media server communicates with an application server, which modifies the mixing table so that subsequent data is also routed to a media generator which analyzes voice data received after detection of the event for a command. The media generator communicates with the application server to retrieve data from a user data source, such as a website, associated with a detected command. The media generator produces an audio representation of the retrieved data which is communicated to the source client device via the media server.	02-18-2010
20100049524	Method And Apparatus For Providing Search Capability And Targeted Advertising For Audio, Image And Video Content Over The Internet - The present invention provides an apparatus and method for extracting the content of a video, image, and/or audio file or podcast, analyzing the content, and then providing a targeted advertisement, search capability and/or other functionality based on the content of the file or podcast.	02-25-2010
20100094633	VOICE ANALYSIS DEVICE, VOICE ANALYSIS METHOD, VOICE ANALYSIS PROGRAM, AND SYSTEM INTEGRATION CIRCUIT - A sound analysis device comprises: a sound parameter calculation unit operable to acquire an audio signal and calculate a sound parameter for each of partial audio signals, the partial audio signals each being the acquired audio signal in a unit of time; a category determination unit operable to determine, from among a plurality of environmental sound categories, which environmental sound category each of the partial audio signals belongs to, based on a corresponding one of the calculated sound parameters; a section setting unit operable to sequentially set judgement target sections on a time axis as time elapses, each of the judgment target sections including two or more of the units of time, the two or more of the units of time being consecutive; and an environment judgment unit operable to judge, based on a number of partial audio signals in each environmental sound category determined in at least a most recent judgment target section, an environment that surrounds the sound analysis device in at least the most recent judgment target section.	04-15-2010
20100094634	METHOD AND APPARATUS FOR CREATING FACE CHARACTER BASED ON VOICE - An apparatus and method of creating a face character which corresponds to a voice of a user is provided. To create various facial expressions with fewer key models, a face character is divided in a plurality of areas and a voice sample is parameterized corresponding to pronunciation and emotion. If the user's voice is input, a face character image corresponding to divided face areas is synthesized using key models and data about parameters corresponding to the voice sample to synthesize an overall face character image using the synthesized face character image corresponding to the divided face areas.	04-15-2010
20100100386	Noise Variance Estimator for Speech Enhancement - A speech enhancement method operative for devices having limited available memory is described. The method is appropriate for very noisy environments and is capable of estimating the relative strengths of speech and noise components during both the presence as well as the absence of speech.	04-22-2010
20100100387	Method and Apparatus for Dynamic Voice Response Messages - A computing device implemented method, apparatus, and computer program product to generate dynamic voice response messages in a mobile computing device. In response to receiving an incoming call from a caller, the process displays a list of response messages in a set of response messages. In response to receiving a selection of a response message from the list of response messages, the process sends the selected response message to the caller.	04-22-2010
20100131277	Device, Method, and Program for Performing Interaction Between User and Machine - There is provided a device for performing interaction between a user and a machine. The device includes a plurality of domains corresponding to a plurality of stages in the interaction. Each of the domains has voice comprehension means which understands the content of the user's voice. The device includes: means for recognizing the user's voice; means for selecting a domain enabling the best voice comprehension result as the domain; means for referencing task knowledge of the domain and extracting a task	05-27-2010
20100131278	Stereo to Mono Conversion for Voice Conferencing - Stereo to mono voice conferencing conversion is performed during a voice conference. Conferencing equipment receives audio for right and left channels and filters each of the channels into a plurality of bands. For each band of each channel, the equipment determines an energy level and compares each energy level for each band of the right channel to each energy level for each corresponding band of the left channel. Based on the comparison, the equipment determines which channel has more audio resulting from speech. Based on the determination, the equipment adjusts delivery of the audio from the right and left channels to a mono channel for transmission to endpoints only capable of mono audio in the voice conference.	05-27-2010
20100145708	SYSTEM AND METHOD FOR IDENTIFYING ORIGINAL MUSIC - We disclose useful components of a method and system that allow identification of music from the song or sound using only the sound of the audio being played. A system built using the method and device components disclosed processes inputs sent from a mobile phone over a telephone or data connection, though inputs might be sent through any variety of computers, communications equipment, or consumer audio devices over any of their associated audio or data networks.	06-10-2010
20100211394	METHOD FOR DETERMINING A STRESS STATE OF A PERSON ACCORDING TO A VOICE AND A DEVICE FOR CARRYING OUT SAID METHOD - The invention relates to the field of methods and devices for analyzing of psychophysiological reactions of a person to verbal tests. The invented device (	08-19-2010
20100211395	Method and System for Speech Intelligibility Measurement of an Audio Transmission System - Method and processing system for measuring the intelligibility of a degraded output signal (Y(t)) from an audio transmission system (	08-19-2010
20100217602	Combined Mirror and Presentation Medium Capable of Speech Recognition - The present invention relates, in general, to a combined mirror and presentation medium, which allows at least one of various presentation bodies to be inserted thereinto, acts as a mirror such that the inside thereof cannot be seen at normal times, and enables the display of an inserted presentation body to the outside at the time of illumination of the inside of the presentation medium. The object thereof is to provide a combined mirror and presentation medium capable of speech recognition, which provides the functions of a mirror and a picture frame using a reflection plate, thus maximizing the functionality thereof, which enables various types of control by automatically controlling a presentation medium, capable of displaying a presentation body, using a user's speech signals, thus facilitating switching between the function of a mirror and the function of a decoration or information transfer medium, and which allows database information to be represented in response to speech signals from the user, thus efficiently performing combined functions.	08-26-2010
20100235170	BIOFEEDBACK SYSTEM FOR CORRECTION OF NASALITY - A system is described for providing biofeedback to hearing-impaired persons as to the degree of nasalization of vowel-like sounds in their speech, in order to monitor their own nasality and thus correct inappropriate nasalization. In a preferred embodiment, this feedback uses tactile vibration, with the vibration amplitude reflecting the nasalance of the speech.	09-16-2010
20100286987	APPARATUS AND METHOD FOR GENERATING AVATAR BASED VIDEO MESSAGE - An apparatus and method for generating an avatar based video message are provided. The apparatus and method are capable of generating an avatar based video message based on speech of a user. The avatar based video message apparatus and method displays information that corresponds to input user speech. The avatar based video message apparatus and method edits the input user speech according to a user input signal with reference to the displayed information, generates avatar animation according to the edited speech, and generates an avatar based video message based on the edited speech and the avatar animation.	11-11-2010
20100318365	Method and Apparatus for Configuring Web-based data for Distribution to Users Accessing a Voice Portal System - In a system for developing and deploying a voice application using Web-based data as source data over a communications network to one or more recipients, a method for organizing, editing, and prioritizing the Web-based data before dialog creation is provided. The method includes harvesting the Web-based data source in the form of its original structure; generating an object tree representing the logical structure and content type of the harvested, Web-based data source; manipulating the object tree generated to a desired hierarchal structure and content; creating a voice application template in VXML and populating the template with the manipulated object tree; and creating a voice application capable of accessing the Web-based data source according to the constraints of the template.	12-16-2010
20100324908	Learning Playbot - An enhanced chatbot, programed to learn from human-computer conversational exchanges. The process of learning automatically creates an expanded and updated statement/response database from input provided by users engaged in interactions with the chatbot.	12-23-2010
20100324909	METHOD AND SYSTEM FOR PROCESSING MESSAGES WITHIN THE FRAMEWORK OF AN INTEGRATED MESSAGE SYSTEM - A method and system for processing messages within the framework of an integrated message system. Recipients of messages in an integrated messaging system are provided with an authentic impression of the received message. In a first step, a message received within the framework of an integrated messaging system is automatically translated. Language detection and dictation system is provided. The message contents of the incoming message as well as its segments and parameters are simultaneously utilized to generate additional information regarding the sender and the information, which is suitable to give the recipient an impression of the received message in the most authentic form possible.	12-23-2010
20100332233	Battery Management System And Method - A battery-management method is performed by a battery-operated device. The method includes allocating a first portion of a battery capacity to a first function and a second portion of the battery capacity to a second function. The method further includes simultaneously displaying a first indicator relating to the first portion of the battery capacity and a second indicator relating to the second portion of the battery capacity.	12-30-2010
20110022392	INFORMATION PROCESSING SYSTEM AND INFORMATION PROCESSING METHOD - A framework is provided which performs location-based analysis using an individual feature such as a stress level obtained based on biological information. An information processing system includes an acquisition unit which acquires frequency power information of a voice inputted at a mobile terminal having a voice communication function, and position information of a base station device that relayed voice communication of the mobile terminal when the voice was inputted; a storage unit which stores the acquired frequency power information and the acquired position information in association with each other; an acceptance unit which accepts designation of an area; and an output unit which identifies the position information related to the designated area, acquires the frequency power information associated with the identified position information with reference to the storage unit, obtains a stress level of a user of the mobile terminal in the designated area based on frequency power information of a frequency greater than or equal to a threshold value within the acquired frequency power information, and outputs the stress level in association with the designated area.	01-27-2011
20110022393	MULTIMODE USER INTERFACE OF A DRIVER ASSISTANCE SYSTEM FOR INPUTTING AND PRESENTATION OF INFORMATION - In a method for multimode information input and/or adaptation of the display of a display and control device, input signals of different modality are detected which are supplied via the device to a voice recognition unit, thus initiating a desired function and/or display as an output signal, which are displayed on the device and/or output by voice output. Touch and/or gesture input signals are provided on or to the device for selection of an object intended for interaction and activation of the voice recognition unit and for the vocabulary which is provided for interaction to be restricted with the selection of the object and/or activation of the voice recognition unit as a function of the selected object, on the basis of which a voice command from the restricted vocabulary is added to the selected object as an information input and/or for adaptation of the display, via the voice recognition unit.	01-27-2011
20110022394	Visual similarity - Methods and apparatus, including computer program products, for visual similarity. A method includes receiving a stream of video content, generating interpretations of the received video content using speech/natural language processing (NLP), associating the interpretations of the received video content with images extracted from video content based on timeline, and using the interpretations to obtain interpretations of other images or other video content.	01-27-2011
20110022395	Machine for Emotion Detection (MED) in a communications device - A system and method monitors the emotional content of human voice signals after the signals have been compressed by standard telecommunication equipment. By analyzing voice signals after compression and decompression, less information is processed, saving power and reducing the amount of equipment used. During conversation, a user of the disclosed methodology may obtain information in various formats regarding the emotional state of the other party. The user may then view the veracity, composure, and stress level of the other party. The user may also view the emotional content of their own transmitted speech.	01-27-2011
20110029314	Food Processor with Recognition Ability of Emotion-Related Information and Emotional Signals - A food processor with recognition ability of emotion-related information and emotional signals is disclosed, which comprises: an emotion recognition module and a food processing module. The emotion recognition module is capable of receiving sound signals so as to identify an emotion containing in the received sound signals. The food processing module is capable of producing food products with a taste corresponding to the emotion recognition result of the emotion recognition module.	02-03-2011
20110035224	SYSTEM AND METHOD FOR ADDRESS RECOGNITION AND CORRECTION - A system, method, and computer-readable medium for parcel address recognition. A method includes receiving an address input and producing candidate address results corresponding to the address input. The method includes receiving operational scheme knowledge describing the mode of operation of a parcel processing system, and receiving at least one operational rule corresponding to the operational scheme knowledge. The method includes applying the at least one operational rule to the candidate address results and producing and storing a finalized result according to the operational rule and the candidate address results.	02-10-2011
20110040563	Voice Control Device and Voice Control Method and Display Device - A voice control device for a display device includes a voice receiver for receiving a voice signal, a voice recognition unit coupled to the voice receiver for recognizing the voice signal to generate a recognition result, a function decision unit coupled to the voice recognition unit for selecting an operating function from a plurality of operating functions according to the recognition result, and an execution unit coupled to the function decision unit for controlling the display device to perform the operating function.	02-17-2011
20110046959	Substituting or Replacing Components in Sound Based on Steganographic Encoding - The present disclosure relates to various methods and systems to provide substitute sound (e.g., audio). One claim includes an apparatus comprising: electronic memory for storing identifying information obtained from steganographically encoded sound; an electronic processor programmed for: providing the identifying information to a remote computer, the remote computer including substitute sound corresponding to the identifying information; providing format information to the remote computer, the format information identifying a format in which the substitute sound should be formatted prior to communication of the substitute sound; and controlling receipt of substitute sound corresponding to the identifying information. Of course, other apparatus, methods and combinations are provided as well.	02-24-2011
20110046960	Multi-Channel Interactive Self-Help Application Platform and Method - An interactive voice response (IVR) platform running a voice application for use with a voice client is extended to support text messaging clients and other clients of other media types on other channels. An application-to-text messaging interface interfaces with text messaging clients via a text messaging protocol transport and interfaces with the IVR via an API. It includes a user/application manager to handle user and application accounts and a state/session manager to handle state information required by the text messaging operations and to handle sessions maintained by the IVR. Text modules are implemented having text synthesis and text recognition with a dictionary/grammar. These allow voice-specific application scripts to be interpreted in a text channel. The extended multi-channel platform supports an open source text messaging network and also through a transport gateways to other types of text messaging clients.	02-24-2011
20110054904	ELECTRONIC SHOPPING ASSISTANT WITH SUBVOCAL CAPABILITY - A mobile device suitable for use by a user in a store includes a subvocal message (SVM) module to detect an SVM from the user. The SVM includes data that indicates an item in the store. A transmitter transmits a request after detecting the SVM. The request includes information indicating the item. A receiver receives a reply. The reply includes information responsive to the request. An output device provides the responsive information to the user. The request may include a request for item position information, item price information, or item inventory information. The mobile device may detect the SVM via a subvocal sensor coupled to the user. The subvocal sensor may be in contact with the user in proximity to a vocal cord of the user. The subvocal sensor may be connected to the mobile device wirelessly or via a wire.	03-03-2011
20110054905	VOICE INTERACTIVE SERVICE SYSTEM AND METHOD FOR PROVIDING DIFFERENT SPEECH-BASED SERVICES - A voice interactive service system provides different speech-based services to a plurality of users. Using a communication terminal, the services are accessed via a telecommunication network through service-specific connectivity ports. The system comprises processing cores which have different configurations of speech processing resources for performing different services. For performing a requested service, a connection module establishes a connection between the respective connectivity port and a processing core having a configuration of speech processing resources suitable for performing the requested service. Because of the service-specific resourcing of cores, there is no need for requesting and allocating processing resources from external resource servers. Moreover, the port-dedicated resourcing of the cores ensures that a successful access to a connectivity port leads to a successful provision of the requested service.	03-03-2011
20110060591	ISSUING ALERTS TO CONTENTS OF INTEREST OF A CONFERENCE - A method, system, and computer program product for issuing an alert in response to detecting a content of interest in a conference. A listening logic comprising multiple conference engines monitors speakers, topics, and words spoken during a conference. A speech-to-text engine monitors the conference and records a transcription. A word emphasis engine monitors the transcription for key words. A voice identification engine monitors the live conversation and the recorded transcript, in real time, for a particular individual to begin speaking. An outline engine may create an outline of transcription. The listening device may issue an alert upon detecting a content of interest in the conference. The listening device may additionally display an outline or a selected portion of the transcript regarding a particular content of interest to inform a user of the listening device of a portion of content of the conference that may have been missed.	03-10-2011
20110071837	Audio Signal Correction Apparatus and Audio Signal Correction Method - According to one embodiment, an audio signal correction apparatus has a characteristic extraction module configured to determine whether an input audio signal is a monaural signal or a stereo signal, on the basis of channel information, and to extract a plurality of characteristic parameters for determining whether the input audio signal is a speech signal or a music signal, a signal type determination module configured to calculate a speech/music discrimination score which indicates whether the input audio signal is close to the speech signal or the music signal, on the basis of the plurality of characteristic parameters and a level calculation module configured to calculate, with use of the speech/music discrimination score, output levels of a degree of speech and a degree of music.	03-24-2011
20110071838	SYSTEM AND METHODS FOR RECOGNIZING SOUND AND MUSIC SIGNALS IN HIGH NOISE AND DISTORTION - A method for recognizing an audio sample locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints. Landmarks occur at reproducible locations within the file, while fingerprints represent features of the signal at or near the landmark timepoints. To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database. For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed. If a large number of corresponding landmarks are linearly related, i.e., if equivalent fingerprints of the sample and retrieved file have the same time evolution, then the file is identified with the sample. The method can be used for any type of sound or music, and is particularly effective for audio signals subject to linear and nonlinear distortion such as background noise, compression artifacts, or transmission dropouts. The sample can be identified in a time proportional to the logarithm of the number of entries in the database; given sufficient computational power, recognition can be performed in nearly real time as the sound is being sampled.	03-24-2011
20110077946	DERIVING GEOGRAPHIC DISTRIBUTION OF PHYSIOLOGICAL OR PSYCHOLOGICAL CONDITIONS OF HUMAN SPEAKERS WHILE PRESERVING PERSONAL PRIVACY - A method including: obtaining, via a plurality of communication devices, a plurality of speech signals respectively associated with human speakers, the speech signals including verbal components and non-verbal components; identifying a plurality of geographical locations, each geographic location associated with a respective one of the plurality of the communication devices; extracting the non-verbal components from the obtained speech signals; deducing physiological or psychological conditions of the human speakers by analyzing, over a specified period, the extracted non-verbal components, using predefined relations between characteristics of the non-verbal components and physiological or psychological conditions of the human speakers; and providing a geographical distribution of the deduced physiological or psychological conditions of the human speakers by associating the deduced physiological or psychological conditions of the human speakers with geographical locations thereof.	03-31-2011
20110093273	System And Method For Determining The Active Talkers In A Video Conference - The present invention describes a method of determining the active talker for display on a video conferencing system, including the steps of: for each participant, capturing audio data using an audio capture sensor and video data using a video capture sensor; determining the probability of active speech (p	04-21-2011
20110099015	USER ATTRIBUTE DERIVATION AND UPDATE FOR NETWORK/PEER ASSISTED SPEECH CODING - Systems, methods and apparatuses are described for deriving and updating user attribute information about users of a communications system. A communications network is then used to transfer the user attribute information to communication terminals, which use the user attribute information to configure a speech codec to operate in a speaker-dependent manner during a communication session, thereby improving speech coding efficiency. In a network-assisted model, the user attribute information is stored on the communications network and selectively transmitted to the communication terminals while in a peer-assisted model, the user attribute information is derived by and transferred between communication terminals.	04-28-2011
20110099016	Multi-Tenant Self-Service VXML Portal - A multi-tenant voice extensible markup language (VXML) voice system includes a voice portal connected to at least one telephony network; a voice application server integrated with the voice portal; and a multi-tenant configuration application integrated with the voice application server, the configuration application accessible to the tenants from a data packet network.	04-28-2011
20110106539	Audio and Video Signal Processing - The present disclosure related generally to audio and video signal processing. Various arrangements are disclosed. One method recites: (a) obtaining data representing audible portions of audio or representing picture portions of video; (b) using a programmed electronic processor, determining identifying information from the obtained data by computing a frequency transform to produce frequency transform data, and processing the frequency transform data to derive a pattern, and using the pattern as the identifying information for the audio or video; and (c) using the identifying data to facilitate purchase or license of the audio or video. Other arrangements are disclosed as well.	05-05-2011
20110125502	Method of putting identification codes in a document - A method of putting identification codes in a document is disclosed. The method adds a speech-purpose print code in a document such that an OID pen can emit sound after the OID pen reads the speech-purpose print code. The software program first acquires the position of each word in the document and then automatically puts a speech-purpose print code corresponding to each word in the position of each word so that a user can rapidly generate a document with speech-purpose codes.	05-26-2011
20110131047	Steganography in Digital Signal Encoders - In a method for embedding steganographic information into the signal information of a signal encoder, a solution is to be created, which enables steganographic information being embedded into the signal information of a signal encoder such that a reduction of the voice quality is largely avoided. This is achieved by means of providing data information, particularly voice information, selecting steganographic information from a quantity of steganographic information, generating a code word from a code book provided by means of the signal encoder on the basis of the code elements forming the code book such that with the use of the code word generated within the scope of a transmission standard associated with the code book the data information is encoded into signal information containing the code word and/or making reference to the code word; and by the code word generated having an additional feature that can be calculated on the basis of the code elements forming the code word, wherein the additional feature represents the steganographic information.	06-02-2011
20110131048	SYSTEM AND METHOD FOR AUTOMATICALLY GENERATING A DIALOG MANAGER - Disclosed herein are systems, methods, and computer-readable storage media for automatically generating a dialog manager for use in a spoken dialog system. A system practicing the method receives a set of user interactions having features, identifies an initial policy, evaluates all of the features in a linear evaluation step of the algorithm to identify a set of most important features, performs a cubic policy improvement step on the identified set of most important features, repeats the previous two steps one or more times, and generates a dialog manager for use in a spoken dialog system based on the resulting policy and/or set of most important features. Evaluating all of the features can include estimating a weight for each feature which indicates how much each feature contributes to at least one of the identified policies. The system can ignore features not in the set of most important features.	06-02-2011
20110131049	Method and Apparatus for Providing a Framework for Efficient Scanning and Session Establishment - A method of providing a framework for efficient scanning and session establishment may include receiving vocabulary independent property information indicative of a property request and corresponding setting information of an application associated with a device capable of communication with a network communication environment, determining capabilities of the network communication environment relative to the received property information, and enabling generation of a selected scan function having selected scan parameters based at least in part on the determined capabilities and the property information. A corresponding apparatus and computer program product are also provided.	06-02-2011
20110137656	SOUND CLASSIFICATION SYSTEM FOR HEARING AIDS - A hearing aid includes a sound classification module to classify environmental sound sensed by a microphone. The sound classification module executes an advanced sound classification algorithm. The hearing aid then processes the sound according to the classification.	06-09-2011
20110144998	EMBEDDER FOR EMBEDDING A WATERMARK INTO AN INFORMATION REPRESENTATION, DETECTOR FOR DETECTING A WATERMARK IN AN INFORMATION REPRESENTATION, METHOD AND COMPUTER PROGRAM - An embedder for embedding a watermark to be embedded into an input information representation comprises an embedding parameter determiner that is implemented to apply a derivation function once or several times to an initial value to obtain an embedding parameter for embedding the watermark into the input information representation. Further, the embedder comprises a watermark adder that is implemented to provide the input information representation with the watermark using the embedding parameter. The embedder is implemented to select how many times the derivation function is to be applied to the initial value.	06-16-2011
20110153331	Method for Generating Voice Signal in E-Books and an E-Book Reader - The present invention provides a method for generating voice signal in electronic books (E-books). The method includes the steps of: receiving a voice signal in response to a triggering signal for placing a bookmark; and displaying a functional icon of the bookmark corresponding to the voice signal in a region of the E-book. The present invention also provides a E-book reader, including: a display unit, a receiver unit, and a processing unit, wherein the receiver unit receivers a voice signal in response to a triggering signal for placing a bookmark, and the processing unit is used to display a functional icon of the bookmark corresponding to the voice signal in a region of the E-book.	06-23-2011
20110161086	Orchestrated Encoding and Decoding - Orchestrated encoding schemes facilitate encoding and decoding of data in content signals at several points in the distribution path of content items. Orchestrated encoding adheres to a set of encoding rules that enables multiple watermarks and corresponding applications to co-exist, avoids collisions among watermarks, and simplifies metadata and routing database infrastructure.	06-30-2011
20110166862	SYSTEM AND METHOD FOR VARIABLE AUTOMATED RESPONSE TO REMOTE VERBAL INPUT AT A MOBILE DEVICE - A method and system for altering an operational mode of evaluating and responding to verbal input from a user to a mobile device if conditions make such evaluation incompatible with a favorable user experience. Automated speech recognition (ASR) evaluation of verbal input may be performed on a mobile platform to continue a flow of the user experience. Evaluation of the verbal input may continue at a backend when conditions allow for transmission of recorded input to the backend.	07-07-2011
20110173003	SYSTEM AND METHOD FOR DETERMINING A PERSONAL SHG PROFILE BY VOICE ANALYSIS - According to one embodiment of the present invention a computerized voice-analysis device for determining an S,H,G profile is provided (as described herein, such an S,H,G profile relates to the strengths (e.g., relative strengths) of three human instinctive drives). Of note, the present invention may be used for one or more of the following: analyzing a previously recorded voice sample; real-time analysis of voice as it is being spoken; combination voice analysis—that is, a combination of: (a) previously recorded and/or real-time voice; and (b) answers to a questionnaire.	07-14-2011
20110178803	DETECTING EMOTION IN VOICE SIGNALS IN A CALL CENTER - A computer system monitors a conversation between an agent and a customer. The system extracts a voice signal from the conversation and analyzes the voice signal to detect a voice characteristic of the customer. The system identifies an emotion corresponding to the voice characteristic and initiates an action based on the emotion. The action may include communicating the emotion to an emergency response team, or communicating feedback to a manager of the agent, as examples.	07-21-2011
20110196681	TRANSMISSION SYSTEM - The present invention provide a transmission system comprising a transmission apparatus for transmitting audio data of multi channels and auxiliary data required for playback of the audio data, and a receiving apparatus for receiving the audio data and the auxiliary data which are transmitted by the transmission apparatus. A multiplexer of the transmission apparatus creates block data that is composed of 8 frames, and first 1 byte of each frame is allocated to a header having Sync, OE and the like, the second byte is allocated to the auxiliary data including AUX data and copyright protect information, and remaining bytes are used to transmit the audio data. An encryptor carries out an encryption process for the second and later bytes of each frame, and a communication means outputs encrypted data. A communication means of the receiving apparatus receives the encrypted data from the transmission apparatus, a decoder decodes the encrypted data, and a demultiplexer demultiplexes the audio data and the auxiliary data. Therefore, this transmission system transmits multi-channel audio data of the DVD-Audio or the like efficiently on a transmission line using fixed length frames according to the MOST method, and takes measures for copyright protection of audio data.	08-11-2011
20110196682	Common Scene Based Conference System - Conference bridge (	08-11-2011
20110202348	RHYTHM PROCESSING AND FREQUENCY TRACKING IN GRADIENT FREQUENCY NONLINEAR OSCILLATOR NETWORKS - A method for mimicking the auditory system's response to rhythm of an input signal having a time varying structure comprising the steps of receiving a time varying input signal x(t) to a network of n nonlinear oscillators, each oscillator having a different natural frequency of oscillation and obeying a dynamical equation of the form	08-18-2011
20110202349	ESTABLISHING A MULTIMODAL ADVERTISING PERSONALITY FOR A SPONSOR OF A MULTIMODAL APPLICATION - Establishing a multimodal advertising personality for a sponsor of a multimodal application, including associating one or more vocal demeanors with a sponsor of a multimodal application and presenting a speech portion of the multimodal application for the sponsor using at least one of the vocal demeanors associated with the sponsor.	08-18-2011
20110213617	AUDIO SOURCE SYSTEM AND METHOD - A system includes a computer having a device driver. The device driver includes a detection module to detect an audio input. The device driver includes a selection module to send the audio input to audio hardware after detection of the audio input. The device driver also includes an emulation module to send hardware emulation information to an operating system audio application to replace feedback data received at the device driver from the audio hardware and sent from the device driver to the operating system audio application.	09-01-2011
20110224988	INTRACARDIAC ELECTROGRAM TIME FREQUENCY NOISE DETECTION - Systems, methods, and apparatus for identifying and classifying noise of an intracardiac electrogram of a cardiac rhythm management device to prevent inaccurate detection of a cardiac episode are disclosed. In an example, three channels are analyzed to identify and determine whether an episode or noise has been detected.	09-15-2011
20110238422	METHOD FOR SONIC DOCUMENT CLASSIFICATION - A method to identify and classify a document (	09-29-2011
20110238423	SONIC DOCUMENT CLASSIFICATION - An apparatus for classifying documents (	09-29-2011
20110246202	METHODS AND APPARATUS FOR AUDIO WATERMARKING A SUBSTANTIALLY SILENT MEDIA CONTENT PRESENTATION - Methods and apparatus for audio watermarking a substantially silent media content presentation are disclosed. An example method to audio watermark a media content presentation disclosed herein comprises obtaining a watermarked noise signal comprising a watermark and a noise signal having energy substantially concentrated in an audible frequency band, the watermarked noise signal attenuated to be substantially inaudible without combining with a separate audio signal, associating the watermarked noise signal with a substantially silent content component of the media content presentation, the media content presentation comprising one or more media content components, and outputting the watermarked noise signal during presentation of the substantially silent content component.	10-06-2011
20110246203	Dynamic Interactive Voice Interface - A dynamic voice user interface system is provided. The dynamic voice user interface system interacts with a user at a first level of formality. The voice user interface system then monitors history of user interaction and adjusts the voice user interface to interact with the user with a second level of formality based on the history of user interaction.	10-06-2011
20110251845	VOICE ACTIVITY DETECTOR, VOICE ACTIVITY DETECTION PROGRAM, AND PARAMETER ADJUSTING METHOD - Judgment result deriving means	10-13-2011
20110276333	Methods and Systems for Synchronizing Media - Systems and methods of synchronizing media are provided. A client device may be used to capture a sample of a media stream being rendered by a media rendering source. The client device sends the sample to a position identification module to determine a time offset indicating a position in the media stream corresponding to the sampling time of the sample, and optionally a timescale ratio indicating a speed at which the media stream is being rendered by the media rendering source based on a reference speed of the media stream. The client device calculates a real-time offset using a present time, a timestamp of the media sample, the time offset, and optionally the timescale ratio. The client device then renders a second media stream at a position corresponding to the real-time offset to be in synchrony to the media stream being rendered by the media rendering source.	11-10-2011
20110276334	Methods and Systems for Synchronizing Media - Systems and methods of synchronizing media are provided. A client device may be used to capture a sample of a media stream being rendered by a media rendering source. The client device sends the sample to a position identification module to determine a time offset indicating a position in the media stream corresponding to the sampling time of the sample, and optionally a timescale ratio indicating a speed at which the media stream is being rendered by the media rendering source based on a reference speed of the media stream. The client device calculates a real-time offset using a present time, a timestamp of the media sample, the time offset, and optionally the timescale ratio. The client device then renders a second media stream at a position corresponding to the real-time offset to be in synchrony to the media stream being rendered by the media rendering source.	11-10-2011
20110282669	Estimating a Listener's Ability To Understand a Speaker, Based on Comparisons of Their Styles of Speech - An automated telecommunication system adjunct is described that “listens” to one or more participants styles of speech, identifies specific characteristics that represent differences in their styles, notably the accent, but also one or more of pronunciation accuracy, speed, pitch, cadence, intonation, co-articulation, syllable emphasis, and syllable duration, and utilizes, for example, a mathematical model in which the independent measurable components of speech that can affect understandability by that specific listener are weighted appropriately and then combined into a single overall score that indicates the estimated ease with which the listener can understand what is being said, and presents real-time feedback to speakers based on the score. In addition, the system can provide recommendations to the speaker as to how improve understandability.	11-17-2011
20110282670	System for Dynamic AD Selection and Placement Within a Voice Application Accessed Through an Electronic Information Page - A system for dynamic advertisement selection and presentment within a speech application is provided. The system includes a user operable network browsing interface in communication with a server on a data network; at least one voice link to a voice application interface, the link or links accessible to the user working within the browsing interface; a pool of at least one advertisement for presentment; and a selection engine accessible to the voice application interface for receiving criteria originated from the server for advertisement ranking and for selecting an advertisement from the pool of at least one advertisement for placement based on the received criteria.	11-17-2011
20110295607	System and Method for Recognizing Emotional State from a Speech Signal - A computerized method, software, and system for recognizing emotions from a speech signal, wherein statistical and MFCC features are extracted from the speech signal, the MFCC features are sorted to provide a basis for comparison between the speech signal and reference samples, the statistical and MFCC features are compared between the speech signal and reference samples, a scoring system is used to compare relative correlation to different emotions, a probable emotional state is assigned to the speech signal based on the scoring system, and the probable emotional state is communicated to a user.	12-01-2011
20110301956	Information Processing Apparatus, Information Processing Method, and Program - An information processing apparatus includes an image analysis unit that executes a process for analyzing an image captured by a camera, a speech analysis unit that executes a process for analyzing speech input from a microphone, and a data processing unit that receives a result of the analysis conducted by the image analysis unit and a result of the analysis conducted by the speech analysis unit and that executes output control of help information for a user. The data processing unit calculates a degree of difficulty of the user on the basis of at least either the result of the image analysis or the result of the speech analysis and, if the degree of difficulty that has been calculated is equal to or more than a predetermined threshold value, executes a process for outputting help information to the user.	12-08-2011
20110313773	SEARCH APPARATUS, SEARCH METHOD, AND PROGRAM - A search apparatus includes a sound recognition unit which recognizes input sound, a user information estimation unit which estimates at least one of a physical condition and emotional demeanor of a speaker of the input sound based on the input sound and outputs user information representing the estimation result, a matching unit which performs matching between a search result target pronunciation symbol string and a recognition result pronunciation symbol string for each of plural search result target word strings, and a generation unit which generates a search result word string as a search result for a word string corresponding to the input sound from the plural search result target word strings based on the matching result. At least one of the matching unit and the generation unit changes processing in accordance with the user information.	12-22-2011
20110320208	Page identification method for audio book - A page identification method for audio book with a main housing, a plurality of pages, a plurality of light blocking panels, an audio record and playback electronic circuit including microphone, speaker, power source, record switch and playback switch, a microprocessor and a plurality of light sensing devices. The top surface of the main body has a plurality of apertures. Each light sensing device located directly under each main body aperture. Each page has one or more apertures that are aligned with at least one of the main body apertures. Each light blocking panel is interleaved between each page so that when the page is turned by the user the light blocking panel will slide over to cover or uncover the page aperture causing the light sensing devices to send a signal to the microprocessor that tells the audio circuit which message to play for each page.	12-29-2011
20120004915	CONVERSATIONAL SPEECH ANALYSIS METHOD, AND CONVERSATIONAL SPEECH ANALYZER - The invention provides a conversational speech analyzer which analyzes whether utterances in a meeting are of interest or concern. Frames are calculated using sound signals obtained from a microphone and a sensor, sensor signals are cut out for each frame, and by calculating the correlation between sensor signals for each frame, an interest level which represents the concern of an audience regarding utterances is calculated, and the meeting is analyzed.	01-05-2012
20120004916	SPEECH SIGNAL PROCESSING DEVICE - A speech signal processing device is equipped with a power acquisition unit, a probability distribution acquisition unit, and a correspondence degree determination unit. The power acquisition unit accepts an inputted speech signal and, based on the accepted speech signal, acquires power representing the intensity of a speech sound represented by the speech signal. The probability distribution acquisition unit acquires a probability distribution using the intensity of the power acquired by the power acquisition unit as a random variable. The correspondence degree determination unit determines whether a correspondence degree representing a degree that power acquired by the power acquisition unit in a case that a predetermined reference speech signal is inputted into the power acquisition unit corresponds with predetermined reference power is higher than a predetermined reference correspondence degree, based on the probability distribution acquired by the probability distribution acquisition unit.	01-05-2012
20120010889	VOICE INTERACTION METHOD OF MOBILE TERMINAL BASED ON VOICEXML AND MOBILE TERMINAL - The present invention discloses a voice interaction method of a mobile terminal based on VoiceXML and a mobile terminal, which comprises: converting received voice information into a VoiceXML document, parsing the VoiceXML document according to a preset VoiceXML document framework, searching the information of the function which needs to be realized by the voice information corresponding to the VoiceXML document; mapping found function information to the function corresponding to the particular function of the man-machine interface, and informing the mapped function to the man-machine interface; performing VoiceXML response document conversion on the response information from the man-machine interface, and playing the conversion result via a corresponding voice information. According to the technical solution of the present invention, the advanced intelligence and complex voice interaction can be realized, and the transportability of voice interaction is improved.	01-12-2012
20120016676	SYSTEM AND METHOD FOR WRITING DIGITS IN WORDS AND PRONUNCIATION OF NUMBERS, FRACTIONS, AND UNITS - Disclosed is a system and method for converting a digital number to text and for pronouncing the digital number. The system includes a filtration system for determining whether the digital number has nonnumeric symbols and for generating a filtrated number, an analyzing system for analyzing the filtrated number, a composition system configured to collect words associated with ternary units of the filtrated number, a linking system configured to link the words, and a pronouncing system for pronouncing the linked words.	01-19-2012
20120016677	Method and device for audio signal classification - The present invention discloses a method and a device for audio signal classification, and relates to the field of communications technologies, which solve a problem of high complexity of type classification of audio signals in the prior art. In the present invention, after an audio signal to be classified is received, a tonal characteristic parameter of the audio signal to be classified, where the tonal characteristic parameter of the audio signal to be classified is in at least one sub-band, is obtained, and a type of the audio signal to be classified is determined according to the obtained characteristic parameter. The present invention is mainly applied to an audio signal classification scenario, and implements audio signal classification through a relatively simple method.	01-19-2012
20120059658	METHODS AND APPARATUS FOR PERFORMING AN INTERNET SEARCH - Embodiments of the present invention relate to searching for content on the Internet. A user may supply a search query to a device, and the device may issue the search query to a plurality of search engines, including at least one general purpose search engine and at least one site-specific search engine. In this way, the user need not separately issue search queries to each of the plurality of search engines.	03-08-2012
20120065981	TEXT PRESENTATION APPARATUS, TEXT PRESENTATION METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a text presentation apparatus presenting text for a speaker to read aloud for voice recording includes: a text storing unit for storing first text; a presenting unit for presenting the first text; a determination unit for determining whether or not the first text needs to be replaced, on the basis of a speaker's input for the first text presented; a preliminary text storing unit for storing preliminary text; a select unit configured to select, if it is determined that the first text needs to be replaced, second text to replace the first text from among the preliminary text, the selecting being performed on the basis of attribute information describing an attribute of the first text and on the basis of at least one of attribute information describing pronunciation of the first text and attribute information describing a stress type of the first text; and a control unit configured to control the presenting unit so that the presenting unit presents the second text.	03-15-2012
20120078634	VOICE DIALOGUE SYSTEM, METHOD, AND PROGRAM - A voice dialogue system executing an operation by a voice dialogue with a user, includes a history storage unit storing an operation name of the operation executed by the voice dialogue system and an operation history corresponding to a number of execution times of the executed operation; a voice storage unit storing voice data corresponding to the operation name; a detection unit detecting a voice skip signal indicating skipping an user's voice input; an acquisition unit acquiring the operation name of the operation having a high priority based on the number of execution times from said history storage unit, when said detection unit detects the voice skip signal; and a generation unit reading the voice data corresponding to the acquired operation name from said voice storage unit, and generating a voice signal corresponding to the read voice data.	03-29-2012
20120089403	Postal Processing Including Voice Feedback - System, methods, and computer-readable media. A method includes receiving a voice input, from an operator, corresponding to a mail item. The method includes performing a voice recognition process on the voice input to produce spoken data, and producing a system result corresponding to the spoken data. The method includes analyzing the system result to produce feedback information, and audibly sounding the feedback information to the operator.	04-12-2012
20120109657	Narrative Voice Files for GPS Devices - The invention consists of a compilation of unified audio tour files in compressed format i.e. MP3 or MP4 that provides pre-recorded spoken commentary to Global Positioning System (GPS) enabled devices. Using satellite technology, audio is triggered based on a user's location, providing relevant facts, geography, points of interest, history, and trivia of every city/town/area as it is being traveled throughout the World. These audio tour files will be provided in multiple languages. Upgrades shall be available via the Internet. The invention will narrate the entire World beginning with the large metropolitan areas of the United States of America, through to the smallest towns in Malta.	05-03-2012
20120109658	VOICE-CONTROLLED POWER DEVICE - A voice-controlled power device includes a voice-controlled circuit which includes a power module for getting an AC input voltage, a switch module having an electromagnetic relay and a driving circuit, and a control module having a microcontroller, a power unit regulating an output voltage of the power module and providing a work voltage for the microcontroller, a voice-detecting unit receiving voice signals and transforming the voice signals into electric signals, and a voice-amplifier unit amplifying the electric signals. The microcontroller receives and analyzes the amplified electric signals and sends out control signals to drive the driving circuit to control switch states of the electromagnetic relay and further control whether there is power output to an external electric appliance or not. The switch states of the electromagnetic relay rest with whether the output voltage of the power module is provided thereon or not under the control of the driving circuit.	05-03-2012
20120116770	SPEECH DATA RETRIEVING AND PRESENTING DEVICE - A speech data retrieving and presenting device applied with an electronic device through a network includes a data receiving unit, a processing unit and a speech presenting unit. The data receiving unit connected to the network receives data of the electronic device through the network. The processing unit coupled to the data receiving unit receives speech data and retrieves a speech presenting signal from the speech data. The speech presenting unit coupled to the processing unit receives the speech presenting signal and outputs a speech according to the speech data. This device can assist a user to obtain network information, and provide the user a more flexible application according to the property that the device can be operated independently by a simple motion.	05-10-2012
20120116771	Method and apparatus for serching a music database - A method for a user to buy a song from a remote music source, the method comprising the steps of:	05-10-2012
20120116772	Method and System for Providing Speech Therapy Outside of Clinic - A system and method for speech therapy is provided that includes a mobile device, a server and a web-client. The mobile device captures and processes voice signals analyzed locally and on the server and from which a speech therapy is coordinated and delivered. The web-client through interaction with the mobile device and through the server implements a speech therapy that can be monitored and managed thereon through specified clinical moderation. The web-client also provides an alternative method to capture and transmit voice signals to the server for analysis and from which a speech therapy is coordinated and delivered. Speech therapy management can implement therapy procedures, guidelines and one-to-one communication sessions between users and providers in a non-clinical setting in real-time or at scheduled times. Other embodiments are disclosed.	05-10-2012
20120116773	CONTENT FILTERING FOR A DIGITAL AUDIO SIGNAL - According to some embodiments, content filtering is provided for a digital audio signal.	05-10-2012
20120116774	SYSTEM FOR VOICE CONTROL OF A MEDICAL IMPLANT - An implantable system (	05-10-2012
20120123783	SYSTEMS AND METHODS FOR EDITING TELECOM WEB APPLICATIONS THROUGH A VOICE INTERFACE - Systems and associated methods for editing telecom web applications through a voice interface are described. Systems and methods provide for editing telecom web applications over a connection, as for example accessed via a standard phone, using speech and/or DTMF inputs. The voice based editing includes exposing an editing interface to a user for a telecom web application that is editable, dynamically generating a voice-based interface for a given user for accomplishing editing tasks, and modifying the telecom web application to reflect the editing commands entered by the user.	05-17-2012
20120123784	SEQUENCED MULTI-MEANING TACTILE SYMBOLS USEABLE TO PRODUCE SYNTHETIC PLURAL WORD MESSAGES INCLUDING WORDS, PHRASES AND SENTENCES - An embodiment of the present application is directed to a method including providing a keyboard, including a plurality of keys, at least some of the keys including polysemous symbols which provide distinctive tactile feedback to a user; and accessing a word, phoneme or plural word message, based upon sequentially selected ones of the polysemous symbols providing distinctive tactile feedback. Another embodiment of the present application is directed to a system, including a keyboard, including a plurality of keys, at least some of the keys including polysemous symbols which provide distinctive tactile feedback to a user; and a processor to access a word, phoneme or plural word message, based upon sequentially selected ones of the polysemous symbols providing distinctive tactile feedback.	05-17-2012
20120123785	VOICE APPLICATION NETWORK PLATFORM - A distributed voice applications system includes a voice applications rendering agent and at least one voice applications agent that is configured to provide voice applications to an individual user. A management system may control and direct the voice applications rendering agent to create voice applications that are personalized for individual users based on user characteristics, information about the environment in which the voice applications will be performed, prior user interactions and other information. The voice applications agent and components of customized voice applications may be resident on a local user device which includes a voice browser and speech recognition capabilities. The local device, voice applications rendering agent and management system may be interconnected via a communications network.	05-17-2012
20120150545	BRAIN-COMPUTER INTERFACE TEST BATTERY FOR THE PHYSIOLOGICAL ASSESSMENT OF NERVOUS SYSTEM HEALTH - A battery of three or more sensory and cognitive challenge tasks actively or dynamically challenge the brain to monitor its state for assessment of injury, disease, or compound effect, among others. The system analyzes and assesses a personalized biometric brain health signature by integrating the use of electroencephalography (EEG), somato-sensory, neuropsychological, and/or cognitive stimulation, and novel signal processing and display. The system also provides for early detection of dementia, including Alzheimer's disease (AD), vascular dementia (VAD), mixed dementia (AD and VAD), MCI, and other dementia-type disorders, as well as brain injury states such as mild Traumatic Brain Injury and can provide some or all of the following improvements over conventional systems and methods, including: (1) Increased sensitivity, specificity, and overall accuracy; (2) early detection of disease and injury; and (3) enhanced portability with remote data acquisition capability.	06-14-2012
20120166200	SYSTEM AND METHOD FOR INTEGRATING GESTURE AND SOUND FOR CONTROLLING DEVICE - Disclosed is a system for integrating gestures and sounds including: a gesture recognition unit that extracts gesture feature information corresponding to user commands from image information and acquires gesture recognition information from the gesture feature information; a background recognition unit acquiring background sound information using the predetermined background sound model from the sound information; a sound recognition unit that extracts the sound feature information corresponding to user commands from the sound information and extracts the sound feature information based on the background sound information and acquires the sound recognition information from the sound feature information; and an integration unit that generates integration information by integrating the gesture recognition information and the sound recognition information.	06-28-2012
20120179469	CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS - Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.	07-12-2012
20120179470	SIMULTANEOUS VOICE AND DATA SYSTEMS FOR SECURE CATALOG ORDERS - Systems and methods for providing a simultaneous voice and data user interface for secure catalog orders and in particular for providing a system and method for providing a distributed voice user interface for a remote device having a limited visual user interface simultaneously with a data stream for facilitating secure automated catalog orders for simultaneous electronic fulfillment applied to that device are described.	07-12-2012
20120185254	INTERACTIVE FIGURINE IN A COMMUNICATIONS SYSTEM INCORPORATING SELECTIVE CONTENT DELIVERY - In a system, an interactive figurine delivers messages to a user in one of a number of forms. A server operation system includes processing capability which may individually couple content or may customize messages to a particular user of the interactive figurines. The interactive figurine contains an embedded circuit consisting of a receiver comprising a detector circuit tuned to at least one preselected frequency, a decoder to provide information indicative of intelligence and signals sent to the receiver, and a decoder circuit to provide actionable output signals indicative of information transmitted to the receiver. The server operation system may include a subscriber database and administration routines for customizing of messages and for directing messages. A user station intermediate the interactive figurine and the server module may be used to provide parental control or other control.	07-19-2012
20120191458	HUMAN-MACHINE DIALOG SYSTEM - The invention relates to a human-machine dialog system comprising:	07-26-2012
20120191459	SKIPPING RADIO/TELEVISION PROGRAM SEGMENTS - Techniques for notifying at least one entity of an occurrence of an event in an audio signal are provided. At least one preference is obtained from the at least one entity. An occurrence of an event in the audio signal is determined. The event is related to at least one of at least one speaker and at least one topic. The at least one entity is notified of the occurrence of the event in the audio signal, in accordance with the at least one preference.	07-26-2012
20120203555	DEVICES FOR ENCODING AND DECODING A WATERMARKED SIGNAL - An electronic device configured for encoding a watermarked signal is described. The electronic device includes modeler circuitry. The modeler circuitry determines parameters based on a first signal and a first-pass coded signal. The electronic device also includes coder circuitry coupled to the modeler circuitry. The coder circuitry performs a first-pass coding on a second signal to obtain the first-pass coded signal and performs a second-pass coding based on the parameters to obtain a watermarked signal.	08-09-2012
20120203556	DEVICES FOR ENCODING AND DETECTING A WATERMARKED SIGNAL - A method for decoding a signal on an electronic device is described. The method includes receiving a signal. The method also includes extracting a bitstream from the signal. The method further includes performing watermark error checking on the bitstream for multiple frames. The method additionally includes determining whether watermark data is detected based on the watermark error checking. The method also includes decoding the bitstream to obtain a decoded second signal if the watermark data is not detected.	08-09-2012
20120209612	Extraction and Matching of Characteristic Fingerprints from Audio Signals - An audio fingerprint is extracted from an audio sample, where the fingerprint contains information that is characteristic of the content in the sample. The fingerprint may be generated by computing an energy spectrum for the audio sample, resampling the energy spectrum, transforming the resampled energy spectrum to produce a series of feature vectors, and computing the fingerprint using differential coding of the feature vectors. The generated fingerprint can be compared to a set of reference fingerprints in a database to identify the original audio content.	08-16-2012
20120215541	SIGNAL PROCESSING METHOD, DEVICE, AND SYSTEM - A signal identifying method includes obtaining signal characteristics of a current frame of input signals; deciding, according to the signal characteristics of the current frame and updated signal characteristics of a background signal frame before the current frame, whether the current frame is a background signal frame; detecting whether the current frame serving as a background signal frame is in a first type signal state; and adjusting a signal classification decision threshold according to whether the current frame serving as a background signal frame is in the first type signal state to enhance the speech signal identification capability.	08-23-2012
20120253817	Mobile speech attendant access - A system and method for connecting to a telephone extension listed in a telephone number database is disclosed. The method comprises recording an audio token on a mobile communication device. The audio token is associated with a telephone number included in the database. The audio token is transmitted from the mobile communication device to a server over a digital channel. The telephone number in the database that is associated with the audio token is selected using speech recognition. The mobile communication device is then connected with the telephone number.	10-04-2012
20120253818	INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM - There is provided an information processing apparatus including an operation information transmitting unit transmitting operation information for operating respective appliances out of a plurality of appliances connected via a network, a character processing unit carrying out processing relating to characters, which correspond to the respective appliances and have individual personalities, and changes a content represented by the characters in accordance with the operation information for operating the appliances, and a display processing unit carrying out processing that displays the characters on a display unit.	10-04-2012
20120253819	LOCATION DETERMINATION SYSTEM AND MOBILE TERMINAL - A location determination system includes a first mobile terminal and a second mobile terminal. The first mobile terminal includes a first processor to acquire a first sound signal, analyze the first sound signal to obtain a first analysis result, and transmit the first analysis result. The second mobile terminal includes a second processor to acquire a second sound signal, analyze the second sound signal to obtain a second analysis result, receive the first analysis result from the first mobile terminal, compare the second analysis result with the first analysis result to obtain a comparison result, and determine whether the first mobile terminal locates in an area in which the second mobile terminal locates, based on the comparison result.	10-04-2012
20120259638	APPARATUS AND METHOD FOR DETERMINING RELEVANCE OF INPUT SPEECH - Audio or visual orientation cues can be used to determine the relevance of input speech. The presence of a user's face may be identified during speech during an interval of time. One or more facial orientation characteristics associated with the user's face during the interval of time may be determined. In some cases, orientation characteristics for input sound can be determined. A relevance of the user's speech during the interval of time may be characterized based on the one or more orientation characteristics.	10-11-2012
20120259639	CONTROLLING AUDIO VIDEO DISPLAY DEVICE (AVDD) TUNING USING CHANNEL NAME - A television, or other device with television tuner, can be controlled to directly tune to a specific channel name, such as a broadcaster's station name, by using EPG metadata to provide a correlation between a channel number and channel name.	10-11-2012
20120259640	VOICE CONTROL DEVICE AND VOICE CONTROL METHOD - A voice control unit controlling and outputting a first voice signal includes an analysis unit configured to calculate an average value of a gradient of spectrum at a high frequency of an inputted second voice signal as a voice characteristic, a determination unit configured to determine an amplification band and an amplification amount of a spectrum of the first voice signal based on the gradient, and an amplification unit configured to amplify the spectrum of the first voice signal to realize the determined amplification band and the determined amplification amount.	10-11-2012
20120265535	PERSONAL VOICE OPERATED REMINDER SYSTEM - A personal voice operated reminder system. In one embodiment, the system is worn as a device on the body in a form similar to a watch, bracelet or necklace. In another embodiment the system is a device normally held in a person's pocket or purse, and in another embodiment the system is a method added as an application to already existing devices such as PDAs or cellular telephones.	10-18-2012
20120265536	APPARATUS AND METHOD FOR PROCESSING VOICE COMMAND - Disclosed is a technique for processing voice commands. In particular, the disclose technique increases a voice recognition rate without performing a process of inputting separate voice commands by updating a voice command table based on interaction with a user by storing similar commands input by the user once those commands have been confirmed by the user as similar command.	10-18-2012
20120271636	VOICE INPUT DEVICE - A voice input device includes: a mastery level identifying device identifying a mastery level of a user with respect to voice input; and an input mode setting device switching a voice input mode between a guided input mode and an unguided input mode. In the guided input mode, preliminary registered contents of the voice input are presented to the user. The input mode setting device sets the voice input mode to the unguided input mode at a starting time when the voice input device starts to receive the voice input. The input mode setting device switches the voice input mode from the unguided input mode to the guided input mode at a switching time. The input mode setting device sets a time interval between the starting time and the switching time in proportion to the mastery level.	10-25-2012
20120271637	DERIVING GEOGRAPHIC DISTRIBUTION OF PHYSIOLOGICAL OR PSYCHOLOGICAL CONDITIONS OF HUMAN SPEAKERS WHILE PRESERVING PERSONAL PRIVACY - A method including: obtaining, via a plurality of communication devices, a plurality of speech signals respectively associated with human speakers, the speech signals including verbal components and non-verbal components; identifying a plurality of geographical locations, each geographic location associated with a respective one of the plurality of the communication devices; extracting the non-verbal components from the obtained speech signals; deducing physiological or psychological conditions of the human speakers by analyzing, over a specified period, the extracted non-verbal components, using predefined relations between characteristics of the non-verbal components and physiological or psychological conditions of the human speakers; and providing a geographical distribution of the deduced physiological or psychological conditions of the human speakers by associating the deduced physiological or psychological conditions of the human speakers with geographical locations thereof.	10-25-2012
20120284029	PHOTO-REALISTIC SYNTHESIS OF IMAGE SEQUENCES WITH LIP MOVEMENTS SYNCHRONIZED WITH SPEECH - Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which a synthesized image sequence will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with lip movements synchronized with the desired speech.	11-08-2012
20120316882	SYSTEM FOR GENERATING CAPTIONS FOR LIVE VIDEO BROADCASTS - An adaptive workflow system can be used to implement captioning projects, such as projects for creating captions or subtitles for live and non-live broadcasts. Workers can repeat words spoken during a broadcast program or other program into a voice recognition system, which outputs text that may be used as captions or subtitles. The process of workers repeating these words to create such text can be referred to as respeaking. Respeaking can be used as an effective alternative to more expensive and hard-to-find stenographers for generating captions and subtitles.	12-13-2012
20120316883	SYSTEM AND METHOD FOR DETERMINING A PERSONAL SHG PROFILE BY VOICE ANALYSIS - According to one embodiment of the present invention a computerized voice-analysis device for determining an S, H, G profile is provided (as described herein, such an S, H, G profile relates to the strengths (e.g., relative strengths) of three human instinctive drives). Of note, the present invention may be used for one or more of the following: analyzing a previously recorded voice sample; real-time analysis of voice as it is being spoken; combination voice analysis—that is, a combination of: (a) previously recorded and/or real-time voice; and (b) answers to a questionnaire.	12-13-2012
20120323579	DYNAMIC ACCESS TO EXTERNAL MEDIA CONTENT BASED ON SPEAKER CONTENT - An audio conference is supplemented based on speaker content. Speaker content from at least one audio conference participant is monitored using a computer with a tangible non-transitory processor and memory. A set of words is selected from the speaker content. The selected set of words is determined to be associated with supplemental media content from at least one external source. The supplemental media content is made available to at least one audience member for the audio conference. The supplemental media content is selectively presented to the at least one audience member.	12-20-2012
20130013315	Multisensory Speech Detection - A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.	01-10-2013
20130013316	Multisensory Speech Detection - A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.	01-10-2013
20130024199	Method and Apparatus for Sharing Information Using a Handheld Device - A method and apparatus for sending information to a data processing apparatus for identifying a document to share with a recipient. A handheld device is capable of communicating with the data processing apparatus. Information is captured from the document and stored in the handheld device as document data. A communications path is established between the handheld device and the data processing apparatus. The document data is sent to the data processing apparatus through the communications path. Reference documents are provided. Each reference document has reference data stored in a memory. At least a portion of the received document data is extracted as scanning data. The reference data is retrieved from the memory. The scanning data is compared with the reference data. When the scanning data matches at least a portion of the reference data of one of the reference documents, the one reference document is selected as the identified document for forwarding to the recipient.	01-24-2013
20130030812	APPARATUS AND METHOD FOR GENERATING EMOTION INFORMATION, AND FUNCTION RECOMMENDATION APPARATUS BASED ON EMOTION INFORMATION - Provided is an emotion information generating apparatus that is capable of recognizing a user's emotional state for each function of a terminal. The emotion information generating apparatus detects a user's emotional state and maps the user's emotional state to a function of the terminal, thus creating emotion information.	01-31-2013
20130030813	QUALITY OF USER GENERATED AUDIO CONTENT IN VOICE APPLICATIONS - Methods and arrangements for improving quality of content in voice applications. A specification is provided for acceptable content for a voice application, and user generated audio content for the voice application is inputted. At least one test is applied to the user generated audio content, and it is thereupon determined as to whether the user generated audio content meets the provided specification.	01-31-2013
20130046542	Periodic Ambient Waveform Analysis for Enhanced Social Functions - Client devices periodically capture ambient audio waveforms, generate waveform fingerprints, and upload the fingerprints to a server for analysis. The server compares the waveforms to a database of stored waveform fingerprints, and upon finding a match, pushes content or other information to the client device. The fingerprints in the database may be uploaded by other users, and compared to the received client waveform fingerprint based on common location or other social factors. Thus a client's location may be enhanced if the location of users whose fingerprints match the client's is known. In particular embodiments, the server may instruct clients whose fingerprints partially match to capture waveform data at a particular time and duration for further analysis and increased match confidence.	02-21-2013
20130096922	METHOD, APPARATUS AND COMPUTER PROGRAM PRODUCT FOR DETERMINING THE LOCATION OF A PLURALITY OF SPEECH SOURCES - The present invention discloses a method, apparatus and computer program product for determining the location of a plurality of speech sources in an area of interest, comprising performing an algorithm on a signal issued by either one of said plurality of speech sources in the area to for iteratively recover data characteristic to said signal, wherein the algorithm is an iterative model-based sparse recovery algorithm, and wherein for each of a plurality of points in said area, the iteratively recovered data is indicative of a presence of a plurality of speech sources contributing to the signal received at each of a plurality of points in the area.	04-18-2013
20130124206	VIDEO GENERATION BASED ON TEXT - Techniques for generating a video sequence of a person based on a text sequence, are disclosed herein. Based on the received text sequence, a processing device generates the video sequence of a person to simulate visual and audible emotional expressions of the person, including using an audio model of the person's voice to generate an audio portion of the video sequence. The emotional expressions in the visual portion of the video sequence are simulated based a priori knowledge about the person. For instance, the a priori knowledge can include photos or videos of the person captured in real life.	05-16-2013
20130132088	APPARATUS AND METHOD FOR RECOGNIZING EMOTION BASED ON EMOTIONAL SEGMENTS - An apparatus and method to recognize a user's emotion based on emotional segments are provided. An emotion recognition apparatus includes a sampling unit configured to extract sampling data from input data for emotion recognition. The emotion recognition apparatus further includes a data segment creator configured to segment the sampling data into a plurality of data segments. The emotion recognition apparatus further includes an emotional segment creator configured to create a plurality of emotional segments that include a plurality of emotions corresponding to each of the respective data segments.	05-23-2013
20130132089	CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS - Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture. An indication of the availability of the remote speech recognition to perform speech recognition at a point in time may be provided to a user of the client device via a user interface of the client device.	05-23-2013
20130138442	Systems and Methods for Recognizing Sound and Music Signals in High Noise and Distortion - A method for recognizing an audio sample locates an audio file that closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints. Landmarks occur at reproducible locations within the file, while fingerprints represent features of the signal at or near the landmark timepoints. To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database. For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed. If a large number of corresponding landmarks are linearly related, i.e., if equivalent fingerprints of the sample and retrieved file have the same time evolution, then the file is identified with the sample.	05-30-2013
20130144626	RAP MUSIC GENERATION - The preferred embodiments of this invention convert common human speeches into rap music. Computer programs change the timing intervals, amplitudes, and/or frequencies of the sound signals of a common speech to follow rap music beats. The resulting rap music also can overlap with background music and/or video images to achieve better effects.	06-06-2013
20130144627	VOICE CONTROL CIRCUIT FOR STARTING ELECTRONIC DEVICES - A control circuit employed in an electronic device includes a microphone, a level conversion circuit, and a voice processing circuit. The voice processing circuit includes a voice operated switch connected between the microphone and the level conversion circuit. The microphone picks up voice commands, the voice operated switch receives the voice commands from the microphone, and outputs a high voltage signal when a volume of the voice commands is greater than or equal to a predetermined volume threshold or is within a predetermined volume range, the level conversion circuit converts the high voltage signal into a low voltage signal for turning on the electronic device.	06-06-2013
20130151257	APPARATUS AND METHOD FOR PROVIDING EMOTIONAL CONTEXT TO TEXTUAL ELECTRONIC COMMUNICATION - An apparatus and method for including emotional context in textual electronic communication transmissions. The emotional context is conveyed symbolically through standardized alternations in the manner in which the text is displayed without the inclusion of additional graphics, thereby increasing the communicative value of textual electronic communication. An important advantage of this method of embedding emotional context is that the recipient is made aware of the mental and emotional state of to the originator while the textual electronic message is being received and interpreted and therefore is able to interpret the message in light of the emotional context.	06-13-2013
20130151258	Context Based Online Advertising - A software and/or hardware facility for inferring user context and delivering advertisements, such as coupons, using natural language and/or sentiment analysis is disclosed. The facility may infer context information based on a user's emotional state, attitude, needs, or intent from the user's interaction with or through a mobile device. The facility may then determine whether it is appropriate to deliver an advertisement to the user and select an advertisement for delivery. The facility may also determine an appropriate expiration time and/or discount amount for the advertisement.	06-13-2013
20130179171	SYSTEM AND METHOD FOR MULTI LEVEL TRANSCRIPT QUALITY CHECKING - Methods and systems for multi level quality checking of transcripts are disclosed. The method includes the steps of searching subsets of metadata associated with the transcripts, identifying a group of transcripts having at least one particular subset of metadata, selecting a number of transcripts from the group of identified transcripts corresponding to a predetermined percentage, identifying a group of correctionists having a proper set of characteristics to correct the selected transcripts by matching the identified subsets of metadata associated with the transcripts with characteristics of correctionists, providing the transcripts and any voice files from which the transcripts derive to the selected correctionists, and, following correction, updating the subsets of metadata associated with the transcripts to include subsets of metadata pertaining to the voice files from which the transcripts were derived, any transcriptionist who transcribed the transcripts, or any correctionist who corrected the transcripts.	07-11-2013
20130185076	MOTION ANALYZER, VOICE ACQUISITION APPARATUS, MOTION ANALYSIS SYSTEM, AND MOTION ANALYSIS METHOD - A motion analyzer includes a motion detection unit that detects motion of a part of a body of a subject, a speaking detection unit that detects speaking of the subject, and a determination unit that determines that the subject has performed predetermined motion when motion of a part of the body is detected by the motion detection unit and speaking of the subject is detected by the speaking detection unit.	07-18-2013
20130197913	EXTRACTION AND MATCHING OF CHARACTERISTIC FINGERPRINTS FROM AUDIO SIGNALS - An audio fingerprint is extracted from an audio sample, where the fingerprint contains information that is characteristic of the content in the sample. The fingerprint may be generated by computing an energy spectrum for the audio sample, resampling the energy spectrum logarithmically in the time dimension, transforming the resampled energy spectrum to produce a series of feature vectors, and computing the fingerprint using differential coding of the feature vectors. The generated fingerprint can be compared to a set of reference fingerprints in a database to identify the original audio content.	08-01-2013
20130204625	EXERCISE AND LEARNING CONCEPT USING AN EXERCISE AND LEARNING DEVICE FOR THE THERAPEUTIC TREATMENT OF PATIENTS IN THE MEDICAL DOMAIN - The invention relates to an exercise and learning concept and to a mobile exercise and learning device for the therapeutic treatment of patients, said concept being based on a network system. Said exercise and learning concept and exercise and learning device are used for the therapeutic treatment of patients in order to allow mobile and interactive learning. Said exercise and learning device comprises exercise and learning modules which are individually adapted to a patient who can also perform therapeutic exercises irrespective of the time and place.	08-08-2013
20130211840	SYSTEM AND METHOD FOR GENERATING AN ALTERNATIVE PRODUCT RECOMMENDATION - A method and system for automatically generating a naturally reading narrative product summary including assertions about a selected product. In one embodiment, the method includes the steps of determining at least one attribute associated with said specific product; selecting an alternative product based on said at least one attribute; and generating a naturally reading narrative including assertions about the specific product and a recommendation of the alternative product.	08-15-2013
20130218570	APPARATUS AND METHOD FOR CORRECTING SPEECH, AND NON-TRANSITORY COMPUTER READABLE MEDIUM THEREOF - According to one embodiment, in an apparatus for correcting a speech corresponding to a moving image, a separation unit separates at least one audio component from each audio frame of the speech. An estimation unit estimates a scene including a plurality of image frames related in the moving image, based on at least one of a feature of each image frame of the moving image and a feature of the each audio frame. An analysis unit acquires attribute information of the plurality of image frames by analyzing the each image frame. A correction unit determines a correction method of the audio component corresponding to the plurality of image frames, based on the attribute information, and corrects the audio component by the correction method.	08-22-2013
20130226585	Methods, Systems, and Products for Measuring Health - Methods, systems, and products measure health data related to a user. A spoken phrase is received and time-stamped. The user is identified from the spoken phrase. A window of time is determined from a semantic content of the spoken phrase. A sensor measurement is received and time-stamped. A difference in time between the time-stamped spoken phrase and the time-stamped sensor measurement is determined and compared to the window of time. When the difference in time is within the window of time, then the sensor measurement is associated with the user.	08-29-2013
20130231936	Computer-Implemented System And Method For Identifying And Masking Special Information Within Recorded Speech - A computer-implemented system and method for identifying and masking special information within recorded speech is provided. A field for entry of special information is identified. Movement of a pointer device along a trajectory towards the field is also identified. A correlation of the pointer device movement and entry of the special information is determined based on a location of the trajectory in relation to the field. A threshold is applied to the correlation. The special information is received as verbal speech. A recording of the special information is rendered unintelligible when the threshold is satisfied.	09-05-2013
20130282379	METHOD AND APPARATUS FOR ANALYZING ANIMAL VOCALIZATIONS, EXTRACTING IDENTIFICATION CHARACTERISTICS, AND USING DATABASES OF THESE CHARACTERISTICS FOR IDENTIFYING THE SPECIES OF VOCALIZING ANIMALS - A method for capturing and analyzing audio, in particular vocalizing animals, which uses the resulting analysis parameters to establish a database of identification characteristics for the vocalizations of known species. This database can then be compared against the parameters of unknown species to identify the species producing that vocalization type. The method uses a unique multi-stage method of analysis that includes first-stage analysis followed by segmentation of a vocalization into its structural components, such as Parts, Elements, and Sections. Further analysis of the individual Parts, Elements, Sections and other song structures produces a wide range of parameters which are then used to assign to a collection of identical, known species a diagnostic set of criteria. Subsequently, the vocalizations of unknown species can be similarly analyzed and the resulting parameters can be used to match the unknown data sample to the database of samples from a plurality of known species.	10-24-2013
20130297316	VOICE ENTRY OF SENSITIVE INFORMATION - A method, system, and computer program product for voice entry of information are provided in the illustrative embodiments. A conversion rule is applied to a voice input. An entry field input is generated, wherein the conversion rule allows the voice input to be distinct from the entry field input, and wherein the voice input obfuscates the entry field input. The entry field input is provided to an application, wherein the entry field is usable to populate a data entry field in the application.	11-07-2013
20130304475	SWITCHING BETWEEN ACOUSTIC PARAMETERS IN A CONVERTIBLE VEHICLE - A method of configuring an acoustics system of a convertible vehicle to receive speech from an occupant of the vehicle who is using hand-free technology. The position of the top of the convertible is first determined and based upon whether the top is up or down, an audio reception configuration is selected. The audio reception configuration includes a set of tuning parameters and a microphone arrangement. The acoustics system is then configured based upon the determination of whether the top is up or down.	11-14-2013
20130304476	Audio User Interaction Recognition and Context Refinement - A system which performs social interaction analysis for a plurality of participants includes a processor. The processor is configured to determine a similarity between a first spatially filtered output and each of a plurality of second spatially filtered outputs. The processor is configured to determine the social interaction between the participants based on the similarities between the first spatially filtered output and each of the second spatially filtered outputs and display an output that is representative of the social interaction between the participants. The first spatially filtered output is received from a fixed microphone array, and the second spatially filtered outputs are received from a plurality of steerable microphone arrays each corresponding to a different participant.	11-14-2013
20130311190	METHOD AND APPARATUS OF SPEECH ANALYSIS FOR REAL-TIME MEASUREMENT OF STRESS, FATIGUE, AND UNCERTAINTY - The present invention utilizes speech analysis to provide real-time measurement of end-user stress, fatigue, and uncertainty in decision-making. The present invention monitors “technology-induced” stressors by increasing the inherent functionality of individual monitoring technologies, so as to perform multiple applications in a single setting. In addition to the continued use of speech recognition technology for computerized report transcription, the present invention simultaneously measures and analyzes occupational stress and fatigue in real-time, specific to the unique profile of each individual end-user and context of the task being performed. The derived user-specific stress/fatigue analytics may be used in the creation of a number of workflow and quality enhancing deliverables, including customizable intervention strategies for stress/fatigue reduction, creation of automated workflow templates, and targeted quality assurance and peer review.	11-21-2013
20130317825	DERIVING GEOGRAPHIC DISTRIBUTION OF PHYSIOLOGICAL OR PSYCHOLOGICAL CONDITIONS OF HUMAN SPEAKERS WHILE RESERVING PERSONAL PRIVACY - A method including: obtaining, via a plurality of communication devices, a plurality of speech signals respectively associated with human speakers, the speech signals including verbal components and non-verbal components; identifying a plurality of geographical locations, each geographic location associated with a respective one of the plurality of the communication devices; extracting the non-verbal components from the obtained speech signals; deducing physiological or psychological conditions of the human speakers by analyzing, over a specified period, the extracted non-verbal components, using predefined relations between characteristics of the non-verbal components and physiological or psychological conditions of the human speakers; and providing a geographical distribution of the deduced physiological or psychological conditions of the human speakers by associating the deduced physiological or psychological conditions of the human speakers with geographical locations thereof.	11-28-2013
20130346082	LOW-DIMENSIONAL STRUCTURE FROM HIGH-DIMENSIONAL DATA - Low-dimensional structure from high-dimensional data is described for example, in the context of video foreground/background segmentation, speech signal background identification, document clustering and other applications where distortions in the observed data may exist. In various embodiments a first convex optimization process is used to find low dimensional structure from observations such as video frames in a manner which is robust to distortions in the observations; a second convex optimization process is used for incremental observations so bringing computational efficiency whilst retaining robustness. In various embodiments error checks are made to decide when to move between the first and second optimization processes. In various examples, the second convex optimization process encourages similarity between the solution it produces and the solution of the first convex optimization process, for example, by using an objective function which is suitable for convex optimization.	12-26-2013
20140019139	BLOOD GLUCOSE METER WITH SIMPLIFIED PROGRAMMABLE VOICE FUNCTION - A blood glucose meter with a simplified programmable voice function, including: a microprocessor; a memory that is both programmable and re-programmable coupled to the microprocessor; and an audio output device coupled to the microprocessor and the memory; wherein a language algorithm and a plurality of language components specific to a language selected by a user are disposed within the memory; and wherein the language algorithm and the plurality of language components are utilized to provide an audio output through the audio output device in the language selected by the user. The language algorithm is operable for determining which language components are utilized to provide the audio output and in what order based on the language selected by the user. Optionally, the audio output is generated by the microprocessor and the memory using a pulse-width modulation scheme and/or the like.	01-16-2014
20140025385	Method, Apparatus and Computer Program Product for Emotion Detection - In accordance with an example embodiment a method and apparatus is provided. The method comprises determining a value of at least one speech element associated with the audio stream. The value of the at least one speech element is compared with at least one threshold value of the speech element. Processing of a video stream is initiated based on the comparison of the value of the at least one speech element with the at least one threshold value. The video stream is associated with the audio stream. An emotional state is determined based on the processing of the video stream.	01-23-2014
20140032218	DYNAMIC ADJUSTMENT OF TEXT INPUT SYSTEM COMPONENTS - Dynamic adjustment of text input system components is provided. An indication of user activity with respect to a text input system of an electronic device is received. One or more activity indicators are determined based on at least the user activity. One or more components of the text input system are identified, each component providing a typing assistance functionality to a user and being associated with a set of parameters. For each of the one or more components, a determination is made whether the component should be adjusted based on the one or more activity indicators, and the component is dynamically adjusted when it is determined that the component should be adjusted based on the one or more activity indicators. Dynamically adjusting the component includes at least one of activating the component, deactivating the component or adjusting the set of parameters associated with the component.	01-30-2014
20140046668	CONTROL METHOD AND VIDEO-AUDIO PLAYING SYSTEM - A control method for a video-audio playing system receiving a video-audio streaming signal is provided. The video-audio streaming signal includes at least a channel-program information. The control method comprises receiving a speech signal and analyzing the speech signal to obtain an acoustic feature of the speech signal. According to the acoustic feature, a speech recognition is performed to determine one of the channel-program information corresponds to the acoustic feature. According to the determined channel-program information, the video-audio playing system executes an operation corresponding to the channel-program information.	02-13-2014
20140052448	SYSTEM AND METHOD FOR RECOGNIZING EMOTIONAL STATE FROM A SPEECH SIGNAL - A computerized method, software, and system for recognizing emotions from a speech signal, wherein statistical and MFCC features are extracted from the speech signal, the MFCC features are sorted to provide a basis for comparison between the speech signal and reference samples, the statistical and MFCC features are compared between the speech signal and reference samples, a scoring system is used to compare relative correlation to different emotions, a probable emotional state is assigned to the speech signal based on the scoring system and the probable emotional state is communicated to a user.	02-20-2014
20140074479	Biometric-Music Interaction Methods and Systems - A system and method for the automatic, procedural generation of musical content in relation to biometric data. The systems and methods use a user's device, such as a cell phone to capture image data of a body part, and derive a biometric signal from the image data. The biometric signal includes biometric parameters, which are used by a music generation engine to generate music. The music generation can also be based on user-specific data and quality data related to the biometric detection process.	03-13-2014
20140081643	SYSTEM AND METHOD FOR DETERMINING EXPERTISE THROUGH SPEECH ANALYTICS - Systems, methods, and non-transitory computer-readable storage media for determining expertise through speech analytics. The system associates speakers with respective segments of an audio conversation to yield associated speaker segments. The system also identifies a number of times a speaker has spoken about a topic in the audio conversation by searching the associated speaker segments for a term associated with the topic. The system then ranks the speaker as an expert in the topic when the number of times the speaker has spoken about the topic in the audio conversation exceeds a threshold. The audio conversation can include a compilation of a plurality of audio conversations. Moreover, the system can tag the associated speaker segments having the term with keyword tags and match a respective segment from the associated speaker segments with the speaker, the respective segment having a keyword tag.	03-20-2014
20140095166	DEEP TAGGING BACKGROUND NOISES - In a method for deep tagging a recording, a computer records audio comprising speech from one or more people. The computer detects a non-speech sound within the audio. The computer determines that the non-speech sound corresponds to a type of sound, and in response, associates a descriptive term with a time of occurrence of the non-speech sound within the recorded audio to form a searchable tag. The computer stores the searchable tag as metadata of the recorded audio.	04-03-2014
20140108016	PICTURES FROM SKETCHES - A graphical sketch can be received, the sketch including one or more representations of text. A query can be automatically generated from the sketch. The generation of the query can include automatically recognizing the text and automatically representing the text in the query. The query can be run to identify a picture in response to the query, with the text describing one or more non-textual features of the picture. The picture can be returned, such as in response to the receipt of the graphical sketch.	04-17-2014
20140114664	Active Participant History in a Video Conferencing System - Embodiments of methods and systems for dominant speaker identification in video conferencing are described. In one embodiment, the computer-implemented method includes identifying one or more dominant speakers in a video conference. The method may also include generating a list of the one or more dominant speakers. Additionally, the method may include communicating the list of one or more dominant speakers to clients in a video conferencing system. In a further embodiment, the method includes communicating the list of the one or more dominant speakers to a client in response to the client joining the video conference.	04-24-2014
20140142947	Sound Rate Modification - Sound rate modification techniques are described. In one or more implementations, an indication is received of an amount that a rate of output of sound data is to be modified. One or more sound rate rules are applied to the sound data that, along with the received indication, are usable to calculate different rates at which different portions of the sound data are to be modified, respectively. The sound data is then output such that the calculated rates are applied.	05-22-2014
20140149120	SYSTEM AND METHOD FOR FINGERPRINTING DATASETS - Systems and methods for the matching of datasets, such as input audio segments, with known datasets in a database are disclosed. In an illustrative embodiment, the use of the presently disclosed systems and methods is described in conjunction with recognizing known network message recordings encountered during an outbound telephone call. The methodologies include creation of a ternary fingerprint bitmap to make the comparison process more efficient. Also disclosed are automated methodologies for creating the database of known datasets from a larger collection of datasets.	05-29-2014
20140172429	LOCAL RECOGNITION OF CONTENT - Systems, methods, and computer-readable storage media for facilitating local recognition of audio content at a user device. In some embodiments, the method includes capturing, using a user device, audio data, at least some of which is processable to recognize the audio data. Thereafter, an audio fingerprint that uniquely represents perceptual information associated with the audio data is generated, and a local data store within the user device is referenced. Such a local data store can include reference audio fingerprints. Upon referencing the local data store, a determination can be made as to whether the generated audio fingerprint matches a reference audio fingerprint at least to an extent.	06-19-2014
20140244264	HUMAN EMOTION ASSESSMENT REPORTING TECHNOLOGY- SYSTEM AND METHOD - The present disclosure describes a novel method of analyzing and presenting results of human emotion during a session such as chat, video, audio and combination thereof in real time. The analysis is done using semiotic analysis and hierarchical slope clustering to give feedback for a session or historical sessions to the user or any professional. The method and system is useful for recognizing reaction for a particular session or detection of abnormal behavior. The method and system with unique algorithm is useful in getting instant feedback to stay the course or change in strategy for a desired result during the session.	08-28-2014
20140244265	METHODS, APPARATUSES, AND COMPUTER PROGRAM PRODUCTS FOR PROVIDING BROADBAND AUDIO SIGNALS ASSOCIATED WITH NAVIGATION INSTRUCTIONS - Methods, apparatuses, and computer program products are herein provided for providing broadband audio signal navigation instructions. A method may include determining a broadband audio signal associated with at least one navigation instruction. The method may further include causing the broadband audio signal to be provided to the user. The broadband audio signal may be a representation of an audio signal with components in all frequencies capable of being at least one of perceived or broadcast by a speaker. Corresponding apparatuses and computer program products are also provided.	08-28-2014
20140249823	STATE ESTIMATING APPARATUS, STATE ESTIMATING METHOD, AND STATE ESTIMATING COMPUTER PROGRAM - A state estimating apparatus includes: a spectrum calculating unit which calculates a power spectrum for each of a plurality of frequencies on a frame-by-frame basis from a voice signal containing voice of a first speaker and voice of a second speaker transmitted over a telephone line; a band power calculating unit which calculates power of a non-transmission band on a frame-by-frame basis, based on the power spectra of frequencies contained in the non-transmission band among the plurality of frequencies; a transmitted-voice judging unit which determines that any frame whose power in the non-transmission band is greater than a threshold value indicating the presence of voice carries the voice of the first speaker; and a state judging unit which judges whether the state of mind of the first speaker is normal or abnormal, based on the frame judged to carry the voice of the first speaker.	09-04-2014
20140249824	Detecting a Physiological State Based on Speech - A computer-implemented method identifies a spoken audio signal representing speech of a person and estimates a physiological state of the person based on the spoken audio signal. For example, the method may identify articulatory patterns (such as landmarks) in the speech and estimate the person's physiological state based on those articulatory patterns. The method may estimate, for example, the amount of time the person has been without sleep. The method may produce the physiological state estimate without performing speech recognition on the spoken audio signal. The method may produce the physiological state estimate in real-time.	09-04-2014
20140257820	METHOD AND APPARATUS FOR REAL TIME EMOTION DETECTION IN AUDIO INTERACTIONS - The subject matter discloses a computerized method for real time emotion detection in audio interactions comprising: receiving at a computer server a portion of an audio interaction between a customer and an organization representative, the portion of the audio interaction comprises a speech signal; extracting feature vectors from the speech signal; obtaining a statistical model; producing adapted statistical data by adapting the statistical model according to the speech signal using the feature vectors extracted from the speech signal; obtaining an emotion classification model; and producing an emotion score based on the adapted statistical data and the emotion classification model, said emotion score represents the probability that the speaker that produced the speech signal is in an emotional state.	09-11-2014
20140303980	SYSTEM AND METHOD FOR AUDIO KYMOGRAPHIC DIAGNOSTICS - A system and method for assisting in a determination of one or more maladies associated with a human voice anatomy utilizes voice information acquired over at least two temporally displaced acquisitions. Acquired voice samples, including plural vowel sounds, are digitized and passed through one or more bandpass filters to isolate one or more frequency ranges. Curve fitting of acquired data is completed in accordance with a plurality of parameter weights applied in either a time domain or frequency domain model of the voice. This process is repeated a second, later time, for the same human, and the same process is completed for the subsequently-acquired voice information. A difference between the curve information in the respective data sets is analyzed relative to the weights, and corresponding changes are correlated to maladies of various areas of the human voice anatomy.	10-09-2014
20140303981	CROSS-LINGUAL SEEDING OF SENTIMENT - A contact center system can receive messages from social media sites or centers. The messages may be in a foreign language. The system can review messages by identifying content in the social media messages with negative/positive sentiment and then identify a seed term in the messages. A seed term can be a word in another language, different from the message body. The seed term is then used to find one or more other words, in the foreign language, that are correlated with the seed term. The identification of the found words in other messages can then be used to determine sentiment in the foreign language.	10-09-2014
20140303982	PHONETIC CONVERSATION METHOD AND DEVICE USING WIRED AND WIRESS COMMUNICATION - A phonetic conversation method using wired and wireless communication networks includes: receiving, by a voice input unit of a phonetic conversation device, a voice that is input by a user; receiving, by a wired and wireless communication unit of the phonetic conversation device, a voice that is input through the voice input unit and transmitting the voice to a mobile terminal; receiving, by the wired and wireless communication unit, an answer voice that is transmitted from the mobile terminal; and receiving and outputting, by a voice output unit of the phonetic conversation device, a voice from the wired and wireless communication unit.	10-09-2014
20140309998	PREVENTION OF UNINTENDED DISTRIBUTION OF AUDIO INFORMATION - Preventing unintended distribution of audio information may comprise analyzing audio data of a speaker's speech received by a microphone; determining automatically by a processor, from the analyzing whether the speaker's speech is intended to be distributed to an audience via the microphone; and in response to determining that the speaker's speech is not intended to be distributed to the audience via the microphone, performing one or more actions.	10-16-2014
20140309999	PREVENTION OF UNINTENDED DISTRIBUTION OF AUDIO INFORMATION - Preventing unintended distribution of audio information may comprise analyzing audio data of a speaker's speech received by a microphone; determining automatically by a processor, from the analyzing whether the speaker's speech is intended to be distributed to an audience via the microphone; and in response to determining that the speaker's speech is not intended to be distributed to the audience via the microphone, performing one or more actions.	10-16-2014
20140316787	Systems and Methods for Recognizing Sound and Music Signals in High Noise and Distortion - A method for recognizing an audio sample locates an audio file that closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints. Landmarks occur at reproducible locations within the file, while fingerprints represent features of the signal at or near the landmark timepoints. To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database. For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed. If a large number of corresponding landmarks are linearly related, i.e., if equivalent fingerprints of the sample and retrieved file have the same time evolution, then the file is identified with the sample.	10-23-2014
20140324439	CONTENT SHARING METHOD, APPARATUS AND ELECTRONIC DEVICE - A content sharing method, apparatus and an electronic device, which belongs to a field of computer technology, are provided. The method includes: collecting a voice signal by the microphone after current content is displayed; detecting whether the voice signal is a blowing signal; displaying a sharing page corresponding to the current content in the case that the voice signal is the blowing signal, the sharing page containing content to be shared; sharing the content to be shared in the sharing page. According to the present method, a sharing page is displayed directly to guide a user to complete content sharing upon detecting that the voice signal collected by the microphone is the blowing signal, thus avoiding inconvenience caused by multiple times of clicks of the user during content sharing and greatly reducing the time for sharing.	10-30-2014
20140337034	SYSTEM AND METHOD FOR ANALYSIS OF POWER RELATIONSHIPS AND INTERACTIONAL DOMINANCE IN A CONVERSATION BASED ON SPEECH PATTERNS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for evaluating dominance of participants in a conversation or interaction. An example system configured to practice the method first receives interaction data involving a plurality of participants, and can identify a type of interaction based on the interaction data. The system can parse the interaction data to identify dialog turns, and extract, from the interaction data and dialog turns, a plurality of participant features, wherein the plurality of participant features is selected based on the type of interaction. Then the system can generate, for each of the plurality of participants, a power index based on the respective participant features.	11-13-2014
20140358549	SYSTEM AND METHOD FOR CONVERSATIONAL CONFIGURATION OF APPLICATIONS - A configuration-file generation system for generating a configuration-file to configure an application for an enterprise is provided. The configuration-file generation system includes an IVR module for enabling a user to verbally interact with the configuration-file generation system. The configuration-file generation system further includes an analyzing module for analyzing and querying any information missed by the user. The configuration-file generation system further includes a suggestion module for searching and suggesting possible options corresponding to the missing information with the help of semantic web technology and with an experience database. The configuration-file generation system further includes a configuration-file generation module for generating a configuration-file based on available information received from the user. The configuration-file may then be passed to an application configuration module for configuring the application as required.	12-04-2014
20140372124	HUMAN-MACHINE DIALOGUE DEVICE - The invention relates to a human-machine dialogue device (	12-18-2014
20140379351	SPEECH DETECTION BASED UPON FACIAL MOVEMENTS - Apparatus, computer-readable storage medium, and method associated with speech communication, including determining whether a user is speaking, are described. In embodiments, a computing device may include a camera, a microphone, and a speech sensing module. The speech sensing module may be configured to determine whether a user of the computing device is speaking. This determination may be based upon mouth movements of the user detected through images captured by the camera. As a result of the determination, the microphone may be muted or unmuted. Other embodiments may be described and/or claimed.	12-25-2014
20150032455	COMPUTERIZED INFORMATION AND DISPLAY APPARATUS AND METHODS - Computerized apparatus useful for obtaining and presenting information to users. In one embodiment, the computerized apparatus includes a display device and speech recognition apparatus configured to receive user speech input and enable performance of various tasks, such as obtaining desired information relating to an entity, maps or directions, weather, news, or any number of other topics. In one variant, a plurality of data is received by the computerized apparatus from a remote server via a network interface; the received data can then be searched or otherwise utilized to provide the user with the desired information.	01-29-2015
20150046165	Talking Medicine Bottle and Label and System and Method for Manufacturing the Same - A talking medicine label, bottle, system and method for their manufacture are described. The system and method include use of a recording device by speaking into a microphone and then affixing the talking label to the side of a conventional pill bottle to transform it into a talking pill bottle. The system and method alternatively may include a PC/POS terminal and a speech synthesis device for programming the label with a synthetic-speech recording.	02-12-2015
20150046166	METHODS AND SYSTEMS FOR MUSIC INFORMATION MANAGEMENT - Methods and systems for music information management are provided. When audio data is generated in an electronic device, a control module is notified to launch a specific application to perform a music recognition procedure for the audio data, thus to obtain music information corresponding to the audio data.	02-12-2015
20150051912	Method for Segmenting Videos and Audios into Clips Using Speaker Recognition - A method for segmenting video and audio into clips using speaker recognition is provided to segment audio according to speaker audio, and to make audio clips correspond to the audio and video signals to generate audio and video clips. The method instantly trains an independent speaker model by increasing an unknown speaker source audio signal, and the speaker recognition result is applied to determine the audio and video clips. Independent speaker clips of source audio are determined according to the speaker model and the speaker model is renewed according the independent speaker clips of source audio. This method segments audio by the speaker model without waiting for complete speaker feature audio signals to be collected. The method is also able to segment the audio and video into clips based on the recognition result of speaker audio, and can be used to segment TV audio and video into clips.	02-19-2015
20150081308	VOICE ANALYSIS - Telecommunications services and systems utilising voice intonation analysis to provide additional information to users. Information on caller's moods may be obtained from the intonation analysis and stored for later retrieval with information on the calls, including audio data, to which the information relates. An interactive system may be provided to perform intonation analysis on a caller's reasons for calling and the results of that analysis may be provided to the recipient to assist them in deciding whether to accept the call.	03-19-2015
20150106103	CUSTOMIZABLE SYSTEM AND DEVICE FOR DEFINING VOICE DIMENSIONS AND METHODS OF USE - The disclosure provides a customizable system for modifying voice dimensions. The system comprises a program interface located on an electronic device. The program interface is used to manipulate user input from one or more individuals relating to voice parameters. Instructions are then created by the program interface that allow for one or more individuals to modify the voice dimensions of the one or more individuals by following the instructions.	04-16-2015
20150106104	DISPLAY DEVICE AND CONTROL METHOD THEREOF - Disclosed is a display device including: a signal receiver which receives a remote control signal from a remote controller that includes a plurality of input buttons; a storage which stores therein voice information corresponding to the plurality of input buttons of the remote controller; a voice output which outputs the voice information; and a voice output controller which identifies one of the plurality of input buttons that has been selected from the input buttons, based on the received remote control signal, and controls the voice output to output the stored voice information corresponding to the identified input button.	04-16-2015
20150112689	Acoustic Activity Detection Apparatus And Method - Streaming audio is received. The streaming audio includes a frame having plurality of samples. An energy estimate is obtained for the plurality of samples. The energy estimate is compared to at least one threshold. In addition, a band pass estimate of the signal is determined. An energy estimate is obtained for the band-passed plurality of samples. The two energy estimates are compared to at least one threshold each. Based upon the comparison operation, a determination is made as to whether speech is detected.	04-23-2015
20150127351	Noise Dependent Signal Processing For In-Car Communication Systems With Multiple Acoustic Zones - A speech communication system includes a speech service compartment for holding one or more system users. The speech service compartment includes a plurality of acoustic zones having varying acoustic environments. At least one input microphone is located within the speech service compartment, for developing microphone input signals from the one or more system users. At least one loudspeaker is located within the service compartment. An in-car communication (ICC) system receives and processes the microphone input signals, forming loudspeaker output signals that are provided to one or more of the at least one output loudspeakers. The ICC system includes at least one of a speaker dedicated signal processing module and a listener specific signal processing module, that controls the processing of the microphone input signal and/or forming of the loudspeaker output signal based, at least in part, on at least one of an associated acoustic environment(s) and resulting psychoacoustic effect(s).	05-07-2015
20150127352	Methods, Systems, and Tools for Promoting Literacy - Computer-based literacy tools provide users with visual and aural examples of phonemes, associating graphemes with both a respective sound and organ or organs of articulation. The tools can compare a spoken syllable, word, or phrase to the correct phoneme(s) and express any dissonance between the two using iconophonological symbols. Works published in one dialect can be transliterated into another dialect, such as in response to a location or user preference. Iconophonological orthographies can be logically ordered, and can be used to simplify animation.	05-07-2015
20150142445	SIGNAL PROCESSING APPARATUS, SIGNAL PROCESSING METHOD, AND PROGRAM - Provided is a signal processing apparatus including a feature detection unit configured to detect, from an input signal, a detection signal including at least one of audience-generated-sound likelihood and music likelihood, and a vicinity-sound generation unit configured to generate vicinity sound based on the detection signal.	05-21-2015
20150142446	Credit Risk Decision Management System And Method Using Voice Analytics - A credit risk decision management system and method using voice analytics are disclosed. The voice analysis may be applied to speaker authentication and emotion detection. The system introduces use of voice analysis as a tool for credit assessment, fraud detection and a measure of customer satisfaction and return rate probability when lending to an individual or a group. Emotions in voice interactions during a credit granting process are shown to have high correlation with specific loan outcomes. This system may predicts lending outcomes that determine if a customer might face financial difficulty in near future and ascertains affordable credit limit for such a customer. Information carrying features are extracted from the customer's voice files, and mathematical and logical transformations are performed on these features to get derived features. The data is then fed to a predictive model which captures the probability of default, intent to pay and fraudulent activity involved in a credit transaction. The voice prints can also be transcribed into text and text analytics can be performed on the data obtained to infer similar lending outcomes using Natural Language Processing and predictive modeling techniques.	05-21-2015
20150302866	SPEECH AFFECT ANALYZING AND TRAINING - We describe a method for the analysis of non-verbal audible expressions in speech, comprising the steps of: providing one or more input means for receiving audible expression from one or more users/participants; transferring said audible expression into speech signal; inputting said speech signal into at least one computer system; processing, at the at least one computer system, said speech signal to determine and output data representing a degree of affective content of said speech signal; and providing a user interface adapted for allowing the analysis of said output data	10-22-2015
20150302867	CONVERSATION DETECTION - Various embodiments relating to detecting a conversation during presentation of content on a computing device, and taking one or more actions in response to detecting the conversation, are disclosed. In one example, an audio data stream is received from one or more sensors, a conversation between a first user and a second user is detected based on the audio data stream, and presentation of a digital content item is modified by the computing device in response to detecting the conversation.	10-22-2015
20150317977	VOICE PROFILE MANAGEMENT AND SPEECH SIGNAL GENERATION - A device includes a receiver, a memory, and a processor. The receiver is configured to receive a remote voice profile. The memory is electrically coupled to the receiver. The memory is configured to store a local voice profile associated with a person. The processor is electrically coupled to the memory and the receiver. The processor is configured to determine that the remote voice profile is associated with the person based on speech content associated with the remote voice profile or an identifier associated with the remote voice profile. The processor is also configured to select the local voice profile for profile management based on the determination.	11-05-2015
20150317987	METHOD AND A DEVICE FOR REACTING TO WATERMARKS IN DIGITAL CONTENT - A device for reacting to watermarks embedded in digital content, comprising a capture module configured to capture digital content; a watermark extraction module configured to extract watermarks embedded in captured digital content; an interpreter module configured to interpret extracted watermarks and to send commands corresponding to interpretations of extracted watermarks; a storage module configured to store digital content; a rendering module configured to render digital content stored in the storage module in response to a command from the interpreter module to render digital content; a recorder module configured to record digital content captured by the capture module upon reception of a command from the interpreter; and an encoder module configured to encode digital content recorded by the recorded module and to store encoded digital content in the storage module.	11-05-2015
20150332708	METHOD AND APPARATUS FOR REPLACING TELEPHONE ON HOLD MUSIC AT A CALLER'S SIDE - A method of handling telephone on-hold music, provided by a second party to a first party on a telephone connection between the first and the second party, includes detecting on-hold music, and providing, while on-hold music is detected, and to the first party, content from a source that is independent from the second party. A length of the most recently received audio signal originating from the second party is continuously stored. When on-hold music is no longer detected, voice information is retrieved from the stored length of the most recently received audio signal, providing, to the first party, content from a source that is independent from the second party is stopped, and the retrieved voice information is reproduced to the first party, wherein a time offset that corresponds to a time period required for detection of on-hold music is present.	11-19-2015
20150348570	METHOD AND APPARATUS FOR SPEECH BEHAVIOR VISUALIZATION AND GAMIFICATION - In some example embodiments, a system is provided for real-time analysis of audio signals. First digital audio signals are retrieved from memory. First computed streamed signal information corresponding to each of the first digital audio signals is generated by computing first metrics data for the first digital audio signals, the first computed streamed signal information including the first metrics data. The computed first streamed signal information is stored in the memory. The first computed streamed signal information is transmitted to one or more computing devices. Transmitting the first computed streamed signal information to the one or more computing devices causes the first computed streamed signal information to be displayed at the one or more computing devices.	12-03-2015
20150356970	METHOD AND SYSTEM FOR EXTENDING DIALOG SYSTEMS TO PROCESS COMPLEX ACTIVITIES FOR APPLICATIONS - A dialog system that includes a dialog manager to manage a conversation between the dialog system and a user, and to associate the conversation with a complex activity, and a plan engine to execute a plan script in connection with the complex activity, the plan script including a set of atomic dialog activities and logic to control a data and sequence flow of the atomic dialog activities, the set of atomic dialog activities being sub-activities of the complex activity, the complex activity being specified via a declarative activity specification language that connects the atomic dialog activities with a process.	12-10-2015
20150371661	Conveying Audio Messages to Mobile Display Devices - An apparatus and method includes a graphics station, an internet server and a production device, such that, at the graphics station, a character data file is created; a speech animation loop is generated having lips control; at the production device, the character data file is obtained along with the speech animation loop from the internet server; local audio is received to produce associated audio data and a control signal to animate the lips, a primary animation data file is constructed with lip movement; at each mobile display device, the character data is received; the primary animation data file and the associated audio data are also accepted; the character data file and the primary animation data file produce primary rendered video data, and the primary rendered video data is played with the associated audio data, such that the movement of the lips is in synchronism with the audio being played.	12-24-2015
20160035371	Method and Apparatus to Determine and Use Audience Affinity and Aptitude - One embodiment is a method of presenting an audio or audio-visual work which includes: (a) detecting media work content properties in an audio portion of the audio or audio-visual work using a media work content properties detection apparatus; (b) associating a presentation rate of the audio of the audio portion of the audio or audio-visual work with the detected media work content properties; and (c) presenting the portion of the audio or audio-visual work using the media work content properties detection apparatus so that the audio is presented at the presentation rate; wherein the media work content properties comprise one or more indicia of words of interest; and wherein the audio or audio-visual work includes conversations.	02-04-2016
20160064033	PERSONALIZED AUDIO AND/OR VIDEO SHOWS - One or more techniques and/or systems are provided for providing personalized audio shows and/or video shows. For example, content corresponding to an interest of a user may be identified (e.g., a videogame article, a home renovation blog, etc.). One or more actor templates within a natural language template set may be applied to portions of the content to create audio snippets. For example, text-to-speech synthesis functionality may use a first actor template to convert the videogame article into a videogame snippet and may use a second actor template to convert the home renovation blog into a home renovation snippet. The videogame snippet and the home renovation snippet may be used to generate an audio show (e.g., a dialogue between a first actor persona, defined within the first actor template, reading the videogame snippet and a second actor persona, defined within the second actor template, reading the home renovation snippet).	03-03-2016
20160078883	ACTION ANALYSIS DEVICE, ACTION ANALYSIS METHOD, AND ACTION ANALYSIS PROGRAM - An action analysis device includes: an acoustic analysis unit	03-17-2016
20160093315	ELECTRONIC DEVICE, METHOD AND STORAGE MEDIUM - According to one embodiment, an electronic device includes circuitry configured to display, during recording, a first mark indicative of a sound waveform collected from a microphone and a second mark indicative of a section of voice collected from the microphone, after processing to detect the section of voice.	03-31-2016
20160117684	EVALUATION OF VOICE COMMUNICATIONS - One-to-many comparisons of callers' words and/or voice prints with known words and/or voice prints to identify any substantial matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract different words, such as words of anger. The system may also segment at least a portion of the customer's voice to create a tone profile, and it formats the segmented words and tone profiles for network transmission to a server. The server compares the customer's words and/or tone profiles with multiple known words and/or tone profiles stored on a database to determine any substantial matches. The identification of any matches may be used for a variety of purposes, such as providing representative feedback or customer follow-up.	04-28-2016
20160125889	METHODS AND SYSTEMS FOR DECREASING LATENCY OF CONTENT RECOGNITION - Aspects of the present invention relate to systems, methods and apparatus for identifying a reference audio content in an audio stream.	05-05-2016
20160125891	ENVIRONMENT-BASED COMPLEXITY REDUCTION FOR AUDIO PROCESSING - Audio processing complexity is reduced based on an environment. In one example, a current environment of a mobile device is determined. A profile is selected based on the current environment. An audio processing pipeline is configured based on the selected profile and audio received at the mobile device is processed through the configured audio processing pipeline.	05-05-2016
20160135736	MONITORING TREATMENT COMPLIANCE USING SPEECH PATTERNS CAPTURED DURING USE OF A COMMUNICATION SYSTEM - Methods and systems for monitoring compliance of a patient with a prescribed treatment regimen are described. Patient speech is detected during use of a communication system such as a mobile telephone and analyzed to determine compliance with a treatment for a brain-related disorder, for example. Speech data representing one or more patient speech pattern and an identity signal containing information used to determine presence/identity of the patient are transmitted from a circuitry-based system at the patient location to a monitoring location. Identity of the patient as user of the communication system is determined through, e.g., biometric or authentication techniques. Speech data is analyzed to determine whether a patient speech pattern matches one or more characteristic speech patterns. Outcome of the analysis is reported to a medical caregiver or other party, for example.	05-19-2016
20160135737	DETERMINING TREATMENT COMPLIANCE USING SPEECH PATTERNS CAPTURED DURING USE OF A COMMUNICATION SYSTEM - Methods and systems for monitoring compliance of a patient with a prescribed treatment regimen are described. Compliance is determined based upon analysis of the patient's speech detected during use of a communication system such as a mobile telephone by the patient. Methods and systems as described herein can be used for monitoring patient compliance with a treatment for a brain-related disorder, for example. Identity of the patient as user of the communication system is determined through the use of e.g., biometric or authentication techniques. Speech data indicative of whether the patient has complied with the prescribed treatment regimen is transmitted from a circuitry-based system at the patient location to a monitoring location, where it may be reviewed by a medical caregiver or other party, for example. Patient speech may be analyzed at the patient location and/or subjected to analysis at the monitoring location.	05-19-2016
20160140971	PERIODIC AMBIENT WAVEFORM ANALYSIS FOR ENHANCED SOCIAL FUNCTIONS - In particular embodiments, one or more computer-readable non-transitory storage media embody software that is operable when executed to receive an audio waveform fingerprint and a client-determined location from a client device. The received audio waveform fingerprint may be compared to a database of stored audio waveform fingerprints, each stored audio waveform fingerprint associated with an object in an object database. One or more matching audio waveform fingerprints may be found from a comparison set of audio waveform fingerprints obtained from the audio waveform fingerprint database. Location information associated with a location of the client device may be determined, and the location information may be sent to the client device. The client device may be operable to update the client-determined location based at least in part on the location information.	05-19-2016
20160148620	INDEXING BASED ON TIME-VARIANT TRANSFORMS OF AN AUDIO SIGNAL'S SPECTROGRAM - An audio identification system generates audio fingerprints and indexes associated with the audio fingerprints based on discrete and overlapping frames within a sample of an audio signal. The system applies a time-to-frequency domain transform to a time-sequence of frames, which may be filtered. The audio identification system then applies a time-variant transformation (e.g., a Discrete Cosine Transform) to the transformed frames and generates an audio fingerprint and index by selecting sets of coefficients of the time-variant transformation. The system selects coefficients that are less sensitive to possible noise and/or distortions in the underlying signal, such as low-frequency coefficients. The time-variant transformation provides sufficient sampling among the indexes by incorporating the phase information of the frames into the indexes. The system stores the audio fingerprint and other identifying information by index for efficient retrieval and matching of the retrieved fingerprints.	05-26-2016
20160196837	EMOTIONAL SURVEY ACCORDING TO VOICE CATEGORIZATION	07-07-2016
20160203831	MULTIPLE LIVE VOICE DISCUSSIONS STATUS	07-14-2016
20160379667	ROBUST FEATURE EXTRACTION USING DIFFERENTIAL ZERO-CROSSING COUNTS - A low power sound recognition sensor is configured to receive an analog signal that may contain a signature sound. Sparse sound parameter information is extracted from the analog signal and compared to a sound parameter reference stored locally with the sound recognition sensor to detect when the signature sound is received in the analog signal. A portion of the sparse sound parameter information is differential zero crossing (ZC) counts. Differential ZC rate may be determined by measuring a number of times the analog signal crosses a threshold value during each of a sequence of time frames to form a sequence of ZC counts and taking a difference between selected pairs of ZC counts to form a sequence of differential ZC counts.	12-29-2016
20180025742	DYNAMICALLY PROVIDING TO A PERSON FEEDBACK PERTAINING TO UTTERANCES SPOKEN OR SUNG BY THE PERSON	01-25-2018
20190147866	SUPPORT SYSTEM, SUPPORT METHOD, AND MEMORY MEDIUM	05-16-2019

Patent applications in class Application

Patent applications in all subclasses Application

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Application

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

Patent class list (only not empty are listed)

Deeper subclasses: