Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Application

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
704275000 Speech controlled system 282
704270100 Speech assisted network 103
704273000 Security system 28
704278000 Sound editing 23
704272000 Novelty item 22
704271000 Handicap aid 21
704276000 Pattern display 18
704277000 Translation 10
704274000 Warning/alarm system 6
Entries
DocumentTitleDate
20130030813QUALITY OF USER GENERATED AUDIO CONTENT IN VOICE APPLICATIONS - Methods and arrangements for improving quality of content in voice applications. A specification is provided for acceptable content for a voice application, and user generated audio content for the voice application is inputted. At least one test is applied to the user generated audio content, and it is thereupon determined as to whether the user generated audio content meets the provided specification.01-31-2013
20130030812APPARATUS AND METHOD FOR GENERATING EMOTION INFORMATION, AND FUNCTION RECOMMENDATION APPARATUS BASED ON EMOTION INFORMATION - Provided is an emotion information generating apparatus that is capable of recognizing a user's emotional state for each function of a terminal. The emotion information generating apparatus detects a user's emotional state and maps the user's emotional state to a function of the terminal, thus creating emotion information.01-31-2013
20130211840SYSTEM AND METHOD FOR GENERATING AN ALTERNATIVE PRODUCT RECOMMENDATION - A method and system for automatically generating a naturally reading narrative product summary including assertions about a selected product. In one embodiment, the method includes the steps of determining at least one attribute associated with said specific product; selecting an alternative product based on said at least one attribute; and generating a naturally reading narrative including assertions about the specific product and a recommendation of the alternative product.08-15-2013
20090018840AUTOMATED SPEECH RECOGNITION (ASR) TILING - Techniques are described related to tiles of automatic speech recognition data. In an implementation, automated speech recognition (ASR) data is obtained. The ASR data is divided into a plurality of tiles based on an approximate amount of data to be included in each tile. Each of the tiles is a geographic partition of the ASR data.01-15-2009
20090192798METHOD AND SYSTEM FOR CAPABILITIES LEARNING - A method for task execution improvement, the method includes: generating a baseline model for executing a task; recording a user executing a task; comparing the baseline model to the user's execution of the task; and providing feedback to the user based on the differences in the user's execution and the baseline model.07-30-2009
20090192799Breathing Apparatus Speech Enhancement - Speech enhancement in a breathing apparatus is provided using a primary sensor mounted near a breathing mask user's mouth, at least one reference sensor mounted near a noise source, and a processor that combines the signals from these sensors to produce an output signal with an enhanced speech component. The reference sensor signal may be filtered and the result may be subtracted from the primary sensor signal to produce the output signal with an enhanced speech component. A method for detecting the exclusive presence of a low air alarm noise may be used to determine when to update the filter. A triple filter adaptive noise cancellation method may provide improved performance through reduction of filter maladaptation. The speech enhancement techniques may be employed as part of a communication system or a speech recognition system.07-30-2009
20110196682Common Scene Based Conference System - Conference bridge (08-11-2011
20110196681TRANSMISSION SYSTEM - The present invention provide a transmission system comprising a transmission apparatus for transmitting audio data of multi channels and auxiliary data required for playback of the audio data, and a receiving apparatus for receiving the audio data and the auxiliary data which are transmitted by the transmission apparatus. A multiplexer of the transmission apparatus creates block data that is composed of 8 frames, and first 1 byte of each frame is allocated to a header having Sync, OE and the like, the second byte is allocated to the auxiliary data including AUX data and copyright protect information, and remaining bytes are used to transmit the audio data. An encryptor carries out an encryption process for the second and later bytes of each frame, and a communication means outputs encrypted data. A communication means of the receiving apparatus receives the encrypted data from the transmission apparatus, a decoder decodes the encrypted data, and a demultiplexer demultiplexes the audio data and the auxiliary data. Therefore, this transmission system transmits multi-channel audio data of the DVD-Audio or the like efficiently on a transmission line using fixed length frames according to the MOST method, and takes measures for copyright protection of audio data.08-11-2011
20110202349ESTABLISHING A MULTIMODAL ADVERTISING PERSONALITY FOR A SPONSOR OF A MULTIMODAL APPLICATION - Establishing a multimodal advertising personality for a sponsor of a multimodal application, including associating one or more vocal demeanors with a sponsor of a multimodal application and presenting a speech portion of the multimodal application for the sponsor using at least one of the vocal demeanors associated with the sponsor.08-18-2011
20110202348RHYTHM PROCESSING AND FREQUENCY TRACKING IN GRADIENT FREQUENCY NONLINEAR OSCILLATOR NETWORKS - A method for mimicking the auditory system's response to rhythm of an input signal having a time varying structure comprising the steps of receiving a time varying input signal x(t) to a network of n nonlinear oscillators, each oscillator having a different natural frequency of oscillation and obeying a dynamical equation of the form08-18-2011
20100042412SKIPPING RADIO/TELEVISION PROGRAM SEGMENTS - Techniques for notifying at least one entity of an occurrence of an event in an audio signal are provided. At least one preference is obtained from the at least one entity. An occurrence of an event in the audio signal is determined. The event is related to at least one of at least one speaker and at least one topic. The at least one entity is notified of the occurrence of the event in the audio signal, in accordance with the at least one preference.02-18-2010
20100036668METHOD AND APPARATUS FOR IMPROVED DETECTION OF RATE ERRORS IN VARIABLE RATE RECEIVERS - A system and method for detection of rate determination algorithm errors in variable rate communications system receivers. The disclosed embodiments prevent rate determination algorithm errors from causing audible artifacts such as screeches or beeps. The disclosed system and method detects frames with incorrectly determined data rates and performs frame erasure processing and/or memory state clean up to prevent propagation of distortion across multiple frames. Frames with incorrectly determined data rates are detected by checking illegal rate transitions, reserved bits, validating unused filter type bit combinations and analyzing relationships between fixed code-book gains and linear prediction coefficient gains.02-11-2010
20100036667VOICE ASSISTANT SYSTEM - Methods and apparatuses to assist a user in the performance of a plurality of tasks are provided. The method may comprise storing at least one care plan in a voice assistant, the care plan defining a plurality of tasks to be performed, capturing speech input from the user, determining, from the speech input, a selected interaction with a care plan, and in response to the selected interaction, providing a speech dialog with the user reflective of the care plan. Alternatively, the method may comprise capturing speech input from a user, determining from the speech input, a first weight associated with a resident, associating the first weight with a care plan in turn associated with the resident, comparing the first weight to a second weight associated with the resident and the care plan, and providing a speech dialog regarding reweighting the resident based on the comparison.02-11-2010
20100042413Voice Activated Application Service Architecture and Delivery - A system and method for retrieving distributed content responsive to voice data are disclosed. Voice data is transmitted from a source client device to media server which applies a mixing table to route the voice data to one or more destinations described by the mixing table. The media server also analyzes the received voice data for one or more events. Responsive to detecting an event, the media server communicates with an application server, which modifies the mixing table so that subsequent data is also routed to a media generator which analyzes voice data received after detection of the event for a command. The media generator communicates with the application server to retrieve data from a user data source, such as a website, associated with a detected command. The media generator produces an audio representation of the retrieved data which is communicated to the source client device via the media server.02-18-2010
20100042411Automatic Creation of Audio Files - A method of building an audio description of a particular product of a class of products includes providing a plurality of human voice recordings, wherein each of the human voice recordings includes audio corresponding to an attribute value common to many of the products. The method also includes automatically obtaining attribute values of the particular product, wherein the attribute values reside electronically. The method also includes automatically applying a plurality of rules for selecting a subset of the human voice recordings that correspond to the obtained attribute values and automatically stitching the selected subset of human voice recordings together to provide a voiceover product description of the particular product. A similar method is used to build an audio description of a particular process.02-18-2010
20090157410Speech Translating System - Disclosed is a speech translating system for translating speech from a first language to a language selected from a set of second languages. The system includes an input unit, a processor, and an output unit. The input unit is capable of receiving the speech in the first language. The processor is operatively coupled to the input unit and is capable of converting the speech in the first language to the speech in the selected language. The output unit is operatively coupled to the processor. The output unit is capable of outputting the speech in the selected language.06-18-2009
20120185254INTERACTIVE FIGURINE IN A COMMUNICATIONS SYSTEM INCORPORATING SELECTIVE CONTENT DELIVERY - In a system, an interactive figurine delivers messages to a user in one of a number of forms. A server operation system includes processing capability which may individually couple content or may customize messages to a particular user of the interactive figurines. The interactive figurine contains an embedded circuit consisting of a receiver comprising a detector circuit tuned to at least one preselected frequency, a decoder to provide information indicative of intelligence and signals sent to the receiver, and a decoder circuit to provide actionable output signals indicative of information transmitted to the receiver. The server operation system may include a subscriber database and administration routines for customizing of messages and for directing messages. A user station intermediate the interactive figurine and the server module may be used to provide parental control or other control.07-19-2012
20120166200SYSTEM AND METHOD FOR INTEGRATING GESTURE AND SOUND FOR CONTROLLING DEVICE - Disclosed is a system for integrating gestures and sounds including: a gesture recognition unit that extracts gesture feature information corresponding to user commands from image information and acquires gesture recognition information from the gesture feature information; a background recognition unit acquiring background sound information using the predetermined background sound model from the sound information; a sound recognition unit that extracts the sound feature information corresponding to user commands from the sound information and extracts the sound feature information based on the background sound information and acquires the sound recognition information from the sound feature information; and an integration unit that generates integration information by integrating the gesture recognition information and the sound recognition information.06-28-2012
20130046542Periodic Ambient Waveform Analysis for Enhanced Social Functions - Client devices periodically capture ambient audio waveforms, generate waveform fingerprints, and upload the fingerprints to a server for analysis. The server compares the waveforms to a database of stored waveform fingerprints, and upon finding a match, pushes content or other information to the client device. The fingerprints in the database may be uploaded by other users, and compared to the received client waveform fingerprint based on common location or other social factors. Thus a client's location may be enhanced if the location of users whose fingerprints match the client's is known. In particular embodiments, the server may instruct clients whose fingerprints partially match to capture waveform data at a particular time and duration for further analysis and increased match confidence.02-21-2013
20090043586Detecting a Physiological State Based on Speech - A computer-implemented method identifies a spoken audio signal representing speech of a person and estimates a physiological state of the person based on the spoken audio signal. For example, the method may identify articulatory patterns (such as landmarks) in the speech and estimate the person's physiological state based on those articulatory patterns. The method may estimate, for example, the amount of time the person has been without sleep. The method may produce the physiological state estimate without performing speech recognition on the spoken audio signal. The method may produce the physiological state estimate in real-time.02-12-2009
20090306989VOICE INPUT SUPPORT DEVICE, METHOD THEREOF, PROGRAM THEREOF, RECORDING MEDIUM CONTAINING THE PROGRAM, AND NAVIGATION DEVICE - A setting-item selector calculates probability of a name of a setting item to match a voice based on a conversion database and an audio signal, and retrieves and notifies the setting item in a manner corresponding to the probability. The related-item selector retrieves setting-item information corresponding to the setting item inputted through an input operation by a user based on a setting-item database, and retrieves a name of a related other setting item based on coincidence of set-content information and operation-content information of the setting-item information. A notification controller notifies a combination of related setting items.12-10-2009
20120191459SKIPPING RADIO/TELEVISION PROGRAM SEGMENTS - Techniques for notifying at least one entity of an occurrence of an event in an audio signal are provided. At least one preference is obtained from the at least one entity. An occurrence of an event in the audio signal is determined. The event is related to at least one of at least one speaker and at least one topic. The at least one entity is notified of the occurrence of the event in the audio signal, in accordance with the at least one preference.07-26-2012
20120191458HUMAN-MACHINE DIALOG SYSTEM - The invention relates to a human-machine dialog system comprising: 07-26-2012
20090030693AUTOMATED NEAR-END DISTORTION DETECTION FOR VOICE COMMUNICATION SYSTEMS - In one embodiment, a method for providing voice quality assurance is provided. The method determines voice information for an end point in a voice communication system. The voice information may be from an ingress microphone. The method determines if the voice quality is considered degraded based on an analysis of the voice information. For example, the voice information may indicate that it is distorted, too loud, too soft, is subject to an external noise, etc. Feedback information is determined if the voice quality is considered degraded where the feedback information designed to improve voice quality at an ingress point for a user speaking. The feedback information is then outputted at the end point to the user using the end point.01-29-2009
20090012794System For Giving Intelligibility Feedback To A Speaker - System for giving intelligibility feedback to a speaker (01-08-2009
20080306739SOUND SOURCE SEPARATION SYSTEM - A system capable of separating sound source signals with high precision while improving a convergence rate and convergence precision. A process of updating a current separation matrix W12-11-2008
20110035224SYSTEM AND METHOD FOR ADDRESS RECOGNITION AND CORRECTION - A system, method, and computer-readable medium for parcel address recognition. A method includes receiving an address input and producing candidate address results corresponding to the address input. The method includes receiving operational scheme knowledge describing the mode of operation of a parcel processing system, and receiving at least one operational rule corresponding to the operational scheme knowledge. The method includes applying the at least one operational rule to the candidate address results and producing and storing a finalized result according to the operational rule and the candidate address results.02-10-2011
20110022395Machine for Emotion Detection (MED) in a communications device - A system and method monitors the emotional content of human voice signals after the signals have been compressed by standard telecommunication equipment. By analyzing voice signals after compression and decompression, less information is processed, saving power and reducing the amount of equipment used. During conversation, a user of the disclosed methodology may obtain information in various formats regarding the emotional state of the other party. The user may then view the veracity, composure, and stress level of the other party. The user may also view the emotional content of their own transmitted speech.01-27-2011
20120059658METHODS AND APPARATUS FOR PERFORMING AN INTERNET SEARCH - Embodiments of the present invention relate to searching for content on the Internet. A user may supply a search query to a device, and the device may issue the search query to a plurality of search engines, including at least one general purpose search engine and at least one site-specific search engine. In this way, the user need not separately issue search queries to each of the plurality of search engines.03-08-2012
20120215541SIGNAL PROCESSING METHOD, DEVICE, AND SYSTEM - A signal identifying method includes obtaining signal characteristics of a current frame of input signals; deciding, according to the signal characteristics of the current frame and updated signal characteristics of a background signal frame before the current frame, whether the current frame is a background signal frame; detecting whether the current frame serving as a background signal frame is in a first type signal state; and adjusting a signal classification decision threshold according to whether the current frame serving as a background signal frame is in the first type signal state to enhance the speech signal identification capability.08-23-2012
20110282669Estimating a Listener's Ability To Understand a Speaker, Based on Comparisons of Their Styles of Speech - An automated telecommunication system adjunct is described that “listens” to one or more participants styles of speech, identifies specific characteristics that represent differences in their styles, notably the accent, but also one or more of pronunciation accuracy, speed, pitch, cadence, intonation, co-articulation, syllable emphasis, and syllable duration, and utilizes, for example, a mathematical model in which the independent measurable components of speech that can affect understandability by that specific listener are weighted appropriately and then combined into a single overall score that indicates the estimated ease with which the listener can understand what is being said, and presents real-time feedback to speakers based on the score. In addition, the system can provide recommendations to the speaker as to how improve understandability.11-17-2011
20110282670System for Dynamic AD Selection and Placement Within a Voice Application Accessed Through an Electronic Information Page - A system for dynamic advertisement selection and presentment within a speech application is provided. The system includes a user operable network browsing interface in communication with a server on a data network; at least one voice link to a voice application interface, the link or links accessible to the user working within the browsing interface; a pool of at least one advertisement for presentment; and a selection engine accessible to the voice application interface for receiving criteria originated from the server for advertisement ranking and for selecting an advertisement from the pool of at least one advertisement for placement based on the received criteria.11-17-2011
20090299748MULTIPLE AUDIO FILE PROCESSING METHOD AND SYSTEM - An audio file generation method and system. A computing system receives a first audio file comprising first speech data associated with a first party. The computing system receives a second audio file comprising second speech data associated with a second party. The first audio file differs from the second audio file. The computing system generates a third audio file from the second audio file. The third audio file differs from the second audio file. The process to generate the third audio file includes identifying a first set of attributes missing from the second audio file and adding the first set of attributes to the second audio file. The process to generate the third audio file additionally includes removing a second set of attributes from the second audio file. The third audio file includes third speech data associated with the second party. The computing system broadcasts the third audio file.12-03-2009
20110301956Information Processing Apparatus, Information Processing Method, and Program - An information processing apparatus includes an image analysis unit that executes a process for analyzing an image captured by a camera, a speech analysis unit that executes a process for analyzing speech input from a microphone, and a data processing unit that receives a result of the analysis conducted by the image analysis unit and a result of the analysis conducted by the speech analysis unit and that executes output control of help information for a user. The data processing unit calculates a degree of difficulty of the user on the basis of at least either the result of the image analysis or the result of the speech analysis and, if the degree of difficulty that has been calculated is equal to or more than a predetermined threshold value, executes a process for outputting help information to the user.12-08-2011
20110295607System and Method for Recognizing Emotional State from a Speech Signal - A computerized method, software, and system for recognizing emotions from a speech signal, wherein statistical and MFCC features are extracted from the speech signal, the MFCC features are sorted to provide a basis for comparison between the speech signal and reference samples, the statistical and MFCC features are compared between the speech signal and reference samples, a scoring system is used to compare relative correlation to different emotions, a probable emotional state is assigned to the speech signal based on the scoring system, and the probable emotional state is communicated to a user.12-01-2011
20090299749PRE-PROCESSED ANNOTATION OF STREET GRAMMAR IN SPEECH ENABLED NAVIGATION SYSTEMS - Embodiments of the present invention address deficiencies of the art in respect to virtualization and provide a novel and non-obvious method, system and computer program product for annotation of street grammar in speech enabled navigation devices. In an embodiment of the invention, a pre-processing street grammar annotation system can be provided. The system can include an annotated street grammar storage that contains street root names wherein each street root name has more than one street suffix associated with said street root name, and a street annotation pre-processor wherein the street annotation pre-processor contains logic enabled to annotate a set of street suffixes to a street root name prior to processing a voice input in a speech enabled navigation device, wherein the street root name has more than one street suffix associated with said street root name.12-03-2009
20110144998EMBEDDER FOR EMBEDDING A WATERMARK INTO AN INFORMATION REPRESENTATION, DETECTOR FOR DETECTING A WATERMARK IN AN INFORMATION REPRESENTATION, METHOD AND COMPUTER PROGRAM - An embedder for embedding a watermark to be embedded into an input information representation comprises an embedding parameter determiner that is implemented to apply a derivation function once or several times to an initial value to obtain an embedding parameter for embedding the watermark into the input information representation. Further, the embedder comprises a watermark adder that is implemented to provide the input information representation with the watermark using the embedding parameter. The embedder is implemented to select how many times the derivation function is to be applied to the initial value.06-16-2011
20090150158Portable Networked Picting Device - A portable picting device automatically converts an audio signal from a microphone into a digital data stream, parses a series of words from the digital data stream, and detects any words that match tags in a tag/image database. An image corresponding to the matching tag(s) is then retrieved and transmitted to a display. The images may be stored on a remote network, such as the Internet. In the illustrative embodiment the display is integrated into an article of clothing such as a shirt.06-11-2009
20120109658VOICE-CONTROLLED POWER DEVICE - A voice-controlled power device includes a voice-controlled circuit which includes a power module for getting an AC input voltage, a switch module having an electromagnetic relay and a driving circuit, and a control module having a microcontroller, a power unit regulating an output voltage of the power module and providing a work voltage for the microcontroller, a voice-detecting unit receiving voice signals and transforming the voice signals into electric signals, and a voice-amplifier unit amplifying the electric signals. The microcontroller receives and analyzes the amplified electric signals and sends out control signals to drive the driving circuit to control switch states of the electromagnetic relay and further control whether there is power output to an external electric appliance or not. The switch states of the electromagnetic relay rest with whether the output voltage of the power module is provided thereon or not under the control of the driving circuit.05-03-2012
20100100386Noise Variance Estimator for Speech Enhancement - A speech enhancement method operative for devices having limited available memory is described. The method is appropriate for very noisy environments and is capable of estimating the relative strengths of speech and noise components during both the presence as well as the absence of speech.04-22-2010
20080281598METHOD AND SYSTEM FOR PROMPT CONSTRUCTION FOR SELECTION FROM A LIST OF ACOUSTICALLY CONFUSABLE ITEMS IN SPOKEN DIALOG SYSTEMS - A method (and system) of determining confusable list items and resolving this confusion in a spoken dialog system includes receiving user input, processing the user input and determining if a list of items needs to be played back to the user, retrieving the list to be played back to the user, identifying acoustic confusions between items on the list, changing the items on the list as necessary to remove the acoustic confusions, and playing unambiguous list items back to the user.11-13-2008
20090089063VOICE CONVERSION METHOD AND SYSTEM - A method, system and computer program product for voice conversion. The method includes performing speech analysis on the speech of a source speaker to achieve speech information; performing spectral conversion based on said speech information, to at least achieve a first spectrum similar to the speech of a target speaker; performing unit selection on the speech of said target speaker at least using said first spectrum as a target; replacing at least part of said first spectrum with the spectrum of the selected target speaker's speech unit; and performing speech reconstruction at least based on the replaced spectrum.04-02-2009
20090089062PUBLIC SPEAKING SELF-EVALUATION TOOL - A public speaking self-evaluation tool that helps a user practice public speaking in terms of avoiding undesirable words or sounds, maintaining a desirable speech rhythm, and ensuring that the user is regularly glancing at the audience. The system provides a user interface through which the user is able to define the undesirable words or sounds that are to be avoided, as well as a maximum frequency of occurrence threshold to be used for providing warning signals based on detection of such filler or undesirable words or sounds. The user interface allows a user to define a speech rhythm, e.g. in terms of spoken syllables per minute, that is another maximum threshold for providing a visual warning indication. The disclosed system also provides a visual indication when the user fails to glance at the audience at least as often as defined by a predefined minimum threshold.04-02-2009
20110173003SYSTEM AND METHOD FOR DETERMINING A PERSONAL SHG PROFILE BY VOICE ANALYSIS - According to one embodiment of the present invention a computerized voice-analysis device for determining an S,H,G profile is provided (as described herein, such an S,H,G profile relates to the strengths (e.g., relative strengths) of three human instinctive drives). Of note, the present invention may be used for one or more of the following: analyzing a previously recorded voice sample; real-time analysis of voice as it is being spoken; combination voice analysis—that is, a combination of: (a) previously recorded and/or real-time voice; and (b) answers to a questionnaire.07-14-2011
20090276221Method and System for Processing Channel B Data for AMR and/or WAMR - A method and system for processing channel B data for AMR and/or WAMR may include generating one or more channel B data hypotheses for a present speech frame, if channel A data has a valid CRC and channel B data is unacceptable. Channel B data may be unacceptable, for example, due to high residual bit error rate and/or low Viterbi metric. Speech hypotheses may also be generated for the present speech frame, where each speech hypothesis may be based on a corresponding channel B data hypothesis and channel A data. A speech constraint metric may be assigned to each speech hypothesis that is compared to a previous frame speech data. The speech hypothesis that is closest to the previous frame speech data may be selected as a present speech data. The speech constraint metric may, for example, measure gain continuity and/or pitch continuity.11-05-2009
20080249777Method And System For Control Of An Application - The invention describes a dialog management system and method for control of an application (A10-09-2008
20080249778Communications Using Different Modalities - Communications between users of different modalities are enabled by a single integrated platform that allows both the input of voice (from a telephone, for example) to be realized as text (such as an interactive text message) and allows the input of text (from the interactive text messaging application, for example) to be realized as voice (on the telephone). Real-time communication may be enabled between any permutation of any number of text devices (desktop, PDA, mobile telephone) and voice devices (mobile telephone, regular telephone, etc.). A call to a text device user may be initiated by a voice device user or vice versa.10-09-2008
20100131278Stereo to Mono Conversion for Voice Conferencing - Stereo to mono voice conferencing conversion is performed during a voice conference. Conferencing equipment receives audio for right and left channels and filters each of the channels into a plurality of bands. For each band of each channel, the equipment determines an energy level and compares each energy level for each band of the right channel to each energy level for each corresponding band of the left channel. Based on the comparison, the equipment determines which channel has more audio resulting from speech. Based on the determination, the equipment adjusts delivery of the audio from the right and left channels to a mono channel for transmission to endpoints only capable of mono audio in the voice conference.05-27-2010
20080228488Computer-Implemented Voice Application Indexing WEB Site - A computer-implemented voice application indexing method and system for supplying voice applications that provide telephony services to users. The method and system include receiving voice application data over the network regarding the voice applications. The voice application data contains location data to indicate where the voice applications are located on the network. The voice application data are stored in a database in accordance with a predetermined voice application taxonomy. A request is received for a voice application based upon a user requesting a telephony service. The request includes search criteria for selecting a voice application from the database. The location data of at least one voice application (whose stored voice application information substantially satisfies the search criteria) is retrieved from the database. The voice application located at the retrieved location data is used to perform the user-requested telephony service.09-18-2008
20080312932ERROR MANAGEMENT IN AN AUDIO PROCESSING SYSTEM - An audio processing system includes a voice decoder and an audio processor. In one exemplary embodiment, the audio processing system is embedded in a headset unit that is wirelessly coupled to a game console. The voice decoder is used to decode a stream of incoming voice data packets carried over a wireless signal. The decoded voice data packets are used to drive an audio transducer of the headset unit. Upon detection of an error in the incoming stream, a decoded error-free voice data packet that has been stored in a replay buffer is used to generate an amplitude scaled audio signal. The voice decoder is disconnected from the audio transducer and the scaled audio signal is used to drive the audio transducer instead.12-18-2008
20080270141VIRTUAL VOCAL DYNAMICS IN WRITTEN EXCHANGE - The illustrative embodiments described herein provide a computer implemented method and computer program product for providing context in an electronic text communication. A biometric gathering input device is associated with a sending data processing system. A first set of metrics is identified based on a sender interacting with the biometric gathering input device. A sending communications process on the sending data processing system is calibrated based on the first set of metrics. During the generation of the electronic text communication, a portion of the first set of metrics is identified based on the sender interacting with the biometric gathering input device to form a second set of metrics. The second set of metrics and the electronic text communication are sent from the sending data processing system to a recipient data processing system. The second set of metrics is represented at the recipient data processing system using criteria selected by a recipient of the electronic text communication.10-30-2008
20130024199Method and Apparatus for Sharing Information Using a Handheld Device - A method and apparatus for sending information to a data processing apparatus for identifying a document to share with a recipient. A handheld device is capable of communicating with the data processing apparatus. Information is captured from the document and stored in the handheld device as document data. A communications path is established between the handheld device and the data processing apparatus. The document data is sent to the data processing apparatus through the communications path. Reference documents are provided. Each reference document has reference data stored in a memory. At least a portion of the received document data is extracted as scanning data. The reference data is retrieved from the memory. The scanning data is compared with the reference data. When the scanning data matches at least a portion of the reference data of one of the reference documents, the one reference document is selected as the identified document for forwarding to the recipient.01-24-2013
20090055190EMOTIVE ENGINE AND METHOD FOR GENERATING A SIMULATED EMOTION FOR AN INFORMATION SYSTEM - Information about a device may be emotively conveyed to a user of the device. Input indicative of an operating state of the device may be received. The input may be transformed into data representing a simulated emotional state. Data representing an avatar that expresses the simulated emotional state may be generated and displayed. A query from the user regarding the simulated emotional state expressed by the avatar may be received. The query may be responded to.02-26-2009
20130218570APPARATUS AND METHOD FOR CORRECTING SPEECH, AND NON-TRANSITORY COMPUTER READABLE MEDIUM THEREOF - According to one embodiment, in an apparatus for correcting a speech corresponding to a moving image, a separation unit separates at least one audio component from each audio frame of the speech. An estimation unit estimates a scene including a plurality of image frames related in the moving image, based on at least one of a feature of each image frame of the moving image and a feature of the each audio frame. An analysis unit acquires attribute information of the plurality of image frames by analyzing the each image frame. A correction unit determines a correction method of the audio component corresponding to the plurality of image frames, based on the attribute information, and corrects the audio component by the correction method.08-22-2013
20090326952SPEECH PROCESSING METHOD, SPEECH PROCESSING PROGRAM, AND SPEECH PROCESSING DEVICE - [Problems] To convert a signal of non-audible murmur obtained through an in-vivo conduction microphone into a signal of a speech that is recognizable for (hardly misrecognized by) a receiving person with maximum accuracy.12-31-2009
20110224988INTRACARDIAC ELECTROGRAM TIME FREQUENCY NOISE DETECTION - Systems, methods, and apparatus for identifying and classifying noise of an intracardiac electrogram of a cardiac rhythm management device to prevent inaccurate detection of a cardiac episode are disclosed. In an example, three channels are analyzed to identify and determine whether an episode or noise has been detected.09-15-2011
20090048845APPARATUS, SYSTEM, AND METHOD FOR VOICE CHAT TRANSCRIPTION - An apparatus, system, and method to transcribe a voice chat session initiated from a text chat session. The system includes a chat server, a voice server, and a transcription engine. The chat server is configured to facilitate a text chat session between multiple instant messaging clients. The voice server is coupled to the chat server and configured to facilitate a transition from the text chat session to a voice chat session between the multiple instant messaging clients. The transcription engine is coupled to the voice server and configured to generate a voice transcription of the voice chat session. The voice transcription may be aggregated into a text chat history.02-19-2009
20090055189Automatic Replacement of Objectionable Audio Content From Audio Signals - A method, apparatus and system are provided for the automatic replacement of potentially objectionable audio content from an audio signal in real time. In one embodiment of the present invention, the selective filtering of objectionable audio content from an audio signal is accomplished by first marking objectionable audio content in the audio signal with filtering information that identifies the type of objectionable audio content (e.g., crude language, ethnic and racial slurs, cursing, strong profanity) and storing the filtering information and the corresponding location of the objectionable audio content for that particular audio signal. Objectionable audio content having filtering information corresponding to a stored replacement content code determined from a predetermined replacement setting is then automatically replaced with an audio clip corresponding to the replacement setting.02-26-2009
20090083038MOBILE RADIO TERMINAL, SPEECH CONVERSION METHOD AND PROGRAM FOR THE SAME - The mobile radio terminal includes a speech input unit which inputs a speech signal obtained from speech of a speaking person, an estimating unit which estimates a speech style of the speaking person from the speech signal, and a converting unit which converts the speech signal into a converted speech signal in accordance with the estimated speech style.03-26-2009
20120078634VOICE DIALOGUE SYSTEM, METHOD, AND PROGRAM - A voice dialogue system executing an operation by a voice dialogue with a user, includes a history storage unit storing an operation name of the operation executed by the voice dialogue system and an operation history corresponding to a number of execution times of the executed operation; a voice storage unit storing voice data corresponding to the operation name; a detection unit detecting a voice skip signal indicating skipping an user's voice input; an acquisition unit acquiring the operation name of the operation having a high priority based on the number of execution times from said history storage unit, when said detection unit detects the voice skip signal; and a generation unit reading the voice data corresponding to the acquired operation name from said voice storage unit, and generating a voice signal corresponding to the read voice data.03-29-2012
20090099847Template constrained posterior probability - Detailed herein is a technology which, among other things, reduces errors introduced in recording and transcription data. In one approach to this technology, a method of detecting audio transcription errors is utilized. This method includes selected a focus unit, and selecting a context template corresponding to the focus unit. A hypothesis set is then determined, with reference to the context template and the focus unit. A probability is calculated corresponding to the focus unit, across the hypothesis set.04-16-2009
20090204406SYSTEM AND METHODS FOR DETECTING DECEPTION AS TO FLUENCY OR OTHER ABILITY IN A GIVEN LANGUAGE - The invention relates to a system and methods for detecting when a user is representing he is not fluent in a language in which he is fluent. The present system and methods are adapted to be used in conjunction with conventional and novel computer systems and methods and provides detection of concealment of language skills by a user.08-13-2009
20080319756Electronic Device and Method for Determining a Mixing Parameter - The method of determining a parameter for mixing a first content item (X12-25-2008
20080281599PROCESSING AUDIO DATA - A method of processing audio data including obtaining (11-13-2008
20100211394METHOD FOR DETERMINING A STRESS STATE OF A PERSON ACCORDING TO A VOICE AND A DEVICE FOR CARRYING OUT SAID METHOD - The invention relates to the field of methods and devices for analyzing of psychophysiological reactions of a person to verbal tests. The invented device (08-19-2010
20100145708SYSTEM AND METHOD FOR IDENTIFYING ORIGINAL MUSIC - We disclose useful components of a method and system that allow identification of music from the song or sound using only the sound of the audio being played. A system built using the method and device components disclosed processes inputs sent from a mobile phone over a telephone or data connection, though inputs might be sent through any variety of computers, communications equipment, or consumer audio devices over any of their associated audio or data networks.06-10-2010
20100217602Combined Mirror and Presentation Medium Capable of Speech Recognition - The present invention relates, in general, to a combined mirror and presentation medium, which allows at least one of various presentation bodies to be inserted thereinto, acts as a mirror such that the inside thereof cannot be seen at normal times, and enables the display of an inserted presentation body to the outside at the time of illumination of the inside of the presentation medium. The object thereof is to provide a combined mirror and presentation medium capable of speech recognition, which provides the functions of a mirror and a picture frame using a reflection plate, thus maximizing the functionality thereof, which enables various types of control by automatically controlling a presentation medium, capable of displaying a presentation body, using a user's speech signals, thus facilitating switching between the function of a mirror and the function of a decoration or information transfer medium, and which allows database information to be represented in response to speech signals from the user, thus efficiently performing combined functions.08-26-2010
20100211395Method and System for Speech Intelligibility Measurement of an Audio Transmission System - Method and processing system for measuring the intelligibility of a degraded output signal (Y(t)) from an audio transmission system (08-19-2010
20100100387Method and Apparatus for Dynamic Voice Response Messages - A computing device implemented method, apparatus, and computer program product to generate dynamic voice response messages in a mobile computing device. In response to receiving an incoming call from a caller, the process displays a list of response messages in a set of response messages. In response to receiving a selection of a response message from the list of response messages, the process sends the selected response message to the caller.04-22-2010
20110238422METHOD FOR SONIC DOCUMENT CLASSIFICATION - A method to identify and classify a document (09-29-2011
20090276222Method and system for incorporating one or more inputs and defining and applying a plurality of rules for the different stages of speech and video analytics systems - A method and system for improving the context and accuracy of speech and video analytics searches by incorporating one or more inputs and defining and applying a plurality of rules for the different stages of said speech and video analytics system searches.11-05-2009
20090112598SYSTEM AND METHOD FOR APPLYING PROBABILITY DISTRIBUTION MODELS TO DIALOG SYSTEMS IN THE TROUBLESHOOTING DOMAIN - Disclosed herein are systems, methods, and computer-readable media for troubleshooting based on a probability distribution model. The method for troubleshooting based on a probability distribution model includes establishing a speech-based channel of interaction, establishing at least one non-speech-based channel of interaction, maintaining a probability distribution over time for each of a plurality of component variables describing the state of the product or service and state of the conversation, and troubleshooting a product or service by responding based on the probability distribution.04-30-2009
20090112599MULTI-STATE BARGE-IN MODELS FOR SPOKEN DIALOG SYSTEMS - Disclosed are systems, methods and computer readable media for applying a multi-state barge-in acoustic model in a spoken dialogue system comprising the steps of (1) presenting a prompt to a user from the spoken dialog system. (2) receiving an audio speech input from the user during the presentation of the prompt, (3) accumulating the audio speech input from the user, (4) applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, (5) applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, (6) determining whether the audio speech input is a barge-in-speech input from the user, and (7) if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.04-30-2009
20110040563Voice Control Device and Voice Control Method and Display Device - A voice control device for a display device includes a voice receiver for receiving a voice signal, a voice recognition unit coupled to the voice receiver for recognizing the voice signal to generate a recognition result, a function decision unit coupled to the voice recognition unit for selecting an operating function from a plurality of operating functions according to the recognition result, and an execution unit coupled to the function decision unit for controlling the display device to perform the operating function.02-17-2011
20130132089CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS - Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture. An indication of the availability of the remote speech recognition to perform speech recognition at a point in time may be provided to a user of the client device via a user interface of the client device.05-23-2013
20130138442Systems and Methods for Recognizing Sound and Music Signals in High Noise and Distortion - A method for recognizing an audio sample locates an audio file that closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints. Landmarks occur at reproducible locations within the file, while fingerprints represent features of the signal at or near the landmark timepoints. To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database. For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed. If a large number of corresponding landmarks are linearly related, i.e., if equivalent fingerprints of the sample and retrieved file have the same time evolution, then the file is identified with the sample.05-30-2013
20130144627VOICE CONTROL CIRCUIT FOR STARTING ELECTRONIC DEVICES - A control circuit employed in an electronic device includes a microphone, a level conversion circuit, and a voice processing circuit. The voice processing circuit includes a voice operated switch connected between the microphone and the level conversion circuit. The microphone picks up voice commands, the voice operated switch receives the voice commands from the microphone, and outputs a high voltage signal when a volume of the voice commands is greater than or equal to a predetermined volume threshold or is within a predetermined volume range, the level conversion circuit converts the high voltage signal into a low voltage signal for turning on the electronic device.06-06-2013
20100235170BIOFEEDBACK SYSTEM FOR CORRECTION OF NASALITY - A system is described for providing biofeedback to hearing-impaired persons as to the degree of nasalization of vowel-like sounds in their speech, in order to monitor their own nasality and thus correct inappropriate nasalization. In a preferred embodiment, this feedback uses tactile vibration, with the vibration amplitude reflecting the nasalance of the speech.09-16-2010
20110029314Food Processor with Recognition Ability of Emotion-Related Information and Emotional Signals - A food processor with recognition ability of emotion-related information and emotional signals is disclosed, which comprises: an emotion recognition module and a food processing module. The emotion recognition module is capable of receiving sound signals so as to identify an emotion containing in the received sound signals. The food processing module is capable of producing food products with a taste corresponding to the emotion recognition result of the emotion recognition module.02-03-2011
20090063157APPARATUS AND METHOD OF GENERATING INFORMATION ON RELATIONSHIP BETWEEN CHARACTERS IN CONTENT - A method of generating information on relationships between characters of a content includes dividing a text extracted from the content into one or more predetermined units, determining one or more dominant relationships between characters of the content by comparing the divided units with relationship keyword information in which keywords contained in categories are defined, wherein the categories represent one or more relationships between the characters, and generating information on the relationships between the characters in accordance with the determined dominant relationships.03-05-2009
20110246203Dynamic Interactive Voice Interface - A dynamic voice user interface system is provided. The dynamic voice user interface system interacts with a user at a first level of formality. The voice user interface system then monitors history of user interaction and adjusts the voice user interface to interact with the user with a second level of formality based on the history of user interaction.10-06-2011
20110125502Method of putting identification codes in a document - A method of putting identification codes in a document is disclosed. The method adds a speech-purpose print code in a document such that an OID pen can emit sound after the OID pen reads the speech-purpose print code. The software program first acquires the position of each word in the document and then automatically puts a speech-purpose print code corresponding to each word in the position of each word so that a user can rapidly generate a document with speech-purpose codes.05-26-2011
20110246202METHODS AND APPARATUS FOR AUDIO WATERMARKING A SUBSTANTIALLY SILENT MEDIA CONTENT PRESENTATION - Methods and apparatus for audio watermarking a substantially silent media content presentation are disclosed. An example method to audio watermark a media content presentation disclosed herein comprises obtaining a watermarked noise signal comprising a watermark and a noise signal having energy substantially concentrated in an audible frequency band, the watermarked noise signal attenuated to be substantially inaudible without combining with a separate audio signal, associating the watermarked noise signal with a substantially silent content component of the media content presentation, the media content presentation comprising one or more media content components, and outputting the watermarked noise signal during presentation of the substantially silent content component.10-06-2011
20110153331Method for Generating Voice Signal in E-Books and an E-Book Reader - The present invention provides a method for generating voice signal in electronic books (E-books). The method includes the steps of: receiving a voice signal in response to a triggering signal for placing a bookmark; and displaying a functional icon of the bookmark corresponding to the voice signal in a region of the E-book. The present invention also provides a E-book reader, including: a display unit, a receiver unit, and a processing unit, wherein the receiver unit receivers a voice signal in response to a triggering signal for placing a bookmark, and the processing unit is used to display a functional icon of the bookmark corresponding to the voice signal in a region of the E-book.06-23-2011
20110178803DETECTING EMOTION IN VOICE SIGNALS IN A CALL CENTER - A computer system monitors a conversation between an agent and a customer. The system extracts a voice signal from the conversation and analyzes the voice signal to detect a voice characteristic of the customer. The system identifies an emotion corresponding to the voice characteristic and initiates an action based on the emotion. The action may include communicating the emotion to an emergency response team, or communicating feedback to a manager of the agent, as examples.07-21-2011
20110099015USER ATTRIBUTE DERIVATION AND UPDATE FOR NETWORK/PEER ASSISTED SPEECH CODING - Systems, methods and apparatuses are described for deriving and updating user attribute information about users of a communications system. A communications network is then used to transfer the user attribute information to communication terminals, which use the user attribute information to configure a speech codec to operate in a speaker-dependent manner during a communication session, thereby improving speech coding efficiency. In a network-assisted model, the user attribute information is stored on the communications network and selectively transmitted to the communication terminals while in a peer-assisted model, the user attribute information is derived by and transferred between communication terminals.04-28-2011
20110099016Multi-Tenant Self-Service VXML Portal - A multi-tenant voice extensible markup language (VXML) voice system includes a voice portal connected to at least one telephony network; a voice application server integrated with the voice portal; and a multi-tenant configuration application integrated with the voice application server, the configuration application accessible to the tenants from a data packet network.04-28-2011
20110077946DERIVING GEOGRAPHIC DISTRIBUTION OF PHYSIOLOGICAL OR PSYCHOLOGICAL CONDITIONS OF HUMAN SPEAKERS WHILE PRESERVING PERSONAL PRIVACY - A method including: obtaining, via a plurality of communication devices, a plurality of speech signals respectively associated with human speakers, the speech signals including verbal components and non-verbal components; identifying a plurality of geographical locations, each geographic location associated with a respective one of the plurality of the communication devices; extracting the non-verbal components from the obtained speech signals; deducing physiological or psychological conditions of the human speakers by analyzing, over a specified period, the extracted non-verbal components, using predefined relations between characteristics of the non-verbal components and physiological or psychological conditions of the human speakers; and providing a geographical distribution of the deduced physiological or psychological conditions of the human speakers by associating the deduced physiological or psychological conditions of the human speakers with geographical locations thereof.03-31-2011
20110071838SYSTEM AND METHODS FOR RECOGNIZING SOUND AND MUSIC SIGNALS IN HIGH NOISE AND DISTORTION - A method for recognizing an audio sample locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints. Landmarks occur at reproducible locations within the file, while fingerprints represent features of the signal at or near the landmark timepoints. To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database. For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed. If a large number of corresponding landmarks are linearly related, i.e., if equivalent fingerprints of the sample and retrieved file have the same time evolution, then the file is identified with the sample. The method can be used for any type of sound or music, and is particularly effective for audio signals subject to linear and nonlinear distortion such as background noise, compression artifacts, or transmission dropouts. The sample can be identified in a time proportional to the logarithm of the number of entries in the database; given sufficient computational power, recognition can be performed in nearly real time as the sound is being sampled.03-24-2011
20110071837Audio Signal Correction Apparatus and Audio Signal Correction Method - According to one embodiment, an audio signal correction apparatus has a characteristic extraction module configured to determine whether an input audio signal is a monaural signal or a stereo signal, on the basis of channel information, and to extract a plurality of characteristic parameters for determining whether the input audio signal is a speech signal or a music signal, a signal type determination module configured to calculate a speech/music discrimination score which indicates whether the input audio signal is close to the speech signal or the music signal, on the basis of the plurality of characteristic parameters and a level calculation module configured to calculate, with use of the speech/music discrimination score, output levels of a degree of speech and a degree of music.03-24-2011
20110060591ISSUING ALERTS TO CONTENTS OF INTEREST OF A CONFERENCE - A method, system, and computer program product for issuing an alert in response to detecting a content of interest in a conference. A listening logic comprising multiple conference engines monitors speakers, topics, and words spoken during a conference. A speech-to-text engine monitors the conference and records a transcription. A word emphasis engine monitors the transcription for key words. A voice identification engine monitors the live conversation and the recorded transcript, in real time, for a particular individual to begin speaking. An outline engine may create an outline of transcription. The listening device may issue an alert upon detecting a content of interest in the conference. The listening device may additionally display an outline or a selected portion of the transcript regarding a particular content of interest to inform a user of the listening device of a portion of content of the conference that may have been missed.03-10-2011
20110213617AUDIO SOURCE SYSTEM AND METHOD - A system includes a computer having a device driver. The device driver includes a detection module to detect an audio input. The device driver includes a selection module to send the audio input to audio hardware after detection of the audio input. The device driver also includes an emulation module to send hardware emulation information to an operating system audio application to replace feedback data received at the device driver from the audio hardware and sent from the device driver to the operating system audio application.09-01-2011
20120203556DEVICES FOR ENCODING AND DETECTING A WATERMARKED SIGNAL - A method for decoding a signal on an electronic device is described. The method includes receiving a signal. The method also includes extracting a bitstream from the signal. The method further includes performing watermark error checking on the bitstream for multiple frames. The method additionally includes determining whether watermark data is detected based on the watermark error checking. The method also includes decoding the bitstream to obtain a decoded second signal if the watermark data is not detected.08-09-2012
20110251845VOICE ACTIVITY DETECTOR, VOICE ACTIVITY DETECTION PROGRAM, AND PARAMETER ADJUSTING METHOD - Judgment result deriving means 10-13-2011
20120203555DEVICES FOR ENCODING AND DECODING A WATERMARKED SIGNAL - An electronic device configured for encoding a watermarked signal is described. The electronic device includes modeler circuitry. The modeler circuitry determines parameters based on a first signal and a first-pass coded signal. The electronic device also includes coder circuitry coupled to the modeler circuitry. The coder circuitry performs a first-pass coding on a second signal to obtain the first-pass coded signal and performs a second-pass coding based on the parameters to obtain a watermarked signal.08-09-2012
20080243514NATURAL ERROR HANDLING IN SPEECH RECOGNITION - A user interface, and associated techniques, that permit a fast and efficient way of correcting speech recognition errors, or of diminishing their impact. The user may correct mistakes in a natural way, essentially by repeating the information that was incorrectly recognized previously. Such a mechanism closely approximates what human-to-human dialogue would be in similar circumstances. Such a system fully takes advantage of all the information provided by the user, and on its own estimates the quality of the recognition in order to determine the correct sequence of words in the fewest number of steps.10-02-2008
20080243513Apparatus And Method For Controlling Output Format Of Information - An apparatus for controlling the output format of information is provided. The apparatus includes a communications unit configured to receive information intended for at least one recipient. The apparatus also includes a selection unit, which is configured to automatically detect, based on the at least one recipient, an externally-specified indication of a preferred form of output selected from amongst a plurality of available forms of output. The selection unit causes the information to be outputted in the preferred form of output. A method and a computer program product are also provided for controlling the output format of information.10-02-2008
20080243512Method of and System For Classification of an Audio Signal - The invention describes a method of classifying an audio input signal (10-02-2008
20110054905VOICE INTERACTIVE SERVICE SYSTEM AND METHOD FOR PROVIDING DIFFERENT SPEECH-BASED SERVICES - A voice interactive service system provides different speech-based services to a plurality of users. Using a communication terminal, the services are accessed via a telecommunication network through service-specific connectivity ports. The system comprises processing cores which have different configurations of speech processing resources for performing different services. For performing a requested service, a connection module establishes a connection between the respective connectivity port and a processing core having a configuration of speech processing resources suitable for performing the requested service. Because of the service-specific resourcing of cores, there is no need for requesting and allocating processing resources from external resource servers. Moreover, the port-dedicated resourcing of the cores ensures that a successful access to a connectivity port leads to a successful provision of the requested service.03-03-2011
20110054904ELECTRONIC SHOPPING ASSISTANT WITH SUBVOCAL CAPABILITY - A mobile device suitable for use by a user in a store includes a subvocal message (SVM) module to detect an SVM from the user. The SVM includes data that indicates an item in the store. A transmitter transmits a request after detecting the SVM. The request includes information indicating the item. A receiver receives a reply. The reply includes information responsive to the request. An output device provides the responsive information to the user. The request may include a request for item position information, item price information, or item inventory information. The mobile device may detect the SVM via a subvocal sensor coupled to the user. The subvocal sensor may be in contact with the user in proximity to a vocal cord of the user. The subvocal sensor may be connected to the mobile device wirelessly or via a wire.03-03-2011
20110022394Visual similarity - Methods and apparatus, including computer program products, for visual similarity. A method includes receiving a stream of video content, generating interpretations of the received video content using speech/natural language processing (NLP), associating the interpretations of the received video content with images extracted from video content based on timeline, and using the interpretations to obtain interpretations of other images or other video content.01-27-2011
20110022393MULTIMODE USER INTERFACE OF A DRIVER ASSISTANCE SYSTEM FOR INPUTTING AND PRESENTATION OF INFORMATION - In a method for multimode information input and/or adaptation of the display of a display and control device, input signals of different modality are detected which are supplied via the device to a voice recognition unit, thus initiating a desired function and/or display as an output signal, which are displayed on the device and/or output by voice output. Touch and/or gesture input signals are provided on or to the device for selection of an object intended for interaction and activation of the voice recognition unit and for the vocabulary which is provided for interaction to be restricted with the selection of the object and/or activation of the voice recognition unit as a function of the selected object, on the basis of which a voice command from the restricted vocabulary is added to the selected object as an information input and/or for adaptation of the display, via the voice recognition unit.01-27-2011
20110137656SOUND CLASSIFICATION SYSTEM FOR HEARING AIDS - A hearing aid includes a sound classification module to classify environmental sound sensed by a microphone. The sound classification module executes an advanced sound classification algorithm. The hearing aid then processes the sound according to the classification.06-09-2011
20100324909METHOD AND SYSTEM FOR PROCESSING MESSAGES WITHIN THE FRAMEWORK OF AN INTEGRATED MESSAGE SYSTEM - A method and system for processing messages within the framework of an integrated message system. Recipients of messages in an integrated messaging system are provided with an authentic impression of the received message. In a first step, a message received within the framework of an integrated messaging system is automatically translated. Language detection and dictation system is provided. The message contents of the incoming message as well as its segments and parameters are simultaneously utilized to generate additional information regarding the sender and the information, which is suitable to give the recipient an impression of the received message in the most authentic form possible.12-23-2010
20100324908Learning Playbot - An enhanced chatbot, programed to learn from human-computer conversational exchanges. The process of learning automatically creates an expanded and updated statement/response database from input provided by users engaged in interactions with the chatbot.12-23-2010
20100332233Battery Management System And Method - A battery-management method is performed by a battery-operated device. The method includes allocating a first portion of a battery capacity to a first function and a second portion of the battery capacity to a second function. The method further includes simultaneously displaying a first indicator relating to the first portion of the battery capacity and a second indicator relating to the second portion of the battery capacity.12-30-2010
20090299750Voice/Music Determining Apparatus, Voice/Music Determination Method, and Voice/Music Determination Program - According to one embodiment, various characteristic parameters for determining whether an input audio signal is a voice signal or a music signal are calculated and the calculated characteristic parameters are compared with a threshold value for voice determination and a threshold value for music determination. A voice characteristic score is provided to a characteristic parameter indicating voice and a music characteristic score is provided to a characteristic parameter indicating music. Then, based on a difference between a sum total of voice characteristic scores and a sum total of music characteristic scores, it is determined whether the input audio signal is a voice signal or a music signal.12-03-2009
20100030562SOUND DETERMINATION DEVICE, SOUND DETECTION DEVICE, AND SOUND DETERMINATION METHOD - A sound determination device (02-04-2010
20100017211Method for the construction of a cross-linked system - Disclosed is a method for constructing a cross-linked system whose topology of components creates a network, especially a method for creating predetermined functional units, such as cell types and tissues as well as biological and/or physical components that are based thereupon, by developing the cross-linking of the system in a self-organizing manner. The inventive method is characterized by the following steps: a) the network is represented by graph; b) edges of the said graph are provided with markings which are formed such that the graph can be unambiguously assigned to a minimal automaton; c) the automaton is described by a formal grammar representing a system of equations whose solution are defined in text form. The approach to obtain the solutions of the system of equations describes a way to construct the system, while transducers insert the components into the network in order to entirely construct the system.01-21-2010
20110022392INFORMATION PROCESSING SYSTEM AND INFORMATION PROCESSING METHOD - A framework is provided which performs location-based analysis using an individual feature such as a stress level obtained based on biological information. An information processing system includes an acquisition unit which acquires frequency power information of a voice inputted at a mobile terminal having a voice communication function, and position information of a base station device that relayed voice communication of the mobile terminal when the voice was inputted; a storage unit which stores the acquired frequency power information and the acquired position information in association with each other; an acceptance unit which accepts designation of an area; and an output unit which identifies the position information related to the designated area, acquires the frequency power information associated with the identified position information with reference to the storage unit, obtains a stress level of a user of the mobile terminal in the designated area based on frequency power information of a frequency greater than or equal to a threshold value within the acquired frequency power information, and outputs the stress level in association with the designated area.01-27-2011
20110161086Orchestrated Encoding and Decoding - Orchestrated encoding schemes facilitate encoding and decoding of data in content signals at several points in the distribution path of content items. Orchestrated encoding adheres to a set of encoding rules that enables multiple watermarks and corresponding applications to co-exist, avoids collisions among watermarks, and simplifies metadata and routing database infrastructure.06-30-2011
20120004916SPEECH SIGNAL PROCESSING DEVICE - A speech signal processing device is equipped with a power acquisition unit, a probability distribution acquisition unit, and a correspondence degree determination unit. The power acquisition unit accepts an inputted speech signal and, based on the accepted speech signal, acquires power representing the intensity of a speech sound represented by the speech signal. The probability distribution acquisition unit acquires a probability distribution using the intensity of the power acquired by the power acquisition unit as a random variable. The correspondence degree determination unit determines whether a correspondence degree representing a degree that power acquired by the power acquisition unit in a case that a predetermined reference speech signal is inputted into the power acquisition unit corresponds with predetermined reference power is higher than a predetermined reference correspondence degree, based on the probability distribution acquired by the probability distribution acquisition unit.01-05-2012
20120004915CONVERSATIONAL SPEECH ANALYSIS METHOD, AND CONVERSATIONAL SPEECH ANALYZER - The invention provides a conversational speech analyzer which analyzes whether utterances in a meeting are of interest or concern. Frames are calculated using sound signals obtained from a microphone and a sensor, sensor signals are cut out for each frame, and by calculating the correlation between sensor signals for each frame, an interest level which represents the concern of an audience regarding utterances is calculated, and the meeting is analyzed.01-05-2012
20120209612Extraction and Matching of Characteristic Fingerprints from Audio Signals - An audio fingerprint is extracted from an audio sample, where the fingerprint contains information that is characteristic of the content in the sample. The fingerprint may be generated by computing an energy spectrum for the audio sample, resampling the energy spectrum, transforming the resampled energy spectrum to produce a series of feature vectors, and computing the fingerprint using differential coding of the feature vectors. The generated fingerprint can be compared to a set of reference fingerprints in a database to identify the original audio content.08-16-2012
20120010889VOICE INTERACTION METHOD OF MOBILE TERMINAL BASED ON VOICEXML AND MOBILE TERMINAL - The present invention discloses a voice interaction method of a mobile terminal based on VoiceXML and a mobile terminal, which comprises: converting received voice information into a VoiceXML document, parsing the VoiceXML document according to a preset VoiceXML document framework, searching the information of the function which needs to be realized by the voice information corresponding to the VoiceXML document; mapping found function information to the function corresponding to the particular function of the man-machine interface, and informing the mapped function to the man-machine interface; performing VoiceXML response document conversion on the response information from the man-machine interface, and playing the conversion result via a corresponding voice information. According to the technical solution of the present invention, the advanced intelligence and complex voice interaction can be realized, and the transportability of voice interaction is improved.01-12-2012
20110166862SYSTEM AND METHOD FOR VARIABLE AUTOMATED RESPONSE TO REMOTE VERBAL INPUT AT A MOBILE DEVICE - A method and system for altering an operational mode of evaluating and responding to verbal input from a user to a mobile device if conditions make such evaluation incompatible with a favorable user experience. Automated speech recognition (ASR) evaluation of verbal input may be performed on a mobile platform to continue a flow of the user experience. Evaluation of the verbal input may continue at a backend when conditions allow for transmission of recorded input to the backend.07-07-2011
20120116774SYSTEM FOR VOICE CONTROL OF A MEDICAL IMPLANT - An implantable system (05-10-2012
20120116773CONTENT FILTERING FOR A DIGITAL AUDIO SIGNAL - According to some embodiments, content filtering is provided for a digital audio signal.05-10-2012
20120016677Method and device for audio signal classification - The present invention discloses a method and a device for audio signal classification, and relates to the field of communications technologies, which solve a problem of high complexity of type classification of audio signals in the prior art. In the present invention, after an audio signal to be classified is received, a tonal characteristic parameter of the audio signal to be classified, where the tonal characteristic parameter of the audio signal to be classified is in at least one sub-band, is obtained, and a type of the audio signal to be classified is determined according to the obtained characteristic parameter. The present invention is mainly applied to an audio signal classification scenario, and implements audio signal classification through a relatively simple method.01-19-2012
20120016676SYSTEM AND METHOD FOR WRITING DIGITS IN WORDS AND PRONUNCIATION OF NUMBERS, FRACTIONS, AND UNITS - Disclosed is a system and method for converting a digital number to text and for pronouncing the digital number. The system includes a filtration system for determining whether the digital number has nonnumeric symbols and for generating a filtrated number, an analyzing system for analyzing the filtrated number, a composition system configured to collect words associated with ternary units of the filtrated number, a linking system configured to link the words, and a pronouncing system for pronouncing the linked words.01-19-2012
20120116772Method and System for Providing Speech Therapy Outside of Clinic - A system and method for speech therapy is provided that includes a mobile device, a server and a web-client. The mobile device captures and processes voice signals analyzed locally and on the server and from which a speech therapy is coordinated and delivered. The web-client through interaction with the mobile device and through the server implements a speech therapy that can be monitored and managed thereon through specified clinical moderation. The web-client also provides an alternative method to capture and transmit voice signals to the server for analysis and from which a speech therapy is coordinated and delivered. Speech therapy management can implement therapy procedures, guidelines and one-to-one communication sessions between users and providers in a non-clinical setting in real-time or at scheduled times. Other embodiments are disclosed.05-10-2012
20120116771Method and apparatus for serching a music database - A method for a user to buy a song from a remote music source, the method comprising the steps of: 05-10-2012
20120116770SPEECH DATA RETRIEVING AND PRESENTING DEVICE - A speech data retrieving and presenting device applied with an electronic device through a network includes a data receiving unit, a processing unit and a speech presenting unit. The data receiving unit connected to the network receives data of the electronic device through the network. The processing unit coupled to the data receiving unit receives speech data and retrieves a speech presenting signal from the speech data. The speech presenting unit coupled to the processing unit receives the speech presenting signal and outputs a speech according to the speech data. This device can assist a user to obtain network information, and provide the user a more flexible application according to the property that the device can be operated independently by a simple motion.05-10-2012
20110093273System And Method For Determining The Active Talkers In A Video Conference - The present invention describes a method of determining the active talker for display on a video conferencing system, including the steps of: for each participant, capturing audio data using an audio capture sensor and video data using a video capture sensor; determining the probability of active speech (p04-21-2011
20120065981TEXT PRESENTATION APPARATUS, TEXT PRESENTATION METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a text presentation apparatus presenting text for a speaker to read aloud for voice recording includes: a text storing unit for storing first text; a presenting unit for presenting the first text; a determination unit for determining whether or not the first text needs to be replaced, on the basis of a speaker's input for the first text presented; a preliminary text storing unit for storing preliminary text; a select unit configured to select, if it is determined that the first text needs to be replaced, second text to replace the first text from among the preliminary text, the selecting being performed on the basis of attribute information describing an attribute of the first text and on the basis of at least one of attribute information describing pronunciation of the first text and attribute information describing a stress type of the first text; and a control unit configured to control the presenting unit so that the presenting unit presents the second text.03-15-2012
20120109657Narrative Voice Files for GPS Devices - The invention consists of a compilation of unified audio tour files in compressed format i.e. MP3 or MP4 that provides pre-recorded spoken commentary to Global Positioning System (GPS) enabled devices. Using satellite technology, audio is triggered based on a user's location, providing relevant facts, geography, points of interest, history, and trivia of every city/town/area as it is being traveled throughout the World. These audio tour files will be provided in multiple languages. Upgrades shall be available via the Internet. The invention will narrate the entire World beginning with the large metropolitan areas of the United States of America, through to the smallest towns in Malta.05-03-2012
20110106539Audio and Video Signal Processing - The present disclosure related generally to audio and video signal processing. Various arrangements are disclosed. One method recites: (a) obtaining data representing audible portions of audio or representing picture portions of video; (b) using a programmed electronic processor, determining identifying information from the obtained data by computing a frequency transform to produce frequency transform data, and processing the frequency transform data to derive a pattern, and using the pattern as the identifying information for the audio or video; and (c) using the identifying data to facilitate purchase or license of the audio or video. Other arrangements are disclosed as well.05-05-2011
20100094634METHOD AND APPARATUS FOR CREATING FACE CHARACTER BASED ON VOICE - An apparatus and method of creating a face character which corresponds to a voice of a user is provided. To create various facial expressions with fewer key models, a face character is divided in a plurality of areas and a voice sample is parameterized corresponding to pronunciation and emotion. If the user's voice is input, a face character image corresponding to divided face areas is synthesized using key models and data about parameters corresponding to the voice sample to synthesize an overall face character image using the synthesized face character image corresponding to the divided face areas.04-15-2010
20100094633VOICE ANALYSIS DEVICE, VOICE ANALYSIS METHOD, VOICE ANALYSIS PROGRAM, AND SYSTEM INTEGRATION CIRCUIT - A sound analysis device comprises: a sound parameter calculation unit operable to acquire an audio signal and calculate a sound parameter for each of partial audio signals, the partial audio signals each being the acquired audio signal in a unit of time; a category determination unit operable to determine, from among a plurality of environmental sound categories, which environmental sound category each of the partial audio signals belongs to, based on a corresponding one of the calculated sound parameters; a section setting unit operable to sequentially set judgement target sections on a time axis as time elapses, each of the judgment target sections including two or more of the units of time, the two or more of the units of time being consecutive; and an environment judgment unit operable to judge, based on a number of partial audio signals in each environmental sound category determined in at least a most recent judgment target section, an environment that surrounds the sound analysis device in at least the most recent judgment target section.04-15-2010
20120123785VOICE APPLICATION NETWORK PLATFORM - A distributed voice applications system includes a voice applications rendering agent and at least one voice applications agent that is configured to provide voice applications to an individual user. A management system may control and direct the voice applications rendering agent to create voice applications that are personalized for individual users based on user characteristics, information about the environment in which the voice applications will be performed, prior user interactions and other information. The voice applications agent and components of customized voice applications may be resident on a local user device which includes a voice browser and speech recognition capabilities. The local device, voice applications rendering agent and management system may be interconnected via a communications network.05-17-2012
20120123784SEQUENCED MULTI-MEANING TACTILE SYMBOLS USEABLE TO PRODUCE SYNTHETIC PLURAL WORD MESSAGES INCLUDING WORDS, PHRASES AND SENTENCES - An embodiment of the present application is directed to a method including providing a keyboard, including a plurality of keys, at least some of the keys including polysemous symbols which provide distinctive tactile feedback to a user; and accessing a word, phoneme or plural word message, based upon sequentially selected ones of the polysemous symbols providing distinctive tactile feedback. Another embodiment of the present application is directed to a system, including a keyboard, including a plurality of keys, at least some of the keys including polysemous symbols which provide distinctive tactile feedback to a user; and a processor to access a word, phoneme or plural word message, based upon sequentially selected ones of the polysemous symbols providing distinctive tactile feedback.05-17-2012
20120123783SYSTEMS AND METHODS FOR EDITING TELECOM WEB APPLICATIONS THROUGH A VOICE INTERFACE - Systems and associated methods for editing telecom web applications through a voice interface are described. Systems and methods provide for editing telecom web applications over a connection, as for example accessed via a standard phone, using speech and/or DTMF inputs. The voice based editing includes exposing an editing interface to a user for a telecom web application that is editable, dynamically generating a voice-based interface for a given user for accomplishing editing tasks, and modifying the telecom web application to reflect the editing commands entered by the user.05-17-2012
20090132254DIAGNOSTIC REPORT BASED ON QUALITY OF USER'S REPORT DICTATION - A system and method are provided for automatically routing a diagnostic interpretation from diagnostic data received from a diagnostic source. The diagnostic interpretation is produced automatically using a voice recognition system. Along with the transcription of the interpretation, the voice recognition system returns a level of confidence of the voice recognition. Based on this level of confidence, the system automatically routes the transcribed interpretation to the appropriate destination for further processing.05-21-2009
20120253818INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM - There is provided an information processing apparatus including an operation information transmitting unit transmitting operation information for operating respective appliances out of a plurality of appliances connected via a network, a character processing unit carrying out processing relating to characters, which correspond to the respective appliances and have individual personalities, and changes a content represented by the characters in accordance with the operation information for operating the appliances, and a display processing unit carrying out processing that displays the characters on a display unit.10-04-2012
20120253817Mobile speech attendant access - A system and method for connecting to a telephone extension listed in a telephone number database is disclosed. The method comprises recording an audio token on a mobile communication device. The audio token is associated with a telephone number included in the database. The audio token is transmitted from the mobile communication device to a server over a digital channel. The telephone number in the database that is associated with the audio token is selected using speech recognition. The mobile communication device is then connected with the telephone number.10-04-2012
20120316882SYSTEM FOR GENERATING CAPTIONS FOR LIVE VIDEO BROADCASTS - An adaptive workflow system can be used to implement captioning projects, such as projects for creating captions or subtitles for live and non-live broadcasts. Workers can repeat words spoken during a broadcast program or other program into a voice recognition system, which outputs text that may be used as captions or subtitles. The process of workers repeating these words to create such text can be referred to as respeaking. Respeaking can be used as an effective alternative to more expensive and hard-to-find stenographers for generating captions and subtitles.12-13-2012
20100049524Method And Apparatus For Providing Search Capability And Targeted Advertising For Audio, Image And Video Content Over The Internet - The present invention provides an apparatus and method for extracting the content of a video, image, and/or audio file or podcast, analyzing the content, and then providing a targeted advertisement, search capability and/or other functionality based on the content of the file or podcast.02-25-2010
20120179470SIMULTANEOUS VOICE AND DATA SYSTEMS FOR SECURE CATALOG ORDERS - Systems and methods for providing a simultaneous voice and data user interface for secure catalog orders and in particular for providing a system and method for providing a distributed voice user interface for a remote device having a limited visual user interface simultaneously with a data stream for facilitating secure automated catalog orders for simultaneous electronic fulfillment applied to that device are described.07-12-2012
20120179469CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS - Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.07-12-2012
20120253819LOCATION DETERMINATION SYSTEM AND MOBILE TERMINAL - A location determination system includes a first mobile terminal and a second mobile terminal. The first mobile terminal includes a first processor to acquire a first sound signal, analyze the first sound signal to obtain a first analysis result, and transmit the first analysis result. The second mobile terminal includes a second processor to acquire a second sound signal, analyze the second sound signal to obtain a second analysis result, receive the first analysis result from the first mobile terminal, compare the second analysis result with the first analysis result to obtain a comparison result, and determine whether the first mobile terminal locates in an area in which the second mobile terminal locates, based on the comparison result.10-04-2012
20120316883SYSTEM AND METHOD FOR DETERMINING A PERSONAL SHG PROFILE BY VOICE ANALYSIS - According to one embodiment of the present invention a computerized voice-analysis device for determining an S, H, G profile is provided (as described herein, such an S, H, G profile relates to the strengths (e.g., relative strengths) of three human instinctive drives). Of note, the present invention may be used for one or more of the following: analyzing a previously recorded voice sample; real-time analysis of voice as it is being spoken; combination voice analysis—that is, a combination of: (a) previously recorded and/or real-time voice; and (b) answers to a questionnaire.12-13-2012
20120259639CONTROLLING AUDIO VIDEO DISPLAY DEVICE (AVDD) TUNING USING CHANNEL NAME - A television, or other device with television tuner, can be controlled to directly tune to a specific channel name, such as a broadcaster's station name, by using EPG metadata to provide a correlation between a channel number and channel name.10-11-2012
20120259640VOICE CONTROL DEVICE AND VOICE CONTROL METHOD - A voice control unit controlling and outputting a first voice signal includes an analysis unit configured to calculate an average value of a gradient of spectrum at a high frequency of an inputted second voice signal as a voice characteristic, a determination unit configured to determine an amplification band and an amplification amount of a spectrum of the first voice signal based on the gradient, and an amplification unit configured to amplify the spectrum of the first voice signal to realize the determined amplification band and the determined amplification amount.10-11-2012
20120259638APPARATUS AND METHOD FOR DETERMINING RELEVANCE OF INPUT SPEECH - Audio or visual orientation cues can be used to determine the relevance of input speech. The presence of a user's face may be identified during speech during an interval of time. One or more facial orientation characteristics associated with the user's face during the interval of time may be determined. In some cases, orientation characteristics for input sound can be determined. A relevance of the user's speech during the interval of time may be characterized based on the one or more orientation characteristics.10-11-2012
20080300883Projection Apparatus with Speech Indication and Control Method Thereof - A projection apparatus with speech indication and a control method thereof are provided. The projection apparatus comprises a storage unit, a transmission interface, a process unit, and an output unit. The storage unit is configured to store a plurality of speech data. The transmission interface is configured to connect to an external apparatus for accessing the storage unit. The process unit is configured to select at least one of the speech data according to the present state of the projection apparatus. The output unit is configured to output the selected speech datum to broadcast the speech indication.12-04-2008
20110131049Method and Apparatus for Providing a Framework for Efficient Scanning and Session Establishment - A method of providing a framework for efficient scanning and session establishment may include receiving vocabulary independent property information indicative of a property request and corresponding setting information of an application associated with a device capable of communication with a network communication environment, determining capabilities of the network communication environment relative to the received property information, and enabling generation of a selected scan function having selected scan parameters based at least in part on the determined capabilities and the property information. A corresponding apparatus and computer program product are also provided.06-02-2011
20110131048SYSTEM AND METHOD FOR AUTOMATICALLY GENERATING A DIALOG MANAGER - Disclosed herein are systems, methods, and computer-readable storage media for automatically generating a dialog manager for use in a spoken dialog system. A system practicing the method receives a set of user interactions having features, identifies an initial policy, evaluates all of the features in a linear evaluation step of the algorithm to identify a set of most important features, performs a cubic policy improvement step on the identified set of most important features, repeats the previous two steps one or more times, and generates a dialog manager for use in a spoken dialog system based on the resulting policy and/or set of most important features. Evaluating all of the features can include estimating a weight for each feature which indicates how much each feature contributes to at least one of the identified policies. The system can ignore features not in the set of most important features.06-02-2011
20110131047Steganography in Digital Signal Encoders - In a method for embedding steganographic information into the signal information of a signal encoder, a solution is to be created, which enables steganographic information being embedded into the signal information of a signal encoder such that a reduction of the voice quality is largely avoided. This is achieved by means of providing data information, particularly voice information, selecting steganographic information from a quantity of steganographic information, generating a code word from a code book provided by means of the signal encoder on the basis of the code elements forming the code book such that with the use of the code word generated within the scope of a transmission standard associated with the code book the data information is encoded into signal information containing the code word and/or making reference to the code word; and by the code word generated having an additional feature that can be calculated on the basis of the code elements forming the code word, wherein the additional feature represents the steganographic information.06-02-2011
20120265536APPARATUS AND METHOD FOR PROCESSING VOICE COMMAND - Disclosed is a technique for processing voice commands. In particular, the disclose technique increases a voice recognition rate without performing a process of inputting separate voice commands by updating a voice command table based on interaction with a user by storing similar commands input by the user once those commands have been confirmed by the user as similar command.10-18-2012
20120265535PERSONAL VOICE OPERATED REMINDER SYSTEM - A personal voice operated reminder system. In one embodiment, the system is worn as a device on the body in a form similar to a watch, bracelet or necklace. In another embodiment the system is a device normally held in a person's pocket or purse, and in another embodiment the system is a method added as an application to already existing devices such as PDAs or cellular telephones.10-18-2012
20110046960Multi-Channel Interactive Self-Help Application Platform and Method - An interactive voice response (IVR) platform running a voice application for use with a voice client is extended to support text messaging clients and other clients of other media types on other channels. An application-to-text messaging interface interfaces with text messaging clients via a text messaging protocol transport and interfaces with the IVR via an API. It includes a user/application manager to handle user and application accounts and a state/session manager to handle state information required by the text messaging operations and to handle sessions maintained by the IVR. Text modules are implemented having text synthesis and text recognition with a dictionary/grammar. These allow voice-specific application scripts to be interpreted in a text channel. The extended multi-channel platform supports an open source text messaging network and also through a transport gateways to other types of text messaging clients.02-24-2011
20110046959Substituting or Replacing Components in Sound Based on Steganographic Encoding - The present disclosure relates to various methods and systems to provide substitute sound (e.g., audio). One claim includes an apparatus comprising: electronic memory for storing identifying information obtained from steganographically encoded sound; an electronic processor programmed for: providing the identifying information to a remote computer, the remote computer including substitute sound corresponding to the identifying information; providing format information to the remote computer, the format information identifying a format in which the substitute sound should be formatted prior to communication of the substitute sound; and controlling receipt of substitute sound corresponding to the identifying information. Of course, other apparatus, methods and combinations are provided as well.02-24-2011
20120323579DYNAMIC ACCESS TO EXTERNAL MEDIA CONTENT BASED ON SPEAKER CONTENT - An audio conference is supplemented based on speaker content. Speaker content from at least one audio conference participant is monitored using a computer with a tangible non-transitory processor and memory. A set of words is selected from the speaker content. The selected set of words is determined to be associated with supplemental media content from at least one external source. The supplemental media content is made available to at least one audience member for the audio conference. The supplemental media content is selectively presented to the at least one audience member.12-20-2012
20120271636VOICE INPUT DEVICE - A voice input device includes: a mastery level identifying device identifying a mastery level of a user with respect to voice input; and an input mode setting device switching a voice input mode between a guided input mode and an unguided input mode. In the guided input mode, preliminary registered contents of the voice input are presented to the user. The input mode setting device sets the voice input mode to the unguided input mode at a starting time when the voice input device starts to receive the voice input. The input mode setting device switches the voice input mode from the unguided input mode to the guided input mode at a switching time. The input mode setting device sets a time interval between the starting time and the switching time in proportion to the mastery level.10-25-2012
20120271637DERIVING GEOGRAPHIC DISTRIBUTION OF PHYSIOLOGICAL OR PSYCHOLOGICAL CONDITIONS OF HUMAN SPEAKERS WHILE PRESERVING PERSONAL PRIVACY - A method including: obtaining, via a plurality of communication devices, a plurality of speech signals respectively associated with human speakers, the speech signals including verbal components and non-verbal components; identifying a plurality of geographical locations, each geographic location associated with a respective one of the plurality of the communication devices; extracting the non-verbal components from the obtained speech signals; deducing physiological or psychological conditions of the human speakers by analyzing, over a specified period, the extracted non-verbal components, using predefined relations between characteristics of the non-verbal components and physiological or psychological conditions of the human speakers; and providing a geographical distribution of the deduced physiological or psychological conditions of the human speakers by associating the deduced physiological or psychological conditions of the human speakers with geographical locations thereof.10-25-2012
20110238423SONIC DOCUMENT CLASSIFICATION - An apparatus for classifying documents (09-29-2011
20120089403Postal Processing Including Voice Feedback - System, methods, and computer-readable media. A method includes receiving a voice input, from an operator, corresponding to a mail item. The method includes performing a voice recognition process on the voice input to produce spoken data, and producing a system result corresponding to the spoken data. The method includes analyzing the system result to produce feedback information, and audibly sounding the feedback information to the operator.04-12-2012
20110276334Methods and Systems for Synchronizing Media - Systems and methods of synchronizing media are provided. A client device may be used to capture a sample of a media stream being rendered by a media rendering source. The client device sends the sample to a position identification module to determine a time offset indicating a position in the media stream corresponding to the sampling time of the sample, and optionally a timescale ratio indicating a speed at which the media stream is being rendered by the media rendering source based on a reference speed of the media stream. The client device calculates a real-time offset using a present time, a timestamp of the media sample, the time offset, and optionally the timescale ratio. The client device then renders a second media stream at a position corresponding to the real-time offset to be in synchrony to the media stream being rendered by the media rendering source.11-10-2011
20110276333Methods and Systems for Synchronizing Media - Systems and methods of synchronizing media are provided. A client device may be used to capture a sample of a media stream being rendered by a media rendering source. The client device sends the sample to a position identification module to determine a time offset indicating a position in the media stream corresponding to the sampling time of the sample, and optionally a timescale ratio indicating a speed at which the media stream is being rendered by the media rendering source based on a reference speed of the media stream. The client device calculates a real-time offset using a present time, a timestamp of the media sample, the time offset, and optionally the timescale ratio. The client device then renders a second media stream at a position corresponding to the real-time offset to be in synchrony to the media stream being rendered by the media rendering source.11-10-2011
20110320208Page identification method for audio book - A page identification method for audio book with a main housing, a plurality of pages, a plurality of light blocking panels, an audio record and playback electronic circuit including microphone, speaker, power source, record switch and playback switch, a microprocessor and a plurality of light sensing devices. The top surface of the main body has a plurality of apertures. Each light sensing device located directly under each main body aperture. Each page has one or more apertures that are aligned with at least one of the main body apertures. Each light blocking panel is interleaved between each page so that when the page is turned by the user the light blocking panel will slide over to cover or uncover the page aperture causing the light sensing devices to send a signal to the microprocessor that tells the audio circuit which message to play for each page.12-29-2011
20130013316Multisensory Speech Detection - A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.01-10-2013
20130013315Multisensory Speech Detection - A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.01-10-2013
20100131277Device, Method, and Program for Performing Interaction Between User and Machine - There is provided a device for performing interaction between a user and a machine. The device includes a plurality of domains corresponding to a plurality of stages in the interaction. Each of the domains has voice comprehension means which understands the content of the user's voice. The device includes: means for recognizing the user's voice; means for selecting a domain enabling the best voice comprehension result as the domain; means for referencing task knowledge of the domain and extracting a task05-27-2010
20120150545BRAIN-COMPUTER INTERFACE TEST BATTERY FOR THE PHYSIOLOGICAL ASSESSMENT OF NERVOUS SYSTEM HEALTH - A battery of three or more sensory and cognitive challenge tasks actively or dynamically challenge the brain to monitor its state for assessment of injury, disease, or compound effect, among others. The system analyzes and assesses a personalized biometric brain health signature by integrating the use of electroencephalography (EEG), somato-sensory, neuropsychological, and/or cognitive stimulation, and novel signal processing and display. The system also provides for early detection of dementia, including Alzheimer's disease (AD), vascular dementia (VAD), mixed dementia (AD and VAD), MCI, and other dementia-type disorders, as well as brain injury states such as mild Traumatic Brain Injury and can provide some or all of the following improvements over conventional systems and methods, including: (1) Increased sensitivity, specificity, and overall accuracy; (2) early detection of disease and injury; and (3) enhanced portability with remote data acquisition capability.06-14-2012
20080255846METHOD OF PROVIDING LANGUAGE OBJECTS BY INDENTIFYING AN OCCUPATION OF A USER OF A HANDHELD ELECTRONIC DEVICE AND A HANDHELD ELECTRONIC DEVICE INCORPORATING THE SAME - The disclosed and claimed concept relates generally to handheld electronic devices and, more particularly, to a method of providing language objects by identifying an occupation of a user of a handheld electronic device and a handheld electronic device incorporating the same. A method and apparatus of providing language objects by identifying an occupation of a user of a handheld electronic device includes the following steps: identifying the occupation of the user of the handheld electronic device from a number of occupations; detecting a text input; and displaying at least a portion of at least a first language object that is associated with the identified occupation and that corresponds to the text input.10-16-2008
20080249780VOICE GUIDANCE SYSTEM FOR VEHICLE - A voice guidance system for a vehicle includes a transmitter, a tuner, a touch sensor, a smart ECU, a D-seat speaker, and a P-seat speaker, which are all mounted in a vehicle. It is used for an in-vehicle system, such as a smart entry system, which performs intercommunication with a portable unit. In this guidance system, a smart ECU stores in a memory information indicating that a user has performed predetermined operation with the smart entry system. When it is determined that a user will use the smart entry system, the following processing is performed: voice guidance about the operation procedures for the system is outputted from a driver seat speaker or a passenger seat speaker when information indicating that the user has performed the predetermined operation in the past is not stored in the memory; and voice guidance is disabled when information indicating that the user has performed the predetermined operation is stored.10-09-2008
20080249779SPEECH DIALOG SYSTEM - A speech dialog system includes a signal input unit that receives an acoustic input signal. A voice activity detector compares a portion of the received signal to a noise estimate to determine if the signal includes voice activity. A speech recognizer processes signals containing voice activity to determine if the signal contains speech. An output unit modifies signals when output of the system substantially coincides with the delivered speech.10-09-2008
20080235026VOICE ACTIVATED DISTANCE MEASURING DEVICE - A voice activated device for annunciating a message indicative of a distance of the device spaced from another location is disclosed. The device comprises a voice sensor for receiving a voice command requesting annunciation of a message indicative of the distance of the device spaced from the other location, converting circuitry coupled to the voice sensor for converting the received voice command to a corresponding electrical command, determining circuitry responsive to the electrical command for determining the distance of the device from the other location, and a speaker coupled to the determining circuitry for annunciating the message indicative of the determined distance of the device from the other location. The device may be used for informing a golfer of the golfer's distance from the pin.09-25-2008
20080221895Method and Apparatus for Processing Audio for Playback - A method and apparatus for processing audio for playback to provide a smooth transition between a beginning region of an audio track and an end region of a previous audio track is disclosed. A quantity representative of a chromagram is calculated for each of the audio tracks and the mixing points for the beginning and end regions of each audio track are identified. A quantity representative of a chromagram at the mixing point of the beginning region of the audio track and a quantity representative of a chromagram at the mixing point of the end region of the previous audio track are correlated to determine an order of audio tracks for playback and/or to determine the duration of the mix transition.09-11-2008
20130179171SYSTEM AND METHOD FOR MULTI LEVEL TRANSCRIPT QUALITY CHECKING - Methods and systems for multi level quality checking of transcripts are disclosed. The method includes the steps of searching subsets of metadata associated with the transcripts, identifying a group of transcripts having at least one particular subset of metadata, selecting a number of transcripts from the group of identified transcripts corresponding to a predetermined percentage, identifying a group of correctionists having a proper set of characteristics to correct the selected transcripts by matching the identified subsets of metadata associated with the transcripts with characteristics of correctionists, providing the transcripts and any voice files from which the transcripts derive to the selected correctionists, and, following correction, updating the subsets of metadata associated with the transcripts to include subsets of metadata pertaining to the voice files from which the transcripts were derived, any transcriptionist who transcribed the transcripts, or any correctionist who corrected the transcripts.07-11-2013
20130096922METHOD, APPARATUS AND COMPUTER PROGRAM PRODUCT FOR DETERMINING THE LOCATION OF A PLURALITY OF SPEECH SOURCES - The present invention discloses a method, apparatus and computer program product for determining the location of a plurality of speech sources in an area of interest, comprising performing an algorithm on a signal issued by either one of said plurality of speech sources in the area to for iteratively recover data characteristic to said signal, wherein the algorithm is an iterative model-based sparse recovery algorithm, and wherein for each of a plurality of points in said area, the iteratively recovered data is indicative of a presence of a plurality of speech sources contributing to the signal received at each of a plurality of points in the area.04-18-2013
20110313773SEARCH APPARATUS, SEARCH METHOD, AND PROGRAM - A search apparatus includes a sound recognition unit which recognizes input sound, a user information estimation unit which estimates at least one of a physical condition and emotional demeanor of a speaker of the input sound based on the input sound and outputs user information representing the estimation result, a matching unit which performs matching between a search result target pronunciation symbol string and a recognition result pronunciation symbol string for each of plural search result target word strings, and a generation unit which generates a search result word string as a search result for a word string corresponding to the input sound from the plural search result target word strings based on the matching result. At least one of the matching unit and the generation unit changes processing in accordance with the user information.12-22-2011
20130124206VIDEO GENERATION BASED ON TEXT - Techniques for generating a video sequence of a person based on a text sequence, are disclosed herein. Based on the received text sequence, a processing device generates the video sequence of a person to simulate visual and audible emotional expressions of the person, including using an audio model of the person's voice to generate an audio portion of the video sequence. The emotional expressions in the visual portion of the video sequence are simulated based a priori knowledge about the person. For instance, the a priori knowledge can include photos or videos of the person captured in real life.05-16-2013
20130132088APPARATUS AND METHOD FOR RECOGNIZING EMOTION BASED ON EMOTIONAL SEGMENTS - An apparatus and method to recognize a user's emotion based on emotional segments are provided. An emotion recognition apparatus includes a sampling unit configured to extract sampling data from input data for emotion recognition. The emotion recognition apparatus further includes a data segment creator configured to segment the sampling data into a plurality of data segments. The emotion recognition apparatus further includes an emotional segment creator configured to create a plurality of emotional segments that include a plurality of emotions corresponding to each of the respective data segments.05-23-2013
20130144626RAP MUSIC GENERATION - The preferred embodiments of this invention convert common human speeches into rap music. Computer programs change the timing intervals, amplitudes, and/or frequencies of the sound signals of a common speech to follow rap music beats. The resulting rap music also can overlap with background music and/or video images to achieve better effects.06-06-2013
20100286987APPARATUS AND METHOD FOR GENERATING AVATAR BASED VIDEO MESSAGE - An apparatus and method for generating an avatar based video message are provided. The apparatus and method are capable of generating an avatar based video message based on speech of a user. The avatar based video message apparatus and method displays information that corresponds to input user speech. The avatar based video message apparatus and method edits the input user speech according to a user input signal with reference to the displayed information, generates avatar animation according to the edited speech, and generates an avatar based video message based on the edited speech and the avatar animation.11-11-2010
20130151257APPARATUS AND METHOD FOR PROVIDING EMOTIONAL CONTEXT TO TEXTUAL ELECTRONIC COMMUNICATION - An apparatus and method for including emotional context in textual electronic communication transmissions. The emotional context is conveyed symbolically through standardized alternations in the manner in which the text is displayed without the inclusion of additional graphics, thereby increasing the communicative value of textual electronic communication. An important advantage of this method of embedding emotional context is that the recipient is made aware of the mental and emotional state of to the originator while the textual electronic message is being received and interpreted and therefore is able to interpret the message in light of the emotional context.06-13-2013
20130151258Context Based Online Advertising - A software and/or hardware facility for inferring user context and delivering advertisements, such as coupons, using natural language and/or sentiment analysis is disclosed. The facility may infer context information based on a user's emotional state, attitude, needs, or intent from the user's interaction with or through a mobile device. The facility may then determine whether it is appropriate to deliver an advertisement to the user and select an advertisement for delivery. The facility may also determine an appropriate expiration time and/or discount amount for the advertisement.06-13-2013
20130204625EXERCISE AND LEARNING CONCEPT USING AN EXERCISE AND LEARNING DEVICE FOR THE THERAPEUTIC TREATMENT OF PATIENTS IN THE MEDICAL DOMAIN - The invention relates to an exercise and learning concept and to a mobile exercise and learning device for the therapeutic treatment of patients, said concept being based on a network system. Said exercise and learning concept and exercise and learning device are used for the therapeutic treatment of patients in order to allow mobile and interactive learning. Said exercise and learning device comprises exercise and learning modules which are individually adapted to a patient who can also perform therapeutic exercises irrespective of the time and place.08-08-2013
20100318365Method and Apparatus for Configuring Web-based data for Distribution to Users Accessing a Voice Portal System - In a system for developing and deploying a voice application using Web-based data as source data over a communications network to one or more recipients, a method for organizing, editing, and prioritizing the Web-based data before dialog creation is provided. The method includes harvesting the Web-based data source in the form of its original structure; generating an object tree representing the logical structure and content type of the harvested, Web-based data source; manipulating the object tree generated to a desired hierarchal structure and content; creating a voice application template in VXML and populating the template with the manipulated object tree; and creating a voice application capable of accessing the Web-based data source according to the constraints of the template.12-16-2010
20120284029PHOTO-REALISTIC SYNTHESIS OF IMAGE SEQUENCES WITH LIP MOVEMENTS SYNCHRONIZED WITH SPEECH - Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which a synthesized image sequence will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with lip movements synchronized with the desired speech.11-08-2012
20130185076MOTION ANALYZER, VOICE ACQUISITION APPARATUS, MOTION ANALYSIS SYSTEM, AND MOTION ANALYSIS METHOD - A motion analyzer includes a motion detection unit that detects motion of a part of a body of a subject, a speaking detection unit that detects speaking of the subject, and a determination unit that determines that the subject has performed predetermined motion when motion of a part of the body is detected by the motion detection unit and speaking of the subject is detected by the speaking detection unit.07-18-2013
20130197913EXTRACTION AND MATCHING OF CHARACTERISTIC FINGERPRINTS FROM AUDIO SIGNALS - An audio fingerprint is extracted from an audio sample, where the fingerprint contains information that is characteristic of the content in the sample. The fingerprint may be generated by computing an energy spectrum for the audio sample, resampling the energy spectrum logarithmically in the time dimension, transforming the resampled energy spectrum to produce a series of feature vectors, and computing the fingerprint using differential coding of the feature vectors. The generated fingerprint can be compared to a set of reference fingerprints in a database to identify the original audio content.08-01-2013

Patent applications in class Application

Patent applications in all subclasses Application