Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Speech to image

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

704231000 - Recognition

Patent class list (only not empty are listed)

Deeper subclasses:

Entries
DocumentTitleDate
20130030807WIRELESS SPEECH RECOGNITION TOOL - The wireless voice recognition system for data retrieval comprises a server, a database and an input/output device, operably connected to the server. When the user speaks, the voice transmission is converted into a data stream using a specialized user interface. The input/output device and the server exchange the data stream. The server uses a programming interface having an engine to match and compare the stream of audible data to a data element of selected searchable information. A data element of recognized information is generated and transferred to the input/output device for user verification.01-31-2013
20130030806TRANSCRIPTION SUPPORT SYSTEM AND TRANSCRIPTION SUPPORT METHOD - In an embodiment, a transcription support system includes: a first storage, a playback unit, a second storage, a text generating unit, an estimating unit, and a setting unit. The first storage stores the voice data therein; a playback unit plays back the voice data; and a second storage stores voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, for which the voice positional information is indicative of a temporal position in the voice data and corresponds to the character string. The text creating unit creates text; the estimating unit estimates already-transcribed voice positional information based on the voice indices; and the setting unit sets a playback starting position that indicates a position at which playback is started in the voice data based on the already-transcribed voice positional information.01-31-2013
20130030805TRANSCRIPTION SUPPORT SYSTEM AND TRANSCRIPTION SUPPORT METHOD - According to one embodiment, a transcription support system supports transcription work to convert voice data to text. The system includes a first storage unit configured to store therein the voice data; a playback unit configured to play back the voice data; a second storage unit configured to store therein voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, for which the voice positional information is indicative of a temporal position in the voice data and corresponds to the character string; a text creating unit that creates the text in response to an operation input of a user; and an estimation unit configured to estimate already-transcribed voice positional information indicative of a position at which the creation of the text is completed in the voice data based on the voice indices.01-31-2013
20130030804SYSTEMS AND METHODS FOR IMPROVING THE ACCURACY OF A TRANSCRIPTION USING AUXILIARY DATA SUCH AS PERSONAL DATA - A method is described for improving the accuracy of a transcription generated by an automatic speech recognition (ASR) engine. A personal vocabulary is maintained that includes replacement words. The replacement words in the personal vocabulary are obtained from personal data associated with a user. A transcription is received of an audio recording. The transcription is generated by an ASR engine using an ASR vocabulary and includes a transcribed word that represents a spoken word in the audio recording. Data is received that is associated with the transcribed word. A replacement word from the personal vocabulary is identified, which is used to re-score the transcription and replace the transcribed word.01-31-2013
20090204399SPEECH DATA SUMMARIZING AND REPRODUCING APPARATUS, SPEECH DATA SUMMARIZING AND REPRODUCING METHOD, AND SPEECH DATA SUMMARIZING AND REPRODUCING PROGRAM - Necessary portions of stored speech data representing conference content are summarized and reproduced in a predetermined time. Conference speech is summarized and reproduced using a speech data summarizing and reproducing apparatus comprising a speech data divider for dividing and structuring conference speech data into several utterance unit data based on utterers, distributed documents, the occurrence frequency of words in speech recognition results, and pauses, an importance level calculator for determining important utterance unit data based on the occurrence frequency of keywords, the information of utterers, and data specified by the user, a summarizer for extracting important utterance unit data and summarizing them within a specified time, and a speech data reproducer for reproducing the summarized speech data in chronological order or an order of importance levels with auxiliary information added thereto.08-13-2009
20100153106CONVERSATION MAPPING - A method may include receiving communications associated with a communication session. The communication session may correspond to a telephone conversation, text-based conversation or a multimedia conversation. The method may also include identifying portions of the communication session and storing the identified portions. The method may further include receiving a request to retrieve information associated with the communication session and providing to a display, information associated with the identified portions.06-17-2010
20100057457SPEECH RECOGNITION SYSTEM AND PROGRAM THEREFOR - An unknown word is additionally registered in a speech recognition dictionary by utilizing a correction result, and a new pronunciation of the word that has been registered in a speech recognition dictionary is additionally registered in the speech recognition dictionary, thereby increasing the accuracy of speech recognition. The start time and finish time of each phoneme unit in speech data corresponding to each phoneme included in a phoneme sequence acquired by a phoneme sequence converting section 03-04-2010
20090192797Talk text - The patent that I am requesting is for an existing product on the market. It is for voice transponding or activated texting. Rather than typing a text message, the user can simply speak into their cell phone and the message will be typed into the text message.07-30-2009
20130211834AUTOMATED INTERPRETATION OF CLINICAL ENCOUNTERS WITH CULTURAL CUES - A method, system and a computer program product for an automated interpretation and/or translation are disclosed. An automated interpretation and/or translation occurs by receiving language-based content from a user. The received language-based content is processed to interpret and/or translate the received language-based content into a target language. Also, a presence of a cultural sensitivity in the received language-based content is detected. Further, an appropriate guidance for dealing with the detected cultural sensitivity is provided.08-15-2013
20130211833TECHNIQUES FOR OVERLAYING A CUSTOM INTERFACE ONTO AN EXISTING KIOSK INTERFACE - Techniques for overlaying a custom interface onto an existing kiosk interface are provided. An event is detected that triggers a kiosk to process an agent that overlays, and without modifying, the kiosk's existing interface. The agent alters screen features and visual presentation of the existing interface and provides additional alternative operations for navigating and executing features defined in the existing interface. In an embodiment, the agent provides a custom interface overlaid onto the existing interface to provide a customer-facing interface for individuals that are sight impaired.08-15-2013
20120179466SPEECH TO TEXT CONVERTING DEVICE AND METHOD - A speech to text converting device includes a display, a voice receiving module, a voice recognition module, an identity recognition module, and a control module. The voice receiving module receives a voice signal. The voice recognition module converts the voice signal to voice data and produces text data corresponding to the voice data. The identity recognition module receives the voice signal and establishes an identity data corresponding to the voice signal. The control module displays the text data and the identity data together on the display.07-12-2012
20100076762Coarticulation Method for Audio-Visual Text-to-Speech Synthesis - A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data. second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.03-25-2010
20100076760DIALOG FILTERING FOR FILLING OUT A FORM - The invention discloses a system and method for filling out a form from a dialog between a caller and a call center agent. The caller and the caller center agent can have the dialog in the form of telephone conversation, instant messaging chat or email exchange. The system and method provides a list of named entities specific to the call center operation and uses a translation and transcription minor to filter relevant elements from the dialog between the caller and the call center agent. The relevant elements filtered from the dialog are subsequently displayed on the call center agent's computer screen to fill out application forms automatically or through drag and drop operations by the call center agent.03-25-2010
20100076761Decoding-Time Prediction of Non-Verbalized Tokens - Non-verbalized tokens, such as punctuation, are automatically predicted and inserted into a transcription of speech in which the tokens were not explicitly verbalized. Token prediction may be integrated with speech decoding, rather than performed as a post-process to speech decoding.03-25-2010
20120245938METHODS, SYSTEMS, AND COMPUTER PROGRAM PRODUCTS FOR MANAGING AUDIO AND/OR VIDEO INFORMATION VIA A WEB BROADCAST - Provided Web broadcast information is managed by annotation markers. At least one marker is received that annotates the audio and/or video information and the annotated audio and/or video information is saved in an electronically searchable file.09-27-2012
20120245937Voice Rendering Of E-mail With Tags For Improved User Experience - Tags, such as XML tags, are inserted into email to separate email content from signature blocks, privacy notices and confidentiality notices, and to separate original email messages from replies and replies from further replies. The tags are detected by a system that renders email as speech, such as voice command platform or network-based virtual assistant or message center. The system can render an original email message in one voice mode and the reply in a different voice mode. The tags can be inserted to identify a voice memo in which a user responds to a particular portion of an email message.09-27-2012
20120245935ELECTRONIC DEVICE AND SERVER FOR PROCESSING VOICE MESSAGE - An electronic device includes a voice processing unit, a wireless communication unit, and a combining unit. The voice processing unit receives speech signals. The wireless communication unit sends the speech signals to a server. The server converts the speech signals into a text message. The wireless communication unit receives the text message from the server. The combining unit combines the text message and the speech signals into a combined message. The wireless communication unit further sends the combined message to a recipient. A related server is also provided.09-27-2012
20130085754Interactive Text Editing - A method for providing suggestions includes capturing audio that includes speech and receiving textual content from a speech recognition engine. The speech recognition engine performs speech recognition on the audio signal to obtain the textual content, which includes one or more passages. The method also includes receiving a selection of a portion of a first word in a passage in the textual content, wherein the passage includes multiple words, and retrieving a set of suggestions that can potentially replace the first word. At least one suggestion from the set of suggestions provides a multi-word suggestion for potentially replacing the first word. The method further includes displaying, on a display device, the set of suggestions, and highlighting a portion of the textual content, as displayed on the display device, for potentially changing to one of the suggestions from the set of suggestions.04-04-2013
20130080164Selective Feedback For Text Recognition Systems - This specification describes technologies relating to recognition of text in various media. In general, one aspect of the subject matter described in this specification can be embodied in methods that include receiving an input signal including data representing one or more words and passing the input signal to a text recognition system that generates a recognized text string based on the input signal. The methods may further include receiving the recognized text string from the text recognition system. The methods may further include presenting the recognized text string to a user and receiving a corrected text string based on input from the user. The methods may further include checking if an edit distance between the corrected text string and the recognized text string is below a threshold. If the edit distance is below the threshold, the corrected text string may be passed to the text recognition system for training purposes.03-28-2013
20130080163INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD AND COMPUTER PROGRAM PRODUCT - According to an embodiment, an information processing apparatus includes a storage unit, a detector, an acquisition unit, and a search unit. The storage unit configured to store therein voice indices, each of which associates a character string included in voice text data obtained from a voice recognition process with voice positional information, the voice positional information indicating a temporal position in the voice data and corresponding to the character string. The acquisition unit acquires reading information being at least a part of a character string representing a reading of a phrase to be transcribed from the voice data played back. The search unit specifies, as search targets, character strings whose associated voice positional information is included in the played-back section information among the character strings included in the voice indices, and retrieves a character string including the reading represented by the reading information from among the specified character strings.03-28-2013
20130080162User Query History Expansion for Improving Language Model Adaptation - Query history expansion may be provided. Upon receiving a spoken query from a user, an adapted language model may be applied to convert the spoken query to text. The adapted language model may comprise a plurality of queries interpolated from the user's previous queries and queries associated with other users. The spoken query may be executed and the results of the spoken query may be provided to the user.03-28-2013
20130085755Systems And Methods For Continual Speech Recognition And Detection In Mobile Computing Devices - The present application describes systems, articles of manufacture, and methods for continuous speech recognition for mobile computing devices. One embodiment includes determining whether a mobile computing device is receiving operating power from an external power source or a battery power source, and activating a trigger word detection subroutine in response to determining that the mobile computing device is receiving power from the external power source. In some embodiments, the trigger word detection subroutine operates continually while the mobile computing device is receiving power from the external power source. The trigger word detection subroutine includes determining whether a plurality of spoken words received via a microphone includes one or more trigger words, and in response to determining that the plurality of spoken words includes at least one trigger word, launching an application corresponding to the at least one trigger word included in the plurality of spoken words.04-04-2013
20130035936LANGUAGE TRANSCRIPTION - A transcription system is applicable to transcription for a language in which there is limited pronunciation and/or acoustic data. A transcription station is configured using pronunciation data and acoustic data for use with the language. The pronunciation data and/or the acoustic data is initially from another dialect of a language, another language from a language group, or is universal (e.g., not specific to any particular language). A partial transcription of the audio recording is accepted via the transcription station (e.g., from a transcriptionist). One or more repetitions of one or more portions of the partial transcription are identified in the audio recording, and can be accepted during transcription. The pronunciation data and/or the acoustic data is updated in a bootstrapping manner during transcription, thereby improving the efficiency of the transcription process.02-07-2013
20130138438SYSTEMS AND METHODS FOR CAPTURING, PUBLISHING, AND UTILIZING METADATA THAT ARE ASSOCIATED WITH MEDIA FILES - Systems for recording, searching for, and obtaining metadata that are relevant to a plurality of media files are disclosed. The systems generally include a server that is configured to receive, index, and store a plurality of media files, which are received by the server from a plurality of sources, within at least one database in communication with the server. In addition, the server is configured to make one or more of the media files accessible to and searchable by, one or more persons—other than the original sources of such media files. Still further, the server is configured to display metadata that are associated with each media file. Such metadata may include links to one or more profile pages that are published within one or more social networks, with each of such profile pages being correlated with a unique voice signature that is detected within each media file. In addition, these metadata may include a geographical area from which each media file is provided to the server; a date on which each media file was created; a popularity index that is assigned to each media file; one or more theme categories that are assigned to each media file; or combinations of the above.05-30-2013
20130041664Method and Apparatus for Annotating Video Content With Metadata Generated Using Speech Recognition Technology - A method and apparatus is provided for annotating video content with metadata generated using speech recognition technology. The method begins by rendering video content on a display device. A segment of speech is received from a user such that the speech segment annotates a portion of the video content currently being rendered. The speech segment is converted to a text-segment and the text-segment is associated with the rendered portion of the video content. The text segment is stored in a selectively retrievable manner so that it is associated with the rendered portion of the video content.02-14-2013
20130041662SYSTEM AND METHOD OF CONTROLLING SERVICES ON A DEVICE USING VOICE DATA - A device and method to control applications using voice data. In one embodiment, a method includes detecting voice data from a user, converting the voice data to text data, matching the text data to an identifier the identifier associated with a list of identifiers for controlling operation of the application, and controlling the application based on the identifier matched with the text data. In another embodiment, voice data may be received from a control device.02-14-2013
20130035937System And Method For Efficiently Transcribing Verbal Messages To Text - A system and method for efficiently transcribing verbal messages to text is provided. Verbal messages are received and at least one of the verbal messages is divided into segments. Automatically recognized text is determined for each of the segments by performing speech recognition and a confidence rating is assigned to the automatically recognized text for each segment. A threshold is applied to the confidence ratings and those segments with confidence ratings that fall below the threshold are identified. The segments that fall below the threshold are assigned to one or more human agents starting with those segments that have the lowest confidence ratings. Transcription from the human agents is received for the segments assigned to that agent. The transcription is assembled with the automatically recognized text of the segments not assigned to the human agents as a text message for the at least one verbal message.02-07-2013
20100324895SYNCHRONIZATION FOR DOCUMENT NARRATION - Disclosed are techniques and systems for synchronizing an audio file with a sequence of words displayed on a user interface.12-23-2010
20100106500METHOD AND SYSTEM FOR ENHANCING VERBAL COMMUNICATION SESSIONS - An approach is provided for enhancing verbal communication sessions. A verbal component of a communication session is converted into textual information. The converted textual information is scanned for a text string to trigger an application. The application is invoked to provide supplemental information about the textual information or to perform an action in response to the textual information for or on behalf of a party of the communication session. The supplemental information or a confirmation of the action is transmitted to the party.04-29-2010
20100106498SYSTEM AND METHOD FOR TARGETED ADVERTISING - Disclosed herein are systems, methods, and computer readable-media for targeted advertising, the method including receiving an audio stream containing user speech from a first device, generating text based on the speech contained in the audio stream, identifying at least one key phrase in the text, receiving from an advertiser an advertisement related to the identified at least one key phrase, and displaying the advertisement. In one aspect, the method further includes receiving from an advertiser a set of rules associated with the received advertisement and displaying the advertisement in accordance with the associated set of rules. The first device can be a converged voice and data communications device connected to a network. The communications device can generate text based on the speech. In one aspect, the method displays the advertisement on one or both of a converged voice and data communications device and a second communications device. A central server can generate text based on the speech. At least one key phrase in the text can be identified based on a confidence score threshold. In another aspect, the method further includes receiving multiple audio streams containing speech from a same user and generating text based on the speech contained in the multiple audio streams. The advertisement can be displayed after the audio stream terminates.04-29-2010
20100106499METHODS AND APPARATUS FOR LANGUAGE IDENTIFICATION - In a multi-lingual environment, a method and apparatus for determining a language spoken in a speech utterance. The method and apparatus test acoustic feature vectors extracted from the utterances against acoustic models associated with one or more of the languages. Speech to text is then performed for the language indicated by the acoustic testing, followed by textual verification of the resulting text. During verification, the resulting text is processed by language specific NLP and verified against textual models associated with the language. The system is self-learning, i.e., once a language is verified or rejected, the relevant feature vectors are used for enhancing one or more acoustic models associated with one or more languages, so that acoustic determination may improve.04-29-2010
20120215534System and Method for Automatic Storage and Retrieval of Emergency Information - A vehicle communication system includes a computer processor in communication with a memory circuit, a transceiver in communication with the processor and operable to communicate with one or more wireless devices, and one or more storage locations storing one or more pieces of emergency contact information. In this illustrative system, the processor is operable to establish communication with a first wireless device through the transceiver. Upon detection of an emergency event by at least one vehicle based sensor system, the vehicle communication system is operable to contact an emergency operator. The vehicle communication system is further operable to display one or more of the one or more pieces of emergency contact information in a selectable manner. Upon selection of one of the one or more pieces of emergency contact information, the vehicle computing system places a call to a phone number associated with the selected emergency contact.08-23-2012
20120185250Distributed Dictation/Transcription System - A distributed dictation/transcription system is provided. The system provides a client station, dictation manager, and dictation server networked such that the dictation manager selects a dictation server to transcribe audio from the client station. The dictation manager selects one of a plurality of dictation servers based on conventional load balancing and on a determination of whether the user profile is already uploaded to a dictation server. While selecting a dictation server or uploading a profile, the client may begin dictating, which audio would be stored in a buffer of dictation manager until a dictation server was selected or available. The user may receive in real time or near real time a display of the textual data that may be corrected by the user to update the user profile.07-19-2012
20120185249METHOD AND SYSTEM FOR SPEECH BASED DOCUMENT HISTORY TRACKING - A method and a system of history tracking corrections in a speech based document. The speech based document comprises one or more sections of text recognized or transcribed from sections of speech, wherein the sections of speech are dictated by a user and processed by a speech recognizer in a speech recognition system into corresponding sections of text of the speech based document. The method comprises associating at least one speech attribute to each section of text in the speech based document, said speech attribute comprising information related to said section of text, respectively; presenting said speech based document on a presenting unit; detecting an action being performed within any of said sections of text; and updating information of said speech attributes related to the kind of action detected on one of said sections of text for updating said speech based document.07-19-2012
20130041661AUDIO COMMUNICATION ASSESSMENT - A device may include a communication interface configured to receive audio signals associated with audible communications from a user; an output device; and logic. The logic may be configured to determine one or more audio qualities associated with the audio signals, map the one or more audio qualities to at least one value, generate audio-related information based on the mapping, and provide, via the output device during the audible communications, the audio-related information to the user.02-14-2013
20120166193METHOD AND SYSTEM FOR AUTOMATIC TRANSCRIPTION PRIORITIZATION - A visual toolkit for prioritizing speech transcription is provided. The toolkit can include a logger (06-28-2012
20130046538VISUALIZATION INTERFACE OF CONTINUOUS WAVEFORM MULTI-SPEAKER IDENTIFICATION - A method implemented in a computer infrastructure having computer executable code having programming instructions tangibly embodied on a computer readable storage medium. The programming instructions are operable to receive a current waveform of a communication between a plurality of participants. Additionally, the programming instructions are operable to create a voiceprint from the current waveform if the current waveform is of a human voice. Furthermore, the programming instructions are operable to determine one of whether a match exists between the voiceprint and one library waveform of one or more library waveforms, whether a correlation exists between the voiceprint and a number of library waveforms of the one or more library waveforms and whether the voiceprint is unique. Additionally, the programming instructions are operable to transcribe the current waveform into text and provide a match indication display (MID) indicating an association between the current waveform and the one or more library waveforms based on the determining.02-21-2013
20130046537Systems and Methods for Providing an Electronic Dictation Interface - Some embodiments disclosed herein store a target application and a dictation application. The target application may be configured to receive input from a user. The dictation application interface may include a full overlay mode option, where in response to selection of the full overlay mode option, the dictation application interface is automatically sized and positioned over the target application interface to fully cover a text area of the target application interface to appear as if the dictation application interface is part of the target application interface. The dictation application may be further configured to receive an audio dictation from the user, convert the audio dictation into text, provide the text in the dictation application interface and in response to receiving a first user command to complete the dictation, automatically copy the text from the dictation application interface and inserting the text into the target application interface.02-21-2013
20090306981Systems and methods for conversation enhancement - This invention description details systems and methods for improving human conversations by enhancing conversation participants' ability to: —Distill out and record core ideas of conversations. —Classify and prioritize these key concepts. —Recollect commitments and issues and take appropriate action. —Analyze and uncover new insight from the linkage of these ideas with those from other conversations.12-10-2009
20090306979Data processing system for autonomously building speech identification and tagging data - A method, system, and computer program product for autonomously transcribing and building tagging data of a conversation. A corpus processing agent monitors a conversation and utilizes a speech recognition agent to identify the spoken languages, speakers, and emotional patterns of speakers of the conversation. While monitoring the conversation, the corpus processing agent determines emotional patterns by monitoring voice modulation of the speakers and evaluating the context of the conversation. When the conversation is complete, the corpus processing agent determines synonyms and paraphrases of spoken words and phrases of the conversation taking into consideration any localized dialect of the speakers. Additionally, metadata of the conversation is created and stored in a link database, for comparison with other processed conversations. A corpus, a transcription of the conversation containing metadata links, is then created. The corpus processing agent also determines the frequency of spoken keywords and phrases and compiles a popularity index.12-10-2009
20120191452REPRESENTING GROUP INTERACTIONS - Disclosed is a system for generating a representation of a group interaction, the system comprising: a transcription module adapted to generate a transcript of the group interaction from audio source data representing the group interaction, the transcript comprising a sequence of lines of text, each line corresponding to an audible utterance in the audio source data; and a labeling module adapted to generate a conversation path from the transcript by labeling each transcript line with an identifier identifying the speaker of the corresponding utterance in the audio source data; and generate the representation of the group interaction by associating the conversation path with a plurality of voice profiles, each voice profile corresponding to an identified speaker in the conversation path.07-26-2012
20120191451STORAGE AND ACCESS OF DIGITAL CONTENT - In one embodiment, the invention provides a method, comprising providing a first communications channel to transmit digital content to a notes-access application for storage against a particular user, the first communications channel being selected from the group consisting of an SMS channel, an MMS channel, a fax channel, an e-mail channel, and an IM channel; responsive to receiving digital content from said user via the first communications channel storing said digital content in the database associated with said notes-access application; and providing a second communications channel to the notes-access application whereby the digital content stored by the notes-access application against said user is provided to said user, the second communications channel being selected from the group consisting of an SMS channel, an MMS channel, a fax channel, an e-mail channel, and an IM channel.07-26-2012
20110015927SYSTEM AND METHOD FOR EFFICIENT LASER PROCESSING OF A MOVING WEB-BASED MATERIAL - An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.01-20-2011
20110015926WORD DETECTION FUNCTIONALITY OF A MOBILE COMMUNICATION TERMINAL - Voice communication by first and second users in a voice communication session that facilitates communication between a first device through which a first user communicates and a second device through which a second user communicates is enabled. Words spoken in the voice communication session between the first device and the second device are monitored. Presence of one or more key words as a subset of less than all of the monitored spoken in the voice communication session is determined from the monitored words spoken in the voice communication session. The one or more key words are displayed on a display screen.01-20-2011
20130060568OBSERVATION PLATFORM FOR PERFORMING STRUCTURED COMMUNICATIONS - Using structured communications within an organization or retail environment, the users establish a fabric of communications that allows external users of devices or applications to integrate in a way that is non-disruptive, measured and structured. An observation platform may be used for performing structured communications. A signal is received from a first communication device at a second communication device associated with a computer system, wherein the computer system is associated with an organization, wherein a first characteristic of the signal corresponds to an audible source and a second characteristic of the signal corresponds to information indicative of a geographic position of the first communication device.03-07-2013
20120310643METHODS AND APPARATUS FOR PROOFING OF A TEXT INPUT - Techniques for presenting data input as a plurality of data chunks including a first data chunk and a second data chunk. The techniques include converting the plurality of data chunks to a textual representation comprising a plurality of text chunks including a first text chunk corresponding to the first data chunk and a second text chunk corresponding to the second data chunk, respectively, and providing a presentation of at least part of the textual representation such that the first text chunk is presented differently than the second text chunk to, when presented, assist a user in proofing the textual representation.12-06-2012
20120310642AUTOMATICALLY CREATING A MAPPING BETWEEN TEXT DATA AND AUDIO DATA - Techniques are provided for creating a mapping that maps locations in audio data (e.g., an audio book) to corresponding locations in text data (e.g., an e-book). Techniques are provided for using a mapping between audio data and text data, whether or not the mapping is created automatically or manually. A mapping may be used for bookmark switching where a bookmark established in one version of a digital work is used to identify a corresponding location with another version of the digital work. Alternatively, the mapping may be used to play audio that corresponds to text selected by a user. Alternatively, the mapping may be used to automatically highlight text in response to audio that corresponds to the text being played. Alternatively, the mapping may be used to determine where an annotation created in one media context (e.g., audio) will be consumed in another media context (e.g., text).12-06-2012
20110066431HAND-HELD INPUT APPARATUS AND INPUT METHOD FOR INPUTTING DATA TO A REMOTE RECEIVING DEVICE - A hand-held input apparatus includes an input unit, a translator and a wireless transmitter. The input unit generates an input signal. The translator receives the input signal from the input unit, converts the input signal to a meaningful text and translates the meaningful text to a translated signal according to a protocol used in a remote receiving device. The wireless transmitter wirelessly transmits the translated signal to the remote receiving device.03-17-2011
20090271194Speech recognition and transcription among users having heterogeneous protocols - A system is disclosed for facilitating speech recognition and transcription among users employing incompatible protocols for generating, transcribing, and exchanging speech. The system includes a system transaction manager that receives a speech information request from at least one of the users. The speech information request includes formatted spoken text generated using a first protocol. The system also includes a speech recognition and transcription engine, which communicates with the system transaction manager. The speech recognition and transcription engine receives the speech information request from the system transaction manager and generates a transcribed response, which includes a formatted transcription of the formatted speech. The system transmits the response to the system transaction manager, which routes the response to one or more of the users. The latter users employ a second protocol to handle the response, which may be the same as or different than the first protocol. The system transaction manager utilizes a uniform system protocol for handling the speech information request and the response.10-29-2009
20090271193SUPPORT DEVICE, PROGRAM AND SUPPORT METHOD - A support device, program and support method for supporting generation of text from speech data. The support device includes a confirmed rate calculator, a candidate obtaining unit and a selector. The confirmed rate calculator calculates a confirmed utterance rate which is an utterance rate of a confirmed part having already-confirmed text in the speech data. The candidate obtaining unit obtains multiple candidate character strings resulting from a speech recognition of an unconfirmed part having unconfirmed text in the speech data. The selector preferentially selects, from among the plurality of candidate character strings, a candidate character string whose utterance time consumed in uttering the candidate character string at the confirmed utterance rate is closest to an utterance time of the unconfirmed part of the speech data.10-29-2009
20090271192METHOD AND SYSTEMS FOR MEASURING USER PERFORMANCE WITH SPEECH-TO-TEXT CONVERSION FOR DICTATION SYSTEMS - A computer-implemented system and method for evaluating the performance of a user using a dictation system is provided. The system and method include receiving a text or transcription file generated from user audio. A performance metric, such as, for example, words/minute or errors is generated based on the transcription file. The performance metric is provided to an administrator so the administrator can evaluate the performance of the user using the dictation system.10-29-2009
20090271191METHOD AND SYSTEMS FOR SIMPLIFYING COPYING AND PASTING TRANSCRIPTIONS GENERATED FROM A DICTATION BASED SPEECH-TO-TEXT SYSTEM - A computer-implemented method for simplifying the pasting of textual transcriptions from a transcription engine into an application is described. An audio file is sent to a transcription engine. A textual transcription file of the audio file is received from the transcription engine. The textual transcription file is automatically loaded into a copy buffer. The textual transcription file is pasted from the copy buffer into an application.10-29-2009
20120226499SCRIPTING SUPPORT FOR DATA IDENTIFIERS, VOICE RECOGNITION AND SPEECH IN A TELNET SESSION - Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields.09-06-2012
20110046950Wireless Dictaphone Features and Interface - A system and method for a method for integrating a communications system with a dictation system on a mobile device includes displaying a first graphical user interface screen on a display of the mobile device, the first graphical user interface screen including a first plurality of selections, when selected by a user, enable the user to dictate and create one or more voice files for sending to a receiving server; and automatically displaying a second graphical user interface screen on the display of the mobile device when the communications system receives an incoming call, said second graphical user interface screen indicating suspension of dictation functionality and enabling telephone functionality.02-24-2011
20090030682System and method for publishing media files - A method for publishing a digital media file. The method includes the steps of receiving the digital media file containing speech, converting the speech to text, identifying a keyword in the text, retrieving, for the keyword, a corresponding URL from a database, inserting into the text a hyperlink linking the keyword with the corresponding URL, and making the media file and the text available to a subscriber.01-29-2009
20130066630AUDIO TRANSCRIPTION GENERATOR AND EDITOR - A system for correcting errors in automatically generated audio transcriptions includes an audio recorder, a computerized transcription generator, a voice recording, a collection of link data, transcription text, an audio player, a system of cross linking, and a text editor including a text display with a cursor. The system permits a user to correct transcription errors using techniques of jump to position; show position; and track playback.03-14-2013
20120116761Minimum Converted Trajectory Error (MCTE) Audio-to-Video Engine - Embodiments of an audio-to-video engine are disclosed. In operation, the audio-to-video engine generates facial movement (e.g., a virtual talking head) based on an input speech. The audio-to-video engine receives the input speech and recognizes the input speech as a source feature vector. The audio-to-video engine then determines a Maximum A Posterior (MAP) mixture sequence based on the source feature vector. The MAP mixture sequence may be a function of a refined Gaussian Mixture Model (GMM). The audio-to-video engine may then use the MAP to estimate video feature parameters. The video feature parameters are then interpreted as facial movement. The facial movement may be stored as data to a storage module and/or it may be displayed as video to a display device.05-10-2012
20090012788SIGN LANGUAGE TRANSLATION SYSTEM - The translation system of a preferred embodiment includes an input element that receives an input language as audio information, an output element that displays an output language as visual information, and a remote server coupled to the input element and the output element, the remote server including a database of sign language images; and a processor that receives the input language from the input element, translates the input language into the output language, and transmits the output language to the output element, wherein the output language is a series of the sign language images that correspond to the input language and that are coupled to one another with substantially seamless continuity, such that the ending position of a first image is blended into the starting position of a second image.01-08-2009
20090012787DIALOG PROCESSING SYSTEM, DIALOG PROCESSING METHOD AND COMPUTER PROGRAM - A dialog processing system which includes a target expression data extraction unit for extracting a plurality of target expression data each including a pattern matching portion which matches an utterance pattern, which are inputted by an utterance pattern input unit and is an utterance structure derived from contents of field-independent general conversations, among a plurality of utterance data which are inputted by an utterance data input unit and obtained by converting contents of a plurality of conversations in one field; a feature extraction unit for retrieving the pattern matching portions, respectively, from the plurality of target expression data extracted, and then for extracting feature quantity common to the plurality of pattern matching portions; and a mandatory data extraction unit for extracting mandatory data in the one field included in the plurality of utterance data by use of the feature quantities extracted.01-08-2009
20130166293AUTOMATIC DISCLOSURE DETECTION - A method of detecting pre-determined phrases to determine compliance quality is provided. The method includes determining whether at least one of an event or a precursor event has occurred based on a comparison between pre-determined phrases and a communication between a sender and a recipient in a communications network, and rating the recipient based on the presence of the pre-determined phrases associated with the event or the presence of the pre-determined phrases associated with the precursor event in the communication.06-27-2013
20090018829Speech Recognition Dialog Management - Described is a speech recognition dialog management system that allows more open-ended conversations between virtual agents and people than are possible using just agent-directed dialogs. The system uses both novel dialog context switching and learning algorithms based on spoken interactions with people. The context switching is performed through processing multiple dialog goals in a last-in-first-out (LIFO) pattern. The recognition accuracy for these new flexible conversations is improved through automated learning from processing errors and addition of new grammars.01-15-2009
20130166292Accessing Content Using a Source-Specific Content-Adaptable Dialogue - A system for accessing content maintains a set of content selections associated with a first user. The system receives first original content from a first content source associated with a first one of the content selections associated with the first user. The system applies, to the first original content, a first rule (such as a parsing rule) that is specific to the first one of the content selections, to produce first derived content. The system changes the state of at least one component of a human-machine dialogue system (such as a text-to-act engine, a dialogue manager, or an act-to-text engine) based on the first derived content. The system may apply a second rule (such as a dialogue rule) to the first derived content to produce rule output and change the state of the human-machine dialogue system based on the rule output.06-27-2013
20080294434Live Media Captioning Subscription Framework for Mobile Devices - A subscription-based system provides transcribed audio information to one or more mobile devices. Some techniques feature a system for providing subscription services for currently-generated (e.g., not stored) information (e.g., caption information, transcribed audio) for one or more mobile devices for a live/current audio event. There can be a communication network for communicating to the one or more mobile devices, a transcriber configured for transcribing the event to generate information (e.g., caption information, transcribed audio). Caption data includes transcribed data and control code data. The system includes a subscription gateway configured for live/current transfer of the transcribed data to the one or more mobile devices. The subscription gateway is configured to provide access for the transcribed data to the one or more mobile devices. User preferences for subscribers can be set and/or updated by mobile device users and/or GPS-capable mobile devices to receive feeds for the live/current audio event.11-27-2008
20110035217SPEECH-DRIVEN SELECTION OF AN AUDIO FILE - A system and method for detecting a refrain in an audio file having vocal components. The method and system includes generating a phonetic transcription of a portion of the audio file, analyzing the phonetic transcription and identifying a vocal segment in the generated phonetic transcription that is repeated frequently. The method and system further relate to the speech-driven selection based on similarity of detected refrain and user input.02-10-2011
20120316874RADIOLOGY VERIFICATION SYSTEM AND METHOD - A system and method of radiology verification is provided. The verification may be implemented as a standalone software utility, as part of a radiology imaging graphical user interface, or within a more complex computing system configured for generating radiology reports.12-13-2012
20120173236SPEECH TO TEXT CONVERTING DEVICE AND METHOD - A speech to text converting device includes a display, a voice receiving module, and a voice recognition module, an input module, and a control module. The voice receiving module receives a speech within a certain period of time. The voice recognition module converts the speech to voice data. The control module establishes text data corresponding to the voice data and displays the text data, any inputted words, and the relevant time period.07-05-2012
20110276325Training A Transcription System - According to certain embodiments, training a transcription system includes accessing recorded voice data of a user from one or more sources. The recorded voice data comprises voice samples. A transcript of the recorded voice data is accessed. The transcript comprises text representing one or more words of each voice sample. The transcript and the recorded voice data are provided to a transcription system to generate a voice profile for the user. The voice profile comprises information used to convert a voice sample to corresponding text.11-10-2011
20110282664METHOD AND SYSTEM FOR ASSISTING INPUT OF TEXT INFORMATION FROM VOICE DATA - Methods and systems for providing services and/or computing resources are provided. A method may include converting voice data into text data and tagging at least one portion of the text data in the text conversion file with at least one tag, the at least one tag indicating that the at least one portion of the text data includes a particular type of data. The method may also include displaying the text data on a display such that the at least one portion of text data is displayed with at least one associated graphical element indicating that the at least one portion of text data is associated with the at least one tag. The at least one portion of text data may be a selectable item on the display allowing a user interfacing with the display to select the at least one portion of text data in order to apply the at least one portion of text data to an application.11-17-2011
20090037170METHOD AND APPARATUS FOR VOICE COMMUNICATION USING ABBREVIATED TEXT MESSAGES - The present invention relates generally to an apparatus and method for capturing and producing voice using abbreviated text messages, specifically, to translate voice into an abbreviated message text format for transmission in a communication system and to translate abbreviated message text received from a communication system to voice.02-05-2009
20130041663COMMUNICATION APPLICATION FOR CONDUCTING CONVERSATIONS INCLUDING MULTIPLE MEDIA TYPES IN EITHER A REAL-TIME MODE OR A TIME-SHIFTED MODE - A communication application configured to support a conversation among participants over a communication network. The communication application is configured to (i) support one or more media types within the context of the conversation, (ii) interleave the one or more media types in a time-indexed order within the context of the conversation, (iii) enable the participants to render the conversation including the interleaved one or more media types in either a real-time rendering mode or time-shifted rendering mode, and (iv) seamlessly transition the conversation between the two modes so that the conversation may take place substantially live when in the real-time rendering mode or asynchronously when in the time-shifted rendering mode.02-14-2013
20110276327VOICE-TO-EXPRESSIVE TEXT - A method including receiving a vocal input including words spoken by a user; determining vocal characteristics associated with the vocal input mapping the vocal characteristics to textual characteristics; and generating a voice-to-expressive text that includes, in addition to text corresponding to the words spoken by the user, a textual representation of the vocal characteristics based on the mapping.11-10-2011
20110301951ELECTRONIC QUESTIONNAIRE - A questionnaire is presented to a user in a more efficient manner in which the user is more likely to participate. The questionnaire is sent electronically to the user's vehicle and presented audibly to the user. The user responds audibly to the questions in the questionnaire. The user's responses are converted to text and sent back to the provider server for tallying.12-08-2011
20110301952SPEECH RECOGNITION PROCESSING SYSTEM AND SPEECH RECOGNITION PROCESSING METHOD - The present invention provides a speech recognition processing system in which speech recognition processing is executed parallelly by plural speech recognizing units. Before text data as the speech recognition result is output from each of the speech recognizing units, information indicating each speaker is parallelly displayed on a display in emission order of each speech. When the text data is output from each of the speech recognizing units, the text data is associated with the information indicating each speaker and the text data is displayed.12-08-2011
20110288863VOICE STREAM AUGMENTED NOTE TAKING - Voice stream augmented note taking may be provided. An audio stream associated with at least one speaker may be recorded and converted into text chunks. A text entry may be received from a user, such as in an electronic document. The text entry may be compared to the text chunks to identify matches, and the matching text chunks may be displayed to the user for selection.11-24-2011
20110288862Methods and Systems for Performing Synchronization of Audio with Corresponding Textual Transcriptions and Determining Confidence Values of the Synchronization - Methods and systems for performing audio synchronization with corresponding textual transcription and determining confidence values of the timing-synchronization are provided. Audio and a corresponding text (e.g., transcript) may be synchronized in a forward and reverse direction using speech recognition to output a time-annotated audio-lyrics synchronized data. Metrics can be computed to quantify and/or qualify a confidence of the synchronization. Based on the metrics, example embodiments describe methods for enhancing an automated synchronization process to possibly adapted Hidden Markov Models (HMMs) to the synchronized audio for use during the speech recognition. Other examples describe methods for selecting an appropriate HMM for use.11-24-2011
20110288861Audio Synchronization For Document Narration with User-Selected Playback - Disclosed are techniques and systems to provide a narration of a text. In some aspects, the techniques and systems described herein include generating a timing file that includes elapsed time information for expected portions of text that provides an elapsed time period from a reference time in an audio recording to each portion of text in recognized portions of text11-24-2011
20120065969System and Method for Contextual Social Network Communications During Phone Conversation - An embodiment of the invention includes methods and systems for contextual social network communications during a phone conversation. A telephone conversation between a first user and at least one second user is monitored. More specifically, a monitor identifies terms spoken by the first user and the second user during the telephone conversation. The terms spoken are translated into textual keywords by a translating module. One or more of the second user's web applications are searched by a processor for portion(s) of the second user's web applications that include at least one of the keywords. The processor also searches one or more of the first user's web applications for portion(s) of the first user's web applications that include at least one of the keywords. The portion(s) of the second user's web applications and the portion(s) of the first user's web applications are displayed to the first user during the telephone conversation.03-15-2012
20090326940AUTOMATED VOICE-OPERATED USER SUPPORT - An information device for voice-operated support of a user includes a storage medium, a knowledge database, a processing unit, an input device, a recording component, a transcription component, and an ontological analysis component. A signal is detected by the input device and stored by the recording component via the processing unit in the storage medium. The signal is transformed into a corresponding text by the transcription component via the processing unit and stored in the storage medium. The ontological analysis component categorizes the text via the processing unit using the knowledge database and processes the text using the categorization and the knowledge database via the processing unit.12-31-2009
20090030680Method and System of Indexing Speech Data - A method and system of indexing speech data. The method includes indexing word transcripts including a timestamp for a word occurrence; and indexing sub-word transcripts including a timestamp for a sub-word occurrence. A timestamp in the index indicates the time and duration of occurrence of the word or sub-word in the speech data, and word and sub-word occurrences can be correlated using the timestamps. A method of searching speech transcripts is also provided in which a search query in the form of a phrase to be searched includes at least one in-vocabulary word and at least one out-of-vocabulary word. The method of searching includes extracting the search terms from the phrase, retrieving a list of occurrence of words for an in-vocabulary search term from an index of words having timestamps, retrieving a list of occurrences of sub-words for an out-of-vocabulary search term from an index of sub-words having timestamps, and merging the retrieved lists of occurrences of words and sub-words according to their timestamps.01-29-2009
20120065971VOICE CONTROL OF MULTIMEDIA AND COMMUNICATIONS DEVICES - A method for operating a communications device can include receiving a plurality of spoken commands uttered by a user, the plurality of spoken commands comprising a custom written communication message to be displayed. The method can also include executing a speech recognition engine to recognize and convert each of the spoken commands into corresponding electronic signals that selectively enable and operatively control each of a plurality of multimedia units and at least one light array, wherein the electronic signals are configured to cause multiple light units of the light array to be selectively activated and display the custom written communication message. The method can further include transmitting audio signals received from different ones of the plurality of multimedia units to a radio via a preset open radio channel for broadcasting the audio signals through at least one speaker connected to the radio.03-15-2012
20100070275SPEECH TO MESSAGE PROCESSING - Voice message processors are configured to produce text representations of voice messages. The text representations can be compacted based on one or more abbreviation libraries or rule libraries. Abbreviation processing can be applied to produce a compact text representation based on display properties of a destination device or to enhance user perception. Text representation length can be reduced based on abbreviations in a standard abbreviation list, a user specific abbreviation list, or a combination of standard and custom lists. In some examples, text length is shortened based on stored rules. Mobile stations are configured to receive text representations of voice messages and request delivery of the associated voice messages based on message identifiers or message availability indicators that are presented on mobile station display. Network elements comprise a voice message processor and are configured to produce text representations and deliver text representation or voice messages based as requested by message recipients.03-18-2010
20100036662JOURNALING DEVICE AND INFORMATION MANAGEMENT SYSTEM - A portable journaling device is useful for thought and behavior tracking, documenting, optimizing, teaching, and training. The device is arranged and configured to receive and store voice data, and to transmit the received voice data to a server. The server transcribes the voice data, if possible, and stores the transcribed data. If the server is unable to transcribe the voice data, the voice data is automatically forwarded to a data transcription service for manual transcription. The manually transcribed data is then received by the server where the manually transcribed data is stored.02-11-2010
20100114571INFORMATION RETRIEVAL SYSTEM, INFORMATION RETRIEVAL METHOD, AND INFORMATION RETRIEVAL PROGRAM - An information retrieval system comprises: a speech input unit for inputting speech; an information storage unit for storing information with which speech information, of a length with which text degree of similarity is computable, is associated as a retrieval tag; an information selection unit for comparing a feature of each spoken content item extracted from each item of said speech information, with a feature of spoken content extracted from said input speech, to select information with which speech information similar to input speech is associated. The system further comprises an output unit for outputting information selected by said information selection unit, as information associated with input speech.05-06-2010
20090313014MOBILE TERMINAL AND METHOD FOR RECOGNIZING VOICE THEREOF - A method for detecting a character or a word emphasized by a user from a voice inputted in a mobile terminal to refer it as meaningful information for a voice recognition, or emphatically displaying the user-emphasized character or word in a pre-set format when the inputted voice is converted into text, and a mobile terminal implementing the same are disclosed. The mobile terminal includes: a microphone to receive a voice of user; a controller to convert the received voice into corresponding text and detect a character or a word emphatically pronounced by the user from the voice; and a display unit to emphatically display the detected character or word in a pre-set format when the converted text is displayed.12-17-2009
20090089055Method and apparatus for identification of conference call participants - A system including a conferencing telephone coupled to or in communication with an identification service. The identification service is configured to poll user devices of conference participants to determine or confirm identities. In response, the user devices transmit audio electronic business cards, which can include user voice samples and/or preprocessed voice recognition data. The identification service stores the resulting audio electronic business card data. When the corresponding participant speaks during the conference, the identification service identifies the speaker.04-02-2009
20090150147RECORDING AUDIO METADATA FOR STORED IMAGES - A method of processing audio signals recorded during display of image data from a media file on a display device to produce semantic understanding data and associating such data with the original media file, includes: separating a desired audio signal from the aggregate mixture of audio signals; analyzing the separated signal for purposes of gaining semantic understanding; and associating the semantic information obtained from the audio signals recorded during image display with the original media file.06-11-2009
20090099845Methods and system for capturing voice files and rendering them searchable by keyword or phrase - A system for capturing voice files and rendering them searchable, comprising one or more devices capable of capturing audio speech electronically, a recorder coupled to the devices for retrieving audio speech, a controller coupled to the recorder, a recognition engine adapted to transcribe audio speech into text, and a database system is disclosed. In the system, the controller causes the recorder to capture audio speech from at least one of the devices, the recorder stores the audio speech as data in the database system, and the recognition engine subsequently retrieves the audio speech data, transcribes the audio speech data into text, and stores the text and data associating the text data with at least the audio speech data in the database system for subsequent retrieval by a search application.04-16-2009
20110264451METHODS AND SYSTEMS FOR TRAINING DICTATION-BASED SPEECH-TO-TEXT SYSTEMS USING RECORDED SAMPLES - A method and apparatus useful to train speech recognition engines is provided. Many of today's speech recognition engines require training to particular individuals to accurately convert speech to text. The training requires the use of significant resources for certain applications. To alleviate the resources, a trainer is provided with the text transcription and the audio file. The trainer updates the text based on the audio file. The changes are provided to the speech recognition to train the recognition engine and update the user profile. In certain aspects, the training is reversible as it is possible to over train the system such that the trained system is actually less proficient.10-27-2011
20080319744METHOD AND SYSTEM FOR RAPID TRANSCRIPTION - A method and system for producing and working with transcripts according to the invention eliminates the foregoing time inefficiencies. By dispersing a source recording to a transcription team in small segments, so that team members transcribe segments in parallel, a rapid transcription process delivers a fully edited transcript within minutes. Clients can view accurate, grammatically correct, proofread and fact-checked documents that shadow live proceedings by mere minutes. The rapid transcript includes time coding, speaker identification and summary. A viewer application allows a client to view a video recording side-by-side with a transcript. Clicking on a word in the transcript locates the corresponding recorded content; advancing a recording to a particular point locates and displays the corresponding spot in the transcript. The recording is viewed using common video features, and may be downloaded. The client can edit the transcript and insert comments. Any number of colleagues can view and edit simultaneously.12-25-2008
20080312919Method and System for Speech Based Document History Tracking - A method and a system of history tracking corrections in a speech based document are disclosed. The speech based document comprises one or more sections of text recognized or transcribed from sections of speech, wherein the sections of speech are dictated by a user and processed by a speech recognizer in a speech recognition system into corresponding sections of text of the speech based document. The method comprises associating of at least one speech attribute (12-18-2008
20120239395Selection of Text Prediction Results by an Accessory - A method for entering text in a text input field using a non-keyboard type accessory includes selecting a character for entry into the text field presented by a portable computing device. The portable computing device determines whether text suggestions are available based on the character. If text suggestions are available, the portable computing device can determine the text suggestions and send them to the accessory, which in turn can display the suggestions on a display. A user operating the accessory can select one of the text suggestions, expressly reject the text suggestions, or ignore the text suggestions. If a text suggestion is selected, the accessory can send the selected text to the portable computing device for populating the text field.09-20-2012
20120109648Speech Morphing Communication System - A communication system is described. The communication system including an automatic speech recognizer configured to receive a speech signal and to convert the speech signal into a text sequence. The communication also including a speech analyzer configured to receive the speech signal. The speech analyzer configured to extract paralinguistic characteristics from the speech signal. In addition, the communication system includes a speech output device coupled with the automatic speech recognizer and the speech analyzer. The speech output device configured to convert the text sequence into an output speech signal based on the extracted paralinguistic characteristics.05-03-2012
20120035926Platform for Enabling Voice Commands to Resolve Phoneme Based Domain Name Registrations - A method, apparatus, and system are directed towards employing machine representations of phonemes to generate and manage domain names, and/or messaging addresses. A user of a computing device may provide an audio input signal such as obtained from human language sounds. The audio input signal is received at a phoneme encoder that converts the sounds into machine representations of the sounds using a phoneme representation viewable as a sequence of alpha-numeric values. The sequence of alpha-numeric values may then be combined with a host name, or the like to generate a URI, a message address, or the like. The generated URI, message address, or the like, may then be used to communication over a network.02-09-2012
20120035925Population of Lists and Tasks from Captured Voice and Audio Content - Automatic capture and population of task and list items in an electronic task or list surface via voice or audio input through an audio recording-capable mobile computing device is provided. A voice or audio task or list item may be captured for entry into a task application interface or into a list authoring surface interface for subsequent use as task items, reminders, “to do” items, list items, agenda items, work organization outlines, and the like. Captured voice or audio content may be transcribed locally or remotely, and transcribed content may be populated into a task or list authoring surface user interface that may be displayed on the capturing device (e.g., mobile telephone), or that may be stored remotely and subsequently displayed in association with a number of applications on a number of different computing devices.02-09-2012
20100121638SYSTEM AND METHOD FOR AUTOMATIC SPEECH TO TEXT CONVERSION - Speech recognition is performed in near-real-time and improved by exploiting events and event sequences, employing machine learning techniques including boosted classifiers, ensembles, detectors and cascades and using perceptual clusters. Speech recognition is also improved using tandem processing. An automatic punctuator injects punctuation into recognized text streams.05-13-2010
20090094027Method, Apparatus and Computer Program Product for Providing Improved Voice Conversion - An apparatus for providing improved voice conversion includes a sub-feature generator and a transformation element. The sub-feature generator may be configured to define sub-feature units with respect to a feature of source speech. The transformation element may be configured to perform voice conversion of the source speech to target speech based on the conversion of the sub-feature units to corresponding target speech sub-feature units using a conversion model trained with respect to converting training source speech sub-feature units to training target speech sub-feature units.04-09-2009
20100268534TRANSCRIPTION, ARCHIVING AND THREADING OF VOICE COMMUNICATIONS - Described is a technology that provides highly accurate speech-recognized text transcripts of conversations, particularly telephone or meeting conversations. Speech is received for recognition when it is at a high quality and separate for each user, that is, independent of any transmission. Moreover, because the speech is received separately, a personalized recognition model adapted to each user's voice and vocabulary may be used. The separately recognized text is then merged into a transcript of the communication. The transcript may be labeled with the identity of each user that spoke the corresponding speech. The output of the transcript may be dynamic as the conversation takes place, or may occur later, such as contingent upon each user agreeing to release his or her text. The transcript may be incorporated into the text or data of another program, such as to insert it as a thread in a larger email conversation or the like.10-21-2010
20100088096Hand held speech recognition device - A hand held device is used for interactively converting speech into text with at least one speaker. The device includes: a screen for displaying text; at least one voice input source for receiving speech from a single speaker; a sound processor operably connected to the voice input source; a storage device capable of storing an operating system, a speech recognition engine, speech-to-text applications and data files; a power source; a navigation system; and a control system operably connected to the screen, each voice input source, the storage device, the power source and the navigation system.04-08-2010
20090094028SYSTEMS AND METHODS FOR MAINTENANCE KNOWLEDGE MANAGEMENT - Knowledge-based information can be captured and processed to create a library of such knowledge. A maintenance worker performing a task for an asset can record audio and/or video information during the performance, and can upload the recording to a maintenance system. The system processes the recording to produce a text file corresponding to any speech during the recording, and generates a search index allowing the text file to be searched by a user. If the task is performed in the context of a work order, for example, information from the work order can be associated with the text file so that a user can search by text search, keyword, task, or other such information. A user then can locate and access the text file and/or the corresponding recording for playback.04-09-2009
20090228274USE OF INTERMEDIATE SPEECH TRANSCRIPTION RESULTS IN EDITING FINAL SPEECH TRANSCRIPTION RESULTS - A communication system includes at least one transmitting device and at least one receiving device, one or more network systems for connecting the transmitting device to the receiving device, and an automatic speech recognition (“ASR”) system, including an ASR engine. A user speaks an utterance into the transmitting device, and the recorded speech audio is sent to the ASR engine. The ASR engine returns intermediate transcription results to the transmitting device, which displays the intermediate transcription results in real-time to the user. The intermediate transcription results are also correlated by utterance fragment to final transcription results and displayed to the user. The user may use the information thus presented to make decisions as to whether to edit the final transcription results or to speak the utterance again, thereby repeating the process. The intermediate transcription results may also be used by the user to edit the final transcription results.09-10-2009
20090187403INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING PROGRAM AND RECORDING MEDIUM - A copyright managing information processing apparatus includes a storage module for storing copyrighted content including audio data; a first topic module for recognizing audio data in content opened to the public by a to-be-opened information processing apparatus, converting the audio data into text data, extracting keywords from the text data, and conducting topic processing using the keywords to create topic information; a second topic module for recognizing audio data in content stored in the storage means, converting the audio data into text data, extracting keywords from the text data, and conducting topic processing using the keywords to create topic information; and a similarity determining module for comparing the topic information generated by the first topic module with that created by the second topic module for thereby determining presence or absence of similarity therebetween.07-23-2009
20100094627AUTOMATIC IDENTIFICATION OF TAGS FOR USER GENERATED CONTENT - A method and system for automatically identifying tags for a media item. An audio track associated with a media item is analyzed. References to individuals in the audio track are compared to known acquaintances of a user. Matches are identified as potential tags. Duplicate matches can be presented to the user for resolution.04-15-2010
20090248410LIGHTED TEXT DISPLAY FOR MOTOR VEHICLES - A system and method are disclosed enabling drivers of a vehicle to enter, edit and post messages on a graphical display attached to the interior or exterior of a vehicle. Voice recognition software allows the driver of the vehicle to input, and the system to display, a message without the need for diverting his or her attention from the road or otherwise manually interacting with the system.10-01-2009
20120143607MULTIMODAL DISAMBIGUATION OF SPEECH RECOGNITION - The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.06-07-2012
20100088095METHODS AND SYSTEM TO GENERATE DATA ASSOCIATED WITH A MEDICAL REPORT USING VOICE INPUTS - Methods and system to generate data associated with a medical report using voice inputs are described herein. In one example implementation, a computer-implemented method to automatically generate data associated with a medical report using voice inputs received during a first encounter includes receiving a voice input from a source and determining an identity of the source. Additionally, the method includes performing a speech-to-text conversion on the voice input to generate a text string representing the voice input and associating the text string with the identity of the source. Further, the example method includes identifying and selecting one or more keywords from the text string. The one or more keywords are associated with one or more data fields. Further still, the method includes populating the one or more data fields with the identified keywords according to values associated with the identified keywords and the identity of the source.04-08-2010
20090182559CONTEXT SENSITIVE MULTI-STAGE SPEECH RECOGNITION - A system enables devices to recognize and process speech. The system includes a database that retains one or more lexical lists. A speech input detects a verbal utterance and generates a speech signal corresponding to the detected verbal utterance. A processor generates a phonetic representation of the speech signal that is designated a first recognition result. The processor generates variants of the phonetic representation based on context information provided by the phonetic representation. One or more of the variants of the phonetic representation selected by the processor are designated as a second recognition result. The processor matches the second recognition result with stored phonetic representations of one or more of the stored lexical lists.07-16-2009
20100100376VISUALIZATION INTERFACE OF CONTINUOUS WAVEFORM MULTI-SPEAKER IDENTIFICATION - A method implemented in a computer infrastructure having computer executable code having programming instructions tangibly embodied on a computer readable storage medium. The programming instructions are operable to receive a current waveform of a communication between a plurality of participants. Additionally, the programming instructions are operable to create a voiceprint from the current waveform if the current waveform is of a human voice. Furthermore, the programming instructions are operable to determine one of whether a match exists between the voiceprint and one library waveform of one or more library waveforms, whether a correlation exists between the voiceprint and a number of library waveforms of the one or more library waveforms and whether the voiceprint is unique. Additionally, the programming instructions are operable to transcribe the current waveform into text and provide a match indication display (MID) indicating an association between the current waveform and the one or more library waveforms based on the determining.04-22-2010
20090287487Systems and Methods for a Visual Indicator to Track Medical Report Dictation Progress - Certain embodiments of the present invention provide a system for medical report dictation including a database component, a voice recognition component, and a user interface component. The database component is adapted to store a plurality of available templates. Each of the plurality of available templates is associated with a template cue. Each template cue includes a list of elements. The voice recognition component is adapted to convert a voice data input to a transcription data output. The user interface component is adapted to receive voice data from a user related to an image and the user interface component is adapted to present a visual indicator to the user. The visual indicator is based on a template cue associated with a template selected from the plurality of available templates. The user interface utilizes the voice recognition component to update the visual indicator.11-19-2009
20100100377GENERATING AND PROCESSING FORMS FOR RECEIVING SPEECH DATA - A system and method for dynamically generating and processing forms for receiving data, such as text-based data or speech data provided over a telephone, mobile device, via a computer and microphone, etc. is disclosed. A form developer can use a toolkit provided by the system to create forms that end-users connect to and complete. The system provides a user-friendly interface for the form developer to create various input fields for the form and impose parameters on the data that may be used to complete or populate those fields. These fields may be included to receive specific information, such as the name of the person filling out the form, or may be free-form, allowing a user to provide a continuous stream of information. Furthermore, the system allows a form developer to establish means for providing access to the form and set access limits on the form. Other aspects are disclosed herein.04-22-2010
20100100378METHOD OF AND SYSTEM FOR IMPROVING ACCURACY IN A SPEECH RECOGNITION SYSTEM - A method for transcribing an audio response includes: 04-22-2010
20090276214METHOD FOR DUAL CHANNEL MONITORING ON A RADIO DEVICE - A method for dual channel monitoring on a radio device as provided enables efficient use of communication network resources. The method includes receiving at the radio device a first speech signal over a first channel, while simultaneously receiving at the radio device a second speech signal over a second channel. The first speech signal is then processed at the radio device to generate a text transcription of the first speech signal, and the text transcription of the first speech signal is displayed on a display screen of the radio device. An audible voice signal is then produced from a speaker that is operatively connected to the radio device simultaneously with displaying the text transcription of the first speech signal.11-05-2009
20090287486Methods and Apparatus to Generate a Speech Recognition Library - Methods and apparatus to generate a speech recognition library for use by a speech recognition system are disclosed. An example method comprises identifying a plurality of video segments having closed caption data corresponding to a phrase, the plurality of video segments associated with respective ones of a plurality of audio data segments, computing a plurality of difference metrics between a baseline audio data segment associated with the phrase and respective ones of the plurality of audio data segments, selecting a set of the plurality of audio data segments based on the plurality of difference metrics, identifying a first one of the audio data segments in the set as a representative audio data segment, determining a first phonetic transcription of the representative audio data segment, and adding the first phonetic transcription to a speech recognition library when the first phonetic transcription differs from a second phonetic transcription associated with the phrase in the speech recognition library.11-19-2009
20090287488TEXT DISPLAY, TEXT DISPLAY METHOD, AND PROGRAM - A text display in which speech information can be effectively conveyed in a text to the user. The text display comprises a speech input section (11-19-2009
20120296647INFORMATION PROCESSING APPARATUS - In an embodiment, an information processing apparatus includes: a converting unit; a selecting unit; a dividing unit; a generating unit; and a display processing unit. The converting unit recognizes a voice input from a user into a character string. The selecting unit selects characters from the character string according to designation of the user. The dividing unit converts the selected characters into phonetic characters and divides the phonetic characters into phonetic characters of sound units. The generating unit extracts similar character candidates corresponding to each of the divided phonetic characters of the sound units, from a similar character dictionary storing a plurality of phonetic characters of sound units similar in sound as the similar character candidates in association with each other, and generates correction character candidates for the selected characters. The display processing unit makes a display unit display the generated correction character candidates selectable by the user.11-22-2012
20110208523VOICE-TO-DACTYLOLOGY CONVERSION METHOD AND SYSTEM - A voice-to-dactylology conversion method and system include primarily a transmitting communication device operating in collaboration with a receiving communication device. An ordinary user can use the transmitting communication device to send a voice message to the receiving communication device. At this time, the receiving communication device converts the voice message into a corresponding dactylology image message and displays the dactylology image message in motion pictures on a screen of the receiving communication device, allowing a deaf-mute to understand a message to be expressed by the other party. On the other hand, the deaf-mute can use the receiving communication device to select images to be expressed, arrange and combine the images, followed by converting them into the voice message to be sent to the transmitting communication device. As a result, the communication method of the deaf-mute can be improved significantly.08-25-2011
20110208522METHOD AND APPARATUS FOR DETECTION OF SENTIMENT IN AUTOMATED TRANSCRIPTIONS - A method and apparatus for automatically detecting sentiment in interactions. The method and apparatus include training, in which a model is generated upon features extracted from training interactions and tagging information. and run-time in which the model is used towards detecting sentiment in further interactions.08-25-2011
20120296646MULTI-MODE TEXT INPUT - Concepts and technologies are described herein for multi-mode text input. In accordance with the concepts and technologies disclosed herein, content is received. The content can include one or more input indicators. The input indicators can indicate that user input can be used in conjunction with consumption or use of the content. The application is configured to analyze the content to determine context associated with the content and/or the client device executing the application. The application also is configured to determine, based upon the content and/or the contextual information, which input device to use to obtain input associated with use or consumption of the content. Input captured with the input device can be converted to text and used during use or consumption of the content.11-22-2012
20080275702System and method for providing digital dictation capabilities over a wireless device - A system and method for providing digital dictation capabilities over a wireless device. The system and method enables digital dictations to be recorded on a wireless device, such as a BlackBerry smartphone, and then uploaded wirelessly to a remote location, such a server, for transcription. Features of the wireless device, such as the display and trackball, can be used to control the dictation.11-06-2008
20080275701SYSTEM AND METHOD FOR RETRIEVING DATA BASED ON TOPICS OF CONVERSATION - A method includes performing computerized monitoring with a computer of at least one side of a telephone conversation, which includes spoken words, between a first person and a second person, automatically identifying at least one topic of the conversation, automatically performing a search for information related to the at least one topic, and outputting a result of the search. Also a system for performing the method.11-06-2008
20080275700Method of and System for Modifying Messages - The invention describes a method of and a system for modifying an input message (IM) containing audio content, which method comprises the steps of converting the audio content (A) of the input message (IM) into elements of a text representation (TR), segmenting the audio content (A) of the input message (IM) into constituent phonetic elements (A11-06-2008
20090055174Method and apparatus for automatically completing text input using speech recognition - Provided are a method and apparatus for automatically completing a text input using speech recognition. The method includes: receiving a first part of a text from a user through a text input device; recognizing a speech of the user, which corresponds to the text; and completing a remaining part of the text based on the first part of the text and the recognized speech. Therefore, accuracy of the text input and convenience of the speech recognition can be ensured, and a non-input part of the text can be easily input based on the input part of the text and the recognized speech at a high speed.02-26-2009
20080281592Method and Apparatus for Annotating Video Content With Metadata Generated Using Speech Recognition Technology - A method and apparatus is provided for annotating video content with metadata generated using speech recognition technology. The method begins by rendering video content on a display device. A segment of speech is received from a user such that the speech segment annotates a portion of the video content currently being rendered. The speech segment is converted to a text-segment and the text-segment is associated with the rendered portion of the video content. The text segment is stored in a selectively retrievable manner so that it is associated with the rendered portion of the video content.11-13-2008
20080288251Tracking Time Using Portable Recorders and Speech Recognition - In general, the present invention converts speech, preferably recorded on a portable recorder, to text, analyzes the text, and determines voice commands and times when the voice commands occurred. Task names are associated with voice commands and time segments. These time segments and tasks may be packaged as time increments and stored (e.g., in a file or database) for further processing. Preferably, phrase grammar rules are used when analyzing the text, as this helps to determine voice commands. Using phrase grammar rules also allows the text to contain a variety of topics, only some of which are pertinent to tracking time.11-20-2008
20080288249Method and System for Dynamic Creation of Contexts - A method and a system for a speech recognition system (11-20-2008
20080312920SPEECH-TO-SPEECH GENERATION SYSTEM AND METHOD - An expressive speech-to-speech generation system which can generate expressive speech output by using expressive parameters extracted from the original speech signal to drive the standard TTS system. The system comprises: speech recognition means, machine translation means, text-to-speech generation means, expressive parameter detection means for extracting expressive parameters from the speech of language A, and expressive parameter mapping means for mapping the expressive parameters extracted by the expressive parameter detection means from language A to language B, and driving the text-to-speech generation means by the mapping results to synthesize expressive speech.12-18-2008
20090326937USING PERSONALIZED HEALTH INFORMATION TO IMPROVE SPEECH RECOGNITION - The claimed subject matter provides systems and/or methods that improve speech recognition in the medical context. The system includes mechanisms that access personal health records associated with patients and/or analyze the personal health records for current diseases and/or past ailments. The system thereafter acquires attributes associated with the diseases or ailments and dynamically populates a speech model with these attributes. The speech model utilizes the attributes associated with the diseases or ailments to more accurately transcribe a voice pattern into text that can be projected on a visual display or persisted to a storage device.12-31-2009
20090326936VOICE RECOGNITION DEVICE, VOICE RECOGNITION METHOD, AND VOICE RECOGNITION PROGRAM - A voice recognition device, method, and program for operating a plurality of control objects recognizing a plurality of user-provided verbal commands. The voice recognition device determines a control object and control content from predefined types of control objects and contents, based on a recognition result of the input verbal command. A voice recognition unit converts input verbal commands into a text expressed with a series of words, a first parsing unit performs an identification process of a first control candidate group as a control candidate for the control object and control content, a second parsing unit performs an identification process of a second control candidate group as a control candidate for the control object and control content, and a control candidate identification unit identifies a final control candidate group for determining the control object and control content from the first control candidate group and the second control candidate group.12-31-2009
20080288250REAL-TIME TRANSCRIPTION SYSTEM - A transcription system and method that includes a transcription terminal for recording electronically generated text as units of transcribed text, and a conversion unit for translating the units of ascribed text into a generally accurate transcript of the electronically generated text and converting said transcript into a signal to be transmitted to an authorized receiving unit over a communication link. The system and method optionally includes any of a presentation object to be transmitted to the authorized receiving unit, a wireless access point for transmitting serial data representing the transcript, and suppression of an automatic network identifier.11-20-2008
20080270128Text Input System and Method Based on Voice Recognition - Provided is a text input system and method based on voice recognition. The system includes: an input unit for receiving part of text, i.e., partial text; a voice input unit for receiving entire text of the partial text by voice; a voice recognition preprocessing unit for analyzing the voice inputted through the voice input unit and transmitting the partial text inputted through the input unit with voice analysis information; a voice recognizing unit for creating a list of a recognition candidates by using the partial text transmitted from the voice recognition preprocessing unit, performing a voice recognition and selecting a text among the recognition candidates; and an output unit for outputting a finally voice recognized text.10-30-2008
20100138221DEDICATED HARDWARE/SOFTWARE VOICE-TO-TEXT SYSTEM - A text preparation system has a first and a second CPU, with the first dedicated to a conventional voice-to-text software and the second to all other functions including a voice-to-text correction software. Voice commands enable the user to initiate the first and the second voice-to-text software and associated lexicons alternately, the second software and lexicon providing a corrections mode for errors made by the first voice-to-text software.06-03-2010
20100145694REPLYING TO TEXT MESSAGES VIA AUTOMATED VOICE SEARCH TECHNIQUES - An automated “Voice Search Message Service” provides a voice-based user interface for generating text messages from an arbitrary speech input. Specifically, the Voice Search Message Service provides a voice-search information retrieval process that evaluates user speech inputs to select one or more probabilistic matches from a database of pre-defined or user-defined text messages. These probabilistic matches are also optionally sorted in terms of relevancy. A single text message from the probabilistic matches is then selected and automatically transmitted to one or more intended recipients. Optionally, one or more of the probabilistic matches are presented to the user for confirmation or selection prior to transmission. Correction or recovery of speech recognition errors avoided since the probabilistic matches are intended to paraphrase the user speech input rather than exactly reproduce that speech, though exact matches are possible. Consequently, potential distractions to the user are significantly reduced relative to conventional speech recognition techniques.06-10-2010
20080235014Method and System for Processing Dictated Information - A method and a system for processing dictated information into a dynamic form are disclosed. The method comprises presenting an image (09-25-2008
20110270609REAL-TIME SPEECH-TO-TEXT CONVERSION IN AN AUDIO CONFERENCE SESSION - Various embodiments of systems, methods, and computer programs are disclosed for providing real-time resources to participants in an audio conference session. One embodiment is a method for providing real-time resources to participants in an audio conference session via a communication network. One such method comprises: a conferencing system establishing an audio conference session between a plurality of computing devices via a communication network, each computing device generating a corresponding audio stream comprising a speech signal; and in real-time during the audio conference session, a server: receiving and processing the audio streams to determine the speech signals; extracting words from the speech signals; analyzing the extracted words to determine a relevant keyword being discussed in the audio conference session; identifying a resource related to the relevant keyword; and providing the resource to one or more of the computing devices.11-03-2011
20090119101Transcript Alignment - An approach to alignment of transcripts with recorded audio is tolerant of moderate transcript inaccuracies, untranscribed speech, and significant non-speech noise. In one aspect, a number of search terms are formed from the transcript such that each search term is associated with a location within the transcript. Possible locations of the search terms are then determined in the audio recording. The audio recording and the transcript are then aligned using the possible locations of the search terms. In another aspect a search expression is accepted, and then a search is performed for spoken occurrences of the search expression in an audio recording. This search includes searching for text occurrences of the search expression in a text transcript of the audio recording, and searching for spoken occurrences of the search expression in the audio recording.05-07-2009
20090119100ASSOCIATING ANNOTATION RECORDING WITH A CELL PHONE NUMBER - A method, system and computer program product for creating voice annotations during a mobile phone call. During the phone call a user engages a trigger on the communication device prompting the phone to first mute the device of the user, and then record an audible message. The audible message, or voice annotation, is automatically linked to the current call information. The voice annotation may be transcribed and stored as a textual annotation. The voice or textual annotation may be retrieved utilizing a graphical user interface (GUI).05-07-2009
20090138262SYSTEMS AND METHODS TO INDEX AND SEARCH VOICE SITES - A method comprises crawling and indexing voice sites and storing results in an index; receiving a search request in voice from a user via a telephone; performing speech recognition on the voice search request and converting the request from voice to text; parsing the query; and performing a search on the index and ranking the search results. Search results may be filtered based on attributes such as location and context. Filtered search results may be presented to the user in categories to enable easy voice browsing of the search results by the user. Computer program code and systems are also provided.05-28-2009
20090048831Scripting support for data identifiers, voice recognition and speech in a telnet session - Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields.02-19-2009
20090006090IMAGE COMMUNICATION APPARATUS AND CONTROL METHOD OF THE SAME - An image communication apparatus includes: an image pickup unit which picks up a user image of a user and processes the user image into a user image signal; an audio input unit which receives a user audio signal of the user; an encoder which encodes the user image signal processed by the image pickup unit and the user audio signal; a communication unit which receives an encoded image signal and an encoded audio signal from outside and transmits the user image signal and the user audio signal which are encoded by the encoder; a decoder which decodes the encoded image signal and the encoded audio signal which are received through the communication unit; and a controller which converts at least one of the user audio signal, the user image signal, the decoded image signal and the decoded audio signal into a data file which is stored.01-01-2009
20090177469SYSTEM FOR RECORDING AND ANALYSING MEETINGS - A system for producing a transcript of a meeting having n attendees, the attendees being identified as ID07-09-2009
20090177470DISTRIBUTED DICTATION/TRANSCRIPTION SYSTEM - A distributed dictation/transcription system is provided. The system provides a client station, dictation manager, and dictation server networked such that the dictation manager can select a dictation server to transcribe audio from the client station. The dictation manager selects one of a plurality of dictation servers based on conventional load balancing as well as on a determination of which of the dictation servers may already have a user profile uploaded. Moreover, while selecting a dictation server and/or uploading a profile, the user or client at the client station may begin dictating, which audio would be stored in a buffer of dictation manager until a dictation server was selected and/or available. The user would receive in real time or near real time a display of the textual data that may be corrected by the user. The corrective textual data may be transmitted back to the dictation manager to update the user profile.07-09-2009
20080306737SYSTEMS AND METHODS FOR CLASSIFYING AND REPRESENTING GESTURAL INPUTS - Gesture and handwriting recognition agents provide possible interpretations of electronic ink. Recognition is performed on both individual strokes and combinations of strokes in the input ink lattice. The interpretations of electronic ink are classified and encoded as symbol complexes where symbols convey specific attributes of the contents of the stroke. The use of symbol complexes to represent strokes in the input ink lattice facilitates reference to sets of entities of a specific type.12-11-2008
20090326939SYSTEM AND METHOD FOR TRANSCRIBING AND DISPLAYING SPEECH DURING A TELEPHONE CALL - A system and method for providing speech transcription to a user during a telephone call may include a receiver configured to receive a telecommunications signal forming a telephone call. The telecommunications signal communicates speech data representative of words spoken by a telephone call participant. A processing unit may be in communication with the receiver and be configured to transcribe the speech data representative of words into text. A display unit may be in communication with the processing unit and be configured to display the text for a user during the telephone call.12-31-2009
20090326941SPEECH RECOGNITION CIRCUIT USING PARALLEL PROCESSORS - A speech recognition circuit comprises an input buffer for receiving processed speech parameters. A lexical memory contains lexical data for word recognition. The lexical data comprises a plurality of lexical tree data structures. Each lexical tree data structure comprises a model of words having common prefix components. An initial component of each lexical tree structure is unique. A plurality of lexical tree processors are connected in parallel to the input buffer for processing the speech parameters in parallel to perform parallel lexical tree processing for word recognition by accessing the lexical data in the lexical memory. A results memory is connected to the lexical tree processors for storing processing results from the lexical tree processors and lexical tree identifiers to identify lexical trees to be processed by the lexical tree processors. A controller controls the lexical tree processors to process lexical trees identified in the results memory by performing parallel processing on a plurality of said lexical tree data structures.12-31-2009
20080319743ASR-Aided Transcription with Segmented Feedback Training - An ASR-aided transcription system with segmented feedback training is provided, the system including a transcription process manager configured to extract a first segment and a second segment from an audio input of speech uttered by a speaker, and an ASR engine configured to operate in a first speech recognition mode to convert the first speech segment into a first text transcript using a speaker-independent acoustic model and a speaker-independent language model, operate in a first training mode to create a speaker-specific acoustic model and a speaker-specific language model by adapting the speaker-independent acoustic model and the speaker-independent language model using either of the first segment and a corrected version of the first text transcript, and operate in a second speech recognition mode to convert the second speech segment into a second text transcript using the speaker-specific acoustic model and the speaker-specific language model.12-25-2008
20090024389Text oriented, user-friendly editing of a voicemail message - A system in one embodiment includes a server associated with a unified messaging system (UMS). The server records speech of a user as an audio data file, translates the audio data file into a text data file, and maps each word within the text data file to a corresponding segment of audio data in the audio data file. A graphical user interface (GUI) of a message editor running on an endpoint associated with the user displays the text data file on the endpoint and allows the user to identify a portion of the text data file for replacement. The server being further operable to record new speech of the user as new audio data and to replace one or more segments of the audio data file corresponding to the portion of the text with the new audio data.01-22-2009
20090083032METHODS AND SYSTEMS FOR DYNAMICALLY UPDATING WEB SERVICE PROFILE INFORMATION BY PARSING TRANSCRIBED MESSAGE STRINGS - Systems, methods, and software for parsing and/or filtering message strings of text messages and/or instant messages in order to identify keywords, phrases, or fragments as a function of which user preferences of user profiles are dynamically updated are disclosed. Such systems, methods, and software are utilized in the context of a communication system including text messaging, instant messaging, or both. Furthermore, such communication system preferably includes an automatic speech recognition (ASR) system. Additionally, ad impressions are selected and delivered to users based, at least in part, on the parsing and/or filtering and/or data maintained in user profiles as dynamically updated from time to time. The ad impression preferably is delivered within a text message or within an instant message conversation and is generally unobtrusive. Revenues preferably may be generated from the delivering of the ad impressions, whereby a provider of instant messaging or text messaging may further derive monetary benefit from providing such service and whereby users of such service may be provided with contextually relevant information in an unobtrusive manner.03-26-2009
20090198493System and Method for Unsupervised and Active Learning for Automatic Speech Recognition - A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.08-06-2009
20110144989SYSTEM AND METHOD FOR AUDIBLE TEXT CENTER SUBSYSTEM - Disclosed herein are systems, methods, and computer-readable storage media for sending a spoken message as a text message. The method includes initiating a connection with a first subscriber, receiving from the first subscriber a spoken message and spoken information associated with at least one recipient address. The method further includes converting the spoken message to text via an audible text center subsystem (ATCS), and delivering the text to the recipient address. The method can also include verifying a subscription status of the first subscriber, or delivering the text to the recipient address based on retrieved preferences of the first subscriber. The preferences can be retrieved from a consolidated network repository or embedded within the spoken message. Text and the spoken message can be delivered to the same or different recipient addresses. The method can include updating recipient addresses based on a received oral command from the first subscriber.06-16-2011
20110144990RATING SPEECH NATURALNESS OF SPEECH UTTERANCES BASED ON A PLURALITY OF HUMAN TESTERS - A method that includes: generating an utterance-specific scoring model for each one of a plurality of obtained speech utterances, each scoring model usable to estimate a level of speech naturalness for a respective one of the obtained speech utterances; presenting a plurality of human-testers with some of the obtained speech utterances; receiving, for each presented speech utterance, a plurality of human tester generated speech utterances being at least one human repetition of the presented speech utterance; updating the scoring model for each presented speech utterance, based on respective human-tester generated speech utterances; and obtaining a speech naturalness score for each presented speech utterance by respectively applying the updated utterance-specific scoring model to each presented speech utterance.06-16-2011
20090055175CONTINUOUS SPEECH TRANSCRIPTION PERFORMANCE INDICATION - A method of providing speech transcription performance indication includes receiving, at a user device data representing text transcribed from an audio stream by an ASR system, and data representing a metric associated with the audio stream; displaying, via the user device, said text; and via the user device, providing, in user-perceptible form, an indicator of said metric. Another method includes displaying, by a user device, text transcribed from an audio stream by an ASR system; and via the user device, providing, in user-perceptible form, an indicator of a level of background noise of the audio stream. Another method includes receiving data representing an audio stream; converting said data representing an audio stream to text via an ASR system; determining a metric associated with the audio stream; transmitting data representing said text to a user device; and transmitting data representing said metric to the user device.02-26-2009
20110224981DYNAMIC SPEECH RECOGNITION AND TRANSCRIPTION AMONG USERS HAVING HETEROGENEOUS PROTOCOLS - A system is disclosed for facilitating free form dictation, including directed dictation and constrained recognition and/or structured transcription among users having heterogeneous native (legacy) protocols for generating, transcribing, and exchanging recognized and transcribed speech. The system includes at least one system transaction manager having a “system protocol,” to receive a verified, streamed speech information request from at least one authorized user employing a first legacy user protocol. The speech information request which includes spoken text and system commands is generated using a user interface capable of bi-directional communication with the system transaction manager and supporting dictation applications, including prompts to direct user dictation in response to user system protocol commands and systems transaction manager commands. A speech recognition and/or transcription engine (ASR), in communication with the systems transaction manager, receives the speech information request from the system transaction manager, generates a transcribed response, which can include a formatted transcription, and transmits the response to the system transaction manager. The system transaction manager routes the response to one or more of the users employing a second protocol, which may be the same as or different than the first protocol. In another embodiment, the system employs a virtual sound driver for streaming free form dictation to any ASR, regardless of the ASR's ability to recognize and/or transcribe spoken text from any input source such as, for example, a live microphone or line input. In another embodiment, the system employs a buffer to facilitate the system's use of ASRs requiring input data to be in batches, while providing the user with an uninterrupted, seamless dictating experience.09-15-2011
20120143605CONFERENCE TRANSCRIPTION BASED ON CONFERENCE DATA - In one implementation, a collaboration server is a conference bridge or other network device configured to host an audio and/or video conference among a plurality of conference participants. The collaboration server sends conference data and a media stream including speech to a speech recognition engine. The conference data may include the conference roster or text extracted from documents or other files shared in the conference. The speech recognition engine updates a default language model according to the conference data and transcribes the speech in the media stream based on the updated language model. In one example, the performance of default language model, the updated language model, or both may be tested using a confidence interval or submitted for approval of the conference participant.06-07-2012
20120078629MEETING SUPPORT APPARATUS, METHOD AND PROGRAM - According to one embodiment, a meeting support apparatus includes a storage unit, a determination unit, a generation unit. The storage unit is configured to store storage information for each of words, the storage information indicating a word of the words, pronunciation information on the word, and pronunciation recognition frequency. The determination unit is configured to generate emphasis determination information including an emphasis level that represents whether a first word should be highlighted and represents a degree of highlighting determined in accordance with a pronunciation recognition frequency of a second word when the first word is highlighted, based on whether the storage information includes second set corresponding to first set and based on the pronunciation recognition frequency of the second word when the second set is included. The generation unit is configured to generate an emphasis character string based on the emphasis determination information when the first word is highlighted.03-29-2012
20120078628HEAD-MOUNTED TEXT DISPLAY SYSTEM AND METHOD FOR THE HEARING IMPAIRED - The head-mounted text display system for the hearing impaired is a speech-to-text system, in which spoken words are converted into a visual textual display and displayed to the user in passages containing a selected number of words. The system includes a head-mounted visual display, such as eyeglass-type dual liquid crystal displays or the like, and a controller. The controller includes an audio receiver, such as a microphone or the like, for receiving spoken language and converting the spoken language into electrical signals. The controller further includes a speech-to-text module for converting the electrical signals representative of the spoken language to a textual data signal representative of individual words. A transmitter associated with the controller transmits the textual data signal to a receiver associated with the head-mounted display. The textual data is then displayed to the user in passages containing a selected number of individual words.03-29-2012
20120078627ELECTRONIC DEVICE WITH TEXT ERROR CORRECTION BASED ON VOICE RECOGNITION DATA - During operation of an electronic device such as a cellular telephone with a touch screen display or other electronic equipment, a voice recognition engine may gather data on spoken words. Data on the spoken words that are recognized may be maintained in a spoken word database maintained by an input processor with an autocorrection engine. A user may supply text input that contains mistyped words to the electronic device using the touch screen or a keyboard. The input processor may use the autocorrection engine to automatically replace mistyped words with corrected versions of the mistyped words. The corrected words may be displayed in real time as the user supplies the text input. The autocorrection engine may make word correction decisions based at least partly on information in the spoken word database.03-29-2012
20120078626SYSTEMS AND METHODS FOR CONVERTING SPEECH IN MULTIMEDIA CONTENT TO TEXT - Methods and systems for converting speech to text are disclosed. One method includes analyzing multimedia content to determine the presence of closed captioning data. The method includes, upon detecting closed captioning data, indexing the closed captioning data as associated with the multimedia content. The method also includes, upon failure to detect closed captioning data in the multimedia content, extracting audio data from multimedia content, the audio data including speech data, performing a plurality of speech to text conversions on the speech data to create a plurality of transcripts of the speech data, selecting text from one or more of the plurality of transcripts to form an amalgamated transcript, and indexing the amalgamated transcript as associated with the multimedia content.03-29-2012
20090006089METHOD AND APPARATUS FOR STORING REAL TIME INFORMATION ON A MOBILE COMMUNICATION DEVICE - A method and apparatus that stores information on a mobile communication device is disclosed. The method may include receiving a first signal from a user, initiating a recording of information spoken by at least one of the user, a voice mail recording, a recorded message, and a party engaged in the telephone call with the user based on the received first signal, receiving a second signal from the user, stopping the recording of the information based on the second signal being received, converting the recorded information to text, and storing the converted text to a designated location.01-01-2009
20090210225SUPPORTING ELECTRONIC TASK MANAGEMENT SYSTEMS VIA TELEPHONE - The disclosed personal information management (PIM) system supports tasks and reminders via a audio user interface. The user creates a task object via a telephone call to the server. The task object may include an audio recording of the user's voice received during the telephone call. The system may convert the user's speech to text and may store the text in the task object. The system may include other structured data further defining the task such as calling party number, due date, start date, priority, status, percentage complete, categories, or the like. As stored by the system, the task may appear with the user's other tasks in the user's client. The PIM system may provide outbound telephone calls to the user as reminders associated with the user's tasks. The user receiving the reminder call may hear voice prompts, computer generated speech, and/or the audio recording associated with the task.08-20-2009
20080319745METHOD AND DEVICE FOR PROVIDING SPEECH-TO-TEXT ENCODING AND TELEPHONY SERVICE - A machine-readable medium and a network device are provided for speech-to-text translation. Speech packets are received at a broadband telephony interface and stored in a buffer. The speech packets are processed and textual representations thereof are displayed as words on a display device. Speech processing is activated and deactivated in response to a command from a subscriber.12-25-2008
20090281806SYSTEM AND METHOD FOR SPELLING RECOGNITION USING SPEECH AND NON-SPEECH INPUT - A system and method for non-speech input or keypad-aided word and spelling recognition is disclosed. The method includes generating an unweighted grammar, selecting a database of words, generating a weighted grammar using the unweighted grammar and a statistical letter model trained on the database of words, receiving speech from a user after receiving the non-speech input and after generating the weighted grammar, and performing automatic speech recognition on the speech and non-speech input using the weighted grammar. If a confidence is below a predetermined level, then the method includes receiving non-speech input from the user, disambiguating possible spellings by generating a letter lattice based on a user input modality, and constraining the letter lattice and generating a new letter string of possible word spellings until a letter string is correctly recognized.11-12-2009
20100153105SYSTEM AND METHOD FOR REFERRING TO ENTITIES IN A DISCOURSE DOMAIN - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for referring to entities. The method includes receiving domain-specific training data of sentences describing a target entity in a context, extracting a speaker history and a visual context from the training data, selecting attributes of the target entity based on at least one of the speaker history, the visual context, and speaker preferences, generating a text expression referring to the target entity based on at least one of the selected attributes, the speaker history, and the context, and outputting the generated text expression. The weighted finite-state automaton can represent partial orderings of word pairs in the domain-specific training data. The weighted finite-state automaton can be speaker specific or speaker independent. The weighted finite-state automaton can include a set of weighted partial orderings of the training data for each possible realization.06-17-2010
20100179811IDENTIFYING KEYWORD OCCURRENCES IN AUDIO DATA - Occurrences of one or more keywords in audio data are identified using a speech recognizer employing a language model to derive a transcript of the keywords. The transcript is converted into a phoneme sequence. The phonemes of the phoneme sequence are mapped to the audio data to derive a time-aligned phoneme sequence that is searched for occurrences of keyword phoneme sequences corresponding to the phonemes of the keywords. Searching includes computing a confusion matrix. The language model used by the speech recognizer is adapted to keywords by increasing the likelihoods of the keywords in the language model. For each potential occurrences keywords detected, a corresponding subset of the audio data may be played back to an operator to confirm whether the potential occurrences correspond to actual occurrences of the keywords.07-15-2010
20090306980MOBILE TERMINAL AND TEXT CORRECTING METHOD IN THE SAME - A mobile terminal including a voice receiving unit configured to receive input voice, a controller configured to convert the received input voice to text, a display configured to display the converted text, and an input unit configured to select a word included in the displayed converted text. Further, the controller is further configured to control the display to display a plurality of possible candidate words corresponding to the selected word in an arrangement in which a corresponding displayed candidate word is displayed with a proximity from the selected word that is based on how similar the corresponding candidate word is to the selected word.12-10-2009
20120197640System and Method for Unsupervised and Active Learning for Automatic Speech Recognition - A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.08-02-2012
20100185443System and Method for Processing Speech - Systems and methods for processing speech are provided. A system may include a speech recognition interface and a processor. The processor may convert speech received from a call at the speech recognition interface to at least one word string. The processor may parse each word string of the at least one word string into first objects and first actions. The processor may access a synonym table to determine second objects and second actions based on the first objects and the first actions. The processor may also select a preferred object and a preferred action from the second objects and the second actions.07-22-2010
20110112837METHOD AND DEVICE FOR CONVERTING SPEECH - Electronic device and method for speech to text conversion procedure, wherein the overall conversion result may include smaller portions with multiple conversion options that are audibly and optionally visually or tactilely reproduced for user confirmation, thereby resulting enhanced conversion accuracy with minimal additional effort by the user.05-12-2011
20110112836METHOD AND DEVICE FOR CONVERTING SPEECH - Electronic device and method for obtaining a digital speech signal and a control command relating to the digital speech signal while obtaining the digital speech signal, and for temporally associating the control command with a substantially corresponding time instant in the digital speech signal to which the control command was directed, wherein the control command determines one or more punctuation marks or another, optionally symbolic, elements to be at least logically positioned at a text location corresponding to the communication instant relative to the digital speech signal so as to cultivate the speech to text conversion procedure.05-12-2011
20110112835COMMENT RECORDING APPARATUS, METHOD, PROGRAM, AND STORAGE MEDIUM - A comment recording apparatus, including a voice input device and a voice output device for recording and playing back comment voice, includes a voice obtaining unit, a voice recognition unit, a morphological analysis unit, and a display generation unit. The voice obtaining unit obtains comment voice as voice data, and registers the obtained voice data to a voice database for each topic specified by a topic specification device and each comment-delivered participant identified from the voice data. The voice recognition unit conducts a voice recognition process on the voice data to obtain text information. The morphological analysis unit conducts a morphological analysis on the text information, and registers a keyword extracted from words obtained by the morphological analysis unit to a keyword database with topic and comment-delivered participant along with voice. The display generation unit displays the keyword in a matrix while relating the keyword to a topic and a comment-delivering participant.05-12-2011
20100217591VOWEL RECOGNITION SYSTEM AND METHOD IN SPEECH TO TEXT APPLICTIONS - The present invention provides systems, software and methods method for accurate vowel detection in speech to text conversion, the method including the steps of applying a voice recognition algorithm to a first user speech input so as to detect known words and residual undetected words; and detecting at least one undetected vowel from the residual undetected words by applying a user-fitted vowel recognition algorithm to vowels from the known words so as to accurately detect the vowels in the undetected words in the speech input, to enhance conversion of voice to text.08-26-2010
20090076816ASSISTIVE LISTENING SYSTEM WITH DISPLAY AND SELECTIVE VISUAL INDICATORS FOR SOUND SOURCES - A portable assistive listening system for enhancing sound for hearing impaired individuals includes a functional hearing aid and a separate handheld digital signal processing (DSP) device. The invention focuses on a handheld DSP device that provides a visual cue to the user representing the source of an intermittent incoming sound. It is known that it is easier to distinguish and recognize sounds when the user has knowledge of the sound source. The system provides for various wired and/or wireless audio inputs from, for example, a television, a wireless microphone on a person, a doorbell, a telephone, a smoke alarm, etc. The wireless audio sources are linked to the DSP and can be identified as a particular type of source. For example, the telephone input is associated with a graphical image of a telephone, and the smoke alarm is associated with a graphical image of a smoke alarm. The DSP is configured and arranged to monitor the audio sources and will visually display the graphical image of the input source when sound input is detected from the input. Accordingly, when the telephone rings, the DSP device will display the image of the phone as a visual cue to the user that the phone is ringing. Additionally, the DSP will turn on backlight of the display as an added visual cue that there is an incoming audio signal.03-19-2009
20100228547METHOD AND APPARATUS FOR ANALYZING DISCUSSION REGARDING MEDIA PROGRAMS - A system that incorporates teachings of the present disclosure may include, for example, a device, such as a set-top box, including a controller to detect a plurality of users engaging in a voice conference to discuss a presentation of a media program, convert speech dialog detected in the voice conference to textual dialog, detect from the textual dialog a behavioral profile of at least one of the plurality of users, and identify at least one of advertisement content and marketable media content based on the behavioral profile of the at least one user. Other embodiments are disclosed.09-09-2010
20100228546SYSTEM AND METHODS FOR PROVIDING VOICE TRANSCRIPTION - A system and methods is provided for providing SIP based voice transcription services. A computer implemented method includes: transcribing a Session Initiation Protocol (SIP) based conversation between one or more users from voice to text transcription; identifying each of the one or more users that are speaking using a device SIP_ID of the one or more users; marking the identity of the one or more users that are speaking in the text transcription; and providing the text transcription of the speaking user to non-speaking users.09-09-2010
20100241429Systems And Methods For Punctuating Voicemail Transcriptions - A system, method and software product punctuates voicemail transcription text. A transcription text of the voicemail message is generated and the pauses between words of the transcribed text are determined. Ellipses are inserted into the transcription text at the position of “er” and “ahh” type words and pauses between words of the transcribed text.09-23-2010
20100161327SYSTEM-EFFECTED METHODS FOR ANALYZING, PREDICTING, AND/OR MODIFYING ACOUSTIC UNITS OF HUMAN UTTERANCES FOR USE IN SPEECH SYNTHESIS AND RECOGNITION - A computer-implemented method for automatically analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition. Possible steps include: initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data; using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals; analyzing acoustic wave data representing a selected acoustic unit to determine the phase state of the acoustic unit; and analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit. Also included are systems for implementing the described and related methods.06-24-2010
20100223056VARIOUS APPARATUS AND METHODS FOR A SPEECH RECOGNITION SYSTEM - A method, apparatus, and system are described for a continuous speech recognition engine that includes a fine speech recognizer model, a coarse sound representation generator, and a coarse match generator. The fine speech recognizer model receives a time coded sequence of sound feature frames, applies a speech recognition process to the sound feature frames and determines at least a best guess at each recognizable word that corresponds to the sound feature frames. The coarse sound representation generator generates a coarse sound representation of the recognized word. The coarse match generator determines a likelihood of the coarse sound representation actually being the recognized word based on comparing the coarse sound representation of the recognized word to a database containing the known sound of that recognized word and assigns the likelihood as a robust confidence level parameter to that recognized word.09-02-2010
20100036661Methods and Systems for Providing Grammar Services - A computing system, comprising: an I/O platform for interfacing with a user; and a processing entity configured to implement a dialog with the user via the I/O platform. The processing entity is further configured for: identifying a grammar template and an instantiation context associated with a current point in the dialog; causing creation of an instantiated grammar model from the grammar template and the instantiation context; storing the instantiated grammar model in a memory; and interpreting user input received via the I/O platform in accordance with the instantiated grammar model. Also, a grammar authoring environment supporting a variety of grammar development tools is disclosed.02-11-2010
20120245936Device to Capture and Temporally Synchronize Aspects of a Conversation and Method and System Thereof - A system, device, and method for capturing and temporally synchronizing different aspect of a conversation is presented. The method includes receiving an audible statement, receiving a note temporally corresponding to an utterance in the audible statement, creating a first temporal marker comprising temporal information related to the note, transcribing the utterance into a transcribed text, creating a second temporal marker comprising temporal information related to the transcribed text, temporally synchronizing the audible statement, the note, and the transcribed text. Temporally synchronizing comprises associating a time point in the audible statement with the note using the first temporal marker, associating the time point in the audible statement with the transcribed text using the second temporal marker, and associating the note with the transcribed text using the first temporal marker and second temporal marker.09-27-2012
20090276215METHODS AND SYSTEMS FOR CORRECTING TRANSCRIBED AUDIO FILES - Methods and systems for correcting transcribed text. One method includes receiving audio data from one or more audio data sources and transcribing the audio data based on a voice model to generate text data. The method also includes making the text data available to a plurality of users over at least one computer network and receiving corrected text data over the at least one computer network from the plurality of users. In addition, the method can include modifying the voice model based on the corrected text data.11-05-2009
20090319266MULTIMODAL INPUT USING SCRATCHPAD GRAPHICAL USER INTERFACE TO EDIT SPEECH TEXT INPUT WITH KEYBOARD INPUT - A system and method for multimodal input into an application program. The method may include performing speech recognition on speech audio input to thereby produce recognized speech text input for insertion into a document of an application program, the document having keyboard focus. The method may also include identifying the document as being text service framework unaware. The method may further include displaying the recognized speech text input in a scratchpad graphical user interface for editing the recognized speech text input. The method may further include reflecting keyboard input bound for the document to the scratchpad graphical user interface, while preserving the keyboard focus of the document. The method may also include displaying the reflected keyboard input on the scratchpad graphical user interface, to thereby effect edits in the recognized speech text input.12-24-2009
20090144057Method, Apparatus, and Program for Certifying a Voice Profile When Transmitting Text Messages for Synthesized Speech - A mechanism is provided for authenticating and using a personal voice profile. The voice profile may be issued by a trusted third party, such as a certification authority. The personal voice profile may include information for generating a digest or digital signature for text messages. A speech synthesis system may speak the text message using the voice characteristics, such as prosodic characteristics, only if the voice profile is authenticated and the text message is valid and free of tampering.06-04-2009
20080294433Automatic Text-Speech Mapping Tool - A text-speech mapping method. Silence segments for incoming speech data are obtained. Incoming transcript data is preprocessed. The incoming transcript data comprises a written document of the speech data. Possible candidate sentence endpoints based on the silence segments are found. A best match sentence endpoint is selected based on a forced alignment score. The next sentence is set to begin immediately after the current sentence endpoint, and the process of finding candidate sentence endpoints, selecting the best match sentence endpoint, and setting the next sentence is repeated until all sentences for the incoming speech data are mapped. The process is repeated for each mapped sentence to provide word level mapping.11-27-2008
20090319267METHOD, A SYSTEM AND A DEVICE FOR CONVERTING SPEECH - An arrangement for converting speech into text comprises a mobile device (12-24-2009
20120143606METHOD AND SYSTEM FOR TESTING CLOSED CAPTION CONTENT OF VIDEO ASSETS - A method and system for monitoring video assets provided by a multimedia content distribution network includes testing closed captions provided in output video signals. A video and audio portion of a video signal are acquired during a time period that a closed caption occurs. A first text string is extracted from a text portion of a video image, while a second text string is extracted from speech content in the audio portion. A degree of matching between the strings is evaluated based on a threshold to determine when a caption error occurs. Various operations may be performed when the caption error occurs, including logging caption error data and sending notifications of the caption error.06-07-2012
20110112834Communication method and terminal - A communication method and terminal assist hearing and speech impaired persons. The communication method includes generating a text by combining at least one character input in a text call mode. The text is converted to a speech, and the converted speech is transmitted to a counterparty terminal. The communication terminal of the present invention is capable of converting the text input by the user to a speech signal for the counterparty terminal and converting the speech signal received from the counterparty terminal to a text such that it is possible for the user to conduct a communication with the counterparty regardless of whether the counterparty is a hearing or speech impaired or whether the counterparty's terminal supports the text call service, resulting in reduction of the text call service implementation complexity and improvement of user convenience.05-12-2011
20110112832AUTO-TRANSCRIPTION BY CROSS-REFERENCING SYNCHRONIZED MEDIA RESOURCES - A media archive comprising a plurality of media resources associated with events that occurred during a time interval are processed to synchronize the media resources. Sequences of patterns are identified in each media resource of the media archive. Elements of the sequences associated with different media resources are correlated such that a set of correlated elements is associated with the same event that occurred in the given time interval. The synchronization information of the processed media resources is represented in a flexible and extensible data format. The synchronization information is used for correction of errors occurring in the media resources of a media archive, for enhancing processes identifying information in media resources, for example by transcription of audio resources or by optical character recognition of images.05-12-2011
20080228479Data transcription and management system and method - What is disclosed is a data gathering, storage and management system, which includes a database in which data files are stored. The database includes a series of selected keywords each associated with one or more content files, the content files comprising advertisements, information, on-screen control buttons for performing a series of functions, and links for access to websites and other sources of information. The system accepts audio data files and identifies keywords that may be heard in the audio file. In one embodiment, the audio data file is transcribed and keywords are searched for in the transcribed text. The identified keywords from the audio data file are compared with the selected keywords, and at least one content file is selected for display for each retrieved keyword in the list which matches a selected keyword. The content file is displayed to the user so that the displayed content is relevant to the produced audio, which may be a recording of a conversation, a speech, or issued voice commands. What is disclosed is a method and system for providing relevant content to a user based upon speech.09-18-2008
20080228480SPEECH RECOGNITION METHOD, SPEECH RECOGNITION SYSTEM, AND SERVER THEREOF - A speech recognition method comprises model selection step which selects a recognition model based on characteristic information of input speech and speech recognition step which translates input speech into text data based on the selected recognition model.09-18-2008
20130132080SYSTEM AND METHOD FOR CROWD-SOURCED DATA LABELING - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for crowd-sourced data labeling. The system requests a respective response from each of a set of entities. The set of entities includes crowd workers. Next, the system incrementally receives a number of responses from the set of entities until at least one of an accuracy threshold is reached and m responses are received, wherein the accuracy threshold is based on characteristics of the number of responses. Finally, the system generates an output response based on the number of responses.05-23-2013
20130132081CONTENTS PROVIDING SCHEME USING SPEECH INFORMATION - An apparatus for providing contents based on speech information is provided. The apparatus includes a speech information reception unit configured to receive speech information from a first device, a device identification unit configured to receive device information of the first device from the first device and identify the first device based on the received device information, a speech information translation unit configured to translate the speech information into text information according to the received device information, and a contents provision unit configured to search for contents based on the translated text information, and provide the searched contents to a second device.05-23-2013
20090070109Speech-to-Text Transcription for Personal Communication Devices - A speech-to-text transcription system for a personal communication device (PCD) is housed in a communications server that is communicatively coupled to one or more PCDs. A user of the PCD, dictates an e-mail, for example, into the PCD. The PCD converts the user's voice into a speech signal that is transmitted to the speech-to-text transcription system located in the server. The speech-to-text transcription system transcribes the speech signal into a text message. The text message is then transmitted by the server to the PCD. Upon receiving the text message, the user carries out corrections on erroneously transcribed words before using the text message in various applications.03-12-2009
20110112833REAL-TIME TRANSCRIPTION OF CONFERENCE CALLS - Described herein are embodiments of systems, methods and computer program products for real-time transcription of conference calls that employ voice activity detection, audio snippet capture, and multiple transcription instances to deliver practical real-time or near real-time conference call transcription.05-12-2011
20100332226MOBILE TERMINAL AND CONTROLLING METHOD THEREOF - A mobile terminal and controlling method thereof are disclosed, by which a specific content and another content associated with the specific content can be quickly searched using a user's voice. The present invention includes inputting a voice for a search for a specific content provided to the mobile terminal via a microphone, analyzing a meaning of the inputted voice, searching a memory for at least one content to which a voice name having a meaning associated with the analyzed voice is tagged, and displaying the searched at least one content.12-30-2010
20100332225TRANSCRIPT ALIGNMENT - Some general aspects relate to systems and methods for media processing. One aspect, for example, relates to a method for aligning multimedia recording with a transcript. A group of search terms are formed from the transcript, with each search term being associated with a location within the transcript. Putative locations of the search terms are determined in a time interval of the multimedia recording. For each search term, zero or more putative locations are determined and, for at least some of the search terms, multiple putative locations are determined in the time interval of the multimedia recording. According to a first sequencing constraint, a first representation of a group of sequences each of a subset of the putative locations of the search terms is formed. A second representation of a group of sequences each of a subset of the search terms is formed. Using the first and the second representations, the time interval of the multimedia recording is partially aligned with the transcript.12-30-2010
20110010174MULTIMODAL DISAMBIGUATION OF SPEECH RECOGNITION - The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.01-13-2011
20110246196INTEGRATED VOICE BIOMETRICS CLOUD SECURITY GATEWAY - A triple factor authentication in one step method and system is disclosed. According to one embodiment, an Integrated Voice Biometrics Cloud Security Gateway (IVCS Gateway) intercepts an access request to a resource server from a user using a user device. IVCS Gateway then authenticates the user by placing a call to the user device and sending a challenge message prompting the user to respond by voice. After receiving the voice sample of the user, the voice sample is compared against a stored voice biometrics record for the user. The voice sample is also converted into a text phrase and compared against a stored secret text phrase. In an alternative embodiment, an IVCS Gateway that is capable of making non-binary access decisions and associating multiple levels of access with a single user or group is described.10-06-2011
20110010173System for Analyzing Interactions and Reporting Analytic Results to Human-Operated and System Interfaces in Real Time - A computerized system for advising one communicant in electronic communication between two or more communicants has apparatus monitoring and recording interaction between the communicants, software executing from a machine-readable medium and providing analytics, the software functions including rendering speech into text, and analyzing the rendered text for topics, performing communicant verification, and detecting changes in communicant emotion. Advice is offered to the one communicant during the interaction, based on results of the analytics.01-13-2011
20110035218Live Media Captioning Subscription Framework for Mobile Devices - A subscription-based system provides transcribed audio information to one or more mobile devices. Some techniques feature a system for providing subscription services for currently-generated (e.g., not stored) information (e.g., caption information, transcribed audio) for one or more mobile devices for a live/current audio event. There can be a communication network for communicating to the one or more mobile devices, a transcriber configured for transcribing the event to generate information (e.g., caption information, transcribed audio). Caption data includes transcribed data and control code data. The system includes a subscription gateway configured for live/current transfer of the transcribed data to the one or more mobile devices. The subscription gateway is configured to provide access for the transcribed data to the one or more mobile devices. User preferences for subscribers can be set and/or updated by mobile device users and/or GPS-capable mobile devices to receive feeds for the live/current audio event.02-10-2011
20100250248COMBINED SPEECH AND TOUCH INPUT FOR OBSERVATION SYMBOL MAPPINGS - The invention relates to systems and or methodologies for enabling combined speech and touch inputs for observation symbol mappings. More particularly, the current innovation leverages the commonality of touch screen display text entry and speech recognition based text entry to increase the speed and accuracy of text entry via mobile devices. Touch screen devices often contain small and closely grouped keypads that can make it difficult for a user to press the intended character, by combining touch screen based text entry with speech recognition based text entry the aforementioned limitation can be overcome efficiently and conveniently.09-30-2010
20110054895UTILIZING USER TRANSMITTED TEXT TO IMPROVE LANGUAGE MODEL IN MOBILE DICTATION APPLICATION - In embodiments of the present invention improved capabilities are described for utilizing user transmitted text to improve language modeling in converting voice to text on a mobile communication facility comprising capturing speech presented by a user using a resident capture facility on the mobile communication facility; transmitting at least a portion of the captured speech as data through a wireless communication facility to a speech recognition facility; generating speech-to-text results for the captured speech utilizing the speech recognition facility; transmitting the text results from the speech recognition facility to the mobile communications facility; entering the text results into a text field on the mobile communications facility; monitoring for a user selected transmission of the entered text results through a communications application on the mobile communications facility; and receiving the user selected transmitted text at the speech recognition facility and using it to improve the performance of the speech recognition facility.03-03-2011
20110246197METHOD, APPARATUS, AND PROGRAM FOR CERTIFYING A VOICE PROFILE WHEN TRANSMITTING TEXT MESSAGES FOR SYNTHESIZED SPEECH - A mechanism is provided for authenticating and using a personal voice profile. The voice profile may be issued by a trusted third party, such as a certification authority. The personal voice profile may include information for generating a digest or digital signature for text messages. A speech synthesis system may speak the text message using the voice characteristics, such as prosodic characteristics, only if the voice profile is authenticated and the text message is valid and free of tampering.10-06-2011
20110246195HIERARCHICAL QUICK NOTE TO ALLOW DICTATED CODE PHRASES TO BE TRANSCRIBED TO STANDARD CLAUSES - A dictation system that allows using trainable code phrases is provided. The dictation system operates by receiving audio and recognizing the audio as text. The text/audio may contain code phrases that are identified by a comparator that matches the text/audio and replaces the code phrase with a standard clause that is associated with the code phrase. The database or memory containing the code phrases is loaded with matched standard clauses that may be identified to provide a hierarchal system such that certain code phrases may have multiple meanings depending on the user.10-06-2011
20110246194Indicia to indicate a dictation application is capable of receiving audio - A client station having access to an application is provided. The application has at least one indicia having a first configuration and a second configuration different from the first configuration. The second configuration indicating the application is able to accept input.10-06-2011
20110087491Method and system for efficient management of speech transcribers - A method and system for improving the efficiency of speech transcription by automating the management of a varying pool of human and machine transcribers having diverse qualifications, skills, and reliability for a fluctuating load of speech transcription tasks of diverse requirements such as accuracy, promptness, privacy, and security, from sources of diverse characteristics such as language, dialect, accent, speech style, voice type, vocabulary, audio quality, and duration.04-14-2011
20090048833Automated Extraction of Semantic Content and Generation of a Structured Document from Speech - Techniques are disclosed for automatically generating structured documents based on speech, including identification of relevant concepts and their interpretation. In one embodiment, a structured document generator uses an integrated process to generate a structured textual document (such as a structured textual medical report) based on a spoken audio stream. The spoken audio stream may be recognized using a language model which includes a plurality of sub-models arranged in a hierarchical structure. Each of the sub-models may correspond to a concept that is expected to appear in the spoken audio stream. Different portions of the spoken audio stream may be recognized using different sub-models. The resulting structured textual document may have a hierarchical structure that corresponds to the hierarchical structure of the language sub-models that were used to generate the structured textual document.02-19-2009
20090048834Audio Signal De-Identification - Techniques are disclosed for automatically de-identifying spoken audio signals. In particular, techniques are disclosed for automatically removing personally identifying information from spoken audio signals and replacing such information with non-personally identifying information. De-identification of a spoken audio signal may be performed by automatically generating a report based on the spoken audio signal. The report may include concept content (e.g., text) corresponding to one or more concepts represented by the spoken audio signal. The report may also include timestamps indicating temporal positions of speech in the spoken audio signal that corresponds to the concept content. Concept content that represents personally identifying information is identified. Audio corresponding to the personally identifying concept content is removed from the spoken audio signal. The removed audio may be replaced with non-personally identifying audio.02-19-2009
20090048829Differential Dynamic Content Delivery With Text Display In Dependence Upon Sound Level - Differential dynamic content delivery including providing a session document for a presentation, where the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming speech to the user from one or more users participating in the presentation; converting the speech to text; detecting a total sound level for the user; and determining whether to display the text in dependence upon the total sound level for the user.02-19-2009
20090048830Conceptual analysis driven data-mining and dictation system and method - A new approach to speech recognition that reacts to concepts conveyed through speech, which shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. A probabilistically unbiased multi-phoneme recognition process is employed, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response. Applications include improving accuracy or automatically generating punctuation for transcription and dictation, word or concept spotting in audio streams, concept spotting in electronic text, customer support, call routing and other command/response scenarios.02-19-2009
20090313013SIGN LANGUAGE CAPABLE MOBILE PHONE - A mobile phone includes a display, a data capturing module, and a sign language translating system. The data capturing module is configured to capture images of a user communicating with sign language. The sign language translating system is connected to the data capturing module. The sign language translating system includes a data input module and a data output module. The data input module is configured to receive and transform a text data or a speech data to a corresponding sign language image, and display the sign language image on the display, the data output module is configured to receive and transform a sign language image captured by the data capturing module to a corresponding text data or a corresponding speech data.12-17-2009
20120245934SPEECH RECOGNITION DEPENDENT ON TEXT MESSAGE CONTENT - A method of automatic speech recognition. An utterance is received from a user in reply to a text message, via a microphone that converts the reply utterance into a speech signal. The speech signal is processed using at least one processor to extract acoustic data from the speech signal. An acoustic model is identified from a plurality of acoustic models to decode the acoustic data, and using a conversational context associated with the text message. The acoustic data is decoded using the identified acoustic model to produce a plurality of hypotheses for the reply utterance.09-27-2012
20100057455Method and System for 3D Lip-Synch Generation with Data-Faithful Machine Learning - A method for generating three-dimensional speech animation is provided using data-driven and machine learning approaches. It utilizes the most relevant part of the captured utterances for the synthesis of input phoneme sequences. If highly relevant data are missing or lacking, then it utilizes less relevant (but more abundant) data and relies more heavily on machine learning for the lip-synch generation.03-04-2010
20120035923IN-VEHICLE TEXT MESSAGING EXPERIENCE ENGINE - The disclosed invention provides a system and apparatus for providing a telematics system user with an improved texting experience. A messaging experience engine database enables voice avatar/personality selection, acronym conversion, shorthand conversion, and custom audio and video mapping. As an interpreter of the messaging content that is passed through the telematics system, the system eliminates the need for a user to manually manipulate a texting device, or to read such a device. The system recognizes functional content and executes actions based on the identified functional content.02-09-2012
20100057458IMAGE PROCESSING APPARATUS, IMAGE PROCESSING PROGRAM AND IMAGE PROCESSING METHOD - Regarding audio data related to document data, an image processing apparatus pertaining to the present invention generates text data by using a speech recognition technology in advance, and determines delimiter positions in the text data and the audio data in correspondence. In a keyword search, if a keyword is detected in the text data, the image processing apparatus plays the audio data from a delimiter that is immediately before the keyword.03-04-2010
20100057459VOICE RECOGNITION SYSTEM FOR INTERACTIVELY GATHERING INFORMATION TO GENERATE DOCUMENTS - A voice recognition system for interactively gathering information to generate a document, form, or application. An user establishes a connection with the voice recognition system and provides verbal responses to a plurality of verbal questions generated by voice recognition system to compile a document, form or application. The voice recognition system converts the user's verbal responses to textually converted responses.03-04-2010
20090216532Automatic Extraction and Dissemination of Audio Impression - A method of creating a voice message is described. A dictated audio input is converted by automatic speech recognition to produce a structured text report that includes report fields with report field data extracted from the dictated audio input. A report message is created for transmission over an electronic communication system to a message recipient. The report message has message fields with message field data based on corresponding report field data. A message audio extract is automatically extracted from a portion of the dictated audio input and attached to the report message. And the report message with the message audio extract attachment is forwarded over the electronic communication system to the message recipient08-27-2009
20090216531PROVIDING TEXT INPUT USING SPEECH DATA AND NON-SPEECH DATA - Systems, methods, and computer readable media providing a speech input interface. The interface can receive speech input and non-speech input from a user through a user interface. The speech input can be converted to text data and the text data can be combined with the non-speech input for presentation to a user.08-27-2009
20100057456VOICE RESPONSE UNIT MAPPING - A system, method and program product for mapping voice response units (VRUs). A system is provided that includes: an interrogation system for interrogating a VRU and gathering a hierarchical set of options associated with the VRU; a map building system for converting the hierarchical set of options into a VRU map suitable for display; and a user interface for displaying the VRU map to an end user.03-04-2010
20100057460VERBAL LABELS FOR ELECTRONIC MESSAGES - Verbal labels for electronic messages, as well as systems and methods for making and using such labels, are disclosed. A verbal label is a label containing audio data (such as a digital audio file of a user's voice and/or a speaker template thereof) that is associated with one or more electronic messages. Verbal labels permit a user to more efficiently manipulate e-mail and other electronic messages by voice. For example, a user can add such labels verbally to an e-mail or to a group of e-mails, thereby permitting these messages to be sorted and retrieved more easily.03-04-2010
20100063814APPARATUS, METHOD AND COMPUTER PROGRAM PRODUCT FOR RECOGNIZING SPEECH - A speech recognition apparatus includes a document input unit configured to input a document including a reference term which a user refers to; a vocabulary storage unit configured to store a vocabulary list including a group of notation information, reading information and part of speech; a hypernym hyponym relation storage unit configured to store a hypernym hyponym relation tree on a concept between terms; a hypernym acquisition unit configured to search a hypernym of the reference term from the hypernym hyponym relation tree and to acquire the notation information and the part of speech of the hypernym from the vocabulary list; a correspondence storage unit configured to store a correspondence list showing correspondence between the hypernym and the reference term; a display unit configured to display the hypernym; a speech input unit configured to input speech, including the hypernym of the reference term, which the user speaks from the display unit; a speech recognition unit configured to convert the speech into text information by using the vocabulary list; a replacing unit configured to replace the hypernym, which is included in the text information, with the reference term; and an output unit configured to output the text information replaced by the replacing unit.03-11-2010
20110099011Detecting And Communicating Biometrics Of Recorded Voice During Transcription Process - A method and system for determining and communicating biometrics of a recorded speaker in a voice transcription process. An interactive voice response system receives a request from a user for a transcription of a voice file. A profile associated with the requesting user is obtained, wherein the profile comprises biometric parameters and preferences defined by the user. The requested voice file is analyzed for biometric elements according to the parameters specified in the user's profile. Responsive to detecting biometric elements in the voice file that conform to the parameters specified in the user's profile, a transcription output of the voice file is modified according to the preferences specified in the user's profile for the detected biometric elements to form a modified transcription output file. The modified transcription output file may then be provided to the requesting user.04-28-2011
20110153323METHOD AND SYSTEM FOR CONTROLLING EXTERNAL OUTPUT OF A MOBILE DEVICE - A method and system is provided that controls an external output function of a mobile device according to control interactions received via the microphone. The method includes, activating a microphone according to preset optional information when the mobile device enters an external output mode, performing an external output operation in the external output mode, detecting an interaction based on sound information in the external output mode, and controlling the external output according to the interaction.06-23-2011
20110153322DIALOG MANAGEMENT SYSTEM AND METHOD FOR PROCESSING INFORMATION-SEEKING DIALOGUE - A dialog management apparatus and method for processing an information-seeking dialogue with a user and providing a service to the user by prompting the user for a task-oriented dialogue may be provided. A hierarchical topic plan in which pieces of information are organized in a hierarchy according to topics corresponding to services may be used to prompt the user to change an information-seeking dialogue to a task-oriented dialogue, and the user may be provided with a service.06-23-2011
20110071827GENERATION AND SELECTION OF SPEECH RECOGNITION GRAMMARS FOR CONDUCTING SEARCHES - Various processes are disclosed for generating and selecting speech recognition grammars for conducting searches by voice. In one such process, search queries are selected from a search query log for incorporation into speech recognition grammar. The search query log may include or consist of search queries specified by users without the use of voice. Another disclosed process enables a user to efficiently submit a search query by partially spelling the search query (e.g., on a telephone keypad or via voice utterances) and uttering the full search query. The user's partial spelling is used to select a particular speech recognition grammar for interpreting the utterance of the full search query.03-24-2011
20110077941Enabling Spoken Tags - Techniques for assigning a spoken tag in a telecom web platform are provided. The techniques include receiving a spoken tag, comparing the spoken tag to a set of one or more template tags, if the spoken tag is a match to a template tag, assigning the spoken tag and updating frequency of the tag in the set of one or more template tags, and if the spoken tag is not a match to a template tag, assigning the spoken tag and registering the spoken tag as a new tag in the set of one or more template tags.03-31-2011
20100121637Semi-Automatic Speech Transcription - A semi-automatic speech transcription system of the invention leverages the complementary capabilities of human and machine, building a system which combines automatic and manual approaches. With the invention, collected audio data is automatically distilled into speech segments, using signal processing and pattern recognition algorithms. The detected speech segments are presented to a human transcriber using a transcription tool with a streamlined transcription interface, requiring the transcriber to simply “listen and type”. This eliminates the need to manually navigate the audio, coupling the human effort to the amount of speech, rather than the amount of audio. Errors produced by the automatic system can be quickly identified by the human transcriber, which are used to improve the automatic system performance. The automatic system is tuned to maximize the human transcriber efficiency. The result is a system which takes considerably less time than purely manual transcription approaches to produce a complete transcription.05-13-2010
20110060587COMMAND AND CONTROL UTILIZING ANCILLARY INFORMATION IN A MOBILE VOICE-TO-SPEECH APPLICATION - In embodiments of the present invention improved capabilities are described for controlling a mobile communication facility utilizing ancillary information comprising accepting speech presented by a user using a resident capture facility on the mobile communication facility while the user engages an interface that enables a command mode for the mobile communications facility; processing the speech using a resident speech recognition facility to recognize command elements and content elements; transmitting at least a portion of the speech through a wireless communication facility to a remote speech recognition facility; transmitting information from the mobile communication facility to the remote speech recognition facility, wherein the information includes information about a command recognizable by the resident speech recognition facility and at least one of language, location, display type, model, identifier, network provider, and phone number associated with the mobile communication facility; generating speech-to-text results utilizing the remote speech recognition facility based at least in part on the speech and on the information related to the mobile communication facility; and transmitting the text results for use on the mobile communications facility.03-10-2011
20110071826METHOD AND APPARATUS FOR ORDERING RESULTS OF A QUERY - A method and apparatus for ordering results from a query is provided herein. During operation, a spoken query is received and converted to a textual representation, such as a word lattice. Search strings are then created from the word lattice. For example a set search strings may be created from the N-grams, such as unigrams and bigrams, of the word lattice. The search strings may be ordered and truncated based on confidence values assigned to the n-grams by the speech recognition system. The set of search strings are sent to at least one search engine, and search results are obtained. The search results are then re-arranged or reordered based on a semantic similarity between the search results and the word lattice.03-24-2011
20120303368NUMBER-ASSISTANT VOICE INPUT SYSTEM, NUMBER-ASSISTANT VOICE INPUT METHOD FOR VOICE INPUT SYSTEM AND NUMBER-ASSISTANT VOICE CORRECTING METHOD FOR VOICE INPUT SYSTEM - The present invention discloses a number-assistant voice input system, a number-assistant voice input method for a voice input system and a number-assistant voice correcting method for a voice input system, which apply software to drive a voice input system of an electronic device to provide a voice input logic circuit module. The voice input logic circuit module defines the pronunciation of numbers 1 to 26 as the paths to respectively input letters A to Z in the voice input system and allows users to selectively input or correct a letter by reading a number from 1 to 26 instead of a letter from A to Z.11-29-2012
20130191125TRANSCRIPTION SUPPORTING SYSTEM AND TRANSCRIPTION SUPPORTING METHOD - A transcription supporting system for the conversion of voice data to text data includes a first storage module, a playing module, a voice recognition module, an index generating module, a second storage module, a text forming module, and an estimation module. The first storage module stores the voice data. The playing module plays the voice data. The voice recognition module executes the voice recognition processing on the voice data. The index generating module generates a voice index that makes the plural text strings generated in the voice recognition processing correspond to voice position data. The second storage module stores the voice index. The text forming module forms text corresponding to input of a user correcting or editing the generated text strings. The estimation module estimates the formed voice position indicating the last position in the voice data where the user corrected/confirmed the voice recognition.07-25-2013
20100305945REPRESENTING GROUP INTERACTIONS - Disclosed is a system for generating a representation of a group interaction, the system comprising: a transcription module adapted to generate a transcript of the group interaction from audio source data representing the group interaction, the transcript comprising a sequence of lines of text, each line corresponding to an audible utterance in the audio source data; and a labeling module adapted to generate a conversation path from the transcript by labeling each transcript line with an identifier identifying the speaker of the corresponding utterance in the audio source data; and generate the representation of the group interaction by associating the conversation path with a plurality of voice profiles, each voice profile corresponding to an identified speaker in the conversation path.12-02-2010
20110153325Multi-Modal Input on an Electronic Device - A computer-implemented input-method editor process includes receiving a request from a user for an application-independent input method editor having written and spoken input capabilities, identifying that the user is about to provide spoken input to the application-independent input method editor, and receiving a spoken input from the user. The spoken input corresponds to input to an application and is converted to text that represents the spoken input. The text is provided as input to the application.06-23-2011
20110251843COMPENSATION OF INTRA-SPEAKER VARIABILITY IN SPEAKER DIARIZATION - A method, system, and computer program product compensation of intra-speaker variability in speaker diarization are provided. The method includes: dividing a speech session into segments of duration less than an average duration between speaker change; parameterizing each segment by a time dependent probability density function supervector, for example, using a Gaussian Mixture Model; computing a difference between successive segment supervectors; and computing a scatter measure such as a covariance matrix of the difference as an estimate of intra-speaker variability. The method further includes compensating the speech session for intra-speaker variability using the estimate of intra-speaker variability.10-13-2011
20120150537FILTERING CONFIDENTIAL INFORMATION IN VOICE AND IMAGE DATA - Confidential information included in image and voice data is filtered in an apparatus that includes an extraction unit for extracting a character string from an image frame, and a conversion unit for converting audio data to a character string. The apparatus also includes a determination unit for determining, in response to contents of a database, whether at least one of the image frame and the audio data include confidential information. The apparatus also includes a masking unit for concealing contents of the image frame by masking the image frame in response to determining that the image frame includes confidential information, and for making the audio data inaudible by masking the audio data in response to determining that the audio data includes confidential information. The playback unit included in the apparatus is for playing back the image frame and the audio data.06-14-2012
20120203551AUTOMATED FOLLOW UP FOR E-MEETINGS - Embodiments of the present invention provide a method, system and computer program product for automated follow-up for e-meetings. In an embodiment of the invention, a method for automated follow-up for e-meetings is provided. The method includes monitoring content provided to an e-meeting managed by an e-meeting server executing in memory of a host computer. The method also includes applying a rule in a rules base to the monitored content. Finally, the method includes triggering generation of a follow up item in response to applying the rule to the monitored content.08-09-2012
20120203552CONTROLLING A SET-TOP BOX VIA REMOTE SPEECH RECOGNITION - A device may receive over a network a digitized speech signal from a remote control that accepts speech. In addition, the device may convert the digitized speech signal into text, use the text to obtain command information applicable to a set-top box, and send the command information to the set-top box to control presentation of multimedia content on a television in accordance with the command information.08-09-2012
20110153324Language Model Selection for Speech-to-Text Conversion - Methods, computer program products and systems are described for converting speech to text. Sound information is received at a computer server system from an electronic device, where the sound information is from a user of the electronic device. A context identifier indicates a context within which the user provided the sound information. The context identifier is used to select, from among multiple language models, a language model appropriate for the context. Speech in the sound information is converted to text using the selected language model. The text is provided for use by the electronic device.06-23-2011
20120065970SYSTEM AND METHOD FOR PROVIDING GROUP DISCUSSIONS - A system and method for providing a discussion, including receiving by a processor text related to a discussion; converting by the processor the text to voice; storing by the processor in a memory the converted voice; receiving by the processor voice related to the discussion; storing by the processor in the memory the received voice; receiving by the processor a request to play voice related to at least part of the discussion; and transmitting by the processor audio containing the voice identified by the request related to the at least part of the discussion.03-15-2012
20090240497METHOD AND SYSTEM FOR MESSAGE ALERT AND DELIVERY USING AN EARPIECE - A method for an earpiece to manage a delivery of a message can include receiving a notice that a message is available at a communication device, parsing the notice for header information that identifies at least a portion of the message, and requesting a subsequent delivery of at least a portion of the message from the communication device if at least one keyword in the header information is in an acceptance list. Other embodiments are disclosed.09-24-2009
20080243501Location-Based Responses to Telephone Requests - A method for receiving processed information at a remote device is described. The method includes transmitting from the remote device a verbal request to a first information provider and receiving a digital message from the first information provider in response to the transmitted verbal request. The digital message includes a symbolic representation indicator associated with a symbolic representation of the verbal request and data used to control an application. The method also includes transmitting, using the application, the symbolic representation indicator to a second information provider for generating results to be displayed on the remote device.10-02-2008
20090164214System, method and software program for enabling communications between customer service agents and users of communication devices - The present invention provides a system, method and software application for enabling a customer service agent to efficiently communicate with users of a communication device. When a user enters speech input into his communication device, the speech is converted to text, and the text is displayed to the customer service agent on the agent's computer screen. Alternately, the user's speech input is provided to the customer service agent in the form of an audio file. The agent types a response, and the agent's response is provided to the user on the user's communication device. The agent's response may be converted to speech and played to the user, and/or the agent's response may be displayed as text on the display screen of the user's communication device.06-25-2009
20090182560USING A PHYSICAL PHENOMENON DETECTOR TO CONTROL OPERATION OF A SPEECH RECOGNITION ENGINE - A transmission device such as a cell phone or other mobile communication device includes a physical phenomenon detection device to perform a “push to talk” function by detecting the occurrence of a particular physical phenomenon and using such detection to start and stop recording an utterance for subsequent analysis by a speech recognition engine. A method of controlling operation of a speech recognition engine in response to detection of a physical phenomenon includes detecting or sensing, via a physical phenomenon detection unit, a predetermined physical phenomenon representative of an intent to invoke operation of a speech recognition engine. In response to the detection or sensing of the predetermined physical phenomenon, a signal is transmitted to a control unit in a communication device. In response to the receipt of the transmitted signal, the utterance received from a user via the communication device is recorded, and the recorded utterance is provided to a speech recognition engine for operation thereon. The user may thus effectuate operation of the speech recognition engine upon the utterance by causing the physical phenomenon to occur.07-16-2009
20110257972SYSTEM AND METHOD FOR LOCATION TRACKING USING AUDIO INPUT - An electronic device and method of location tracking adapted to enhance a user's ability in recalling or returning to a former location. The electronic device may record audio, such as the user's speech and/or speech from others. The location at which the speech is recorded is determined and stored. The speech may be converted to text, which is associated with the determined location. The converted text may be indexed for searching. A user may perform a text-based search for words that the user may recall speaking and/or hearing at the location. Returned search results may remind the user of the location and provide directions for returning to the location.10-20-2011
20100324894Voice to Text to Voice Processing - Technologies are generally described for voice to text to voice processing. An audio signal can be preprocessed and translated into text prior to being processed in the textual domain. The text domain processing or subsequent text to voice regeneration can seek to improve clarity, correct grammar, adjust vocabulary level, remove profanity, correct slang, alter dialect, alter accent, or provide other modifications of various oral communication characteristics. The processed text may be translated back into the audio domain for delivery to a listener. The processing at each stage may be driven by a set of objectives and constraints set by the speaker, the listener, a third party, or any combination of explicit or implicit participants. The voice processing may translate the voice content from a specific human language to the same human language with various improvements. The processing may also involve translation into one or more other languages.12-23-2010
20080255838AUDIBLE PRESENTATION AND VERBAL INTERACTION OF HTML-LIKE FORM CONSTRUCTS - A method of synchronizing an audio and visual presentation in a multi-modal browser. A form is transmitted over a network having at least one field requiring user supplied information to a multi-modal browser. Blank fields within the form are filled in by user who provides either verbal or tactile interaction, or a combination of verbal and tactile interaction. The browser moves to the next field requiring user provided input. Finally, the form exits after the user has supplied input for all required fields. The method also provides a synchronized verbal and visual presentation by said browser by having the headings for the fields to be filled out and typing in what the user says.10-16-2008
20110054900HYBRID COMMAND AND CONTROL BETWEEN RESIDENT AND REMOTE SPEECH RECOGNITION FACILITIES IN A MOBILE VOICE-TO-SPEECH APPLICATION - In embodiments of the present invention improved capabilities are described for hybrid command and control between resident and remote speech recognition facilities in controlling a mobile communication facility comprising accepting speech presented by a user using a resident capture facility on the mobile communication facility while the user engages an interface that enables a command mode for the mobile communications facility; processing the speech using a resident speech recognition facility to recognize command elements and content elements; transmitting at least a portion of the speech through a wireless communication facility to a remote speech recognition facility; transmitting information from the mobile communication facility to the remote speech recognition facility, wherein the information includes information about a command recognizable by the resident speech recognition facility; generating speech-to-text results utilizing a hybrid of the resident speech recognition facility and the remote speech recognition facility based at least in part on the speech and on the information related to the mobile communication facility; and transmitting the text results for use on the mobile communications facility.03-03-2011
20110054899COMMAND AND CONTROL UTILIZING CONTENT INFORMATION IN A MOBILE VOICE-TO-SPEECH APPLICATION - In embodiments of the present invention improved capabilities are described for controlling a mobile communication facility utilizing content information comprising accepting speech presented by a user using a resident capture facility on the mobile communication facility while the user engages an interface that enables a command mode for the mobile communications facility; processing the speech using a resident speech recognition facility to recognize command elements and content elements; transmitting at least a portion of the speech through a wireless communication facility to a remote speech recognition facility; transmitting information from the mobile communication facility to the remote speech recognition facility, wherein the information includes information about a command recognized by the resident speech recognition facility and information about the content recognized by the resident speech recognition facility; generating speech-to-text results utilizing the remote speech recognition facility based at least in part on the speech and on the information related to the mobile communication facility; and transmitting the text results for use on the mobile communications facility.03-03-2011
20110054898MULTIPLE WEB-BASED CONTENT SEARCH USER INTERFACE IN MOBILE SEARCH APPLICATION - In embodiments of the present invention improved capabilities are described for a multiple web-based content search user interface in searching for web content on a mobile communication facility comprising capturing speech presented by a user using a resident capture facility on the mobile communication facility; transmitting at least a portion of the captured speech as data through a wireless communication facility to a speech recognition facility; generating speech-to-text results utilizing the speech recognition facility; and transmitting text from the speech-to-text results along with URL usage information configured to enable a user to conduct a search on the mobile communication facility.03-03-2011
20110054893SYSTEM AND METHOD FOR GENERATING USER MODELS FROM TRANSCRIBED DIALOGS - Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for generating personalized user models. The method includes receiving automatic speech recognition (ASR) output of speech interactions with a user, receiving an ASR transcription error model characterizing how ASR transcription errors are made, generating guesses of a true transcription and a user model via an expectation maximization (EM) algorithm based on the error model and the respective ASR output where the guesses will converge to a personalized user model which maximizes the likelihood of the ASR output. The ASR output can be unlabeled. The method can include casting speech interactions as a dynamic Bayesian network with four variables: (s), (u), (r), (m), and encoding relationships between (s), (u), (r), (m) as conditional probability tables. At each dialog turn (r) and (m) are known and (s) and (u) are hidden.03-03-2011
20110054897TRANSMITTING SIGNAL QUALITY INFORMATION IN MOBILE DICTATION APPLICATION - In embodiments of the present invention improved capabilities are described for transmitting signal quality information when converting voice to text on a mobile communication facility comprising capturing speech presented by a user using a resident capture facility on the mobile communication facility; transmitting at least a portion of the captured speech as data through a wireless communication facility to a speech recognition facility; generating speech-to-text results for the captured speech utilizing the speech recognition facility; transmitting the text results from the speech recognition facility to the mobile communications facility, including text from the speech-to-text results and information on signal quality, such information including at least one of signal-to-noise ratio, clipping, and energy; and entering the text results into a text field on the mobile communications facility.03-03-2011
20110054896SENDING A COMMUNICATIONS HEADER WITH VOICE RECORDING TO SEND METADATA FOR USE IN SPEECH RECOGNITION AND FORMATTING IN MOBILE DICTATION APPLICATION - In embodiments of the present invention improved capabilities are described for sending a communications header with voice recording to send metadata for use in speech recognition and formatting when converting voice to text on a mobile communication facility comprising capturing speech presented by a user using a resident capture facility on the mobile communication facility; transmitting a communications header to a speech recognition facility from the mobile communication facility via a wireless communications facility, wherein the communications header includes at least one of device name, network provider, network type, audio source, a display parameter for the wireless communications facility, geographic location, and phone number information; transmitting at least a portion of the captured speech as data through a wireless communication facility to a speech recognition facility; generating speech-to-text results for the captured speech utilizing the speech recognition facility based at least in part on the communications header; transmitting the text results from the speech recognition facility to the mobile communications facility; and entering the text results into a text field on the mobile communication facility.03-03-2011
20110054894SPEECH RECOGNITION THROUGH THE COLLECTION OF CONTACT INFORMATION IN MOBILE DICTATION APPLICATION - In embodiments of the present invention improved capabilities are described for improving speech recognition through the collection of contact information on a mobile communication facility comprising capturing speech presented by a user using a resident capture facility on the mobile communication facility; transmitting at least a portion of the captured speech as data through a wireless communication facility to a speech recognition facility; prompting the user to manage contacts associated with usage of the mobile communication facility; transmitting contact names to the speech recognition facility, wherein the contact names are used by the speech recognition facility to at least one of tune, enhance, and improve the speech recognition of the speech recognition facility; generating speech-to-text results for the captured speech utilizing the speech recognition facility based at least in part on at least one of a contact name; transmitting the text results from the speech recognition facility to the mobile communications facility; and entering the text results into a text field on the mobile communications facility.03-03-2011
20110022387CORRECTING TRANSCRIBED AUDIO FILES WITH AN EMAIL-CLIENT INTERFACE - Methods and systems for requesting a transcription of audio data. One method includes displaying a send-for-transcription button within an email-client interface on a computer-controlled display, and automatically sending a selected email message and associated audio data to a transcription server as a request for a transcription of the associated audio data when a user selects the send-for-transcription button.01-27-2011
20110022386SPEECH RECOGNITION TUNING TOOL - Systems and methods for tuning a dictionary of a speech recognition system includes accessing a voice mail record of a user, accessing a recorded audio file of a name of the user in the voice mail record spoken by the user, providing the audio file to a speech recognition system, processing the audio file in the speech recognition system and obtaining a text result, determining whether a confidence score of the text result is below a predetermined threshold, and adding, at least, the name of the user to a list of low confidence names. Alternate spellings for the low confidence names can then be added to the dictionary.01-27-2011
20110119058METHOD AND SYSTEM FOR THE CREATION OF A PERSONALIZED VIDEO - A method and system for creating a personalized video destined to an intended recipient, comprising gathering personal information about the intended recipient, selecting of a non personalized video, retrieving the selected non personalized video along with associated customizable elements, setting the customizable elements according to the personal information of the intended recipient and assembling the non personalized video and the set customizable elements to create the personalized video.05-19-2011
20090171659METHODS AND APPARATUS FOR IMPLEMENTING DISTRIBUTED MULTI-MODAL APPLICATIONS - Embodiments include methods and apparatus for synchronizing data and focus between visual and voice views associated with distributed multi-modal applications. An embodiment includes a client device adapted to render a visual display that includes at least one multi-modal display element for which input data is receivable though a visual modality and a voice modality. When the client detects a user utterance via the voice modality, the client sends uplink audio data representing the utterance to a speech recognizer. An application server receives a speech recognition result generated by the speech recognizer, and sends a voice event response to the client. The voice event response is sent as a response to an asynchronous HTTP voice event request previously sent to the application server by the client. The client may then send another voice event request to the application server in response to receiving the voice event response.07-02-2009
20120310645INTEGRATION OF EMBEDDED AND NETWORK SPEECH RECOGNIZERS - A method, computer program product, and system are provided for performing a voice command on a client device. The method can include translating, using a first speech recognizer located on the client device, an audio stream of a voice command to a first machine-readable voice command and generating a first query result using the first machine-readable voice command to query a client database. In addition, the audio stream can be transmitted to a remote server device that translates the audio stream to a second machine-readable voice command using a second speech recognizer. Further, the method can include receiving a second query result from the remote server device, where the second query result is generated by the remote server device using the second machine-readable voice command and displaying the first query result and the second query result on the client device.12-06-2012
20120310644INSERTION OF STANDARD TEXT IN TRANSCRIPTION - A computer program product, for automatically editing a medical record transcription, resides on a computer-readable medium and includes computer-readable instructions for causing a computer to obtain a first medical transcription of a dictation, the dictation being from medical personnel and concerning a patient, analyze the first medical transcription for presence of a first trigger phrase associated with a first standard text block, determine that the first trigger phrase is present in the first medical transcription if an actual phrase in the first medical transcription corresponds with the first trigger phrase, and insert the first standard text block into the first medical transcription.12-06-2012
20100211389System of communication employing both voice and text - The disclosed invention comprises a method of communication that integrates both speech to text technology and text to speech technology. In its simplest form, one user employs a communication device having means for converting vocal signals into text; this converted text is then sent to the other user. This recipient is presented with the sender's text and to respond, he can enter text which is then output to the first user as speech sounds. This system creates an opportunity for two users to carry on a conversation, one using his voice (and hearing a synthesized voice in response) and the other using text (and receiving speech rendered as text): the first user has a voice conversation; the second user has a text based conversation. This system allows a user to select his preferred method of communication, regardless of the selection of his communication partner.08-19-2010
20100223055MOBILE WIRELESS COMMUNICATIONS DEVICE WITH SPEECH TO TEXT CONVERSION AND RELATED METHODS - A mobile wireless communications device may include a housing and a wireless transceiver carried by the housing. The mobile wireless communications device may also include audio transducer carried by the housing, and a controller cooperating with the wireless transceiver to perform at least one wireless communications function. The controller may also cooperate with the at least one audio transducer to convert speech input through the audio transducer to converted text, determine a proposed modification for the converted text, and output from the audio output transducer the proposed modification for the converted text.09-02-2010
20090299743METHOD AND SYSTEM FOR TRANSCRIBING TELEPHONE CONVERSATION TO TEXT - Methods and systems for transcribing portions of a telephone conversation to text enables users to request transcription such as by pressing a button on a mobile device, with the request transmitted to a server including transcription software. The server transcribes some or all of the telephone conversation to text, and transmits the text to the mobile device. The text data may be scanned for selected information, and only the selected information transmitted to the mobile device. The selected information may be automatically stored in memory of the mobile device, such as in an address book.12-03-2009
20090292539SYSTEM AND METHOD FOR THE SECURE, REAL-TIME, HIGH ACCURACY CONVERSION OF GENERAL QUALITY SPEECH INTO TEXT - Described is a speech-to-text conversion system and method that provides secure, real-time and high-accuracy conversion of general-quality speech into text. The system is designed to interface with external devices and services, providing a simple and convenient manner to transcribe audio that may be stored elsewhere such as a wireless phone's voice mail, or occurring between two or more parties such as a conference call. The first step in the system's process ensures secure and private transcription by separating an audio stream into many audio shreds, each of which has duration of only a few seconds and cannot reveal the context of the conversation. A workforce of geographically distributed transcription agents who transcribe the audio shreds is able to generate transcription in real time, with many agents working in parallel on a single conversation. No one agent (or group of agents) receives a sufficient number of audio shreds to reconstruct the context of any conversation. The use of human transcribers allows the system to overcome limitations typical of computer-based speech recognition and permits accurate transcription of general-quality speech even in acoustically hostile environments.11-26-2009
20100030557Voice and text communication system, method and apparatus - The disclosure relates to systems, methods and apparatus to convert speech to text and vice versa. One apparatus comprises a vocoder, a speech to text conversion engine, a text to speech conversion engine, and a user interface. The vocoder is operable to convert speech signals into packets and convert packets into speech signals. The speech to text conversion engine is operable to convert speech to text. The text to speech conversion engine is operable to convert text to speech. The user interface is operable to receive a user selection of a mode from among a plurality of modes, wherein a first mode enables the speech to text conversion engine, a second mode enables the text to speech conversion engine, and a third mode enables the speech to text conversion engine and the text to speech conversion engine.02-04-2010
20120209606METHOD AND APPARATUS FOR INFORMATION EXTRACTION FROM INTERACTIONS - Obtaining information from audio interactions associated with an organization. The information may comprise entities, relations or events. The method comprises: receiving a corpus comprising audio interactions; performing audio analysis on audio interactions of the corpus to obtain text documents; performing linguistic analysis of the text documents; matching the text documents with one or more rules to obtain one or more matches; and unifying or filtering the matches.08-16-2012
20090030681CONTROLLING A SET-TOP BOX VIA REMOTE SPEECH RECOGNITION - A device may receive over a network a digitized speech signal from a remote control that accepts speech. In addition, the device may convert the digitized speech signal into text, use the text to obtain command information applicable to a set-top box, and send the command information to the set-top box to control presentation of multimedia content on a television in accordance with the command information.01-29-2009
20120004910System and method for speech processing and speech to text - Systems and method for processing speech from a user is disclosed. In the system of the present invention, the user's speech is received as input audio stream. The input audio stream is converted text that corresponds to the input audio stream. The converted text is converted to an echo audio stream. Then, the echo audio stream is sent to the user. This process is performed in real time. Accordingly, the user is able to determine whether or not the speech to text process was correct, or that his or her speech was corrected converted to text. If the conversion was incorrect, the user is able to correct the conversion process by using editing commands. The corresponding text is then analyzed to determine the operation which it demands. Then, the operation is performed on the corresponding text.01-05-2012
20120209605METHOD AND APPARATUS FOR DATA EXPLORATION OF INTERACTIONS - Retrieving data from audio interactions associated with an organization. Retrieving the data comprises: receiving a corpus containing interactions; performing natural language processing on a text document representing an interaction from the corpus; extracting at least one keyphrase from the text document; assigning a rank to the at least one keyphrase; modeling relations between at least two keyphrases using the rank; and identifying topics relevant for the organization from the relations.08-16-2012
20100169092VOICE INTERFACE OCX - A medical dictation workflow system can be customized from the selection of available user application programs. A voice interface OCX can interface speech technologies with the selected user application programs of the medical dictation workflow system. The medical dictation workflow system may be directed to generating reports through filling out defined fields. The fields can be generated through a tracking system subscribing to a core reporting system and requesting certain information be captured or through a user. The voice interface OCX can provide macros so a user can customize the fields, navigate among the fields, or fill in the fields with data through a voice recognition engine or a wave player control. The data entered into the fields can be automatically entered into corresponding database elements of a database.07-01-2010
20120209607METHOD AND APPARATUS FOR SCROLLING TEXT DISPLAY OF VOICE CALL OR MESSAGE DURING VIDEO DISPLAY SESSION - A method and communication device disclosed includes displaying a video on a display, converting voice audio data to textual data by applying voice-to-text conversion, and displaying the textual data as scrolling text displayed along with the video on the display and either above, below or across the video. The method may further include receiving a voice call indication from a network, providing the voice call indication to a user interface where the voice call indication corresponds to an incoming voice call; and receiving a user input for receiving the voice call and displaying the voice call as scrolling text. In another embodiment, a method includes displaying application related data on a display; converting voice audio data to textual data by applying voice-to-text conversion; converting the textual data to a video format; and displaying the textual data as scrolling text over the application related data on the display.08-16-2012
20120010883TRANSCRIPTION DATA EXTRACTION - A computer program product, for performing data determination from medical record transcriptions, resides on a computer-readable medium and includes computer-readable instructions for causing a computer to obtain a medical transcription of a dictation, the dictation being from medical personnel and concerning a patient, analyze the transcription for an indicating phrase associated with a type of data desired to be determined from the transcription, the type of desired data being relevant to medical records, determine whether data indicated by text disposed proximately to the indicating phrase is of the desired type, and store an indication of the data if the data is of the desired type.01-12-2012
20120016671Tool and method for enhanced human machine collaboration for rapid and accurate transcriptions - A system and methods for transcribing text from audio and video files including a set of transcription hosts and an automatic speech recognition system. ASR word-lattices are dynamically selected from either a text box or word-lattice graph wherein the most probable text sequences are presented to the transcriptionist. Secure transcriptions may be accomplished by segmenting a digital audio file into a set of audio slices for transcription by a plurality of transcriptionist. No one transcriptionist is aware of the final transcribed text, only small portions of transcribed text. Secure and high quality transcriptions may be accomplished by segmenting a digital audio file into a set of audio slices, sending them serially to a set of transcriptionists and updating the acoustic and language models at each step to improve the word-lattice accuracy.01-19-2012
20120022868Word-Level Correction of Speech Input - The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.01-26-2012
20120022867Speech to Text Conversion - Methods, computer program products and systems are described for speech-to-text conversion. A voice input is received from a user of an electronic device and contextual metadata is received that describes a context of the electronic device at a time when the voice input is received. Multiple base language models are identified, where each base language model corresponds to a distinct textual corpus of content. Using the contextual metadata, an interpolated language model is generated based on contributions from the base language models. The contributions are weighted according to a weighting for each of the base language models. The interpolated language model is used to convert the received voice input to a textual output. The voice input is received at a computer server system that is remote to the electronic device. The textual output is transmitted to the electronic device.01-26-2012
20120022866Language Model Selection for Speech-to-Text Conversion - Methods, computer program products and systems are described for converting speech to text. Sound information is received at a computer server system from an electronic device, where the sound information is from a user of the electronic device. A context identifier indicates a context within which the user provided the sound information. The context identifier is used to select, from among multiple language models, a language model appropriate for the context. Speech in the sound information is converted to text using the selected language model. The text is provided for use by the electronic device.01-26-2012
20120022865System and Method for Efficiently Reducing Transcription Error Using Hybrid Voice Transcription - A system and method for efficiently reducing transcription error using hybrid voice transcription is provided. A voice stream is parsed from a call into utterances. An initial transcribed value and corresponding recognition score are assigned to each utterance. A transcribed message is generated for the call and includes the initial transcribed values. A threshold is applied to the recognition scores to identify those utterances with recognition scores below the threshold as questionable utterances. At least one questionable utterance is compared to other questionable utterances from other calls and a group of similar questionable utterances is formed. One or more of the similar questionable utterances is selected from the group. A common manual transcription value is received for the selected similar questionable utterances. The common manual transcription value is assigned to the remaining similar questionable utterances in the group.01-26-2012
20090037171Real-time voice transcription system - The real-time voice transcription system provides a speech recognition system and method that includes use of speech and spatial-temporal acoustic data to enhance speech recognition probabilities while simultaneously identifying the speaker. Real-time edit capability is provided enabling a user to train the system during a transcription session. The system may be connected to user computers via local network and/or wide area network means.02-05-2009
20120059652METHODS AND SYSTEMS FOR OBTAINING LANGUAGE MODELS FOR TRANSCRIBING COMMUNICATIONS - A method for transcribing a spoken communication includes acts of receiving a spoken first communication from a first sender to a first recipient, obtaining information relating to a second communication, which is different from the first communication, from a second sender to a second recipient, using the obtained information to obtain a language model, and using the language model to transcribe the spoken first communication.03-08-2012
20120059651MOBILE COMMUNICATION DEVICE FOR TRANSCRIBING A MULTI-PARTY CONVERSATION - A mobile communications device includes a network interface for communicating over a wide-area network, an input/output interface for communicating over a PAN and a display. The communication device also includes one or more processors for executing machine-executable instructions and one or more machine-readable storage media for storing the machine-executable instructions. The instructions, when executed by the one more processors, implement a voice proximity component, a speech-to-text component and a user interface. The voice proximity component is configured to select a first user's voice from among a plurality of user voices. The first user voice belongs to a user who is in closest proximity to the mobile communication device. The speech-to-text component is configured to convert to text in real-time speech received from the first user but not the other users. The user interface is arranged for displaying the text on the display as it received over the PAN from the other mobile communication devices.03-08-2012
20100131271SYSTEMS AND METHODS TO REDIRECT AUDIO BETWEEN CALLERS AND VOICE APPLICATIONS - A call center environment is provided that allows a customer service representative to populate a workstation display screen with data using either keystrokes or voice input. The voice input is provided to the workstation using a voice overlay and voice platform to convert audio into data usable by the workstation to populate the screen.05-27-2010
20110093264Providing Information Services Related to Multimodal Inputs - A system and method provides information services related to multimodal inputs. Several different types of data used as multimodal inputs are described. Also described are various methods involving the generation of contexts using multimodal inputs, synthesizing context-information service mappings and identifying and providing information services.04-21-2011
20110093263Automated Video Captioning - An automated closed captioning, captioning, or subtitle generation system that automatically generates the captioning text from the audio signal in a submitted online video and then allows the user to type in any corrections after which it adds the captioning text to the video allowing users to enable the captioning as needed. The user text review and correction step allows the text prediction model to accumulate additional corrected data with each use thereby improving the accuracy of the text generation over time and use of the system.04-21-2011
20120130714SYSTEM AND METHOD FOR GENERATING CHALLENGE UTTERANCES FOR SPEAKER VERIFICATION - Disclosed herein are systems, methods, and non-transitory computer-readable storage media relating to speaker verification. In one aspect, a system receives a first user identity from a second user, and, based on the identity, accesses voice characteristics. The system randomly generates a challenge sentence according to a rule and/or grammar, based on the voice characteristics, and prompts the second user to speak the challenge sentence. The system verifies that the second user is the first user if the spoken challenge sentence matches the voice characteristics. In an enrollment aspect, the system constructs an enrollment phrase that covers a minimum threshold of unique speech sounds based on speaker-distinctive phonemes, phoneme clusters, and prosody. Then user utters the enrollment phrase and extracts voice characteristics for the user from the uttered enrollment phrase. The system generates a user profile, based on the voice characteristics, for generating random challenge sentences according to a grammar.05-24-2012
20120158405SYNCHRONISE AN AUDIO CURSOR AND A TEXT CURSOR DURING EDITING - A speech recognition device (06-21-2012
20110066432CONTENT FILTERING FOR A DIGITAL AUDIO SIGNAL - According to some embodiments, content filtering is provided for a digital audio signal.03-17-2011
20110106534Voice Actions on Computing Devices - A computer-implemented method includes receiving spoken input at a computing device from a user of the computing device, the spoken input including a carrier phrase and a subject to which the carrier phrase is directed, providing at least a portion of the spoken input to a server system in audio form for speech-to-text conversion by the server system, the portion including the subject to which the carrier phrase is directed, receiving from the server system instructions for automatically performing an operation on the computing device, the operation including an action defined by the carrier phrase using parameters defined by the subject, and automatically performing the operation on the computing device.05-05-2011
20110106535Caption presentation method and apparatus using same - A caption presentation method and an apparatus using the method, by which caption and information related to the caption can be provided together in a broadcast receiver or in an image reproducer that displays the caption in a closed caption method. The method includes detecting subject information from a caption signal; obtaining visual information with respect to the caption, based on the detected caption subject information; and displaying the visual information and the caption signal together.05-05-2011
20100094628System and Method for Latency Reduction for Automatic Speech Recognition Using Partial Multi-Pass Results - A system and method is provided for reducing latency for automatic speech recognition. In one embodiment, intermediate results produced by multiple search passes are used to update a display of transcribed text.04-15-2010
20120123778Security Control for SMS and MMS Support Using Unified Messaging System - A method and apparatus for providing security control of short messaging service (SMS) messages and multimedia messaging service (MMS) messages in a unified messaging (UM) system are disclosed. An SMS or MMS message directed to a recipient mailbox in a UM system is received. It is determined that the recipient mailbox is a secondary mailbox associated with a primary mailbox in the UM system. The message is audited according to an audit policy associated with the recipient mailbox.05-17-2012
20120123779MOBILE DEVICES, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR ENHANCING SOCIAL INTERACTIONS WITH RELEVANT SOCIAL NETWORKING INFORMATION - Devices, methods, and computer program products are for facilitating enhanced social interactions using a mobile device. A method for facilitating an enhanced social interaction using a mobile device includes receiving an audio input at the mobile device, determining a salient portion of the audio input, receiving relevant information associated with the salient portion, and presenting the relevant information via the mobile device.05-17-2012
20100250249Communication control apparatus, communication control method, and computer-readable medium storing a communication control program - A communication control apparatus for communicating sound and image with another communication control apparatus via a network, includes a sound input device that acquires sound data from a sound of a user's speech, a level measuring device that measures a volume level of sound data input from the sound input device, a first determining device that determines whether the volume level measured by the level measuring device is smaller than a predetermined standard volume value, a sound recognizing device that executes sound reorganization of the sound data so as to create text data when the first determining device determines that the volume level is smaller than the standard volume value, and a transmitting device that transmits the text data created by the sound recognizing device to the another communication control apparatus.09-30-2010
20100042409AUTOMATED VOICE SYSTEM AND METHOD - A voice browser for use in a customer interaction management system for interacting with a customer, based on data contained in one or more generic knowledge trees. The system can traverse the knowledge tree based on customer responses to presented questions until a leaf node of the knowledge tree is reached. The information contained in the leaf node is then presented to the customer.02-18-2010
20120084086SYSTEM AND METHOD FOR OPEN SPEECH RECOGNITION - Disclosed herein are systems, methods and non-transitory computer-readable media for performing speech recognition across different applications or environments without model customization or prior knowledge of the domain of the received speech. The disclosure includes recognizing received speech with a collection of domain-specific speech recognizers, determining a speech recognition confidence for each of the speech recognition outputs, selecting speech recognition candidates based on a respective speech recognition confidence for each speech recognition output, and combining selected speech recognition candidates to generate text based on the combination.04-05-2012
20120221332SYSTEM AND METHOD FOR REFERRING TO ENTITIES IN A DISCOURSE DOMAIN - Systems, methods, and non-transitory computer-readable media for referring to entities. The method includes receiving domain-specific training data of sentences describing a target entity in a context, extracting a speaker history and a visual context from the training data, selecting attributes of the target entity based on at least one of the speaker history, the visual context, and speaker preferences, generating a text expression referring to the target entity based on at least one of the selected attributes, the speaker history, and the context, and outputting the generated text expression. The weighted finite-state automaton can represent partial orderings of word pairs in the domain-specific training data. The weighted finite-state automaton can be speaker specific or speaker independent. The weighted finite-state automaton can include a set of weighted partial orderings of the training data for each possible realization.08-30-2012
20120166192PROVIDING TEXT INPUT USING SPEECH DATA AND NON-SPEECH DATA - Systems, methods, and computer readable media providing a speech input interface. The interface can receive speech input and non-speech input from a user through a user interface. The speech input can be converted to text data and the text data can be combined with the non-speech input for presentation to a user.06-28-2012
20120166191ELECTRONIC BOOK WITH VOICE EMULATION FEATURES - A method and system for providing text-to-audio conversion of an electronic book displayed on a viewer. A user selects a portion of displayed text and converts it into audio. The text-to-audio conversion may be performed via a header file and pre-recorded audio for each electronic book, via text-to-speech conversion, or other available means. The user may select manual or automatic text-to audio conversion. The automatic text-to-audio conversion may be performed by automatically turning the pages of the electronic book or by the user manually turning the pages. The user may also select to convert the entire electronic book, or portions of it, into audio. The user may also select an option to receive an audio definition of a particular word in the electronic book. The present invention allows a user to control the system by selecting options from a screen or by entering voice commands.06-28-2012
20120215533Method of and System for Error Correction in Multiple Input Modality Search Engines - A method of and system for error correction in multiple input modality search engines is presented. A method of processing input information based on an information type of the input information includes receiving input information for performing a search for identifying at least one item desired by a user and determining an information type associated with the input information. The method also includes forming a query input for identifying the at least one item desired by the user based on the input information and on the information type. The method further includes submitting the query input to at least one search engine system.08-23-2012
20100204989APPARATUS AND METHOD FOR QUEUING JOBS IN A DISTRIBUTED DICTATION /TRANSCRIPTION SYSTEM - A distributed dictation/transcription system is provided. The system provides a client station, dictation manager, and dictation server connected such that the dictation manager can select a dictation server to transcribe audio from the client station. A job queue at the dictation manager holds the queues the audio to be provided to the dictation servers. The dictation manager reviews all jobs in the job queue and send audio with a user profile matching a user profile already uploaded to the dictation server regardless of whether the matching audio is next in the job queue. If alternative audio has been pending over a predetermined amount of time or has a higher priority, the alternative audio is sent to the dictation server.08-12-2010
20100063815REAL-TIME TRANSCRIPTION - A computing system accepts audio from one or more sources, parses the audio into chunks, and transcribes the chunks in substantially real time. Some transcription is performed automatically, while other transcription is performed by humans who listen to the audio and enter the words spoken and/or the intent of the caller (such as directions given to the system). The system provides for participants a user interface that is updated in substantially real time with the transcribed text from the audio stream(s). A single audio line can be used for simple transcription, and multiple audio lines are used to provide a real-time transcript of a conference call, deposition, or the like. A pool of analysts creates, checks, and/or corrects transcription, and callers/observers can even assist in the correction process through their respective user interfaces. Ads derived from the transcript are displayed together with the text in substantially real time.03-11-2010
20120215532HEARING ASSISTANCE SYSTEM FOR PROVIDING CONSISTENT HUMAN SPEECH - Broadly speaking, the embodiments disclosed herein describe an apparatus, system, and method that allows a user of a hearing assistance system to perceive consistent human speech. The consistent human speech can be based upon user specific preferences.08-23-2012
20100174543SYSTEMS FOR DISPLAYING CONVERSIONS OF TEXT EQUIVALENTS - Embodiments of the invention include a system for displaying an audit diagram. The system includes a monitor capable of electronically displaying the audit diagram. The monitor includes a text equivalent constructed from an input text, and a conversion representation including an operator indicator, a result arrow, and a rule arrow.07-08-2010
20120253804VOICE PROCESSOR AND VOICE PROCESSING METHOD - According to one embodiment, a voice processor includes: a storage module; a converter; a character string converter; a similarity calculator; and an output module. The storage module stores therein first character string information and a first phoneme symbol corresponding thereto in association with each other. The converter converts an input voice into a second phoneme symbol. The character string converter converts the second phoneme symbol into second character string information in which content of the voice is described in a natural language. The similarity calculator calculates similarity between the input voice and a portion of the first character string information stored in the storage module using at least one of the second phoneme symbol converted by the converter and the second character string information converted by the character string converter. The output module outputs the first character string information based on the similarity calculated by the similarity calculator.10-04-2012
20120173235Offline Generation of Subtitles - One embodiment described herein may take the form of a system or method for generating subtitles (also known as “closed captioning”) of an audio component of a multimedia presentation automatically for one or more stored presentations. In general, the system or method may access one or more multimedia programs stored on a storage medium, either as an entire program or in portions. Upon retrieval, the system or method may perform an analysis of the audio component of the program and generate a subtitle text file that corresponds to the audio component. In one embodiment, the system or method may perform a speech recognition analysis on the audio component to generate the subtitle text file.07-05-2012
20120253803VOICE RECOGNITION DEVICE AND VOICE RECOGNITION METHOD - According to embodiments, a voice inputting unit converts voice into a digital signal. The state detecting unit includes an acceleration sensor, and detects movement and/or a state of an equipment main body. The holding unit stores movement or state pattern models of predetermined movement or a state of the equipment main body and predetermined voice recognition process patterns corresponding to the models. The pattern detecting unit detects whether or not movement and/or a state of the equipment main body from the state detecting unit matches the movement or state pattern models stored in the holding unit, and detects a voice recognition process pattern corresponding to the matched model. The voice recognition process executing unit executes the voice recognition process on the digital signal output from the voice inputting unit according to the detected voice recognition process pattern.10-04-2012
20120221331Method and Apparatus for Automatically Building Conversational Systems - A system and method provides a natural language interface to world-wide web content. Either in advance or dynamically, webpage content is parsed using a parsing algorithm. A person using a telephone interface can provide speech information, which is converted to text and used to automatically fill in input fields on a webpage form. The form is then submitted to a database search and a response is generated. Information contained on the responsive webpage is extracted and converted to speech via a text-to-speech engine and communicated to the person.08-30-2012
20120221330LEVERAGING SPEECH RECOGNIZER FEEDBACK FOR VOICE ACTIVITY DETECTION - A voice activity detection (VAD) module analyzes a media file, such as an audio file or a video file, to determine whether one or more frames of the media file include speech. A speech recognizer generates feedback relating to an accuracy of the VAD determination. The VAD module leverages the feedback to improve subsequent VAD determinations. The VAD module also utilizes a look-ahead window associated with the media file to adjust estimated probabilities or VAD decisions for previously processed frames.08-30-2012
20120316873METHOD OF PROVIDING INFORMATION AND MOBILE TELECOMMUNICATION TERMINAL THEREOF - A method of providing information of a mobile communication terminal, and a mobile communication terminal for performing the method, are provided. The method includes determining whether a search command event has been generated during a call with a counterpart terminal, converting a voice signal received from a microphone into a text when the generation of search command event is determined to have been generated, identifying information matching the text in a memory, and sending the information to the counterpart terminal.12-13-2012
20100299147SPEECH-TO-SPEECH TRANSLATION - Systems and methods for facilitating communication including recognizing speech in a first language represented in a first audio signal; forming a first text representation of the speech; processing the first text representation to form data representing a second audio signal; and causing presentation of the second audio signal to a second user while responsive to an interrupt signal from a first user. In some embodiments, processing the first text representation includes translating the first text representation to a second text representation in a second language and processing the second text representation to form the data representing the second audio signal. In some embodiments include accepting an interrupt signal from the first user and interrupting the presentation of the second audio signal.11-25-2010
20100299146Speech Capabilities Of A Multimodal Application - Improving speech capabilities of a multimodal application including receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.11-25-2010
20100250250SYSTEMS AND METHODS FOR GENERATING A HYBRID TEXT STRING FROM TWO OR MORE TEXT STRINGS GENERATED BY MULTIPLE AUTOMATED SPEECH RECOGNITION SYSTEMS - A hybrid text generator is disclosed that generates a hybrid text string from multiple text strings that are produced from an audio input by multiple automated speech recognition systems. The hybrid text generator receives metadata that describes a time-location that each word from the multiple text strings is located in the audio input. The hybrid text generator matches words between the multiple text strings using the metadata and generates a hybrid text string that includes the matched words. The hybrid text generator utilizes confidence scores associated with words that do not match between the multiple text strings to determine whether to add an unmatched word to the hybrid text string.09-30-2010
20120179465REAL TIME GENERATION OF AUDIO CONTENT SUMMARIES - Audio content is converted to text using speech recognition software. The text is then associated with a distinct voice or a generic placeholder label if no distinction can be made. From the text and voice information, a word cloud is generated based on key words and key speakers. A visualization of the cloud displays as it is being created. Words grow in size in relation to their dominance. When it is determined that the predominant words or speakers have changed, the word cloud is complete. That word cloud continues to be displayed statically and a new word cloud display begins based upon a new set of predominant words or a new predominant speaker or set of speakers. This process may continue until the meeting is concluded. At the end of the meeting, the completed visualization may be saved to a storage device, sent to selected individuals, removed, or any combination of the preceding.07-12-2012
20120232897Locating Products in Stores Using Voice Search From a Communication Device - A user can locate products by dialing a number from any phone and accessing an automatic voice recognition system. Reply is made to the user with information locating the product using a store's product location data converted to automatic voice responses. Smart phone and mobile web access to a product database is enabled using voice-to-text and text search. A taxonomy enables product search requests by product descriptions and/or product brand names, and enable synonyms and phonetic enhancements to the system. Search results are related to products and product categories with concise organization. Relevant advertisements, promotional offers and coupons are delivered based upon search and taxonomy elements. Search requests generate dynamic interior maps of a products location inside the shoppers' location, assisting a shopper to efficiently shop the location for listed items. Business intelligence of product categories enable rapid scaling across retail segments.09-13-2012
20120232898SYSTEM AND METHOD OF PROVIDING AN AUTOMATED DATA-COLLECTION IN SPOKEN DIALOG SYSTEMS - The invention relates to a system and method for gathering data for use in a spoken dialog system. An aspect of the invention is generally referred to as an automated hidden human that performs data collection automatically at the beginning of a conversation with a user in a spoken dialog system. The method comprises presenting an initial prompt to a user, recognizing a received user utterance using an automatic speech recognition engine and classifying the recognized user utterance using a spoken language understanding module. If the recognized user utterance is not understood or classifiable to a predetermined acceptance threshold, then the method re-prompts the user. If the recognized user utterance is not classifiable to a predetermined rejection threshold, then the method transfers the user to a human as this may imply a task-specific utterance. The received and classified user utterance is then used for training the spoken dialog system.09-13-2012
20090326938MULTIWORD TEXT CORRECTION - A method including detecting a selection of a plurality of erroneous words in text presented on a display of a device, in an automatic speech recognition system, receiving sequentially dictated corrections for the selected erroneous words in a single, continuous operation where each dictated correction corresponds to at least one of the selected erroneous words, and replacing the plurality of erroneous words with one or more corresponding words of the dictated corrections where each erroneous word is matched with the one or more corresponding words of the dictated corrections in an order the erroneous words appear according to a reading direction of the text.12-31-2009
20120316875HOSTED SPEECH HANDLING - Embodiments of the invention provide systems and methods for speech signal handling. Speech handling according to one embodiment of the present invention can be performed via a hosted architecture. Electrical signal representing human speech can be analyzed with an Automatic Speech Recognizer (ASR) hosted on a different server from a media server or other server hosting a service utilizing speech input. Neither server need be located at the same location as the user. The spoken sounds can be accepted as input to and handled with a media server which identifies parts of the electrical signal that contain a representation of speech. This architecture can serve any user who has a web-browser and Internet access, either on a PC, PDA, cell phone, tablet, or any other computing device.12-13-2012
20120259636METHOD AND APPARATUS FOR PROCESSING SPOKEN SEARCH QUERIES - Some embodiments relate to a method of performing a search for content on the Internet, in which a user may speak a search query and speech recognition may be performed on the spoken query to generate a text search query to be provided to a plurality of search engines. This enables a user to speak the search query rather than having to type it, and also allows the user to provide the search query only once, rather than having to provide it separately to multiple different search engines.10-11-2012
20120259633AUDIO-INTERACTIVE MESSAGE EXCHANGE - A completely hands free exchange of messages, especially in portable devices, is provided through a combination of speech recognition, text-to-speech (TTS), and detection algorithms. An incoming message may be read aloud to a user and the user enabled to respond to the sender with a reply message through audio input upon determining whether the audio interaction mode is proper. Users may also be provided with options for responding in a different communication mode (e.g., a call) or perform other actions. Users may further be enabled to initiate a message exchange using natural language.10-11-2012
20090048832SPEECH-TO-TEXT SYSTEM, SPEECH-TO-TEXT METHOD, AND SPEECH-TO-TEXT PROGRAM - [Problems] To provide a speech-to-text system and the like capable of matching edit result text acquired by editing recognition result text or edit result text which is newly-written text information with speech data.02-19-2009
20090018830SPEECH CONTROL OF COMPUTING DEVICES - The invention relates to techniques of controlling a computing device via speech. A method realization of the proposed techniques comprises the steps of transforming speech input into a text string comprising one or more input words; performing a context-related mapping of the input words to one or more functions for controlling the computing device; and preparing an execution of the identified function. Another realization is related to a remote speech control of computing devices.01-15-2009
20080319742SYSTEM AND METHOD FOR POSTING TO A BLOG OR WIKI USING A TELEPHONE - The present invention discloses a system and method for creating, editing, and posting a BLOG or a WIKI using a telephone. In the invention, a voice-based, real-time telephone communication can be established between a user and a voice response system. User speech can be received over the communication. The user speech can be speech-to-text converted to produce text. The text can be added to a BLOG or a WIKI, which can be posted to a server. The telephone communication can be terminated. The newly posted BLOG or WIKI can be served by the server to clients.12-25-2008
20120265529SYSTEMS AND METHODS FOR OBTAINING AND DISPLAYING AN X-RAY IMAGE - A method and system for transcription of spoken language into continuous text for a user comprising the steps of inputting spoken language of at least one user or of a communication partner of the at least one user into a mobile device of the respective user, wherein the input spoken language of the user is transported within a corresponding stream of voice over IP data packets to a transcription server; transforming the spoken language transported within the respective stream of voice over IP data packets into continuous text by means of a speech recognition algorithm run by said transcription server, wherein said speech recognition algorithm is selected depending on a natural language or dialect spoken in the area of the current position of said mobile device; and outputting said transformed continuous text forwarded by said transcription server to said mobile device of the respective user or to a user terminal of the respective user in real time.10-18-2012
20080300872SCALABLE SUMMARIES OF AUDIO OR VISUAL CONTENT - Providing for browsing a summary of content formed of keywords that can scale to a user-defined level of detail is disclosed herein. Components of a system can include a summarization component that extracts keywords related to the content and associates the keywords with portions thereof, and a zooming component that displays a number of keywords based on a keyword/keyphrase relevance rank and a zoom factor. Additionally, a speech to text component can translate speech associated with the content into text, wherein the keywords are extracted from the translated text. Consequently, the claimed subject matter can present a variable hierarchy of keywords to form a scalable summary of such recorded content.12-04-2008
20080300874SPEECH SKILLS ASSESSMENT - An approach to evaluating a person's speech skills includes automatically processing speech of a person and text some or all of which corresponds to the speech. In some examples, a job application procedure includes collecting speech from an applicant, and using text corresponding to the collected speech to automatically assess speech skills of the applicant. The text may include text that is presented to the applicant and the speech collected from the applicant can include the applicant reading the presented text.12-04-2008
20080300873Systems And Methods For Securely Transcribing Voicemail Messages - A system or method for securely transcribing voicemail messages includes answering a call within a secure communication provider, the secure communication provider recording audio of the call, sending the audio to a voicemail transcription service via a secure communication link, transcribing the audio into text, and sending the text to the secure communication provider via the secure communication link, the audio and the text not being permanently stored and not being available for interpretation by humans during this transcription method.12-04-2008
20120239396MULTIMODAL REMOTE CONTROL - A method and system for operating a remotely controlled device may use multimodal remote control commands that include a gesture command and a speech command. The gesture command may be interpreted from a gesture performed by a user, while the speech command may be interpreted from speech utterances made by the user. The gesture and speech utterances may be simultaneously received by the remotely controlled device in response to displaying a user interface configured to receive multimodal commands.09-20-2012
20120239397Digital Ink Database Searching Using Handwriting Feature Synthesis - A method of searching a digital ink database is disclosed. The digital ink database is associated with a specific author. The method starts by receiving a computer text query from an input device. The computer text query is then mapped to a set of feature vectors using a handwriting model of that specific author. As a result, the set of feature vectors approximates features that would have been extracted had that specific author written the computer query text by hand. Finally, the set of feature vectors is used to search the digital ink database.09-20-2012
20110131041Systems And Methods For Synthesis Of Motion For Animation Of Virtual Heads/Characters Via Voice Processing In Portable Devices - Systems and methods consistent with the innovations herein relate to communication using a virtual humanoid animated during call processing. According to one exemplary implementation, the animation may be performed using a system of recognition of spoken vowels for animation of the lips, which may also be associated with the recognition of DTMF tones for animation of head movements and facial features. The innovations herein may be generally implemented in portable devices such as PDAs, cell phones and Smart Phones that have access to mobile telephony.06-02-2011
20120265527INTERACTIVE VOICE RECOGNITION ELECTRONIC DEVICE AND METHOD - An interactive voice recognition electronic device converts a received voice signal to a text, and searches a voice databases to find a matched voice text of the converted text. The matched voice text is taken as a recognized voice text of the voice signal if the matched voice text exists in the voice database. The electronic device obtains a predetermined number of similar voice texts if no matched voice text exists in the voice database. The electronic device converts the predetermined number of similar voice texts to the voice signals, outputs the converted voice signals in turn, and selects one of the similar voice texts as the recognized voice text according to the selection of the user. The electronic device obtains the associated answer text of the recognized voice text in the voice database and converts the answer text to voice signals.10-18-2012
20120265528Using Context Information To Facilitate Processing Of Commands In A Virtual Assistant - A virtual assistant uses context information to supplement natural language or gestural input from a user. Context helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input. Context can include any available information that is usable by the assistant to supplement explicit user input to constrain an information-processing problem and/or to personalize results. Context can be used to constrain solutions during various phases of processing, including, for example, speech recognition, natural language processing, task flow processing, and dialog generation.10-18-2012
20120323572Document Extension in Dictation-Based Document Generation Workflow - An automatic speech recognizer is used to produce a structured document representing the contents of human speech. A best practice is applied to the structured document to produce a conclusion, such as a conclusion that required information is missing from the structured document. Content is inserted into the structured document based on the conclusion, thereby producing a modified document. The inserted content may be obtained by prompting a human user for the content and receiving input representing the content from the human user.12-20-2012
20120278074MULTISENSORY SPEECH DETECTION - A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.11-01-2012
20120278071TRANSCRIPTION SYSTEM - A transcription system automates the control of the playback of the audio to accommodate the user's ability to transcribe the words spoken. In some examples, a delay between playback and typed input is estimated by processing the typed words using a wordspotting approach. The estimated delay is used as in input to an automated speed control, for example, to maintain a target or maximum delay between playback and typed input.11-01-2012
20120278073MOBILE SYSTEMS AND METHODS OF SUPPORTING NATURAL LANGUAGE HUMAN-MACHINE INTERACTIONS - A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network.11-01-2012
20120278072REMOTE HEALTHCARE SYSTEM AND HEALTHCARE METHOD USING THE SAME - A remote healthcare system includes a healthcare staff terminal which includes an input part configured to input text to be transmitted to a patient by a healthcare staff member, and a first transmitter-receiver part configured to transmit the text and a qualifier of the healthcare staff member; a server which includes a second transmitter-receiver part configured to receive the text and the qualifier of the healthcare staff member transmitted from the healthcare staff terminal, an acoustic source database having an acoustic source of the healthcare staff member stored therein, and a converter configured to change the text into voice using the stored acoustic source of the healthcare staff member; and a patient terminal which includes a third transmitter-receiver part configured to receive the voice converted from the text and the text transmitted by the second transmitter-receiver part of the server, and an output part configured to output the voice to the patient who is managed by the healthcare staff member.11-01-2012
20120089395SYSTEM AND METHOD FOR NEAR REAL-TIME IDENTIFICATION AND DEFINITION QUERY - A method of operating a communication system includes generating a transcript of at least a portion of a conversation between a plurality of users. The transcript includes a plurality of subsets of characters. The method further includes displaying the transcript on a plurality of communication devices, identifying an occurrence of at least one selected subset of characters from the plurality of subsets of characters, and querying a definition source for at least one definition for the selected subset of characters. The definition for the selected subset of characters is displayed on the plurality of communication devices.04-12-2012
20120089394Visual Display of Semantic Information - Techniques involving visual display of information related to matching user utterances against graph patterns are described. In one or more implementations, an utterance of a user is obtained that has been indicated as corresponding to a graph pattern through linguistic analysis. The utterance is displayed in a user interface as a representation of the graph pattern.04-12-2012
20110276328APPLICATION SERVER FOR REDUCING AMBIANCE NOISE IN AN AUSCULTATION SIGNAL, AND FOR RECORDING COMMENTS WHILE AUSCULTATING A PATIENT WITH AN ELECTRONIC STETHOSCOPE - An application server for reducing ambiance noise in an auscultation signal, and for recording comments while auscultating a patient with an electronic stethoscope This application server (AS) comprises: means (SPH) for receiving samples of a raw auscultation signal representing auscultation sounds mixed with ambiance sounds, this raw auscultation signal being transmitted by a first microphone (M11-10-2011
20110276326METHOD AND SYSTEM FOR OPERATIONAL IMPROVEMENTS IN DISPATCH CONSOLE SYSTEMS IN A MULTI-SOURCE ENVIRONMENT - A method and system for operational improvements in a dispatch console in a multi-source environment includes receiving (11-10-2011
20120330659INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM - An information processing device includes a display data creating unit configured to create display data including characters representing the content of an utterance based on a sound and a symbol surrounding the characters and indicating a first direction, and an image combining unit configured to determine the position of the display data based on a display position of an image representing a sound source of the utterance, and to combine the display data and the image of the sound source so that an orientation in which the sound is radiated is matched with the first direction.12-27-2012
20120330658SYSTEMS AND METHODS TO PRESENT VOICE MESSAGE INFORMATION TO A USER OF A COMPUTING DEVICE - Systems and methods to process and/or present information relating to voice messages for a user that are received from other persons. In one embodiment, a method implemented in a data processing system includes: receiving first data associated with prior communications or activities for a first user on a mobile device; receiving a voice message for the first user; transcribing the voice message using the first data to provide a transcribed message; and sending the transcribed message to the mobile device for display to the user.12-27-2012
20120330661Electronic Devices with Voice Command and Contextual Data Processing Capabilities - An electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received. The electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server. The computing equipment may perform a speech recognition operation on the voice command and may process the contextual information. The computing equipment may respond to the voice command. The computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command.12-27-2012
20120330660Detecting and Communicating Biometrics of Recorded Voice During Transcription Process - A method and system for determining and communicating biometrics of a recorded speaker in a voice transcription process. An interactive voice response system receives a request from a user for a transcription of a voice file. A profile associated with the requesting user is obtained, wherein the profile comprises biometric parameters and preferences defined by the user. The requested voice file is analyzed for biometric elements according to the parameters specified in the user's profile. Responsive to detecting biometric elements in the voice file that conform to the parameters specified in the user's profile, a transcription output of the voice file is modified according to the preferences specified in the user's profile for the detected biometric elements to form a modified transcription output file. The modified transcription output file may then be provided to the requesting user.12-27-2012
20100100379VOICE RECOGNITION CORRELATION RULE LEARNING SYSTEM, VOICE RECOGNITION CORRELATION RULE LEARNING PROGRAM, AND VOICE RECOGNITION CORRELATION RULE LEARNING METHOD - A speech recognition rule learning device is connected to a speech recognition device that uses conversion rules for conversion between a first-type character string expressing a sound and a second-type character string for forming a recognition result. The character string recording unit records a first-type character string and a corresponding second-type character string. The extraction unit extracts second-type learned character string candidates. The rule learning unit extracts, from the second-type learned character string candidates, a second-type learned character string that matches at least part of the second-type character string in the character string recording unit; extracts a first-type learned character string from the first-type character string in the character string recording unit; and adds the correspondence relationship between the first-type learned character string and the second-type learned character string to the conversion rules.04-22-2010
20110307255System and Method for Conversion of Speech to Displayed Media Data - A method for instantaneous and real-time conversion of sound into media data and with the ability to project, print, copy, or manipulate such media data. The invention relates to a method for converting speech to a text string, recognizing the text string, and then displaying the media data that corresponds with the text string.12-15-2011
20110307254SPEECH RECOGNITION INVOLVING A MOBILE DEVICE - A system and method of speech recognition involving a mobile device. Speech input is received (12-15-2011
20110320197METHOD FOR INDEXING MULTIMEDIA INFORMATION - It comprises analyzing audio content of multimedia files and performing a speech to text transcription thereof automatically by means of an ASR process, and selecting acoustic and language models adapted for the ASR process at least before the latter processes the multimedia file, i.e. “a priori”.12-29-2011
20110320198INTERACTIVE ENVIRONMENT FOR PERFORMING ARTS SCRIPTS - One or more embodiments present a script to a user in an interactive script environment. A digital representation of a manuscript is analyzed. This digital representation includes a set of roles and a set of information associated with each role in the set of roles. An active role in the set of roles that is associated with a given user is identified based on the analyzing. At least a portion of the manuscript is presented to the given user via a user interface. The portion includes at least a subset of information in the set of information. Information within the set of information that is associated with the active role is presented in a visually different manner than information within the set of information that is associated with a non-active role, which is a role that is associated with a user other than the given user.12-29-2011
20120290299Translating Between Spoken and Written Language - Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.11-15-2012
20120290300APPARATUS AND METHOD FOR FOREIGN LANGUAGE STUDY - The apparatus for foreign language study includes: a voice recognition device configured to recognize a speech entered by a user and convert the speech into a speech text; a speech intent recognition device configured to extract a user speech intent for the speech text using skill level information of the user and dialogue context information; and a feedback processing device configured to extract a different expression depending on the user speech intent and a speech situation of the user. According to the present invention, the intent of a learner's speech may be determined even though the learner's skill is low, and customized expressions for various situations may be provided to the learner.11-15-2012
20120290301METHOD AND SYSTEM OF ENABLING INTELLIGENT AND LIGHTWEIGHT SPEECH TO TEXT TRANSCRIPTION THROUGH DISTRIBUTED ENVIRONMENT - A system includes at least one wireless client device, a service manager, and a plurality of voice transcription servers. The service manager includes a resource management service and a profile management service. The client device communicates the presence of a voice transcription task to the resource management service. The resource management service surveys the plurality of voice transcription servers and selects one voice transcription server based on a set of predefined criteria. The resource management service then communicated an address of the selected server to the profile management service, which then transmits a trained voice profile or default profile to the selected server. The address of the selected server is then sent to the client device, which then transmits an audio stream to the server. Finally, the selected server transcribes the audio stream to a textual format.11-15-2012
20100198595SYSTEMS AND METHODS FOR INTERACTIVELY ACCESSING HOSTED SERVICES USING VOICE COMMUNICATIONS - In a system comprising a voice recognition module, a session manager, and a voice generator module, a method for providing a service to a user comprises receiving an utterance via the voice recognition module; converting the utterance into one or more structures using lexicon tied to an ontology; identifying concepts in the utterance using the structures; provided the utterance includes sufficient information, selecting a service based on the concepts; generating a text message based on the selected service; and converting the text message to a voice message using the voice generator.08-05-2010
20100198594MOBILE PHONE COMMUNICATION GAP RECOVERY - Mobile phone signals may be corrupted by noise, fading, interference with other signals, and low strength field coverage of a transmitting and/or a receiving mobile phone as they pass through the communication network (e.g., free space). Because of the corruption of the mobile phone signal, a voice conversation between a caller and a receiver may be interrupted and there may be gaps in a received oral communication from one or more participants in the voice conversation forcing either or both the caller and the receiver to repeat the conversation. Transmitting a transcript of the oral communication along with a voice signal comprising the oral communication can help ensure that voice conversation is not interrupted due to a corrupted voice signal. The transcript of the oral communication can be used to retrieve parts of the oral communication lost in transmission (e.g., by fading, etc.) to make the conversation more fluid.08-05-2010
20100198596MESSAGE TRANSCRIPTION, VOICE QUERY AND QUERY DELIVERY SYSTEM - A message transmission system accepts a telephone call from a user who wishes to send an e-mail message, send an SMS message, perform an Internet query or retrieve his or her electronic mail. The voice call is transcribed and the message is sent, or the question in the voice call is transcribed and answered by an agent. Any number of agents connect to a central site over an Internet connection and transcribe messages or answer queries in an assembly line like fashion. In addition, a Web query delivery system accepts a query or statement from a user; the query is transcribed, classified, and then broadcast over any medium to any number of experts or web sites that desire to answer the particular type of query received. The entire query is delivered to an expert or web site who provides a full answer to the user.08-05-2010
20100169091DEVICE, SYSTEM AND METHOD FOR PROVIDING TARGETED ADVERTISEMENTS AND CONTENT - An aspect of the present invention is drawn to an audio data processing device for use by a user to control a system and for use with a microphone, a user demographic profiles database and a content/ad database. The microphone may be operable to detect speech and to generate speech data based on the detected speech. The user demographic profiles database may be capable of having demographic data stored therein. The content/ad database may be capable of having at least one of content data and advertisement data stored therein. The audio data processing device includes a voice recognition portion, a voice analysis portion and a speech to text portion. The voice recognition portion may be operable to process user instructions based on the speech data. The voice analysis portion may be operable to determine characteristics of the user based on the speech data. The speech to text portion may be operable to determine interests of the user.07-01-2010
20130013306TRANSCRIPTION DATA EXTRACTION - A computer program product, for performing data determination from medical record transcriptions, resides on a computer-readable medium and includes computer-readable instructions for causing a computer to obtain a medical transcription of a dictation, the dictation being from medical personnel and concerning a patient, analyze the transcription for an indicating phrase associated with a type of data desired to be determined from the transcription, the type of desired data being relevant to medical records, determine whether data indicated by text disposed proximately to the indicating phrase is of the desired type, and store an indication of the data if the data is of the desired type.01-10-2013
20130013305METHOD AND SUBSYSTEM FOR SEARCHING MEDIA CONTENT WITHIN A CONTENT-SEARCH SERVICE SYSTEM - Various embodiments of the present invention include concept-service components of content-search-service systems which employ ontologies and vocabularies prepared for particular categories of content at particular times in order to score transcripts prepared from content items to enable a search-service component of a content-search-service system to assign estimates of the relatedness of portions of a content item to search criteria in order to render search results to clients of the content-search-service system. The concept-service component processes a search request to generate lists of related terms, and then employs the lists of related terms to process transcripts in order to score transcripts based on information contained in the ontologies.01-10-2013
20130013307DIFFERENTIAL DYNAMIC CONTENT DELIVERY WITH TEXT DISPLAY IN DEPENDENCE UPON SIMULTANEOUS SPEECH - Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users.01-10-2013
20120150538VOICE MESSAGE CONVERTER - A textual representation of a voice message is provided to a communication device, such as a mobile phone, for example, when the mobile phone is operating in a silent mode. The voice message is input by a caller and the voice message converted to phonemes. A text representation of the voice message is transmitted to the mobile phone. The representation includes characters based on the phonemes with well known words being represented in an easily understood shorthand format.06-14-2012
20130018655CONTINUOUS SPEECH TRANSCRIPTION PERFORMANCE INDICATION - A method of providing speech transcription performance indication includes receiving, at a user device data representing text transcribed from an audio stream by an ASR system, and data representing a metric associated with the audio stream; displaying, via the user device, said text; and via the user device, providing, in user-perceptible form, an indicator of said metric. Another method includes displaying, by a user device, text transcribed from an audio stream by an ASR system; and via the user device, providing, in user-perceptible form, an indicator of a level of background noise of the audio stream. Another method includes receiving data representing an audio stream; converting said data representing an audio stream to text via an ASR system; determining a metric associated with the audio stream; transmitting data representing said text to a user device; and transmitting data representing said metric to the user device.01-17-2013
20130018656FILTERING TRANSCRIPTIONS OF UTTERANCES - A method for facilitating mobile phone messaging, such as text messaging and instant messaging, includes receiving audio data communicated from the mobile communication device, the audio data representing an utterance that is intended to be at least a portion of the text of the message that is to be sent from the mobile phone to a recipient; transcribing the utterance to text based on the received audio data to generate a transcription; and applying a filter to the transcribed text to generate a filtered transcription, the text of which is intended to mimic language patterns of mobile device messaging that is performed manually by users. The method may also be applied to the audio data of a voicemail, with the filtered, transcribed text being communicated to a mobile phone as, for example, an SMS text message.01-17-2013
20110161080Speech to Text Conversion - Methods, computer program products and systems are described for speech-to-text conversion. A voice input is received from a user of an electronic device and contextual metadata is received that describes a context of the electronic device at a time when the voice input is received. Multiple base language models are identified, where each base language model corresponds to a distinct textual corpus of content. Using the contextual metadata, an interpolated language model is generated based on contributions from the base language models. The contributions are weighted according to a weighting for each of the base language models. The interpolated language model is used to convert the received voice input to a textual output. The voice input is received at a computer server system that is remote to the electronic device. The textual output is transmitted to the electronic device.06-30-2011
20110161079Grammar and Template-Based Speech Recognition of Spoken Utterances - The present invention relates to a communication system, comprising a database including classes of speech templates, in particular, classified according to a predetermined grammar; an input configured to receive and to digitize speech signals corresponding to a spoken utterance; a speech recognizer configured to receive and recognize the digitized speech signals; and wherein the speech recognizer is configured to recognize the digitized speech signals based on speech templates stored in the database and a predetermined grammatical structure.06-30-2011
20130024195CORRECTIVE FEEDBACK LOOP FOR AUTOMATED SPEECH RECOGNITION - A method for facilitating the updating of a language model includes receiving, at a client device, via a microphone, an audio message corresponding to speech of a user; communicating the audio message to a first remote server; receiving, that the client device, a result, transcribed at the first remote server using an automatic speech recognition system (“ASR”), from the audio message; receiving, at the client device from the user, an affirmation of the result; storing, at the client device, the result in association with an identifier corresponding to the audio message; and communicating, to a second remote server, the stored result together with the identifier.01-24-2013
20080255837Method for locating an audio segment within an audio file - A method for locating an audio segment within an audio file comprising (i) providing a first transcribed text file associated with the audio file; (ii) providing a second transcribed text file associated with the audio file; (iii) receiving a user input defining a text segment corresponding to the audio segment to be located; (iv) searching for the text segment in the first transcribed text file; and (v) displaying only those occurrences of the text segment within the first transcribed text file that are also a match to occurrences of the text segment within the second transcribed text file.10-16-2008
20080243500Automatic Editing Using Probabilistic Word Substitution Models - An input sequence of unstructured speech recognition text is transformed into output structured document text. A probabilistic word substitution model is provided which establishes association probabilities indicative of target structured document text correlating with source unstructured speech recognition text. The input sequence of unstructured speech recognition text is looked up in the word substitution model to determine likelihoods of the represented structured document text corresponding to the text in the input sequence. Then, a most likely sequence of structured document text is generated as an output.10-02-2008
20080221883HANDS FREE CONTACT DATABASE INFORMATION ENTRY AT A COMMUNICATION DEVICE - A method, system, and program provides for hands free contact database information entry at a communication device. A recording system at a communication device detects a user initiation to record. Responsive to detecting the user initiation to record, the recording system records the ongoing conversation supported between the communication device and a second remote communication device. The recording system converts the recording of the conversation into text. Next, the recording system extracts contact information from the text. Then, the recording system stores the extracted contact information in an entry of the contact database, such that contact information is added to the contact database of the communication device without manual entry of the contact information by the user.09-11-2008
20080221882SYSTEM FOR EXCLUDING UNWANTED DATA FROM A VOICE RECORDING - An apparatus and method for the preparation of a censored recording of an audio source according to a procedure whereby no tangible, durable version of the original audio data is created in the course of preparing the censored record. Further, a method is provided for identifying target speech elements in a primary speech text by iteratively using portions of already identified target elements to locate further target elements that contain identical portions. The target speech elements, once identified, are removed from the primary speech text or rendered unintelligible to produce a censored record of the primary speech text. Copies of such censored primary speech text elements may be transmitted and stored with reduced security precautions.09-11-2008
20080221881Recognition of Speech in Editable Audio Streams - A speech processing system divides a spoken audio stream into partial audio streams, referred to as “snippets.” The system may divide a portion of the audio stream into two snippets at a position at which the speaker performed an editing operation, such as pausing and then resuming recording, or rewinding and then resuming recording. The snippets may be transmitted sequentially to a consumer, such as an automatic speech recognizer or a playback device, as the snippets are generated. The consumer may process (e.g., recognize or play back) the snippets as they are received. The consumer may modify its output in response to editing operations reflected in the snippets. The consumer may process the audio stream while it is being created and transmitted even if the audio stream includes editing operations that invalidate previously-transmitted partial audio streams, thereby enabling shorter turnaround time between dictation and consumption of the complete audio stream.09-11-2008
20080221880Mobile music environment speech processing facility - In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a music software application resident on a mobile communication facility, where speech may be recorded using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the music software application. In embodiments, the user may be allowed to alter the results that are received from the speech recognition facility. In addition, the speech recognition facility may be adapted based on usage.09-11-2008
20080221879MOBILE ENVIRONMENT SPEECH PROCESSING FACILITY - In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a software application resident on a mobile communication facility, where recorded speech may be presented by the user using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility, and may be accompanied by information related to the software application. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the software application and the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the software application. In addition, the speech recognition facility may be adapted based on usage.09-11-2008
20130179165DYNAMIC PRESENTATION AID - Performing operations for dynamic display element management. The operations include receiving a verbal input. The operations also include automatically obtaining a display element from an element repository. The display element is a graphical representation of at least a portion of the verbal input. The display element includes a graphical image having a plurality of characteristics. The operations also include evaluating at least one of the plurality of characteristics relative to a present state of a display. The operations also include sending the display element to the display based on the evaluation of the present state of the display.07-11-2013
20130179166VOICE CONVERSION DEVICE, PORTABLE TELEPHONE TERMINAL, VOICE CONVERSION METHOD, AND RECORD MEDIUM - A portable-telephone terminal frees the user from repeatedly performing a correction process. A voice-conversion device includes a voice-recognition unit accepting a voice and converting the voice into a character string; a display unit displaying the character string; a correction unit accepting a correction command that causes a word or a phrase being a part of a character string displayed on the display unit to be corrected and correcting the word or phrase corresponding to the correction command; a storage unit storing a word or a phrase corrected by the correction unit; and a control unit generating a selection candidate corresponding to the corrected word or phrase of the character string and displaying the selection candidate as a recognition-result candidate of the voice on the display unit if the corrected word or phrase has been stored in the storage unit when the voice-recognition unit converts the voice into the character string.07-11-2013
20130173265Speech-to-online-text system - Speech-to-text software, sometimes known as dictation software, is software that lets you talk to the computer in some form and have the computer react appropriately to what you are saying. This is totally different to text-to-speech software, which is software can read out text already in the computer. Speech-to-online-text software allows you to speak words into the webpage of an Internet capable device. Speech-to-online-text software will also support the capabilities provided by speech-to-text software. The hardware required to support this technology is an Internet capable device and a compatible microphone. This capability will be especially useful for communicating in different languages and dialects around the world.07-04-2013
20120253802Location-Based Conversational Understanding - Location-based conversational understanding may be provided. Upon receiving a query from a user, an environmental context associated with the query may be generated. The query may be interpreted according to the environmental context. The interpreted query may be executed and at least one result associated with the query may be provided to the user.10-04-2012
20120253801AUTOMATIC DETERMINATION OF AND RESPONSE TO A TOPIC OF A CONVERSATION - A system, computer-readable medium, and method for automatically determining a topic of a conversation and responding to the topic determination are provided. In the method, an active topic is defined as a first topic in response to execution of an application. The first topic includes first text defining a plurality of phrases, a probability of occurrence associated with each of the plurality of phrases, and a response associated with each of the plurality of phrases. Speech text recognized from a recorded audio signal is received. Recognition of the speech text is based at least partially on the probability of occurrence associated with each of the plurality of phrases of the first topic. A phrase of the plurality of phrases associated with the received speech text is identified. The response associated with the identified phrase is performed by the computing device. The response includes instructions defining an action triggered by occurrence of the received speech text, wherein the action includes defining the active topic as a second topic.10-04-2012
20080215321Pitch model for noise estimation - Pitch is tracked for individual samples, which are taken much more frequently than an analysis frame. Speech is identified based on the tracked pitch and the speech components of the signal are removed with a time-varying filter, leaving only an estimate of a time-varying speech signal. This estimate is then used to generate a time-varying noise model which, in turn, can be used to enhance speech related systems.09-04-2008
20130090924DEVICE, SYSTEM AND METHOD FOR ENABLING SPEECH RECOGNITION ON A PORTABLE DATA DEVICE - Devices, systems and methods for converting analog data to digital data or digital data to analog data for enabling speech recognition processing on a portable data device are provided. The system includes at least one portable data device including an input module configured to receive analog audio signals; a processing module configured to convert the analog audio signals to digital audio data; a communication module configured to transmit the digital audio data to a remote processor and to receive digital text data from the remote processor; and a display module for displaying the received digital text data; the remote processor configured for receiving digital audio data, converting the digital audio data to digital text data and transmitting the converted digital text data to the at least one portable data device; and a communications network for coupling the remote processor to the at least one portable data device.04-11-2013
20130096916MULTICHANNEL DEVICE UTILIZING A CENTRALIZED OUT-OF-BAND AUTHENTICATION SYSTEM (COBAS) - A multichannel security system is disclosed, which system is for granting and denying access to a host computer in response to a demand from an access-seeking individual and computer. The access-seeker has a peripheral device operative within an authentication channel to communicate with the security system. The access-seeker initially presents identification and password data over an access channel which is intercepted and transmitted to the security computer. The security computer then communicates with the access-seeker. A biometric analyzer—a voice or fingerprint recognition device—operates upon instructions from the authentication program to analyze the monitored parameter of the individual. In the security computer, a comparator matches the biometric sample with stored data, and, upon obtaining a match, provides authentication. The security computer instructs the host computer to grant access and communicates the same to the access-seeker, whereupon access is initiated over the access channel.04-18-2013
20130103399DETERMINING AND CONVEYING CONTEXTUAL INFORMATION FOR REAL TIME TEXT - Aspects relate to machine recognition of human voices in live or recorded audio content, and delivering text derived from such live or recorded content as real time text, with contextual information derived from characteristics of the audio. For example, volume information can be encoded as larger and smaller font sizes. Speaker changes can be detected and indicated through text additions, or color changes to the font. A variety of other context information can be detected and encoded in graphical rendition commands available through RTT, or by extending the information provided with RTT packets, and processing that extended information accordingly for modifying the display of the RTT text content.04-25-2013
20130103400Document Transcription System Training - A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system my identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.04-25-2013
20130103401METHOD AND SYSTEM FOR SPEECH BASED DOCUMENT HISTORY TRACKING - A method and a system of history tracking corrections in a speech based document are disclosed. The speech based document comprises one or more sections of text recognized or transcribed from sections of speech, wherein the sections of speech are dictated by a user and processed by a speech recognizer in a speech recognition system into corresponding sections of text of the speech based document. The method comprises associating of at least one speech attribute (04-25-2013
20110313764System and Method for Latency Reduction for Automatic Speech Recognition Using Partial Multi-Pass Results - A system and method is provided for reducing latency for automatic speech recognition. In one embodiment, intermediate results produced by multiple search passes are used to update a display of transcribed text.12-22-2011
20130124204Displaying Sound Indications On A Wearable Computing System - Example methods and systems for displaying one or more indications that indicate (i) the direction of a source of sound and (ii) the intensity level of the sound are disclosed. A method may involve receiving audio data corresponding to sound detected by a wearable computing system. Further, the method may involve analyzing the audio data to determine both (i) a direction from the wearable computing system of a source of the sound and (ii) an intensity level of the sound. Still further, the method may involve causing the wearable computing system to display one or more indications that indicate (i) the direction of the source of the sound and (ii) the intensity level of the sound.05-16-2013
20130124202METHOD AND APPARATUS FOR PROCESSING SCRIPTS AND RELATED DATA - Provided in some embodiments is a method including receiving ordered script words are indicative of dialogue words to be spoken, receiving audio data corresponding to at least a portion of the dialogue words to be spoken and including timecodes associated with dialogue words, generating a matrix of the ordered script words versus the dialogue words, aligning the matrix to determine hard alignment points that include matching consecutive sequences of ordered script words with corresponding sequences of dialogue words, partitioning the matrix of ordered script words into sub-matrices bounded by adjacent hard-alignment points and including corresponding sub-sets the script and dialogue words between the hard-alignment points, and aligning each of the sub-matrices. The alignment of the sub-matrices including: matching script and dialogue words of the sub-subsets, assigning timecodes for matched ordered script words, and interpolating timecodes for the unmatched script words based on the timecodes of the matched script words.05-16-2013
20130124203Aligning Scripts To Dialogues For Unmatched Portions Based On Matched Portions - Provided in some embodiments is a computer implemented method that includes providing script data including script words indicative of dialogue words to be spoken, providing recorded dialogue audio data corresponding to at least a portion of the dialogue words to be spoken, wherein the recorded dialogue audio data includes timecodes associated with recorded audio dialogue words, matching at least some of the script words to corresponding recorded audio dialogue words to determine alignment points, determining that a set of unmatched script words are accurate based on the matching of at least some of the script words matched to corresponding recorded audio dialogue words, generating time-aligned script data including the script words and their corresponding timecodes and the set of unmatched script words determined to be accurate based on the matching of at least some of the script words matched to corresponding recorded audio dialogue words.05-16-2013
20130132079INTERACTIVE SPEECH RECOGNITION - A first plurality of audio features associated with a first utterance may be obtained. A first text result associated with a first speech-to-text translation of the first utterance may be obtained based on an audio signal analysis associated with the audio features, the first text result including at least one first word. A first set of audio features correlated with at least a first portion of the first speech-to-text translation associated with the at least one first word may be obtained. A display of at least a portion of the first text result that includes the at least one first word may be initiated. A selection indication may be received, indicating an error in the first speech-to-text translation, the error associated with the at least one first word.05-23-2013
20130144619ENHANCED VOICE CONFERENCING - Techniques for ability enhancement are described. Some embodiments provide an ability enhancement facilitator system (“AEFS”) configured to enhance voice conferencing among multiple speakers. In one embodiment, the AEFS receives data that represents utterances of multiple speakers who are engaging in a voice conference with one another. The AEFS then determines speaker-related information, such as by identifying a current speaker, locating an information item (e.g., an email message, document) associated with the speaker, or the like. The AEFS then informs a user of the speaker-related information, such as by presenting the speaker-related information on a display of a conferencing device associated with the user.06-06-2013
20080201143SYSTEM AND METHOD FOR MULTI-MODAL AUDIO MINING OF TELEPHONE CONVERSATIONS - A system and method for the automated monitoring of inmate telephone calls as well as multi-modal search, retrieval and playback capabilities for said calls. A general term for such capabilities is multi-modal audio mining. The invention is designed to provide an efficient means for organizations such as correctional facilities to identify and monitor the contents of telephone conversations and to provide evidence of possible inappropriate conduct and/or criminal activity of inmates by analyzing monitored telephone conversations for events, including, but not limited to, the addition of third parties, the discussion of particular topics, and the mention of certain entities.08-21-2008
20080201142METHOD AND APPARATUS FOR AUTOMICATION CREATION OF AN INTERACTIVE LOG BASED ON REAL-TIME CONTENT08-21-2008
20110213613Automatic Language Model Update - A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary.09-01-2011
20130151250HYBRID SPEECH RECOGNITION - Described is a technology by which speech is locally and remotely recognized in a hybrid way. Speech is input and recognized locally, with remote recognition invoked if locally recognized speech data was not confidently recognized. The part of the speech that was not confidently recognized is sent to the remote recognizer, along with any confidently recognized text, which the remote recognizer may use as context data in interpreting the part of the speech data that was sent. Alternative text candidates may be sent instead of corresponding speech to the remote recognizer.06-13-2013
20100286982System and Method for Automatic Merging of Multiple Time-Stamped Transcriptions - A system for automatically merging multiple time-stamped transcriptions is provided. The system includes a transcription server for receiving a signal having time-stamp information, a splitter, a merging utility, and a text output. A method for automatic merging of multiple time-stamped transcriptions comprises the following steps: transferring a signal having timestamp information encoded therein to a splitter which yields a mixed audio output having resultant corresponding audio channels, transferring the mixed audio output to a transcriber server which thereby yields one or more text outputs, and the text outputs being merged by a merging utility with the timestamps included in the signal thereby providing a single text file.11-11-2010
20110257973VEHICLE USER INTERFACE SYSTEMS AND METHODS - A control system for mounting in a vehicle and for providing information to a portable electronic device for processing by the portable electronic device is shown and described. The control system includes a first interface for communicating with the portable electronic device and a memory device. The control system also includes a processing circuit communicably coupled to the first interface and the memory device, the processing circuit configured to extract information from the memory device and to provide the information to the first interface so that the first interface communicates the information to the portable electronic device. The processing circuit is further configured to determine the capabilities of the portable electronic device based on data received from the portable electronic device via the first interface and to determine whether or not to communicate the information to the portable electronic device based on the determined capabilities.10-20-2011
20130151251AUTOMATIC DIALOG REPLACEMENT BY REAL-TIME ANALYTIC PROCESSING - An automated method and apparatus for automatic dialog replacement having an optional I/O interface converts an A/V stream into a format suitable for automated processing. The I/O interface feeds the A/V stream to a dubbing engine for generating new dubbed dialog from said A/V stream. A dubber/slicer replaces the original dialog with the new dubbed dialog in the A/V stream. The I/O interface then transmits the A/V stream that is enhanced with a new dubbed dialog.06-13-2013
20120259635Document Certification and Security System - A system for the storing of client information in an independent repository is disclosed. Client data may be uploaded by client or those authorized by client or collected and stored by the repository. Data about the client file such as, for example, the time of upload and modifications are stored in a metadata file associated with the client file.10-11-2012
20120259634MUSIC PLAYBACK DEVICE, MUSIC PLAYBACK METHOD, PROGRAM, AND DATA CREATION DEVICE - There is provided a music playback device comprising a playback unit configured to playback music, an analysis unit configured to analyze lyrics of the music and extract a word or a phrase included in the lyrics, an acquisition unit configured to acquire an image using the word or the phrase extracted by the analysis unit, and a display control unit configured to, during playback of the music, cause a display device to display the image acquired by the acquisition unit.10-11-2012
20130204618Methods and Systems for Dictation and Transcription - Automated delivery and filing of transcribed material prepared from dictated audio files into a central record-keeping system are presented. A user dictates information from any location, uploads that audio file to a transcriptionist to be transcribed, and the transcribed material is automatically delivered into a central record keeping system, filed with the appropriate client or matter file, and the data stored in the designated appropriate fields within those client or matter files. Also described is the recordation of meetings from multiple sources using mobile devices and the detection of the active or most prominent speaker at given intervals in the meeting. Further, text boxes on websites are completed using an audio recording application and offsite transcription.08-08-2013
20130204619SYSTEMS AND METHODS FOR VOICE-GUIDED OPERATIONS - A method includes transforming textual material data into a multimodal data structure including a plurality of classes selected from the group consisting of output, procedural information, and contextual information to produce transformed textual data, storing the transformed textual data on a memory device, retrieving, in response to a user request via a multimodal interface, requested transformed textual data and presenting the retrieved transformed textual data to the user via the multimodal interface.08-08-2013
20120284024Text Interface Device and Method in Voice Communication - A computerized communication device has a display screen, a mechanism for a user to select words or phrases displayed on the display screen, and software executing from a non-transitory physical medium, the software providing a function for providing audio signal output in a connected voice-telephone call from the text words or phrases selected by a user.11-08-2012
20130158993Audio User Interface With Audio Cursor - An audio user interface is provided in which items are represented in an audio field by corresponding synthesized sound sources from where sounds related to the items appear to emanate. An audio cursor, in the form of a synthesised sound source from which a distinctive cursor sound emanates, is movable in the audio field under user control. Upon the cursor being moved close to an item-representing sound source, a related audible indication is generated by modifying the sounds emanating from at least one of that item-representing sound source and the cursor. In one embodiment, this audible indication also indicates the current distance between the cursor and item-representing sound source and also the direction of the latter from the cursor.06-20-2013
20130158995METHODS AND APPARATUSES RELATED TO TEXT CAPTION ERROR CORRECTION - Systems and methods related to providing error correction in a text caption are disclosed. A method may comprise displaying a text caption including one or more blocks of text on each of a first device and a second device remote from the first device. The method may also include generating another block of text and replacing a block of text of the text caption with the another block of text. Furthermore, the method may include displaying the text caption on the second device having the block of text of the first text caption replaced by the another block of text.06-20-2013
20130158991METHODS AND SYSTEMS FOR COMMUNICATING AUDIO CAPTURED ONBOARD AN AIRCRAFT - Methods and systems are provided for communicating information from an aircraft to a computer system at a ground location. One exemplary method involves obtaining an audio input from an audio input device onboard the aircraft, generating text data comprising a textual representation of the one or more words of the audio input, and communicating the text data to the computer system at the ground location.06-20-2013
20130158992SPEECH PROCESSING SYSTEM AND METHOD - An exemplary speech processing method includes extracting voice features from the stored audio files. Next, the method extracts speech(s) of a speaker from one or more audio files that contains voice feature matching one selected voice model, to form a single audio file, implements a speech-to-text algorithm to create a textual file based on the single audio file, and further records time point(s). The method then associates each of the words in the converted text with corresponding recorded time points recorded. Next, the method searches for an input keyword in the converted textual file. The method further obtains a time point associated with a word first appearing in the textual file that matches the keyword, and further controls an audio play device to play the single audio file at the determined time point.06-20-2013
20130158994RETRIEVAL AND PRESENTATION OF NETWORK SERVICE RESULTS FOR MOBILE DEVICE USING A MULTIMODAL BROWSER - A method of obtaining information using a mobile device can include receiving a request including speech data from the mobile device, and querying a network service using query information extracted from the speech data, whereby search results are received from the network service. The search results can be formatted for presentation on a display of the mobile device. The search results further can be sent, along with a voice grammar generated from the search results, to the mobile device. The mobile device then can render the search results.06-20-2013
20130185069AMUSEMENT SYSTEM - A technique for allowing a virtual experience of more realistic live performance. A main apparatus reproduces music data and audience video data recording a video image of audience. A user holds a microphone and makes a live performance for the audience displayed on a monitor. The microphone sends voice data and motion information of the microphone to the main apparatus. The main apparatus determines that the user makes a live performance when the user calls on the audience with a specific phrase and performs an action corresponding to the specific phrase. The main apparatus reproduces reaction data recording a video image and sound indicating a reaction of the audience to the live performance.07-18-2013
20110320199METHOD AND APPARATUS FOR FUSING VOICED PHONEME UNITS IN TEXT-TO-SPEECH - According to one embodiment, an apparatus for fusing voiced phoneme units in Text-To-Speech, includes a reference unit selection module configured to select a reference unit from the plurality of units based on pitch cycle information of the each unit and the number of pitch cycles of the target segment. The apparatus includes a template creation module configured to create a template based on the reference unit selected by the reference unit selection module and the number of pitch cycles of the target segment, wherein the number of pitch cycles of the template is same with that of pitch cycles of the target segment. The apparatus includes a pitch cycle alignment module configured to align pitch cycles of each unit of the plurality of units except the reference unit with pitch cycles of the template by using a dynamic programming algorithm.12-29-2011
20120290298SYSTEM AND METHOD FOR OPTIMIZING SPEECH RECOGNITION AND NATURAL LANGUAGE PARAMETERS WITH USER FEEDBACK - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for assigning saliency weights to words of an ASR model. The saliency values assigned to words within an ASR model are based on human perception judgments of previous transcripts. These saliency values are applied as weights to modify an ASR model such that the results of the weighted ASR model in converting a spoken document to a transcript provide a more accurate and useful transcription to the user.11-15-2012
20120004911Method and Apparatus for Identifying Video Program Material or Content via Nonlinear Transformations - A system for identification of video content in a video signal is provided via a sound track audio signal. The audio signal is processed with filtering and non linear transformations to extract voice signals from the sound track channel. The extracted voice signals are coupled to a speech recognition system to provide in text form, the words of the video content, which is later compared with a reference library of words or dialog from known video programs or movies. Other attributes of the video signal or transport stream may be combined with closed caption data or closed caption text for identification purposes. Example attributes include DVS/SAP information, time code information, histograms, and or rendered video or pictures.01-05-2012
20120029918SYSTEMS AND METHODS FOR RECORDING, SEARCHING, AND SHARING SPOKEN CONTENT IN MEDIA FILES - Systems for recording, searching for, and sharing media files among a plurality of users are disclosed. The systems include a server that is configured to receive, index, and store a plurality of media files, which are received by the server from a plurality of sources, within at least one database in communication with the server. In addition, the server is configured to make one or more of the media files accessible to one or more persons—other than the original sources of such media files. Still further, the server is configured to transcribe the media files into text; receive and publish comments associated with the media files within a graphical user interface of a website; and allow users to query and playback excerpted portions of such media files.02-02-2012
20120029917APPARATUS AND METHOD FOR PROVIDING MESSAGES IN A SOCIAL NETWORK - A system that incorporates teachings of the present disclosure may include, for example, a server including a controller to receive audio signals and content identification information from a media processor, generate text representing a voice message based on the audio signals, determine an identity of media content based on the content identification information, generate an enhanced message having text and additional content where the additional content is obtained by the controller based on the identity of the media content, and transmit the enhanced message to the media processor for presentation on the display device, where the enhanced message is accessible by one or more communication devices that are associated with a social network and remote from the media processor. Other embodiments are disclosed.02-02-2012
20130197908Speech Processing in Telecommunication Networks - Systems and methods for speech processing in telecommunication networks are described. In some embodiments, a method may include receiving speech transmitted over a network, causing the speech to be converted to text, and identifying the speech as predetermined speech in response to the text matching a stored text associated with the predetermined speech. The stored text may have been obtained, for example, by subjecting the predetermined speech to a network impairment condition. The method may further include identifying terms within the text that match terms within the stored text (e.g., despite not being identical to each other), calculating a score between the text and the stored text, and determining that the text matches the stored text in response to the score meeting a threshold value. In some cases, the method may also identify one of a plurality of speeches based on a selected one of a plurality of stored texts.08-01-2013
20130197909METHODS AND SYSTEMS FOR CORRECTING TRANSCRIBED AUDIO FILES - Methods and systems for correcting transcribed text. One method includes receiving audio data from one or more audio data sources and transcribing the audio data based on a voice model to generate text data. The method also includes making the text data available to a plurality of users over at least one computer network and receiving corrected text data over the at least one computer network from the plurality of users. In addition, the method can include modifying the voice model based on the corrected text data.08-01-2013
20130197910System and Method for Audible Text Center Subsystem - A system, method, and computer-readable storage device for sending a spoken message as a text message. The method includes initiating a connection with a first subscriber, receiving from the first subscriber a spoken message and spoken disambiguating information associated with at least one recipient address. The method further includes converting the spoken message to text via an audible text center subsystem (ATCS), and delivering the text to the recipient address. The method can also include verifying a subscription status of the first subscriber, or delivering the text to the recipient address based on retrieved preferences of the first subscriber. The preferences can be retrieved from a consolidated network repository or embedded within the spoken message. Text and the spoken message can be delivered to the same or different recipient addresses. The method can include updating recipient addresses based on a received oral command from the first subscriber.08-01-2013
20130197911Method and System For Endpoint Automatic Detection of Audio Record - A method and system for endpoint automatic detection of audio record is provided. The method comprises the following steps: acquiring a audio record text and affirming the text endpoint acoustic model for the audio record text; starting acquiring the audio record data of each frame in turn from the audio record start frame in the audio record data; affirming the characteristics acoustic model of the decoding optimal path for the acquired current frame of the audio record data; comparing the characteristics acoustic model of the decoding optimal path acquired from the current frame of the audio record data with the endpoint acoustic model to determine if they are the same; if yes, updating a mute duration threshold with a second time threshold, wherein the second time threshold is less than a first time threshold. This method can improve the recognizing efficiency of the audio record endpoint.08-01-2013
20120035924DISAMBIGUATING INPUT BASED ON CONTEXT - In one implementation, a computer-implemented method includes receiving, at a mobile computing device, ambiguous user input that indicates more than one of a plurality of commands; and determining a current context associated with the mobile computing device that indicates where the mobile computing device is currently located. The method can further include disambiguating the ambiguous user input by selecting a command from the plurality of commands based on the current context associated with the mobile computing device; and causing output associated with performance of the selected command to be provided by the mobile computing device.02-09-2012

Patent applications in class Speech to image