Nuance Communications, Inc. Patent applications |
Patent application number | Title | Published |
20160140957 | Speech Recognition Semantic Classification Training - An automated method is described for developing an automated speech input semantic classification system such as a call routing system. A set of semantic classifications is defined for classification of input speech utterances, where each semantic classification represents a specific semantic classification of the speech input. The semantic classification system is trained from training data from training data substantially without manually transcribed in-domain training data, and then operated to assign input speech utterances to the defined semantic classifications. Adaptation training data based on input speech utterances is collected with manually assigned semantic labels from at least one source of already collected language data. When the adaptation training data satisfies a pre-determined adaptation criteria, the semantic classification system is automatically retrained based on the adaptation training data. | 05-19-2016 |
20160098393 | NATURAL LANGUAGE UNDERSTANDING (NLU) PROCESSING BASED ON USER-SPECIFIED INTERESTS - Methods and apparatus for natural language understanding (NLU) processing based on user-specified interests. Information specifying a weight for each of a plurality of domains is received via a user interface. The plurality of domains each relates to a potential area of interest for the user, and the weight for a domain from among the plurality of domains indicates a level of interest for the user in the domain. A ranking classifier used to rank NLU hypotheses generated by an NLU engine is trained using training data from which features are, at least in part, based on the information specifying a weight for each of the plurality of domains. | 04-07-2016 |
20160077792 | METHODS AND APPARATUS FOR UNSUPERVISED WAKEUP - Methods and apparatus for unsupervised wakeup of a device including receiving a first acoustic event at a first time and a second acoustic event at a second time, wherein scores of the first and second acoustic events are above a first threshold identifying the first and second acoustic events as wakeup candidates for a wakeup phrase for an unsupervised wakeup of a device. It is determined that the first acoustic event is above a second threshold, which is higher than the first threshold, and that the second acoustic event is above a third threshold, which is higher than the first threshold. Occurrence of a wakeup event can be determined based upon acoustic similarity of the events. | 03-17-2016 |
20160077574 | Methods and Apparatus for Unsupervised Wakeup with Time-Correlated Acoustic Events - Methods and apparatus for unsupervised wakeup of a device including receiving a first acoustic event at a first time and a second acoustic event at a second time, wherein the first and second acoustic events have scores above a first threshold identifying the first and second acoustic events as wakeup candidates for a wakeup phrase for an unsupervised wakeup of a device. It can be determined that the first acoustic event score is below a second threshold, which is higher than the first threshold and whether a difference between the first and second times is within a range to check for correlation in time between the first and second acoustic events. Occurrence of a wakeup event can be determined based upon the first and second times. | 03-17-2016 |
20160059775 | METHODS AND APPARATUS FOR PROVIDING DIRECTION CUES TO A DRIVER - Methods and apparatus for generating, for a GPS system-directed route, a visual cue signal to activate a visual indicator at a location in a vehicle passenger compartment corresponding to a direction of an upcoming event in the route and generating an audio signal to activate a sound source at a location in the vehicle passenger compartment corresponding to the direction of the upcoming event for providing spatial information to a user. In one embodiment, a validation signal can be generated to activate a confirmation indicator upon receiving information that the user has navigated the event in accordance with the route. | 03-03-2016 |
20160035370 | Formant Dependent Speech Signal Enhancement - An arrangement is described for speech signal processing. An input microphone signal is received that includes a speech signal component and a noise component. The microphone signal is transformed into a frequency domain set of short-term spectra signals. Then speech formant components within the spectra signals are estimated based on detecting regions of high energy density in the spectra signals. One or more dynamically adjusted gain factors are applied to the spectra signals to enhance the speech formant components. | 02-04-2016 |
20160035348 | Speech-Based Search Using Descriptive Features of Surrounding Objects - A natural language query arrangement is described for a mobile environment. An automatic speech recognition (ASR) engine can process an unknown speech input from a user to produce corresponding recognition text. A natural language understanding module can extract natural language concept information classifier uses the from the recognition text. A query recognition text and the natural language concept information to assign to the speech input a query intent related to one or more objects in the mobile environment. An environment database contains information descriptive of objects in the mobile environment. A query search engine searches the environment database based on the query intent, the natural language concept information, and the recognition text to determine corresponding search results, which can be to the user. | 02-04-2016 |
20160026608 | Method and Apparatus for Generating Multimodal Dialog Applications by Analyzing Annotated Examples of Human-System Conversations - Designing a dialog application is a difficult task that typically requires a complete understanding of the dialog framework and a high level of expertise to map system requirements to the actual implementations. In contrast, determining the logic of the dialog application via sample interaction is typically very simple and efficient. A developer can describe via speech or text what the operations of the application are, effectively writing dialog samples. Methods described herein reverse the way dialog applications are designed by obtaining annotated dialog samples and defined concepts related to a requested dialog application; analyzing the annotated dialog samples, defined concepts, and one or more relationships between or among the defined concepts; and generating an executable dialog application based on the analysis of the annotated dialog samples and the defined concepts. | 01-28-2016 |
20160019907 | System For Automatic Speech Recognition And Audio Entertainment - In one aspect, the present application is directed to a device for providing different levels of sound quality in an audio entertainment system. The device includes a speech enhancement system with a reference signal modification unit and a plurality of acoustic echo cancellation filters. Each acoustic echo cancellation filter is coupled to a playback channel. The device includes an audio playback system with loudspeakers. Each loudspeaker is coupled to a playback channel. At least one of the speech enhancement system and the audio playback system operates according to a full sound quality mode and a reduced sound quality mode. In the full sound quality mode, all of the playback channels contain non-zero output signals. In the reduced sound quality mode, a first subset of the playback channels contains non-zero output signals and a second subset of the playback channels contains zero output signals. | 01-21-2016 |
20160005086 | METHOD OF COMPENSATING A PROVIDER FOR ADVERTISEMENTS DISPLAYED ON A MOBILE PHONE - A method and apparatus for advertising on a mobile phone. In one embodiment the method includes the steps of downloading an advertisement to the mobile phone using an advertisement server; selecting the downloaded advertisement on the mobile phone by a user of the mobile phone; providing by a server additional information in response to the user selection; and tracking the selection and additional information by the server. In another embodiment the compensation provided is in response to the display screen of said advertisement. In another embodiment the step of providing additional information includes the step of using space reserved, in the user interface of the mobile phone, for advertisements. Another aspect the invention relates to a system for displaying advertisements on a mobile phone. In one embodiment the system includes a server; and a mobile phone in communication with said server. | 01-07-2016 |
20160005076 | METHOD OF COMPENSATING A PROVIDER FOR ADVERTISEMENTS DISPLAYED ON A MOBILE PHONE - A method and apparatus for advertising on a mobile phone. In one embodiment the method includes the steps of downloading an advertisement to the mobile phone using an advertisement server; selecting the downloaded advertisement on the mobile phone by a user of the mobile phone; providing by a server additional information in response to the user selection; and tracking the selection and additional information by the server. In another embodiment the compensation provided is in response to the display screen of said advertisement. In another embodiment the step of providing additional information includes the step of using space reserved, in the user interface of the mobile phone, for advertisements. Another aspect the invention relates to a system for displaying advertisements on a mobile phone. In one embodiment the system includes a server; and a mobile phone in communication with said server. | 01-07-2016 |
20150356647 | USER AND ENGINE CODE HANDLING IN MEDICAL CODING SYSTEM - Techniques for medical coding include applying a natural language understanding (NLU) engine to a free-form text documenting a clinical patient encounter, to derive a first set of one or more medical billing codes for the clinical patient encounter and a link between each code in the first set and a corresponding portion of the free-form text. The first set of codes may be compared to a second set of one or more medical billing codes approved by one or more human users for the patient encounter, to identify at least one code in the first set that overlaps with at least one code in the second set. The code in the second set approved by the one or more human users may be retained instead of the overlapping code in the first set derived by the NLU engine. | 12-10-2015 |
20150356646 | MEDICAL CODING SYSTEM WITH INTEGRATED CODEBOOK INTERFACE - Techniques for use in medical coding include applying a natural language understanding engine to a free-form text documenting at least one clinical patient encounter to generate a set of one or more medical billing codes for the patient encounter. A user interface may be provided, configured to allow one or more human users to review and correct the generated set of medical billing codes. Within the user interface, in response to user selection of a first medical billing code of the generated set of medical billing codes, at least a portion of a government-authorized codebook for the first medical billing code may be displayed, and a position of the first medical billing code may be indicated in the displayed portion of the codebook. | 12-10-2015 |
20150356260 | NLU TRAINING WITH USER CORRECTIONS TO ENGINE ANNOTATIONS - Techniques for training a natural language understanding (NLU) engine may include generating a first annotation of free-form text documenting a healthcare patient encounter and a link between the first annotation and a corresponding portion of the text, using the NLU engine. A second annotation of the text and a link between the second annotation and a corresponding portion of the text may be received from a human user. The first annotation and its corresponding link may be merged with the second annotation and its corresponding link. Training data may be provided to the engine in the form of the text and the merged annotations and links. | 12-10-2015 |
20150356246 | MEDICAL CODING SYSTEM WITH CDI CLARIFICATION REQUEST NOTIFICATION - Techniques are provided whereby a clarification request may be generated with a clinical documentation improvement (CDI) system for resolution by a clinician, and notification of the clarification request may be transmitted to a medical coding system. At a medical coding system, notification may be received of a clarification request generated at a CDI system for resolution by a clinician. In some embodiments, the medical coding system may be a computer-assisted coding (CAC) system. | 12-10-2015 |
20150356198 | RICH FORMATTING OF ANNOTATED CLINICAL DOCUMENTATION, AND RELATED METHODS AND APPARATUS - Systems and methods for producing and presenting annotations of clinical documents in a rich format are described, for instance for use with medical billing procedures. An initial XHTML document documenting a medical patient encounter and having rich formatting is used to generate a plain text document. A clinical language understanding system generates annotations, such as medical codes, which are used to annotate the XHTML document. The annotated XHTML document is then presented to a user, thus displaying for the user the annotations while retaining the rich formatting of the initial XHTML document. | 12-10-2015 |
20150356057 | NLU TRAINING WITH MERGED ENGINE AND USER ANNOTATIONS - Techniques for training a natural language understanding (NLU) engine may include generating a first annotation of free-form text documenting a healthcare patient encounter and a link between the first annotation and a corresponding portion of the text, using the NLU engine. A second annotation of the text and a link between the second annotation and a corresponding portion of the text may be received from a human user. The first annotation and its corresponding link may be merged with the second annotation and its corresponding link. Training data may be provided to the engine in the form of the text and the merged annotations and links. | 12-10-2015 |
20150347067 | VOICE AND TOUCH BASED MOBILE PRINT AND SCAN FRAMEWORK - Initiating document management in an enterprise network from outside of the network can challenge information technology (IT) specialists because many solutions require altering security of the enterprise network. In an embodiment, a method includes polling, from an automated agent in an agent-side network, a server in a cloud-side network for a request to access a document management resource of the agent-side network via an interface between the agent-side network and cloud-side network The method further includes, responsive to the polling, downloading the request via the interface between the agent-side network and the cloud-side network. The method additionally includes issuing the request to the document management resource to cause the document management resource to access a document stored on a device of the agent-side network and perform an operation associated with the request. The method, therefore, enables a user access to the document management resource from outside of an enterprise network. | 12-03-2015 |
20150340042 | METHODS AND APPARATUS FOR DETECTING A VOICE COMMAND - According to some aspects, a method of monitoring an acoustic environment of a mobile device, at least one computer readable medium encoded with instructions that, when executed, perform such a method and/or a mobile device configured to perform such a method is provided. The method comprises receiving, by the mobile device, acoustic input from the environment of the mobile device, detecting whether the acoustic input includes a voice command from a user without requiring receipt of an explicit trigger from the user, and initiating responding to the detected voice command. | 11-26-2015 |
20150332673 | REVISING LANGUAGE MODEL SCORES BASED ON SEMANTIC CLASS HYPOTHESES - Techniques for improved speech recognition disclosed herein include applying a statistical language model to a free-text input utterance to obtain a plurality of candidate word sequences for automatic speech recognition of the input utterance, each of the plurality of candidate word sequences having a corresponding initial score generated by the statistical language model. For one or more of the plurality of candidate word sequences, each of the one or more candidate word sequences may be analyzed to generate one or more hypotheses for a semantic class of at least one token in the respective candidate word sequence. The initial scores generated by the statistical language model for at least the one or more candidate word sequences may be revised based at least in part on the one or more hypotheses for the semantic class of the at least one token in each of the one or more candidate word sequences. | 11-19-2015 |
20150310863 | METHOD AND APPARATUS FOR SPEAKER DIARIZATION - A method and apparatus records at a first mobile device, separately, each of an upstream component and a downstream component of a speech data associated with users of the first mobile device and a second mobile device in a full-duplex communication system. Speech endpointing is performing on each recorded component to delimit speech chunks in each component using timing information common to both components. The speech chunks are converted to text chunks using at least one automatic speech recognition process and the text chunks are displayed, based on the timing information, in chronological order on a graphical user interface of the first mobile device as diarized text. | 10-29-2015 |
20150309984 | LEARNING LANGUAGE MODELS FROM SCRATCH BASED ON CROWD-SOURCED USER TEXT INPUT - Technology is described for developing a language model for a language recognition system from scratch based on aggregating and analyzing text input from multiple users of the language. The technology allows a user to select a language, and if no existing language model is available for the selected language, provides a new language model for the selected language, monitors and collects information about the use of words in the selected language, combines information collected from multiple users of the selected language, and updates the user's language model based on the combined information from multiple users of the selected language. | 10-29-2015 |
20150304502 | SYSTEM AND METHOD FOR AUDIO CONFERENCING - The present disclosure is directed towards an audio conferencing method. Some embodiments may include receiving, at a first mixing device, a first audio stream from one or more participant conferencing devices. The method may further include generating a top-N voice stream at the first mixing device, wherein the top-N voice stream corresponds with at least one top-N talker and wherein the identification of the at least one top-N talker is based upon, at least in part, an activity ranking. The method may also include receiving the top-N voice stream at a centralized mixing device and generating at least one mixed audio stream at the centralized mixing device. | 10-22-2015 |
20150302865 | SYSTEM AND METHOD FOR AUDIO CONFERENCING - The present disclosure is directed towards an audio conferencing method. Some embodiments may include receiving, at a first mixing device, an audio signal from a first user associated with an audio conference. Embodiments may further include processing the audio signal at the first mixing device to generate a processed audio signal and transmitting the processed audio signal to a second mixing device, wherein the first mixing device and the second mixing device are distributed over a network in a cascaded configuration. Embodiments may also include receiving, at the second mixing device, a third audio signal from a second user associated with the audio conference and processing the third audio signal at the second mixing device to generate a second processed audio signal. | 10-22-2015 |
20150295946 | SYSTEM AND METHOD FOR HANDLING ROGUE DATA PACKETS - The present disclosure is directed towards a system and method for handling rogue data packets. The method may include receiving, using one or more processors, a first data packet having header information associated therewith. The method may further include obtaining, from the header information, sequence number, timestamp and synchronization source identifier information. The method may also include detecting one or more rogue data packets, based upon, at least in part, at least one of the sequence number, timestamp and synchronization source identifier information. | 10-15-2015 |
20150287401 | PRIVACY-SENSITIVE SPEECH MODEL CREATION VIA AGGREGATION OF MULTIPLE USER MODELS - Techniques disclosed herein include systems and methods for privacy-sensitive training data collection for updating acoustic models of speech recognition systems. In one embodiment, the system locally creates adaptation data from raw audio data. Such adaptation can include derived statistics and/or acoustic model update parameters. The derived statistics and/or updated acoustic model data can then be sent to a speech recognition server or third-party entity. Since the audio data and transcriptions are already processed, the statistics or acoustic model data is devoid of any information that could be human-readable or machine readable such as to enable reconstruction of audio data. Thus, such converted data sent to a server does not include personal or confidential information. Third-party servers can then continually update speech models without storing personal and confidential utterances of users. | 10-08-2015 |
20150281456 | METHOD OF PROVIDING VOICEMAILS TO A WIRELESS INFORMATION DEVICE - Voicemail is received at a voicemail server and converted to an audio file format; it is then sent or streamed over a wide area network to a voice to text transcription system comprising a network of computers. One of the networked computers plays back the voice message to an operator and the operator intelligently transcribes the actual message from the original voice message by entering the corresponding text message (actually a succinct version of the original voice message, not a verbose word-for-word conversion) into the computer to generate a transcribed text message. The transcribed text message is then sent to the wireless information device from the computer. | 10-01-2015 |
20150279352 | HYBRID CONTROLLER FOR ASR - A mobile device is described which is adapted for automatic speech recognition (ASR). A speech input receives an unknown speech input signal from a user. A local controller determines if a remote ASR processing condition is met, transforms the speech input signal into a selected one of multiple different speech representation types, and sends the transformed speech input signal to a remote server for remote ASR processing. A local ASR arrangement performs local ASR processing of the speech input including processing any speech recognition results received from the remote server. | 10-01-2015 |
20150277752 | PROVIDING FOR TEXT ENTRY BY A USER OF A COMPUTING DEVICE - A method and system for receiving text input via a computing device generates a graphical text element interface showing text elements arranged to provide for efficient selection by a user. Text elements show a single character, a group of characters, words, or phrases. And by selecting a text element, the user submits text in the computing device. The system may identify text elements to display based at least in part on a previous selection of a text element by a user. | 10-01-2015 |
20150262580 | Text Formatter with Intuitive Customization - A computer implemented method and system of formatting text output from a speech recognition system is provided. The method includes determining if a user correction to a text output from a speech recognition system can be accomplished by changing a formatting setting associated with the speech recognition system. The formatting setting is changed based on an inferential indication that the change to the formatting setting is acceptable to the user and/or an explicit confirmation from the user that the change to the formatting setting is acceptable. | 09-17-2015 |
20150262575 | META-DATA INPUTS TO FRONT END PROCESSING FOR AUTOMATIC SPEECH RECOGNITION - A computer-implemented method is described for front end speech processing for automatic speech recognition. A sequence of speech features which characterize an unknown speech input provided on an audio input channel and associated meta-data data which characterize the audio input channel are received. The speech features are transformed with a computer process that uses a trained mapping function controlled by the meta-data, and automatic speech recognition is performed of the transformed speech features. | 09-17-2015 |
20150248883 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, the recognition results produced by a speech processing system (which may include a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for the domain. In some embodiments, one or more of the recognition results may be evaluated to determine whether the result(s) include one or more words or phrases that, when included in a result, would change a meaning of the result in a manner that would be significant for the domain. | 09-03-2015 |
20150248882 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential errors. In some embodiments, the indications of potential errors may include discrepancies between recognition results that are meaningful for a domain, such as medically-meaningful discrepancies. The evaluation of the recognition results may be carried out using any suitable criteria, including one or more criteria that differ from criteria used by an ASR system in determining the top recognition result and the alternative recognition results from the speech input. In some embodiments, a recognition result may additionally or alternatively be processed to determine whether the recognition result includes a word or phrase that is unlikely to appear in a domain to which speech input relates. | 09-03-2015 |
20150242387 | AUTOMATED TEXT ANNOTATION FOR CONSTRUCTION OF NATURAL LANGUAGE UNDERSTANDING GRAMMARS - Aspects described herein provide various approaches to annotating text samples in order to construct natural language grammars. A text sample may be selected for annotation. A set of annotation candidates may be generated based on the text sample. A classifier may be used to score the set of annotation candidates in order to obtain a set of annotation scores. One of the annotation candidates may be selected as a suggested annotation for the text sample based on the set of annotation scores. A grammar rule may be derived based on the suggested annotation, and a grammar may be configured to include the annotation-derived grammar rule. | 08-27-2015 |
20150220853 | TECHNIQUES FOR EVALUATION, BUILDING AND/OR RETRAINING OF A CLASSIFICATION MODEL - Techniques for evaluation and/or retraining of a classification model built using labeled training data. In some aspects, a classification model having a first set of weights is retrained by using unlabeled input to reweight the labeled training data to have a second set of weights, and by retraining the classification model using the labeled training data weighted according to the second set of weights. In some aspects, a classification model is evaluated by building a similarity model that represents similarities between unlabeled input and the labeled training data and using the similarity model to evaluate the labeled training data to identify a subset of the plurality of items of labeled training data that is more similar to the unlabeled input than a remainder of the labeled training data. | 08-06-2015 |
20150207931 | Automated Task Definitions - Task analysis of an interactive communication system (ICS) can be performed manually. Manual task analysis is costly and time-consuming process. In an embodiment, a method of defining tasks within an ICS includes identifying hub node(s) to be marked as unavailable from consideration as nodes within a task. The at least one hub node can be within a directed graph representing flows through an ICS. The method further includes, from available nodes, automatically identifying a connected subgraph that corresponds to nodes representing an area of functionality defining a task within the ICS. The method additionally includes repeating the identifying of the connected subgraph at least one time. The method also includes outputting an indicator of the at least one hub node identified and the connected subgraphs that represent corresponding areas of functionality defining respective tasks. Therefore, task analysis is improved by extracting task definitions from the graph data automatically. | 07-23-2015 |
20150206536 | DIFFERENTIAL DYNAMIC CONTENT DELIVERY WITH TEXT DISPLAY - Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users. | 07-23-2015 |
20150206527 | FEATURE NORMALIZATION INPUTS TO FRONT END PROCESSING FOR AUTOMATIC SPEECH RECOGNITION - A computer-implemented method is described for front end speech processing for automatic speech recognition. A sequence of speech features which characterize an unknown speech input is received with a computer process. A first subset of the speech features is normalized with a computer process using a first feature normalizing function. A second subset of the speech features is normalized with a computer process using a second feature normalizing function different from the first feature normalizing function. The normalized speech features in the first and second subsets are combined with a computer process to produce a sequence of mixed normalized speech features for automatic speech recognition. | 07-23-2015 |
20150194148 | Methodology for Enhanced Voice Search Experience - Arrangements are described for reducing response latency in intelligent personal assistant applications. While receiving a user request, preemptive responses are automatically prepared for a received portion of the user request. Partial classification word candidates are generated for words in the received portion of the user request, and then a predictive component is applied to generate extended classification word candidates that include the partial classification word candidates and additional classification word candidates. A preliminary search is performed of the extended classification word candidates to prepare the preemptive responses. While the input request continues, the preemptive responses are updated, and when the input request ends, the prepared preemptive responses are used to respond to the user request. | 07-09-2015 |
20150180809 | SELECTION OF A LINK IN A RECEIVED MESSAGE FOR SPEAKING REPLY, WHICH IS CONVERTED INTO TEXT FORM FOR DELIVERY - A link, called an X-Link™ and is placed in a message (SMS, MMS, email etc.) that is sent to a user and displayed on their device (e.g. mobile telephone). When the link is selected by the user, it connects the user's device to a conversion system, enabling the user to speak a reply which is then converted to a text based reply message; the reply message is then sent to the original message sender (and/or another appropriate recipient). This approach enables a text message to be responded to by voice: it is an example of an asymmetric communication. There are many circumstances where this approach is very useful—for example if the message is a SMS and the recipient does not know how to respond using SMS, or is in an environment where it is difficult (perhaps when walking or driving). | 06-25-2015 |
20150179165 | SYSTEM AND METHOD FOR CALLER INTENT LABELING OF THE CALL-CENTER CONVERSATIONS - Labeling a call, for instance by identifying an intent (i.e., the reason why the caller has called into the call center), of a caller in a conversation between a caller and an agent is a useful task for efficient customer relationship management (CRM). In an embodiment, a method of labeling sentences for presentation to a human can include selecting an intent bearing excerpt from sentences, presenting the intent bearing excerpt to the human, and enabling the human to apply a label to each sentence based on the presentation of the intent bearing excerpt. The method can reduce a manual labeling budget while increasing the accuracy of labeling models based on manual labeling. | 06-25-2015 |
20150179164 | PATTERN PROCESSING SYSTEM SPECIFIC TO A USER GROUP - Methods and apparatus for identifying a user group in connection with user group-based speech recognition. An exemplary method comprises receiving, from a user, a user group identifier that identifies a user group to which the user was previously assigned based on training data. The user group comprises a plurality of individuals including the user. The method further comprises using the user group identifier, identifying a pattern processing data set corresponding to the user group, and receiving speech input from the user to be recognized using the pattern processing data set. | 06-25-2015 |
20150161995 | LEARNING FRONT-END SPEECH RECOGNITION PARAMETERS WITHIN NEURAL NETWORK TRAINING - Techniques for learning front-end speech recognition parameters as part of training a neural network classifier include obtaining an input speech signal, and applying front-end speech recognition parameters to extract features from the input speech signal. The extracted features may be fed through a neural network to obtain an output classification for the input speech signal, and an error measure may be computed for the output classification through comparison of the output classification with a known target classification. Back propagation may be applied to adjust one or more of the front-end parameters as one or more layers of the neural network, based on the error measure. | 06-11-2015 |
20150161994 | Method and Apparatus for Speech Recognition Using Neural Networks with Speaker Adaptation - In a speech recognition system, deep neural networks (DNNs) are employed in phoneme recognition. While DNNs typically provide better phoneme recognition performance than other techniques, such as Gaussian mixture models (GMM), adapting a DNN to a particular speaker is a real challenge. According to at least one example embodiment, speech data and corresponding speaker data are both applied as input to a DNN. In response, the DNN generates a prediction of a phoneme based on the input speech data and the corresponding speaker data. The speaker data may be generated from the corresponding speech data. | 06-11-2015 |
20150156587 | Wind Noise Detection For In-Car Communication Systems With Multiple Acoustic Zones - An in-car communication (ICC) system has multiple acoustic zones having varying acoustic environments. At least one input microphone within at least one acoustic zone develops a corresponding microphone signal from one or more system users. At least one loudspeaker within at least one acoustic zone provides acoustic audio to the system users. A wind noise module makes a determination of when wind noise is present in the microphone signal and modifies the microphone signal based on the determination. | 06-04-2015 |
20150154981 | Voice Activity Detection (VAD) for a Coded Speech Bitstream without Decoding - A system, method and computer program product are described for voice activity detection (VAD) within a digitally encoded bitstream. A parameter extraction module is configured to extract parameters from a sequence of coded frames from a digitally encoded bitstream containing speech. A VAD classifier is configured to operate with input of the digitally encoded bitstream to evaluate each coded frame based on bitstream coding parameter classification features to output a VAD decision indicative of whether or not speech is present in one or more of the coded frames. | 06-04-2015 |
20150149174 | DIFFERENTIAL ACOUSTIC MODEL REPRESENTATION AND LINEAR TRANSFORM-BASED ADAPTATION FOR EFFICIENT USER PROFILE UPDATE TECHNIQUES IN AUTOMATIC SPEECH RECOGNITION - A computer-implemented method is described for speaker adaptation in automatic speech recognition. Speech recognition data from a particular speaker is used for adaptation of an initial speech recognition acoustic model to produce a speaker adapted acoustic model. A speaker dependent differential acoustic model is determined that represents differences between the initial speech recognition acoustic model and the speaker adapted acoustic model. In addition, an approach is also disclosed to estimate speaker-specific feature or model transforms over multiple sessions. This is achieved by updating the previously estimated transform using only adaptation statistics of the current session. | 05-28-2015 |
20150134335 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated using one or more sets of words and/or phrases, such as pairs of words/phrases that may include words/phrases that are acoustically similar to one another and/or that, when included in a result, would change a meaning of the result in a manner that would be significant for a domain. The recognition results may be evaluated using the set(s) of words/phrases to determine, when the top result includes a word/phrase from a set of words/phrases, whether any of the alternative recognition results includes any of the other, corresponding words/phrases from the set. | 05-14-2015 |
20150127374 | METHOD TO ASSIGN WORD CLASS INFORMATION - An assignment device ( | 05-07-2015 |
20150127351 | Noise Dependent Signal Processing For In-Car Communication Systems With Multiple Acoustic Zones - A speech communication system includes a speech service compartment for holding one or more system users. The speech service compartment includes a plurality of acoustic zones having varying acoustic environments. At least one input microphone is located within the speech service compartment, for developing microphone input signals from the one or more system users. At least one loudspeaker is located within the service compartment. An in-car communication (ICC) system receives and processes the microphone input signals, forming loudspeaker output signals that are provided to one or more of the at least one output loudspeakers. The ICC system includes at least one of a speaker dedicated signal processing module and a listener specific signal processing module, that controls the processing of the microphone input signal and/or forming of the loudspeaker output signal based, at least in part, on at least one of an associated acoustic environment(s) and resulting psychoacoustic effect(s). | 05-07-2015 |
20150120305 | SPEECH COMMUNICATION SYSTEM FOR COMBINED VOICE RECOGNITION, HANDS-FREE TELEPHONY AND IN-CAR COMMUNICATION - A multi-mode speech communication system is described that has different operating modes for different speech applications. A speech service compartment contains multiple system users, multiple input microphones that develop microphone input signals from the system users to the system, and multiple output loudspeakers that develop loudspeaker output signals from the system to the system users. A signal processing module is in communication with the speech applications and includes an input processing module and an output processing module. The input processing module processes the microphone input signals to produce a set user input signals for each speech application that are limited to currently active system users for that speech application. The output processing module processes application output communications from the speech applications to produce loudspeaker output signals to the system users, wherein for each different speech application, the loudspeaker output signals are directed only to system users currently active in that speech application. The signal processing module dynamically controls the processing of the microphone input signals and the loudspeaker output signals to respond to changes in currently active system users for each application. | 04-30-2015 |
20150120275 | TRANSLATING BETWEEN SPOKEN AND WRITTEN LANGUAGE - Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance. | 04-30-2015 |
20150112896 | Cross-Channel Content Translation Engine - An embodiment according to the present invention addresses reusability and alignment of content across channels in a multi-channel virtual assistant, by allowing users to define content on one channel and then have the content fully or partially translated for the other channels using a mix of pre-defined static rules, dynamic rules or machine learning. Content translation is provided based on communications channels, and content translation is performed from one to many formats, optionally in real time. Performing content translation using machine learning provides an advantage that as users work, content translation becomes more precise and covers more elements. | 04-23-2015 |
20150106101 | METHOD AND APPARATUS FOR PROVIDING SPEECH OUTPUT FOR SPEECH-ENABLED APPLICATIONS - Techniques for providing speech output for speech-enabled applications. A synthesis system receives from a speech-enabled application a text input including a text transcription of a desired speech output. The synthesis system selects one or more audio recordings corresponding to one or more portions of the text input. In one aspect, the synthesis system selects from audio recordings provided by a developer of the speech-enabled application. In another aspect, the synthesis system selects an audio recording of a speaker speaking a plurality of words. The synthesis system forms a speech output including the one or more selected audio recordings and provides the speech output for the speech-enabled application. | 04-16-2015 |
20150088955 | MOBILE APPLICATION DAILY USER ENGAGEMENT SCORES AND USER PROFILES - A system logs application usage data on a mobile device, processes the data on an analysis system and outputs a current and predicted score to, e.g. third parties. The system logs application-related usage data, which is collected via, e.g. a keyboard application running in the background on the mobile device. The system then evaluates the logged usage data and the events corresponding to a particular application. The events can be analyzed to score the user engagement level with the application, e.g., more events recorded for a given application per day, the more engaged a user is with that application. The engagement level can further be predicated based on historical usage log data from which a score decay model can be generated. | 03-26-2015 |
20150088760 | AUTOMATIC INJECTION OF SECURITY CONFIRMATION - Technology is described to monitor incoming channels or messages for a security confirmation code, capture a received confirmation code, recognize a designated field or other destination opportunity to enter a security confirmation code, and automatically inject the captured code to the recognized destination. Various other aspects of the technology are described herein. | 03-26-2015 |
20150088519 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, a recognition result produced by a speech processing system based on an analysis of a speech input is evaluated for indications of potential errors. In some embodiments, sets of words/phrases that may be acoustically similar or otherwise confusable, the misrecognition of which can be significant in the domain, may be used together with a language model to evaluate a recognition result to determine whether the recognition result includes such an indication. In some embodiments, a word/phrase of a set that appears in the result is iteratively replaced with each of the other words/phrases of the set. The result of the replacement may be evaluated using a language model to determine a likelihood of the newly-created string of words appearing in a language and/or domain. The likelihood may then be evaluated to determine whether the result of the replacement is sufficiently likely for an alert to be triggered. | 03-26-2015 |
20150088517 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for a domain, such as the medical domain. In some embodiments, words and/or phrases that may be confused by an ASR system may be determined and associated in sets of words and/or phrases. Words and/or phrases that may be determined include those that change a meaning of a phrase or sentence when included in the phrase/sentence. | 03-26-2015 |
20150088516 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for a domain, such as the medical domain. In some embodiments, words and/or phrases that may be confused by an ASR system may be determined and associated in sets of words and/or phrases. Words and/or phrases that may be determined include those that change a meaning of a phrase or sentence when included in the phrase/sentence. | 03-26-2015 |
20150088507 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for a domain, such as the medical domain. In some embodiments, words and/or phrases that may be confused by an ASR system may be determined and associated in sets of words and/or phrases. Words and/or phrases that may be determined include those that change a meaning of a phrase or sentence when included in the phrase/sentence. | 03-26-2015 |
20150088500 | WEARABLE COMMUNICATION ENHANCEMENT DEVICE - Embodiments disclosed herein may include a wearable apparatus including a frame having a memory and processor associated therewith. The apparatus may include a camera associated with the frame and in communication with the processor, the camera configured to track an eye of a wearer. The apparatus may also include at least one microphone associated with the frame. The at least one microphone may be configured to receive a directional instruction from the processor. The directional instruction may be based upon an adaptive beamforming analysis performed in response to a detected eye movement from the infrared camera. The apparatus may also include a speaker associated with the frame configured to provide an audio signal received at the at least one microphone to the wearer. | 03-26-2015 |
20150073785 | METHOD FOR VOICEMAIL QUALITY DETECTION - A system and method for speech quality detection is included. The method may include receiving, at a computing device, a first speech signal associated with a particular user. The method may include extracting one or more short-term features from the first speech signal wherein extracting short-term features includes extracting a time frame of between 10-50 ms. The method may also include determining one or more statistics of each of the one or more short-term features from the first speech signal. The method may further include classifying the one or more statistics as belonging to one of a set of quality classes. | 03-12-2015 |
20150073780 | METHOD FOR NON-INTRUSIVE ACOUSTIC PARAMETER ESTIMATION - A system and method for non-intrusive acoustic parameter estimation is included. The method may include receiving, at a computing device, a first speech signal associated with a particular user. The method may include extracting one or more short-term features from the first speech signal. The method may also include determining one or more statistics of each of the one or more short-term features from the first speech signal. The method may further include classifying the one or more statistics as belonging to one or more acoustic parameter classes. | 03-12-2015 |
20150066512 | Method and Apparatus for Detecting Synthesized Speech - Computer systems employing speaker verification as a security approach to prevent un-authorized access by intruders may be tricked by a synthetic speech with voice characteristics similar to those of an authorized user of the computer system. According to at least one example embodiment, a method and corresponding apparatus for detecting a synthetic speech signal include extracting a plurality of speech features from multiple segments of the speech signal; analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior; and determining whether the speech signal is a synthetic speech signal or a natural speech signal based on whether or not a periodic variation behavior of the plurality of speech features is detected. The embodiments of synthetic speech detection result in security enhancement of the computer system employing speaker verification. | 03-05-2015 |
20150066485 | Method and System for Dictionary Noise Removal - A method and system of removing noise from a dictionary using a weighted graph is presented. The method can include mapping, by a noise reducing agent executing on a processor, a plurality of dictionaries to a plurality of vertices of a graphical representation, wherein the plurality of vertices is connected by weighted edges representing noise. The plurality of dictionaries may further comprise a plurality of entries, wherein each entry further comprises a plurality of tokens. The method can include selecting a subset of the weighted edges, constructing an acyclic graphical representation flom the selected subset of weighted edges, and determining an ordering based on the acyclic graphical representation. The selected subset of weighted edges may approximate a solution to the Maximum Acyclic Subgraph problem. The method can include removing noise from the plurality of dictionaries according to the determined ordering. | 03-05-2015 |
20150058018 | MULTIPLE PASS AUTOMATIC SPEECH RECOGNITION METHODS AND APPARATUS - In some aspects, a method of recognizing speech that comprises natural language and at least one word specified in at least one domain-specific vocabulary is provided. The method comprises performing a first speech processing pass comprising identifying, in the speech, a first portion including the natural language and a second portion including the at least one word specified in the at least one domain-specific vocabulary, and recognizing the first portion including the natural language. The method further comprises performing a second speech processing pass comprising recognizing the second portion including the at least one word specified in the at least one domain-specific vocabulary. | 02-26-2015 |
20150051910 | Unsupervised Clustering of Dialogs Extracted from Released Application Logs - A natural language understanding system performs automatic unsupervised clustering of dialog data from a natural language dialog application. A log parser automatically extracts structured dialog data from application logs. A dialog generalizing module generalizes the extracted dialog data to generalization identifier vectors. A data clustering module automatically clusters the dialog data based on the generalization identifier vectors using an unsupervised density-based clustering algorithm without a predefined number of clusters and without a predefined distance threshold in an iterative approach based on a hierarchical ordering of the generalization. | 02-19-2015 |
20150046168 | Method and Apparatus for a Multi I/O Modality Language Independent User-Interaction Platform - Automated user-machine interaction is gaining attraction in many applications and services. However, implementing and offering smart automated user-machine interaction services still present technical challenges. According to at least one example embodiment, a dialogue manager is configured to handle multiple dialogue applications independent of the language, the input modalities, or output modalities used. The dialogue manager employs generic semantic representation of user-input data. At a step of a dialogue, the dialogue manager determines whether the user-input data is indicative of a new request or a refinement request based on the generic semantic representation and at least one of a maintained state of the dialogue, general knowledge data representing one or more concepts, and data representing history of the dialogue. The dialogue manager then responds to determined user-request with multi-facet output data to a client dialogue application indicating action(s) to be performed. | 02-12-2015 |
20150046162 | DEVICE, SYSTEM, AND METHOD OF LIVENESS DETECTION UTILIZING VOICE BIOMETRICS - Device, system, and method of liveness detection using voice biometrics. For example, a method comprises: generating a first matching score based on a comparison between: (a) a voice-print from a first text-dependent audio sample received at an enrollment stage, and (b) a second text-dependent audio sample received at an authentication stage; generating a second matching score based on a text-independent audio sample; and generating a liveness score by taking into account at least the first matching score and the second matching score. | 02-12-2015 |
20150046157 | User Dedicated Automatic Speech Recognition - A multi-mode voice controlled user interface is described. The user interface is adapted to conduct a speech dialog with one or more possible speakers and includes a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering, and a selective listening mode which limits speech inputs to a specific speaker using spatial filtering. The user interface switches listening modes in response to one or more switching cues. | 02-12-2015 |
20150039854 | VECTORIZED LOOKUP OF FLOATING POINT VALUES - Systems and techniques disclosed herein include methods for de-quantization of feature vectors used in automatic speech recognition. A SIMD vector processor is used in one embodiment for efficient vectorized lookup of floating point values in conjunction with fMPE processing for increasing the discriminative power of input signals. These techniques exploit parallelism to effectively reduce the latency of speech recognition in a system operating in a high dimensional feature space. In one embodiment, a bytewise integer lookup operation effectively performs a floating point or a multiple byte lookup. | 02-05-2015 |
20150032449 | Method and Apparatus for Using Convolutional Neural Networks in Speech Recognition - Speech recognition techniques are employed in a variety of applications and services serving large numbers of users. As such, there is an increasing demand for speech recognition systems with enhanced performance. Specifically, enhanced performance in large vocabulary continuous speech recognition (LVCSR) systems is a market demand. Herein, convolutional neural networks are explored as an alternative speech recognition approach and different CNN architectures are tested. According to at least one example embodiment, a method and corresponding apparatus for performing speech recognition comprise employing a CNN with at least two convolutional layers and at least two fully-connected layers in speech recognition. Using the CNN a textual representation of input audio data may be provided based on output data by the CNN. | 01-29-2015 |
20150032442 | METHOD AND APPARATUS FOR SELECTING AMONG COMPETING MODELS IN A TOOL FOR BUILDING NATURAL LANGUAGE UNDERSTANDING MODELS - Selecting a grammar for use in a machine question-answering system, such as a Natural Language Understanding System, can be difficult for non-experts in such grammars. A tool, according to an example embodiment, can compare annotations of sample sentences, performed correctly by a human, the annotations having intents and mentions, against annotations performed by multiple grammars. Each grammar can be scored, and the system can select the best scored grammar for the user. In one embodiment, a method of selecting a grammar includes comparing manually-generated annotations against machine-generated annotations as a function of a given grammar among multiple grammars. The method can further include applying scores to the machine-generated annotations that are a function of weightings of the intents and mentions. The method can additionally include recommending whether to employ the given grammar based on the scores. | 01-29-2015 |
20150032441 | Initializing a Workspace for Building a Natural Language Understanding System - Designing a natural language understanding (NLU) model for an application from scratch can be difficult for non-experts. A system can simplify the design process by providing an interface allowing a designer to input example usage sentences and build an NLU model based on presented matches to those example sentences. In one embodiment, a method for initializing a workspace for building an NLU system includes parsing a sample sentence to select at least one candidate stub grammar from among multiple candidate stub grammars. The method can include presenting, to a user, respective representations of the candidate stub grammars selected by the parsing of the sample sentence. The method can include enabling the user to choose one of the respective representations of the candidate stub grammars. The method can include adding to the workspace a stub grammar corresponding to the representation of the candidate stub grammar chosen by the user. | 01-29-2015 |
20150025891 | METHOD AND SYSTEM FOR TEXT-TO-SPEECH SYNTHESIS WITH PERSONALIZED VOICE - A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input ( | 01-22-2015 |
20150025888 | SPEAKER RECOGNITION AND VOICE TAGGING FOR IMPROVED SERVICE - A method of enabling speaker identification, the method comprising receiving an identifier, the identifier having a limited number of potential speakers associated with it, processing speech data received from a speaker, and when the speaker is recognized, tagging a speaker and displaying a speaker identity. The method further comprises, when the speaker is not recognized, prompting an associate to identify the speaker | 01-22-2015 |
20140348308 | Method And System For Speaker Verification - In many scenarios, speaker verification systems can be given a single-channel audio with recordings of multiple speakers. To perform accurate speaker verification, a system can isolate the speech of a speaker. In one embodiment, a method, and corresponding system, of speaker verification includes extracting a target speaker's speech, using a known speaker voiceprint, from an audio recording that includes the target speaker's speech and the known speaker's speech. The known speaker voiceprint can correspond to the known speaker. Extracting the target speaker's speech can include determining portions of the audio recording where the known speaker voiceprint matches the known speaker's speech above a particular threshold, and extracting the target speaker's speech from other portions of the audio recording. In this manner, speaker verification is performed on the target speaker's speech without interference from the known speaker's speech and allows for a more accurate verification. | 11-27-2014 |
20140343948 | SYSTEM AND METHOD FOR PROVIDING NETWORK COORDINATED CONVERSATIONAL SERVICES - A system and method for providing automatic and coordinated sharing of conversational resources, e.g., functions and arguments, between network-connected servers and devices and their corresponding applications. In one aspect, a system for providing automatic and coordinated sharing of conversational resources includes a network having a first and second network device, the first and second network device each comprising a set of conversational resources, a dialog manager for managing a conversation and executing calls requesting a conversational service, and a communication stack for communicating messages over the network using conversational protocols, wherein the conversational protocols establish coordinated network communication between the dialog managers of the first and second network device to automatically share the set of conversational resources of the first and second network device, when necessary, to perform their respective requested conversational service. | 11-20-2014 |
20140337016 | Speech Signal Enhancement Using Visual Information - Visual information is used to alter or set an operating parameter of an audio signal processor, other than a beamformer. A digital camera captures visual information about a scene that includes a human speaker and/or a listener. The visual information is analyzed to ascertain information about acoustics of a room. A distance between the speaker and a microphone may be estimated, and this distance estimate may be used to adjust an overall gain of the system. Distances among, and locations of, the speaker, the listener, the microphone, a loudspeaker and/or a sound-reflecting surface may be estimated. These estimates may be used to estimate reverberations within the room and adjust aggressiveness of an anti-reverberation filter, based on an estimated ratio of direct to indirect (reverberated) sound energy expected to reach the microphone. In addition, orientation of the speaker or the listener, relative to the microphone or the loudspeaker, can also be estimated, and this estimate may be used to adjust frequency-dependent filter weights to compensate for uneven frequency propagation of acoustic signals from a mouth, or to a human ear, about a human head. | 11-13-2014 |
20140324477 | METHOD AND SYSTEM FOR GENERATING A MEDICAL REPORT AND COMPUTER PROGRAM PRODUCT THEREFOR - A method and a system for generating, with the assistance of a computer system ( | 10-30-2014 |
20140324434 | SYSTEMS AND METHODS FOR PROVIDING METADATA-DEPENDENT LANGUAGE MODELS - Techniques for generating language models. The techniques include: obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data; identifying, by processing the language data using at least one processor, a set of one or more of the metadata attributes to use for clustering the instances of training data into a plurality of clusters; clustering the training data instances based on their respective values for the identified set of metadata attributes into the plurality of clusters; and generating a language model for each of the plurality of clusters. | 10-30-2014 |
20140316785 | SPEECH RECOGNITION SYSTEM INTERACTIVE AGENT - A speech recognition system includes distributed processing across a client and server for recognizing a spoken query by a user. A number of different speech models for different languages are used to support and detect a language spoken by a user. In some implementations an interactive electronic agent responds in the user's language to facilitate a real-time, human like dialogue. | 10-23-2014 |
20140309993 | SYSTEM AND METHOD FOR DETERMINING QUERY INTENT - A method for training a system is provided. The method may include storing one or more backend communication logs, each of the one or more backend communication logs including a user query and a corresponding backend query. The method may further include parsing the one or more backend communication logs to extract statistical information and generating a mapping between each user query and a corresponding set of language tags. The method may also include sorting the one or more backend communication logs based upon, at least in part, the extracted statistical information. | 10-16-2014 |
20140307883 | METHOD FOR DETERMINING A SET OF FILTER COEFFICIENTS FOR AN ACOUSTIC ECHO COMPENSATOR - Methods and apparatus for beamforming and performing echo compensation for the beamformed signal with an echo canceller including calculating a set of filter coefficients as an estimate for a new steering direction without a complete adaptation of the echo canceller. | 10-16-2014 |
20140303976 | METHOD AND SYSTEM FOR DYNAMIC CREATION OF CONTEXTS - A method and a system for a speech recognition system, comprising an electronic speech-based document is associated with a document template and comprises one or more sections of text recognized or transcribed from sections of speech. The sections of speech are transcribed by the speech recognition system into corresponding sections of text of the electronic speech based document. The method includes the steps of dynamically creating sub contexts and associating the sub context to sections of text of the document template. | 10-09-2014 |
20140297283 | Concept Cloud in Smart Phone Applications - An automated arrangement is described for conducting natural language interactions with a human user. A user interface is provided for user communication in a given active natural language interaction with a natural language application during an automated dialog session. An automatic speech recognition (ASR) engine processes unknown user speech inputs from the user interface to produce corresponding speech recognition results. A natural language concept module processes the speech recognition results to develop corresponding natural language concept items. A concept item storage holds selected concept items for reuse in a subsequent natural language interaction with the user during the automated dialog session. | 10-02-2014 |
20140297282 | Auto-Generation of Parsing Grammars from a Concept Ontology - An ontology stores information about a domain of an automatic speech recognition (ASR) application program. The ontology is augmented with information that enables subsequent automatic generation of a speech understanding grammar for use by the ASR application program. The information includes hints about how a human might talk about objects in the domain, such as preludes (phrases that introduce an identification of the object) and postludes (phrases that follow an identification of the object). | 10-02-2014 |
20140297279 | SYSTEM AND METHOD USING FEEDBACK SPEECH ANALYSIS FOR IMPROVING SPEAKING ABILITY - A speech analysis system and method for analyzing speech. The system includes: a voice recognition system for converting inputted speech to text; an analytics system for generating feedback information by analyzing the inputted speech and text; and a feedback system for outputting the feedback information. | 10-02-2014 |
20140297278 | METHODS AND APPARATUS FOR LINKING EXTRACTED CLINICAL FACTS TO TEXT - A plurality of clinical facts may be extracted from a free-form narration of a patient encounter provided by a clinician. The plurality of clinical facts may include a first fact and a second fact. The first fact may be extracted from a first portion of the free-form narration, and the second fact may be extracted from a second portion of the free-form narration. A first indicator that indicates a first linkage between the first fact and the first portion of the free-form narration may be provided to a user. A second indicator, different from the first indicator, that indicates a second linkage between the second fact and the second portion of the free-form narration may also be provided to the user. | 10-02-2014 |
20140288974 | METHODS AND APPARATUS FOR ANALYZING SPECIFICITY IN CLINICAL DOCUMENTATION - A set of one or more clinical facts may be collected from a clinician's encounter with a patient. From the set of facts, it may be determined that an additional fact that provides additional specificity to the set of facts may possibly be ascertained from the patient encounter. A user may be alerted that the additional fact may possibly be ascertained from the patient encounter. | 09-25-2014 |
20140288973 | CATEGORIZATION OF INFORMATION USING NATURAL LANGUAGE PROCESSING AND PREDEFINED TEMPLATES - A computer implemented method for generating a report that includes latent information, comprising receiving an input data stream that includes latent information, performing one of normalization, validation, and extraction of the input data stream, processing the input data stream to identify latent information within the data stream that is required for generation of a particular report, wherein said processing of the input data stream to identify latent information comprises of identifying a relevant portion of the input data stream, bounding the relevant portion of the input data stream, classifying and normalizing the bounded data, activating a relevant report template based on said identified latent information, populating said template with template-specified data, and processing the template-specified data to generate a report. | 09-25-2014 |
20140280353 | METHODS AND APPARATUS FOR ENTITY DETECTION - Techniques for entity detection include matching a token from at least a portion of a text string with a matching concept in an ontology. A first concept may be identified as being hierarchically related to the matching concept within the ontology, and a second concept may be identified as being hierarchically related to the first concept within the ontology. The first and second concepts may be included in a set of features of the token. Based at least in part on the set of features of the token, a measure related to a likelihood that the at least a portion of the text string corresponds to a particular entity type may be determined. | 09-18-2014 |
20140280169 | Method And Apparatus For A Frequently-Asked Questions Portal Workflow - In FAQ based systems, associating questions with answers can be a time consuming task if performed manually. In one embodiment, a method of building a frequently-asked questions (FAQ) portal can include creating cluster labels. The labels can include predefined universal semantic labels and application-specific labels. The method can further include applying the cluster labels to clusters of queries within an FAQ application. The method can additionally include adjusting the application-specific labels to support combined and newly created clusters of queries based on application-specific queries within the FAQ application on an ongoing basis and reapplying the universal semantic labels and the adjusted application-specific labels to the combined and newly created clusters of queries. The method and system proposed herein allow for the automated clustering of queries and association with applicable answers, which leads to higher efficiencies for a faster response time for a user. | 09-18-2014 |
20140279514 | PRO-ACTIVE IDENTITY VERIFICATION FOR AUTHENTICATION OF TRANSACTION INITIATED VIA NON-VOICE CHANNEL - A method of using biometric verification comprises identifying a validation requirement during the execution of a non-voice channel interaction, and initiating a contact to the user, at a pre-registered device. The method further comprises executing a biometric verification of the user's identity and possession of the device, via a user interaction at the pre-registered device, and providing the validation when the user is successfully identified. | 09-18-2014 |
20140278448 | SYSTEMS AND METHODS FOR IDENTIFYING ERRORS AND/OR CRITICAL RESULTS IN MEDICAL REPORTS - Systems and methods for analyzing a medical report to determine whether the medical report includes at least one instance of at least one category selected from a group consisting of: gender error, laterality error, and critical finding. In some embodiments, one or more portions of text are identified from the medical report. Contextual information associated with the medical report is used to determine whether the identified one or more portions of text comprise at least one instance of at least one category selected from the group. | 09-18-2014 |
20140278435 | METHODS AND APPARATUS FOR DETECTING A VOICE COMMAND - Some aspects include a method of monitoring an acoustic environment of a mobile device operating in a low power mode, the mobile device having a first and second processor, the method comprises receiving acoustic input while the mobile device is operating in the low power mode, performing at least one first processing stage on the acoustic input using the first processor, prior to engaging the second processor, to evaluate whether the acoustic input includes a voice command, performing at least one second processing stage on the acoustic input using the second processor to evaluate whether the acoustic input includes a voice command if further processing is needed to determine whether the acoustic input includes a voice command, and initiating responding to the voice command when either the at least one first processing stage or the at least one second processing stage determines that the acoustic input includes a voice command. | 09-18-2014 |
20140278426 | DATA SHREDDING FOR SPEECH RECOGNITION ACOUSTIC MODEL TRAINING UNDER DATA RETENTION RESTRICTIONS - Training speech recognizers, e.g., their language or acoustic models, using actual user data is useful, but retaining personally identifiable information may be restricted in certain environments due to regulations. Accordingly, a method or system is provided for enabling training of an acoustic model which includes dynamically shredding a speech corpus to produce text segments and depersonalized audio features corresponding to the text segments. The method further includes enabling a system to train an acoustic model using the text segments and the depersonalized audio features. Because the data is depersonalized, actual data may be used, enabling speech recognizers to keep up-to-date with user trends in speech and usage, among other benefits. | 09-18-2014 |
20140278422 | INDEXING DIGITIZED SPEECH WITH WORDS REPRESENTED IN THE DIGITIZED SPEECH - Indexing digitized speech with words represented in the digitized speech, with a multimodal digital audio editor operating on a multimodal device supporting modes of user interaction, the modes of user interaction including a voice mode and one or more non-voice modes, the multimodal digital audio editor operatively coupled to an ASR engine, including providing by the multimodal digital audio editor to the ASR engine digitized speech for recognition; receiving in the multimodal digital audio editor from the ASR engine recognized user speech including a recognized word, also including information indicating where, in the digitized speech, representation of the recognized word begins; and inserting by the multimodal digital audio editor the recognized word, in association with the information indicating where, in the digitized speech, representation of the recognized word begins, into a speech recognition grammar, the speech recognition grammar voice enabling user interface commands of the multimodal digital audio editor. | 09-18-2014 |
20140278410 | TEXT PROCESSING USING NATURAL LANGUAGE UNDERSTANDING - Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance. | 09-18-2014 |
20140278374 | SYSTEM AND METHOD FOR IMPROVING TEXT INPUT IN A SHORTHAND-ON-KEYBOARD INTERFACE - A word pattern recognition system improves text input entered via a shorthand-on-keyboard interface. A core lexicon comprises commonly used words in a language; an extended lexicon comprises words not included in the core lexicon. The system only directly outputs words from the core lexicon. Candidate words from the extended lexicon can be outputted and simultaneously admitted to the core lexicon upon user selection. A concatenation module enables a user to input parts of a long word separately. A compound word module combines two common shorter words whose concatenation forms a long word. | 09-18-2014 |
20140274211 | METHODS AND APPARATUS FOR DETECTING A VOICE COMMAND - According to some aspects, a method of monitoring an acoustic environment of a mobile device, at least one computer readable medium encoded with instructions that, when executed, perform such a method and/or a mobile device configured to perform such a method is provided. The method comprises receiving, by the mobile device, acoustic input from the environment of the mobile device, detecting whether the acoustic input includes a voice command from a user without requiring receipt of an explicit trigger from the user, and initiating responding to the detected voice command. | 09-18-2014 |
20140274203 | METHODS AND APPARATUS FOR DETECTING A VOICE COMMAND - According to some aspects, a method of monitoring an acoustic environment of a mobile device, at least one computer readable medium encoded with instructions that, when executed, perform such a method and/or a mobile device configured to perform such a method is provided. The method comprises receiving acoustic input from the environment of the mobile device while the mobile device is operating in the low power mode, detecting whether the acoustic input includes a voice command based on performing a plurality of processing stages on the acoustic input, wherein at least one of the plurality of processing stages is performed while the mobile device is operating in the low power mode, and using at least one contextual cue to assist in detecting whether the acoustic input includes a voice command. | 09-18-2014 |
20140259126 | BIOMETRIC AUTHORIZATION FOR REAL TIME ACCESS CONTROL - A method of providing biometric authorization comprising enabling a user to log into an account, and determining whether there is a hold on the account. When there is a hold on the account, informing the user of the hold, and enabling the user to respond to a transaction that caused the hold. The method, in one embodiment further comprising prompting the user to enter a biometric authentication, in conjunction with the response, and processing the unblock request in real-time upon receiving and validating the biometric authentication. | 09-11-2014 |
20140258878 | METHOD OF AND SYSTEM FOR DYNAMICALLY CONTROLLING DURING RUN TIME A MULTIFUNCTION PERIPHERAL (MFP) TOUCH PANEL USER INTERFACE (UI) FROM AN EXTERNAL REMOTE NETWORK-CONNECTED COMPUTER - In a system for dynamically and remotely providing user interface (UI) display and processing information to a touch panel embedded within a multifunction peripheral (MFP) such as a digital copier having an internal computer for controlling the touch panel, a method that comprises linking the internal computer to an external data communication network having an external remote computer on the network; and upon the inputting of desired selections by a user at the UI and communicating the same over the network to the external computer, providing information from the external computer via the network back to the internal computer that enables dynamically changing or updating the UI display and behavior during run time of the MFP. | 09-11-2014 |
20140258857 | TASK ASSISTANT HAVING MULTIPLE STATES - A method of providing a task assistant to provide an interface to an application, the method comprising activating the task assistant, the activation having an associated visual display. The method in one embodiment includes receiving input from a user through multimodal input including a plurality of speech input, typing input, and touch input, interpreting the input, and providing a formatted query to the application, receiving data from the application in response to the query, and providing a response to the user through multimodal output including a plurality of: speech output, text output, non-speech audio output, haptic output, and visual non-text output, wherein the task assistant has a plurality of active states, each of the active states having an associated visual display. | 09-11-2014 |
20140258856 | TASK ASSISTANT INCLUDING NAVIGATION CONTROL - A method of providing a task assistant to provide an interface to an application is described. The method includes, in one embodiment, identifying a user and determining whether there is a push notification to be shown to the user. When there is a push notification to be shown to the user, the task assistant displaying the push notification to the user, such that when the user acknowledges the push notification, the user is directed to a push destination. When there is no push notification, receiving input from a user through multimodal input including a plurality of speech input, typing input, and touch input, interpreting the input, and providing a formatted query to the application, receiving data from the application in response to the query, and providing a response to the user through multimodal output including a plurality of: speech output, text output, non-speech audio output, haptic output, and visual non-text output. | 09-11-2014 |
20140258855 | TASK ASSISTANT INCLUDING IMPROVED NAVIGATION - A method of providing a task assistant to provide an interface to an application is described. The method includes, in one embodiment, receiving input from a user through multimodal input including a plurality of each input, typing input, and touch input, interpreting the input, and providing a formatted query to the application, and receiving data from the application in response to the query. The method further includes identifying a destination providing a response to the query, identifying an anchor pointing to a specific location in the application associated with the response to the query, and directing the user to the specific location, using the anchor, the specific location providing a response to the user. | 09-11-2014 |
20140258324 | TASK ASSISTANT UTILIZING CONTEXT FOR IMPROVED INTERACTION - A method of providing a task assistant is described. The task assistant is designed to receive input from a user through multimodal input including a plurality of speech input, typing input, and touch input, determine the meaning of the input, and determining whether there is a context based on prior interactions with the user. The method further to generate an interpreted input based on a combination of the input and the context, and providing a formatted query to an application. The method further to receive data from the application in response to the formatted query, and provide a response to the user through multimodal output including a plurality of: speech output, text output, non-speech audio output, haptic output, and visual non-text output. The method further to update the context based on the interpreted input. | 09-11-2014 |
20140258323 | TASK ASSISTANT - A method of providing a task assistant to provide an interface to an application is described. The method comprises receiving input from a user through multimodal input including a plurality of speech input, typing input, and touch input, interpreting the input, and providing a formatted query to the application, receiving data from the application in response to the query, and providing a response to the user through multimodal output including a plurality of: speech output, text output, non-speech audio output, haptic output, and visual non-text output. | 09-11-2014 |
20140257811 | METHOD FOR REFINING A SEARCH - A method for refining a search is provided. Embodiments may include receiving a first speech signal corresponding to a first utterance and receiving a second speech signal corresponding to a second utterance, wherein the second utterance is a refinement to the first utterance. Embodiments may also include identifying information associated with the first speech signal as first speech signal information and identifying information associated with the second speech signal as second speech signal information. Embodiments may also include determining a first quantity of search results based upon the first speech signal information and determining a second quantity of search results based upon the second speech signal information. Embodiments may also include comparing at least one of the first quantity of search results and the second quantity of search results with a quantity of search results from a combination of information of the first and second signals and determining an information gain from the comparison. | 09-11-2014 |
20140257807 | SPEECH RECOGNITION AND INTERPRETATION SYSTEM - A method of providing a task assistant comprising starting to receive speech input from a user, and identifying a format associated with a destination for speech input based on a flag associated with the destination field. When the format comprises dictation, converting the speech to text, and inserting it into the destination location, and when the format comprises an intent, determining a meaning of the input, and sending a formatted query to an application. The method further comprising receiving data from the application in response to the intent and providing a response to the user through multimodal output. | 09-11-2014 |
20140257806 | FLEXIBLE ANIMATION FRAMEWORK FOR CONTEXTUAL ANIMATION DISPLAY - A method of providing a custom response in conjunction with an application providing speech interaction is described. In one embodiment, the method comprises determining a context of a current interaction with the user, and identifying an associated custom animation, when the associated custom animation exists. The method further comprises displaying the custom animation by overlaying it over a native response of the application. In one embodiment, when no custom animation exists, the method determines whether there is a default animation, and displays the default animation as part of a state change of the application. | 09-11-2014 |
20140257794 | Semantic Re-Ranking of NLU Results in Conversational Dialogue Applications - A human-machine dialogue system is described which has multiple computer-implemented dialogue components. A user client delivers output prompts to a human user and receives dialogue inputs from the human user including speech inputs. An automatic speech recognition (ASR) engine processes the speech inputs to determine corresponding sequences of representative text words. A natural language understanding (NLU) engine processes the text words to determine corresponding NLU-ranked semantic interpretations. A semantic re-ranking module re-ranks the NLU-ranked semantic interpretations based on at least one of dialogue context information and world knowledge information. A dialogue manager responds to the re-ranked semantic interpretations and generates the output prompts so as to manage a dialogue process with the human user. | 09-11-2014 |
20140257793 | Communicating Context Across Different Components of Multi-Modal Dialog Applications - A human-machine dialogue system is described which has multiple computer-implemented dialogue components. A user client delivers output prompts to a human user and receives dialogue inputs including speech inputs from the human user. An automatic speech recognition (ASR) engine processes the speech inputs to determine corresponding sequences of representative text words. A natural language understanding (NLU) engine processes the text words to determine corresponding semantic interpretations. A dialogue manager (DM) generates the output prompts and responds to the semantic interpretations so as to manage a dialogue process with the human user. The dialogue components share context information with each other using a common context sharing mechanism such that the operation of each dialogue component reflects available context information. | 09-11-2014 |
20140257792 | Anaphora Resolution Using Linguisitic Cues, Dialogue Context, and General Knowledge - An automatic conversational system has multiple computer-implemented dialogue components for conducting an automated dialogue process with a human user. A user client delivers dialogue output prompts to the human user and receives dialogue input responses from the human user including speech inputs. An automatic speech recognition engine processes the speech inputs to determine corresponding sequences of representative text words. A natural language understanding (NLU) processing arrangement processes the dialogue input responses and the text words to determine corresponding semantic interpretations. The NLU processing arrangement includes an anaphora processor that accesses different information sources characterizing dialogue context, linguistic features, and NLU features to identify unresolved anaphora in the text words needing resolution in order to determine a semantic interpretation. A dialogue manager manages the dialogue process with the human user based on the semantic interpretations. | 09-11-2014 |
20140253455 | TASK ASSISTANT PROVIDING CONTEXTUAL SUGGESTIONS - A method of providing a task assistant to provide an interface to an application is described. The method comprises, in one embodiment, providing a task assistant to provide an interface to an application, to receive input from a user through multimodal input including a plurality of speech input, typing input, and touch input. The method further comprises receiving a request from a user for a suggestion, determining a current transaction state, based on interactions with the user during a current session, and selecting suggestions relevant to a transaction type associated with the current transaction state. | 09-11-2014 |
20140249831 | VIRTUAL MEDICAL ASSISTANT METHODS AND APPARATUS - In some aspects, a method of using a virtual medical assistant to assist a medical professional, the virtual medical assistant implemented, at least in part, by at least one processor of a host device capable of connecting to at least one network is provided. The method comprises receiving free-form instruction from the medical professional, providing the free-form instruction for processing to assist in identifying from the free-form instruction at least one medical task to be performed, obtaining identification of at least one impediment to performing the at least one medical task, and inferring at least some information needed to overcome the at least one impediment. | 09-04-2014 |
20140249830 | VIRTUAL MEDICAL ASSISTANT METHODS AND APPARATUS - In some aspects, a method of using a virtual medical assistant to assist a medical professional, the virtual medical assistant implemented, at least in part, by at least one processor of a host device capable of connecting to at least one network is provided. The method comprises receiving free-form instruction from the medical professional; providing the free-form instruction for processing to assist in identifying from the free-form instruction at least one medical task to be performed, providing at least one first response to the medical professional regarding the free-form instruction prior to the at least one medical task being performed, and receiving first information from the medical professional responsive to the at least one first response. | 09-04-2014 |
20140249819 | VERIFYING A USER USING SPEAKER VERIFICATION AND A MULTIMODAL WEB-BASED INTERFACE - A method of verifying a user identity using a Web-based multimodal interface can include sending, to a remote computing device, a multimodal markup language document that, when rendered by the remote computing device, queries a user for a user identifier and causes audio of the user's voice to be sent to a multimodal, Web-based application. The user identifier and the audio can be received at about a same time from the client device. The audio can be compared with a voice print associated with the user identifier. The user at the remote computing device can be selectively granted access to the system according to a result obtained from the comparing step. | 09-04-2014 |
20140249816 | METHODS, APPARATUS AND COMPUTER PROGRAMS FOR AUTOMATIC SPEECH RECOGNITION - An automatic speech recognition (ASR) system includes a speech-responsive application and a recognition engine. The ASR system generates user prompts to elicit certain spoken inputs, and the speech-responsive application performs operations when the spoken inputs are recognised. The recognition engine compares sounds within an input audio signal with phones within an acoustic model, to identify candidate matching phones. A recognition confidence score is calculated for each candidate matching phone, and the confidence scores are used to help identify one or more likely sequences of matching phones that appear to match a word within the grammar of the speech-responsive application. The per-phone confidence scores are evaluated against predefined confidence score criteria (for example, identifying scores below a ‘low confidence’ threshold) and the results of the evaluation are used to influence subsequent selection of user prompts. One such system uses confidence scores to select prompts for targetted recognition training—encouraging input of sounds identified as having low confidence scores. Another system selects prompts to discourage input of sounds that were not easily recognised. | 09-04-2014 |
20140247953 | SPEAKER LOCALIZATION - Methods and apparatus for determining phase shift information between the first and second microphone signals for a sound signal, and determining an angle of incidence of the sound in relation to the first and second positions of the first and second microphones from the phase shift information of a band-limited test signal received by the first and second microphones for a frequency range of interest. | 09-04-2014 |
20140244260 | METHOD AND APPARATUS FOR RECOGNIZING AND REACTING TO USER PERSONALITY IN ACCORDANCE WITH SPEECH RECOGNITION SYSTEM - Techniques are disclosed for recognizing user personality in accordance with a speech recognition system. For example, a technique for recognizing a personality trait associated with a user interacting with a speech recognition system includes the following steps/operations. One or more decoded spoken utterances of the user are obtained. The one or more decoded spoken utterances are generated by the speech recognition system. The one or more decoded spoken utterances are analyzed to determine one or more linguistic attributes (morphological and syntactic filters) that are associated with the one or more decoded spoken utterances. The personality trait associated with the user is then determined based on the analyzing step/operation. | 08-28-2014 |
20140244257 | Method and Apparatus for Automated Speaker Parameters Adaptation in a Deployed Speaker Verification System - Typical speaker verification systems usually employ speakers' audio data collected during an enrollment phase when users enroll with the system and provide respective voice samples. Due to technical, business, or other constraints, the enrollment data may not be large enough or rich enough to encompass different inter-speaker and intra-speaker variations. According to at least one embodiment, a method and apparatus employing classifier adaptation based on field data in a deployed voice-based interactive system comprise: collecting representations of voice characteristics, in association with corresponding speakers, the representations being generated by the deployed voice-based interactive system; updating parameters of the classifier, used in speaker recognition, based on the representations collected; and employing the classifier, with the corresponding parameters updated, in performing speaker recognition. | 08-28-2014 |
20140244249 | System and Method for Identification of Intent Segment(s) in Caller-Agent Conversations - Identification of an intent of a conversation can be useful for real-time or post-processing purposes. According to example embodiments, a method, and corresponding apparatus of identifying at least one intent-bearing utterance in a conversation, comprises determining at least one feature for each utterance among a subset of utterances of the conversation; classifying each utterance among the subset of utterances, using a classifier, as an intent classification or a non-intent classification based at least in part on a subset of the at least one determined feature; and selecting at least one utterance, with intent classification, as an intent-bearing utterance based at least in part on classification results by the classifier. Through identification of an intent bearing utterance, a call center for example, can provide improved service for callers through, for example, more effective directing of a call to a live agent. | 08-28-2014 |
20140242957 | MEASURING END USER ACTIVITY OF SOFTWARE ON A MOBILE OR DISCONNECTED DEVICE - A hardware and/or software facility measures end user activity associated with a software application or service on a mobile phone or other mobile device. The facility tracks and stores usage data associated with a mobile user's use of the application or service. When the mobile user initiates transmission of the usage data, the facility retrieves from the mobile phone or other mobile device a usage code representing the usage data. The facility relies on user transcription, text input-buffer insertion, or other indirect means of data transport to deliver the usage code from the mobile phone or other mobile device to an application developer, service provider, or another entity. The recipient extracts the usage data contained in the usage code, and may perform various data mining and analysis techniques on the usage data in order to evaluate how the application or service is used. | 08-28-2014 |
20140241514 | PERFORMING ACTIONS FOR USERS BASED ON SPOKEN INFORMATION - Techniques are described for performing actions for users based at least in part on spoken information, such as spoken voice-based information received from the users during telephone calls. The described techniques include categorizing spoken information obtained from a user in one or more ways, and performing actions on behalf of the user related to the categorized information. For example, in some situations, spoken information obtained from a user is analyzed to identify one or more spoken information items (e.g., words, phrases, sentences, etc.) supplied by the user, and to generate corresponding textual representations (e.g., via automated speech-to-text techniques). One or more actions may then be taken regarding the identified information items, including to categorize the items by adding textual representations of the spoken information items to one or more of multiple predefined lists or other collections of information that are specific to or otherwise available to the user. | 08-28-2014 |
20140241513 | Method and Apparatus for Providing Enhanced Communications - Call service centers typically require a user to confirm its identity by manually entering identification. An embodiment of the present invention uses a client device's identification to identify a user securely without requiring a user to enter identification manually. The client device's automatic numbering identification (ANI) number is an example of identification of the user, and the client device's media access control address is an example of information used, as an encryption key, to verify that the ANI number is not spoofed. In one embodiment, the client device provides the ANI number to a data server prior to a call, and the data server provides information to enable enhancement of the call either by sending the information to the call center directly or via the client device. Benefits of embodiments of the present invention include a reduced load on call service centers, reduced call time, and increased user satisfaction. | 08-28-2014 |
20140236596 | EMOTION DETECTION IN VOICEMAIL - Methods and apparatus for processing a voicemail message to generate a textual representation of at least a portion of the voicemail message. At least one emotion expressed in the voicemail message is determined by applying at least one emotion classifier to the voicemail message and/or the textual representation. An indication of the determined at least one emotion is provided in a manner associated with the textual representation of the at least a portion of the voicemail message. | 08-21-2014 |
20140233757 | Noisy Environment Communication Enhancement System - A communication system enhances communications in a noisy environment. Multiple input arrays convert a voiced or unvoiced signal into an analog signal. A converter receives the analog signal and generates digital signals. A digital signal processor determines temporal and spatial information from the digital signals. The processed signals are then converted to audible sound. | 08-21-2014 |
20140223310 | Correction Menu Enrichment with Alternate Choices and Generation of Choice Lists in Multi-Pass Recognition Systems - A method is described for user correction of speech recognition results. A speech recognition result for a given unknown speech input is displayed to a user. A user selection is received of a portion of the recognition result needing to be corrected. For each of multiple different recognition data sources, a ranked list of alternate recognition choices is determined which correspond to the selected portion. The alternate recognition choices are concatenated or interleaved together and duplicate choices removed to form a single ranked output list of alternate recognition choices, which is displayed to the user. The method may be adaptive over time to derive preferences that can then be leveraged in the ordering of one choice list or across choice lists. | 08-07-2014 |
20140223011 | Method and Apparatus For Supporting Scalable Multi-Modal Dialog Application Sessions - Embodiments disclosed herein enable scaling up and making advanced natural language (NLU) applications more robust. According to one embodiment, state(s) associated with a dialog session may be recorded to a fixed medium. The dialog session may be suspended after a given period of inactivity and later automatically awakened based on unique client, session, or device identifier, or any combination thereof. Memory and resources associated with the suspended session may be reclaimed, the memory and resources being otherwise held by the session during the period of inactivity, enabling higher density (e.g., a larger number of sessions supported). Embodiments disclosed herein obviate a need for sticky dialog sessions, enabling higher density, and may further failover protection and fault tolerance for the dialog sessions. | 08-07-2014 |
20140222428 | Method and Apparatus for Efficient I-Vector Extraction - Most speaker recognition systems use i-vectors which are compact representations of speaker voice characteristics. Typical i-vector extraction procedures are complex in terms of computations and memory usage. According to an embodiment, a method and corresponding apparatus for speaker identification, comprise determining a representation for each component of a variability operator, representing statistical inter- and intra-speaker variability of voice features with respect to a background statistical model, in terms of a linear operator common to all components of the variability operator and having a first dimension larger than a second dimension of the components of the variability operator; computing statistical voice characteristics of a particular speaker using the determined representations; and employing the statistical voice characteristics of the particular speaker in performing speaker recognition. Computing the voice characteristics, by using the determined representations, results in significant reduction in memory usage and possible increase in execution speed. | 08-07-2014 |
20140222423 | Method and Apparatus for Efficient I-Vector Extraction - Most speaker recognition systems use i-vectors which are compact representations of speaker voice characteristics. Typical i-vector extraction procedures are complex in terms of computations and memory usage. According an embodiment, a method and corresponding apparatus for speaker identification, comprise determining a representation for each component of a variability operator, representing statistical inter- and intra-speaker variability of voice features with respect to a background statistical model, in terms of an orthogonal operator common to all components of the variability operator and having a first dimension larger than a second dimension of the components of the variability operator; computing statistical voice characteristics of a particular speaker using the determined representations; and employing the statistical voice characteristics of the particular speaker in performing speaker recognition. Computing the voice characteristics, by using the determined representations, results in significant reduction in memory usage and substantial increase in execution speed. | 08-07-2014 |
20140215505 | SYSTEMS AND METHODS FOR SUPPLEMENTING CONTENT WITH AUDIENCE-REQUESTED INFORMATION - A system and method are described for delivering to a member of an audience supplemental information related to presented media content. Media content is associated with media metadata that identifies active content elements in the media content and supported intents associated with those content elements. A member of an audience may submit input related to an active content element. The audience input is compared to media metadata to determine whether supplemental information can be identified that would be appropriate to deliver to the audience member based on that person's input. In some implementations, audience input includes audio data of an audience's spoken input regarding the media content. | 07-31-2014 |
20140213223 | ENHANCED SIGNALING FOR MOBILE COMMUNICATION DEVICES - A data communication system routes calls from a mobile communication device associated with a calling party to a mobile communication device associated with a called party. If calls are not answered by the called party, a call is routed to a voicemail server. A data channel is opened between the mobile communication device associated with the calling party and the voicemail server over which higher fidelity audio signals are transmitted. The transmitted audio is supplied to the voicemail server for conversion from an audio to a text format. In accordance with another aspect, a data channel is opened with a call placed by the calling party. The data channel is used to append information that is sent with a call to give the called party information regarding the call or the caller without having to answer the call. | 07-31-2014 |
20140208210 | DISPLAYING SPEECH COMMAND INPUT STATE INFORMATION IN A MULTIMODAL BROWSER - Methods, systems, and products are disclosed for displaying speech command input state information in a multimodal browser including displaying an icon representing a speech command type and displaying an icon representing the input state of the speech command. In typical embodiments, the icon representing a speech command type and the icon representing the input state of the speech command also includes attributes of a single icon. Typical embodiments include accepting from a user a speech command of the speech command type, changing the input state of the speech command, and displaying another icon representing the changed input state of the speech command. Typical embodiments also include displaying the text of the speech command in association with the icon representing the speech command type. | 07-24-2014 |
20140207491 | TRANSCRIPTION DATA SECURITY - A computer program product for use with dictated medical patient information resides on a computer-readable medium and comprises computer-readable instructions for causing a computer to analyze the dictated information, identify likely confidential information in the dictated medical patient information, and treat the likely confidential information disparately from likely non-confidential information in the dictated medical patient information. | 07-24-2014 |
20140207451 | Method and Apparatus of Adaptive Textual Prediction of Voice Data - Typical textual prediction of voice data employs a predefined implementation arrangement of a single or multiple prediction sources. Using a predefined implementation arrangement of the prediction sources may not provide a good prediction performance in a consistent manner with variations in voice data quality. Prediction performance may be improved by employing adaptive textual prediction. According to at least one embodiment determining a configuration of a plurality of prediction sources, used for textual interpretation of the voice data, is determined based at least in part on one or more features associated with the voice data or one or more a-priori interpretations of the voice data. A textual output prediction of the voice data is then generated using the plurality of prediction sources according to the determined configuration. Employing an adaptive configuration of the text prediction sources facilitates providing more accurate text transcripts of the voice data. | 07-24-2014 |
20140207442 | Protection of Private Information in a Client/Server Automatic Speech Recognition System - A mobile device is adapted for protecting private information on the mobile device in a hybrid automatic speech recognition arrangement. The mobile device includes a speech input component for receiving a speech input signal from a user. Additionally, the mobile device includes a local ASR arrangement for performing local ASR processing of the speech input signal and determining if private information is included within the speech input signal. A control unit on the mobile device obscures private information in the speech input signal if the local ASR arrangement identifies information within a speech recognition result as private information. The control unit releases the speech input signal with the obscured private information for transmission to a remote server for further ASR processing. | 07-24-2014 |
20140205077 | System and Method for Biometric Identification of a Call Originator - An embodiment according to the invention provides automatic discovery, via Automatic Speech Recognition (ASR) and Voice Biometrics, of the identification of a caller, when the caller is making a phone call from, for example, a residential line. The caller may, for example, initiate a phone call by voice request to a computer or other device. The device initiates the call, but rather than using the conventional technique of determining Calling Name via lookup to the Transaction Capabilities Application Part (TCAP) database, the embodiment uses a technique of ASR in tandem with voice or other biometrics to recognize who within the residence is making the call, and to use the name associated with the requesting caller's voiceprint for determining the Calling Name to display to the called party. Other forms of biometrics, such as image biometrics (e.g., facial or iris biometrics), may alternatively be employed. | 07-24-2014 |
20140204031 | DATA ENTRY SYSTEM AND METHOD OF ENTERING DATA - A method and apparatus for entering words into a computer system. Letters contained in a desired word are entered by giving approximate location and directional information relative to any specified keyboard layout. The inputs need not correspond to specific keys on the keyboard, a sequence of ambiguous key entries corresponding to individual words can be used to retrieve a word from the dictionary. The system tracks directional information of movement relative to a/the specific keyboard layout, reducing it to predetermined primary directions and translates this seemingly ambiguous information into accurate words from the dictionary. The system may also capture the user's intention (with regard to text entry) by observing the movements on the keyboard. | 07-24-2014 |
20140201729 | Method and Apparatus for Supporting Multi-Modal Dialog Applications - An embodiment of the present invention includes a method for creating a dialog system that provides a framework for creating a multi-modal dialog application and includes a runtime application package (RAP) enabling runtime media grammars, prompts, classifiers, and so forth, to be separate from a multi-modal dialog application that utilizes the RAP. Embodiments disclosed herein enable newly trained runtime media supporting the multi-modal dialog application to be deployed with ease, and to do so while a dialog service is in operation. Embodiments disclosed herein enable the multi-modal dialog application to be created, deployed, and maintained in an easy and flexible manner, saving an end-user that may be providing the multi-modal dialog application to customers both time and cost. | 07-17-2014 |
20140198048 | REDUCING ERROR RATES FOR TOUCH BASED KEYBOARDS - The present technology provides systems and methods for reducing error rates to data input to a keyboard, such as a touch screen keyboard. In one example, an input bias model dynamically changes the keyboard functionality such that the keyboard will not necessarily produce the same result for an identical tap coordinate. Rather, the keyboard functionality is adapted to account for key offset bias that occurs when the user has a tendency to select a tap coordinate that would otherwise return an unintended key. Additionally, the present technology provides a language feedback model that may provide a probability for a next tap coordinate and may augment the key corresponding to the most probable next tap coordinate, thereby allowing the user to more easily select the correct key. Further details are provided herein. | 07-17-2014 |
20140198047 | REDUCING ERROR RATES FOR TOUCH BASED KEYBOARDS - The present technology provides systems and methods for reducing error rates to data input to a keyboard, such as a touch screen keyboard. In one example, an input bias model dynamically changes the keyboard functionality such that the keyboard will not necessarily produce the same result for an identical tap coordinate. Rather, the keyboard functionality is adapted to account for key offset bias that occurs when the user has a tendency to select a tap coordinate that would otherwise return an unintended key. Additionally, the present technology provides a language feedback model that may provide a probability for a next tap coordinate and may augment the key corresponding to the most probable next tap coordinate, thereby allowing the user to more easily select the correct key. Further details are provided herein. | 07-17-2014 |
20140195267 | TECHNIQUES TO IMPROVE ACCURACY OF A MEDICAL REPORT RELATING TO A MEDICAL IMAGING STUDY - Techniques for improving the accuracy of medical reports prepared by medical professionals. One technique may entail providing a system to automatically pre-populate a medical report by extracting data from a medical imaging study and pre-populating fields in the report with the extracted data. Another technique may entail providing a system to analyze data values from an imaging study during preparation of a medical report, and to generate alerts if at least one analyzed data value is outside of a normal range. The data extracted for pre-populating and/or the data values analyzed for alert generation may correspond to measurement data and/or calculation data from the imaging study. After pre-populating and/or generating alerts in the medical report, the system may present the report to a medical professional to include his/her impressions of the imaging study. Such techniques may enable medical professionals to more accurately and efficiently prepare medical reports for patients. | 07-10-2014 |
20140195266 | TECHNIQUES TO IMPROVE ACCURACY OF A MEDICAL REPORT RELATING TO MEDICAL IMAGING STUDY - Techniques for improving the accuracy of medical reports prepared by medical professionals. One technique may entail providing a system to automatically pre-populate a medical report by extracting data from a medical imaging study and pre-populating fields in the report with the extracted data. Another technique may entail providing a system to analyze data values from an imaging study during preparation of a medical report, and to generate alerts if at least one analyzed data value is outside of a normal range. The data extracted for pre-populating and/or the data values analyzed for alert generation may correspond to measurement data and/or calculation data from the imaging study. After pre-populating and/or generating alerts in the medical report, the system may present the report to a medical professional to include his/her impressions of the imaging study. Such techniques may enable medical professionals to more accurately and efficiently prepare medical reports for patients. | 07-10-2014 |
20140188881 | System and Method To Label Unlabeled Data - In accordance with an embodiment of the invention, there is provided a technique for permitting a machine to discover classes and topics that data contains and to annotate data objects with those identified classes. The technique enables machines to group and annotate data objects in ways that are meaningful and intuitive for a user of the data objects. An interactive method uses clustering, along with feedback from a user on the clustering output, to discover a set of classes. The feedback from the user is used to guide the clustering process in the later stages, which results in better and better discovery of classes and annotation with more and more human feedback. A method can be used to produce labeled data that involves discovering classes and annotating a given dataset with the discovered class labels. This is advantageous for building a classifier that has wide applications, such as call routing and intent discovery. | 07-03-2014 |
20140188472 | ADAPTIVE VOICE PRINT FOR CONVERSATIONAL BIOMETRIC ENGINE - A computer-implemented method, system and/or program product update voice prints over time. A receiving computer receives an initial voice print. A determining period of time is calculated for that initial voice print. This determining period of time is a length of time during which an expected degree of change in subsequent voice prints, in comparison to the initial voice print and according to a speaker's subsequent age, is predicted to occur. A new voice print is received after the determining period of time has passed, and the new voice print is compared with the initial voice print. In response to a change to the new voice print falling within the expected degree of change in comparison to the initial voice print, a voice print store is updated with the new voice print. | 07-03-2014 |
20140188469 | DIFFERENTIAL DYNAMIC CONTENT DELIVERY WITH TEXT DISPLAY IN DEPENDENCE UPON SIMULTANEOUS SPEECH - Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users. | 07-03-2014 |
20140180692 | INTENT MINING VIA ANALYSIS OF UTTERANCES - According to example configurations, a speech processing system can include a syntactic parser, a word extractor, word extraction rules, and an analyzer. The syntactic parser of the speech processing system parses the utterance to identify syntactic relationships amongst words in the utterance. The word extractor utilizes word extraction rules to identify groupings of related words in the utterance that most likely represent an intended meaning of the utterance. The analyzer in the speech processing system maps each set of the sets of words produced by the word extractor to a respective candidate intent value to produce a list of candidate intent values for the utterance. The analyzer is configured to select, from the list of candidate intent values (i.e., possible intended meanings) of the utterance, a particular candidate intent value as being representative of the intent (i.e., intended meaning) of the utterance. | 06-26-2014 |
20140176351 | MULTIPLE PREDICTIONS IN A REDUCED KEYBOARD DISAMBIGUATING SYSTEM - A computer receives user entry of a sequence of keypresses, representing an intended series of letters collectively spelling-out some or all of a desired textual object. Resolution of the intended series of letters and the desired textual object is ambiguous, however, because some or all of the key presses individually represent multiple letters. The computer interprets the keypresses utilizing concurrent, competing strategies, including one-keypress-per-letter and multi-tap interpretations. The computer displays a combined output of proposed interpretations and completions from both strategies. | 06-26-2014 |
20140172757 | System and Method For Learning Answers To Frequently Asked Questions From a Semi-Structured Data Source - A frequently-asked-question (FAQ)-based system receives question(s) from a user and generates answer(s) based on data about the question(s). In one embodiment, a method includes retrieving, from a memory, a global structure and candidate answers therein. The method can include computing a first, second, and third probability of a candidate answer based on a local structure of the candidate answer within the global structure, content of the candidate answer given content of a query and context of the candidate answer given the content of the query, respectively. The method can include providing a combined probability of the candidate answer based on the first probability, second probability, and third probability. The method can improve efficiency of a FAQ-based system by automating organization of semi-structured data in a database. Therefore, a human user does not need to manually generate the database when it is already generated in semi-structured form, a semi-structured HTML document. | 06-19-2014 |
20140164953 | SYSTEMS AND METHODS FOR INVOKING VIRTUAL AGENT - Systems, methods, and apparatus for use in connection with at least one virtual agent. In some embodiments, at least one processor is programmed to: intercept user input to a messaging application executing on the at least one processor and facilitating a multiparty conversation; and in response to detecting a trigger in the user input, inject the at least one virtual agent into the multiparty conversation facilitated by the messaging application. | 06-12-2014 |
20140164597 | METHOD AND APPARATUS FOR DETECTING USER ID CHANGES - In many speech-enabled applications, adaptation of speech recognition and language understanding tools for different users are employed. With such adaptation, identifying the particular user precedes applying the speech recognition and language understanding tools. According to at least one example embodiment, a method and corresponding apparatus of identifying a user includes comparing personal information data received from a user network device against personal information accessible by the server; and identifying a speech profile specific to the user based on the results of comparing the personal information data retrieved from the first user network device against the personal information accessible by the server. The identified speech profile is used in processing a speech of the user. Through use of the method or corresponding apparatus, a user can proceed directly to the use of the speech recognition or other applications and bypassing a login sequence. | 06-12-2014 |
20140164533 | SYSTEMS AND METHODS FOR USER INTERFACE PRESENTATION OF VIRTUAL AGENT - Systems, methods, and apparatus for use in connection with at least one virtual agent. In some embodiments, at least one processor is programmed to present the at least one virtual agent as a participant in a multiparty conversation taking place via a messaging application. | 06-12-2014 |
20140164532 | SYSTEMS AND METHODS FOR VIRTUAL AGENT PARTICIPATION IN MULTIPARTY CONVERSATION - Systems, methods, and apparatus for use with at least one virtual agent. In some embodiments, the at least one virtual agent is programmed to: analyze first input provided by a first user during a multiparty conversation; analyze second input provided by a second user during the multiparty conversation, the second user being different from the first user; and use the first and second inputs to formulate at least one task to be performed by the virtual agent. | 06-12-2014 |
20140164508 | SYSTEMS AND METHODS FOR SHARING INFORMATION BETWEEN VIRTUAL AGENTS - Systems, methods, and apparatus for use with at least one first virtual agent executing on a first device. In some embodiments, the at least one first virtual agent is programmed to: share information with at least one second virtual agent executing on at least one second device different from the first device, wherein the at least one first virtual agent is associated with a first user and the at least one second virtual agent is associate with a second user; and use the information shared between the at least one first virtual agent and the at least one second virtual agent to make a joint recommendation for the first and second users. | 06-12-2014 |
20140164317 | SYSTEMS AND METHODS FOR STORING RECORD OF VIRTUAL AGENT INTERACTION - Systems, methods, and apparatus for use with at least one virtual agent. In some embodiments, at least one processor is programmed to store a receipt for an interaction between the at least one virtual agent and one or more users, wherein the receipt comprises at least some information provided by the one or more users to the at least one virtual agent during the interaction. | 06-12-2014 |
20140164312 | SYSTEMS AND METHODS FOR INFORMING VIRTUAL AGENT RECOMMENDATION - Systems, methods, and apparatus for use in connection with at least one virtual agent. In some embodiments, at least one virtual agent is programmed to: identify a relationship between at least two persons; and make a recommendation for the at least two persons based at least in part on the relationship between the at least two persons. | 06-12-2014 |
20140164305 | SYSTEMS AND METHODS FOR VIRTUAL AGENT RECOMMENDATION FOR MULTIPLE PERSONS - Systems, methods, and apparatus for implementing at least one virtual agent. In some embodiments, the at least one virtual agent is programmed to analyze first information regarding a first person; analyze second information regarding a second person different from the first person; and make a joint recommendation for a plurality of persons based at least in part on the first and second information, wherein the plurality of persons comprises the first person and the second person. | 06-12-2014 |
20140164023 | METHODS AND APPARATUS FOR APPLYING USER CORRECTIONS TO MEDICAL FACT EXTRACTION - Techniques for applying user corrections to medical fact extraction may include extracting a first set of one or more medical facts from a first portion of text documenting a patient encounter. A correction to the first set of medical facts may be received from a user. The correction may identify a fact that should be associated with the first portion of the text. A second set of one or more medical facts may be extracted from a second portion of the text based at least in part on the user's correction to the first set of medical facts. Extracting the second set of facts may include extracting one or more facts similar to the identified fact from the second portion of the text. | 06-12-2014 |
20140163982 | Human Transcriptionist Directed Posterior Audio Source Separation - A graphical user interface is described for human guided audio source separation in a multi-speaker automated transcription system receiving audio signals representing speakers participating together in a speech session. A speaker avatar for each speaker is distributed about a user interface display to suggest speaker positions relative to each other during the speech session. There also is a speaker highlight element on the interface display for visually highlighting a specific speaker avatar corresponding to an active speaker in the speech session to aid a human transcriptionist listening to the speech session to identify the active speaker. A speech signal processor performs signal processing of the audio signals to isolate an audio signal corresponding to the highlighted speaker avatar. A session transcription processor performs automatic speech recognition (ASR) of the signal processed audio signal for the speech session as supervised by the human transcriptionist and reflecting position of the speaker highlight element. | 06-12-2014 |
20140163981 | Combining Re-Speaking, Partial Agent Transcription and ASR for Improved Accuracy / Human Guided ASR - A speech transcription system is described for producing a representative transcription text from one or more different audio signals representing one or more different speakers participating in a speech session. A preliminary transcription module develops a preliminary transcription of the speech session using automatic speech recognition having a preliminary recognition accuracy performance. A speech selection module enables user selection of one or more portions of the preliminary transcription to receive higher accuracy transcription processing. A final transcription module is responsive to the user selection for developing a final transcription output for the speech session having a final recognition accuracy performance for the selected one or more portions which is higher than the preliminary recognition accuracy performance. | 06-12-2014 |
20140163959 | Multi-Domain Natural Language Processing Architecture - An arrangement and corresponding method are described for multi-domain natural language processing. Multiple parallel domain pipelines are used for processing a natural language input. Each domain pipeline represents a different specific subject domain of related concepts. Each domain pipeline includes a mention module that processes the natural language input using natural language understanding (NLU) to determine a corresponding list of mentions, and an interpretation generator that receives the list of mentions and produces a rank-ordered domain output set of sentence-level interpretation candidates. A global evidence ranker receives the domain output sets from the domain pipelines and produces an overall rank-ordered final output set of sentence-level interpretations. | 06-12-2014 |
20140156575 | Method and Apparatus of Processing Data Using Deep Belief Networks Employing Low-Rank Matrix Factorization - Deep belief networks are usually associated with a large number of parameters and high computational complexity. The large number of parameters results in a long and computationally consuming training phase. According to at least one example embodiment, low-rank matrix factorization is used to approximate at least a first set of parameters, associated with an output layer, with a second and a third set of parameters. The total number of parameters in the second and third sets of parameters is smaller than the number of sets of parameters in the first set. An architecture of a resulting artificial neural network, when employing low-rank matrix factorization, may be characterized with a low-rank layer, not employing activation function(s), and defined by a relatively small number of nodes and the second set of parameters. By using low rank matrix factorization, training is faster, leading to rapid deployment of the respective system. | 06-05-2014 |
20140156265 | METHOD AND SYSTEM FOR CONVEYING AN EXAMPLE IN A NATURAL LANGUAGE UNDERSTANDING APPLICATION | 06-05-2014 |
20140153740 | BEAMFORMING PRE-PROCESSING FOR SPEAKER LOCALIZATION - Methods and apparatus to beamform a first plurality of microphone signals using at least one beamforming weight to obtain a first beamformed signal, beamform a second plurality of microphone signals using the at least one beamforming weight to obtain a second beamformed signal, and adjust the at least one beam forming weight so that the power density of at least one perturbation component present in the first or the second plurality of microphone signals is reduced. | 06-05-2014 |
20140153709 | System And Method For Automatically Generating Adaptive Interaction Logs From Customer Interaction Text - A system and method for providing an adaptive Interaction Logging functionality to help agents reduce the time spent documenting contact center interactions. In a preferred embodiment the system uses a pipeline comprising audio capture of a telephone conversation, automatic speech transcription, text normalization, transcript generation and candidate call log generation based on Real-time and Global Models. The contact center agent edits the candidate call log to create the final call log. The models are updated based on analysis of user feedback in the form of the editing of the candidate call log done by the contact center agents or supervisors. The pipeline yields a candidate call log which the agents can edit in less time than it would take them to generate a call log manually. | 06-05-2014 |
20140146959 | MOBILE DEVICE APPLICATIONS FOR COMPUTER-TELEPHONY SYSTEMS - On a mobile telecommunications device, computer-executable code executes to facilitate interactions between the user of the mobile telecommunications device and a call center or other computer-telephony integration equipment. The computer-executable code includes instructions that request at least one operation to be performed at a call center, where the call center includes a call center controller, an interactive voice response system component, and at least one agent. At least in part, a wireless network transmits the request from the mobile telecommunications device to the call center controller. | 05-29-2014 |
20140143533 | SECURING SPEECH RECOGNITION DATA - Methods and apparatus for reducing security vulnerabilities in a client/server speech recognition system including one or more client computers and one or more server computers connected via a network. Decryption of sensitive information, such as medical dictation information, is performed on designated servers to limit the attack surface of unencrypted data. Management of encryption and decryption keys to restrict the storage and/or use of decryption keys on the server side of the client/server speech recognition system, while maintaining encrypted data on the server side is also described. | 05-22-2014 |
20140136331 | USING WIRELESS DEVICE CALL LOGS FOR SOLICITING SERVICES - A processor-based method for composing an electronic service solicitation using mobile device call logs. The service, when delivered to a mobile device, can be a promotion of a third-party mobile device application, the presentment of a coupon, or the administration of a caller experience survey. The service is delivered in connection with the user making a call to a particular number on the mobile device. The method for composing the solicitation includes selecting a business number, selecting a group of wireless subscribers, sending an information request to the mobile devices of the selected wireless subscribers, receiving affirmative responses from the selected wireless subscribers who recently called the selected business number, counting the affirmative responses, and using the count for composing the solicitation. | 05-15-2014 |
20140136183 | Distributed NLU/NLP - An arrangement and corresponding method are described for distributed natural language processing. A set of local data sources is stored on a mobile device. A local natural language understanding (NLU) match module on the mobile device performs natural language processing of a natural language input with respect to the local data sources to determine one or more local interpretation candidates. A local NLU ranking module on the mobile device processes the local interpretation candidates and one or more remote interpretation candidates from a remote NLU server to determine a final output interpretation corresponding to the natural language input. | 05-15-2014 |
20140134988 | ENHANCING INFORMATION DELIVERY TO A CALLED PARTY - A facilitation and enhancement of interactions between a caller and a called party, such as a call center. The enhancement of interactions between a mobile device caller and a call center is by the communication of the mobile device caller's location and/or preferences to the call center. The communication can be in the form of a data structure stored on a computer-readable storage device, which may be transmitted from the mobile device caller to a call center controller. The data structure contains a caller identification entry, which allows the call center to match the data structure with a caller. In addition, a caller context entry associates additional data to the caller identification entry. The additional data includes at least one preference and/or location of the caller. Processor-based methods are disclosed to create, transmit, and/or utilize the data structure. | 05-15-2014 |
20140129230 | METHOD AND APPARATUS FOR GENERATING SYNTHETIC SPEECH WITH CONTRASTIVE STRESS - Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings. | 05-08-2014 |
20140126716 | System and Method for Granting Delayed Priority Access To A Resource - A user service center facilitates communication between a user and an agent at the user service center. At peak use times, the user service center is connected to a surplus of users with respect to its number of agents, and the user service center has the users wait to communicate with an agent. In one embodiment, the system and method described herein give the user the option to reconnect with the user service center at a later, off-peak, time. In exchange, the user is granted an identifier indicating priority access to an agent at the off-peak time. In this way, the user does not have to wait to communicate with an agent during the subsequent communication during the off-peak time. | 05-08-2014 |
20140122091 | ESTABLISHING A MULTIMODAL PERSONALITY FOR A MULTIMODAL APPLICATION IN DEPENDENCE UPON ATTRIBUTES OF USER INTERACTION - Establishing a multimodal personality for a multimodal application, including evaluating, by the multimodal application, attributes of a user's interaction with the multimodal application; selecting, by the multimodal application, a vocal demeanor in dependence upon the values of the attributes of the user's interaction with the multimodal application; and incorporating, by the multimodal application, the vocal demeanor into the multimodal application. | 05-01-2014 |
20140122084 | Data Search Service - In an embodiment, speech may be acquired from a user. A concept, that may be associated with the user, may be identified from the acquired speech. The concept may be identified by fuzzy matching one or more words in the acquired speech with data contained in a data store. The data store may be associated with the user. An action may be performed based on the identified concept. | 05-01-2014 |
20140109081 | SYSTEM AND METHOD TO USE TEXT-TO-SPEECH TO PROMPT WHETHER TEXT-TO-SPEECH OUTPUT SHOULD BE ADDED DURING INSTALLATION OF A PROGRAM ON A COMPUTER SYSTEM NORMALLY CONTROLLED THROUGH A USER INTERACTIVE DISPLAY - An auditory user interactive interface to an application program being installed in the computer controlled system. A routine in an object, in an application program being installed in the computer controlled system for providing an auditory user interface to the program in combination with auditory means for offering the user of the computer controlled system the auditory user interface during installation of said application program, and responsive to the selection of the auditory interface provides the auditory user interface during said installation of the application program. | 04-17-2014 |
20140108460 | DATA STORE ORGANIZING DATA USING SEMANTIC CLASSIFICATION - Data stores that store content units and annotations regarding the content units derived through a semantic interpretation of the content units. When annotations are stored in a database, different parts of an annotation may be stored in different tables of the database. For example, one or more tables of the database may store all semantic classifications for the annotations, while one or more other tables may store content of all of the annotations. A user may be permitted to provide natural language queries for searching the database. A natural language query may be semantically interpreted to determine one or more annotations from the query. The semantic interpretation of the query may be performed using the same annotation model used to determine annotations stored in the database. Semantic classifications and format of the annotations for a query may be the same as one or more annotations stored in the database. | 04-17-2014 |
20140108424 | DATA STORE ORGANIZING DATA USING SEMANTIC CLASSIFICATION - Data stores that store content units and annotations regarding the content units derived through a semantic interpretation of the content units. When annotations are stored in a database, different parts of an annotation may be stored in different tables of the database. For example, one or more tables of the database may store all semantic classifications for the annotations, while one or more other tables may store content of all of the annotations. A user may be permitted to provide natural language queries for searching the database. A natural language query may be semantically interpreted to determine one or more annotations from the query. The semantic interpretation of the query may be performed using the same annotation model used to determine annotations stored in the database. Semantic classifications and format of the annotations for a query may be the same as one or more annotations stored in the database. | 04-17-2014 |
20140108423 | DATA STORE ORGANIZING DATA USING SEMANTIC CLASSIFICATION - Data stores that store content units and annotations regarding the content units derived through a semantic interpretation of the content units. When annotations are stored in a database, different parts of an annotation may be stored in different tables of the database. For example, one or more tables of the database may store all semantic classifications for the annotations, while one or more other tables may store content of all of the annotations. A user may be permitted to provide natural language queries for searching the database. A natural language query may be semantically interpreted to determine one or more annotations from the query. The semantic interpretation of the query may be performed using the same annotation model used to determine annotations stored in the database. Semantic classifications and format of the annotations for a query may be the same as one or more annotations stored in the database. | 04-17-2014 |
20140108150 | METHOD OF COMPENSATING A PROVIDER FOR ADVERTISEMENTS DISPLAYED ON A MOBILE PHONE - A method and apparatus for advertising on a mobile phone. In one embodiment the method includes the steps of downloading an advertisement to the mobile phone using an advertisement server; selecting the downloaded advertisement on the mobile phone by a user of the mobile phone; providing by a server additional information in response to the user selection; and tracking the selection and additional information by the server. In another embodiment the compensation provided is in response to the display screen of said advertisement. In another embodiment the step of providing additional information includes the step of using space reserved, in the user interface of the mobile phone, for advertisements. Another aspect the invention relates to a system for displaying advertisements on a mobile phone. In one embodiment the system includes a server; and a mobile phone in communication with said server. | 04-17-2014 |
20140108018 | SUBSCRIPTION UPDATES IN MULTIPLE DEVICE LANGUAGE MODELS - Systems and methods for intelligent language models that can be used across multiple devices are provided. Some embodiments provide for a client-server system for integrating change events from each device running a local language processing system into a master language model. The change events can be integrated, not only into the master model, but also into each of the other local language models. As a result, some embodiments enable restoration to new devices as well as synchronization of usage across multiple devices. In addition, real-time messaging can be used on selected messages to ensure that high priority change events are updated quickly across all active devices. Using a subscription model driven by a server infrastructure, utilization logic on the client side can also drive selective language model updates. | 04-17-2014 |
20140108004 | TEXT/CHARACTER INPUT SYSTEM, SUCH AS FOR USE WITH TOUCH SCREENS ON MOBILE PHONES - A system and method for receiving character input from a user includes a programmed processor that receives inputs from the user and disambiguates the inputs to present character sequence choices corresponding to the input characters. In one embodiment, a first character input is received and a corresponding first recognized character is stored in a temporary storage buffer and displayed to the user for editing. After a predetermined number of subsequent input characters and/or predetermined amount of time without being edited, the system determines that the first recognized character is the intended character input by the user and removes the first recognized character from the buffer, thereby inhibiting future editing. | 04-17-2014 |
20140108003 | MULTIPLE DEVICE INTELLIGENT LANGUAGE MODEL SYNCHRONIZATION - Systems and methods for intelligent language models that can be used across multiple devices are provided. Some embodiments provide for a client-server system for integrating change events from each device running a local language processing system into a master language model. The change events can be integrated, not only into the master model, but also into each of the other local language models. As a result, some embodiments enable restoration to new devices as well as synchronization of usage across multiple devices. In addition, real-time messaging can be used on selected messages to ensure that high priority change events are updated quickly across all active devices. Using a subscription model driven by a server infrastructure, utilization logic on the client side can also drive selective language model updates. | 04-17-2014 |
20140105338 | LOW-DELAY FILTERING - A method of frequency-domain filtering is provided that includes a plurality of filters, the plurality of filters including at least one constrained filter(s) | 04-17-2014 |
20140101543 | Text Browsing, Editing And Correction Methods For Automotive Applications - An automotive text display arrangement is described which includes a driver text display positioned directly in front of an automobile driver and displaying a limited amount of text to the driver without impairing forward visual attention of the driver. The arrangement may include a boundary insertion mode wherein when the active text position is an active text boundary, new text is inserted between the text items separated by the active text boundary, and when the active text position is an active text item, new text replaces the active text item. In addition or alternatively, there may be a multifunctional text control knob offering multiple different user movements, each performing an associated text processing function. | 04-10-2014 |
20140100851 | METHOD FOR CUSTOMER FEEDBACK MEASUREMENT IN PUBLIC PLACES UTILIZING SPEECH RECOGNITION TECHNOLOGY - A method, a system and a computer program product for enabling a customer response speech recognition unit to dynamically receive customer feedback. The customer response speech recognition unit is positioned at a customer location. The speech recognition unit is automatically initialized when one or more spoken words are detected. The response statements of customers are dynamically received by the customer response speech recognition unit at the customer location, in real time. The customer response speech recognition unit determines when the one or more spoken words of the customer response statement are associated with a score in a database. An analysis of the words is performed to generate a score that reflects the evaluation of the subject by the customer. The score is dynamically updated as new evaluations are received, and the score is displayed within graphical user interface (GUI) to be viewed by one or more potential customers. | 04-10-2014 |
20140095173 | SYSTEMS AND METHODS FOR PROVIDING A VOICE AGENT USER INTERFACE - Some embodiments provide techniques performed by at least one voice agent. The techniques include receiving voice input; identifying at least one application program as relating to the received voice input; and displaying at least one selectable visual representation that, when selected, causes focus of the computing device to be directed to the at least one application program identified as relating to the received voice input. | 04-03-2014 |
20140095172 | SYSTEMS AND METHODS FOR PROVIDING A VOICE AGENT USER INTERFACE - Some embodiments provide techniques performed by at least one voice agent. The techniques include receiving voice input from a user at least partially specifying a requested action to be performed at least in part by an application program, wherein the requested action requires a plurality of inputs to be fully specified; and in response to receiving the voice input, making the application program accessible to the user prior to completion of performance of the requested action, so as to enable the user to provide and/or edit at least one input of the plurality of inputs by directly interacting with the application program. | 04-03-2014 |
20140095171 | SYSTEMS AND METHODS FOR PROVIDING A VOICE AGENT USER INTERFACE - Some embodiments provide techniques performed by at least one voice agent. The techniques include receiving voice input specifying a requested action; and identifying a subject of the requested action from the voice input and information relating to a prior action invoked by the at least one voice agent, wherein the information identifies a subject of the prior action. | 04-03-2014 |
20140095168 | SYSTEMS AND METHODS FOR PROVIDING A VOICE AGENT USER INTERFACE - Some embodiments provide techniques performed by at least one voice agent. The techniques include receiving voice input; accessing contextual information related to an application program that has focus of the computing device when the voice input is received; and using the contextual information to interpret the received voice input. | 04-03-2014 |
20140095165 | SYSTEM AND METHOD FOR SYNCHRONIZING SOUND AND MANUALLY TRANSCRIBED TEXT - A method for synchronizing sound data and text data, said text data being obtained by manual transcription of said sound data during playback of the latter. The proposed method comprises the steps of repeatedly querying said sound data and said text data to obtain a current time position corresponding to a currently played sound datum and a currently transcribed text datum, respectively, correcting said current time position by applying a time correction value in accordance with a transcription delay, and generating at least one association datum indicative of a synchronization association between said corrected time position and said currently transcribed text datum. Thus, the proposed method achieves cost-effective synchronization of sound and text in connection with the manual transcription of sound data. | 04-03-2014 |
20140095162 | HIERARCHICAL METHODS AND APPARATUS FOR EXTRACTING USER INTENT FROM SPOKEN UTTERANCES - Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system. For example, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target. The multi-stage intent extraction approach may have more than two iterations. By way of example only, the user intent extracting step may further determine a sub-class of the sub-class of the first class after a third iteration, such that the first class, the sub-class of the first class, and the sub-class of the sub-class of the first class are hierarchically indicative of the intent of the user. | 04-03-2014 |
20140095147 | Situation Aware NLU/NLP - An arrangement and corresponding method are described for natural language processing. A natural language understanding (NLU) arrangement processes a natural language input to determine a corresponding sentence-level interpretation. A user state component maintains user context data that characterizes an operating context of the NLU arrangement. Operation of the NLU arrangement is biased by the user context data. | 04-03-2014 |
20140089783 | METHOD AND APPARATUS FOR COUPLING A VISUAL BROWSER TO A VOICE BROWSER - A method and apparatus for concurrently accessing network-based electronic content in a Voice Browser and a Visual Browser can include retrieving a network-based document formatted for display in the Visual Browser; identifying in the retrieved document a reference to the Voice Browser specifying electronic content formatted for audible presentation in the Voice Browser. The Voice Browser can retrieve and audibly present the specified electronic content concurrently with the Visual Browser visually presenting the network-based document. The method can include retrieving a network-based document formatted for audible presentation in the Voice Browser; identifying in the retrieved document a reference to the Visual Browser specifying electronic content formatted for visual presentation in the Visual Browser. The Visual Browser can retrieve and visually present the specified electronic content concurrently with the Voice Browser audibly presenting the network-based document. | 03-27-2014 |
20140089098 | METHOD FOR PERFORMING INTERACTIVE SERVICES ON MOBILE DEVICE, SUCH AS TIME OR LOCATION INITIATED INTERACTIVE SERVICES - A system for performing interactive services at a mobile device is disclosed. In some cases, the system receives an indication of an event, and provides interactive services to a user of the mobile device based on the event. In some cases, an indication of an event in invokes a script-based process that determines one or more actions to present to a user of the mobile device. | 03-27-2014 |
20140081640 | SPEAKER VERIFICATION METHODS AND APPARATUS - One aspect includes determining validity of an identity asserted by a speaker using a voice print associated with a user whose identity the speaker is asserting, the voice print obtained from characteristic features of at least one first voice signal obtained from the user uttering at least one enrollment utterance including at least one enrollment word by obtaining a second voice signal of the speaker uttering at least one challenge utterance that includes at least one word not in the at least one enrollment utterance, obtaining at least one characteristic feature from the second voice signal, comparing the at least one characteristic feature with at least a portion of the voice print to determine a similarity between the at least one characteristic feature and the at least a portion of the voice print, and determining whether the speaker is the user based, at least in part, on the similarity. | 03-20-2014 |
20140080461 | LOCAL INTERCEPT METHODS, SUCH AS APPLICATIONS FOR PROVIDING CUSTOMER ASSISTANCE FOR TRAINING, INFORMATION CALLS AND DIAGNOSTICS - A method of displaying a tutorial to a user of a mobile device is disclosed. In some examples, the mobile device receives an input associated with one or more user functions of the mobile device and launches a locally based application in response to the received input. The locally based application may output instructions to the user explaining to the user how to implement the one or more user functions. | 03-20-2014 |
20140074468 | System and Method for Automatic Prediction of Speech Suitability for Statistical Modeling - An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material. | 03-13-2014 |