Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Natural language

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704001000 - LINGUISTICS

Patent class list (only not empty are listed)

Deeper subclasses:

Entries
DocumentTitleDate
20130030794APPARATUS AND METHOD FOR CLUSTERING SPEAKERS, AND A NON-TRANSITORY COMPUTER READABLE MEDIUM THEREOF - According to one embodiment, a speaker clustering apparatus includes a clustering unit, an extraction unit, and an error detection unit. The clustering unit is configured to extract acoustic features for speakers from an acoustic signal, and to cluster utterances included in the acoustic signal into the speakers by using the acoustic features. The extraction unit is configured to acquire character strings representing contents of the utterances, and to extract linguistic features of the speakers by using the character strings. The error detection unit is configured to decide that, when one of the character strings does not fit with a linguistic feature of a speaker into which an utterance of the one is clustered, the utterance is erroneously clustered by the clustering unit.01-31-2013
20130030793LINGUISTIC ERROR DETECTION - Potential linguistic errors within a sequence of words of a sentence are identified based on analysis of a configurable sliding window. The analysis is performed based on an assumption that if a sequence of words occurs frequently enough within a large, well-formed corpus, its joint probability for occurring in a sentence is very likely to be greater than the same words randomly ordered.01-31-2013
20130030792Customization of a Natural Language Processing Engine - A method, an apparatus and an article of manufacture for customizing a natural language processing engine. The method includes enabling selection of one or more parameters of a desired natural language processing task, the one or more parameters intended for use by a trained and an untrained user, mapping the one or more selected parameters to a collection of one or more intervals of an input parameter to an optimization algorithm, and applying the optimization algorithm with the collection of one or more intervals of an input parameter to a model used by a natural language processing engine to produce a customized model.01-31-2013
20100161312Method of semantic, syntactic and/or lexical correction, corresponding corrector, as well as recording medium and computer program for implementing this method - The method is suitable for dysorthographic or partially sighted persons, to facilitate the semantic, syntactic and/or lexical correction of an erroneous expression in a digital text input by a user. The method comprises the sequence of: a step (06-24-2010
20100161313Region-Matching Transducers for Natural Language Processing - Computer methods, apparatus and articles of manufacture therefor, are disclosed for developing a region-matching transducer for marking language data having delimited strings. The region-matching transducer defines one or more patterns of one or more sequences of delimited strings, with at least one of the patterns defined in the region-matching transducer having an arrangement of a plurality of class-matching networks. The plurality of class-matching networks defines a combination of two or more entity classes from one or both of part-of-speech classes and application-specific classes. The region-matching transducer has, for each of the one or more patterns, an arc that leads from a penultimate state with a transition label that identifies the entity class of the pattern, and shares states between patterns leading to a penultimate state when segments of delimited strings making up two or more patterns overlap.06-24-2010
20110184725Multi-stage text morphing - This invention is a multi-stage method for “text morphing,” wherein text morphing involves integrating or blending together substantive content from two or more bodies of text into a single body of text based on locations of linguistic commonality among the two or more bodies of text. This method for multi-stage text morphing entails: substitution of phrase synonyms between two bodies of text; substitution, between two bodies of text, of text segments with synonymous starting phrases and synonymous ending phrases; and substitution, between two bodies of text, of phrases or segments using associations within a larger reference body of text. Text morphing as disclosed herein can be useful for creative ideation, product development, integrative search engines, and entertainment purposes.07-28-2011
20080275694METHOD AND SYSTEM FOR AUTOMATICALLY EXTRACTING RELATIONS BETWEEN CONCEPTS INCLUDED IN TEXT - A method and system for automatically extracting relations between concepts included in electronic text is described. Aspects the exemplary embodiment include a semantic network comprising a plurality of lemmas that are grouped into synsets representing concepts, each of the synsets having a corresponding sense, and a plurality of links connected between the synsets that represent semantic relations between the synsets. The semantic network further includes semantic information comprising at least one of: 1) an expanded set of semantic relation links representing: hierarchical semantic relations, synset/corpus semantic relations verb/subject semantic relations, verb/direct object semantic relations, and fine grain/coarse grain semantic relationship; 2) a hierarchical category tree having a plurality of categories, wherein each of the categories contains a group of one or more synsets and a set of attributes, wherein the set of attributes of each of the categories are associated with each of the synsets in the respective category; and 3) a plurality of domains, wherein one or more of the domains is associated with at least a portion of the synsets, wherein each domain adds information regarding a linguistic context in which the corresponding synset is used in a language. A linguistic engine uses the semantic network to performing semantic disambiguation on the electronic text using one or more of the expanded set of semantic relation links, the hierarchical category tree, and the plurality of domains to assign a respective one of the senses to elements in the electronic text independently from contextual reference.11-06-2008
20120203545SYSTEM AND METHOD FOR COMPUTERIZED PSYCHOLOGICAL CONTENT ANALYSIS OF COMPUTER AND MEDIA GENERATED COMMUNICATIONS TO PRODUCE COMMUNICATIONS MANAGEMENT SUPPORT, INDICATIONS AND WARNINGS OF DANGEROUS BEHAVIOR, ASSESSMENT OF MEDIA IMAGES, AND PERSONNEL SELECTION SUPPORT - At least one computer-mediated communication produced by or received by an author is collected and parsed to identify categories of information within it. The categories of information are processed with at least one analysis to quantify at least one type of information in each category. A first output communication is generated regarding the at least one computer-mediated communication, describing the psychological state, attitudes or characteristics of the author of the communication. A second output communication is generated when a difference between the quantification of at least one type of information for at least one category and a reference for the at least one category is detected involving a psychological state, attitude or characteristic of the author to which a responsive action should be taken.08-09-2012
20130211823CONCEPTUAL WORLD REPRESENTATION NATURAL LANGUAGE UNDERSTANDING SYSTEM AND METHOD - A Natural Language Understanding system is provided for indexing of free text documents. The system according to the invention utilizes typographical and functional segmentation of text to identify those portions of free text that carry meaning. The system then uses words and multi-word terms and phrases identified in the free to text to identify concepts in the free text. The system uses a lexicon of terms linked to a formal ontology that is independent of a specific language to extract concepts from the free text based on the words and multi-word terms in the free text. The formal ontology contains both language independent domain knowledge concepts and language dependent linguistic concepts that govern the relationships between concepts and contain the rules about how language works. The system according to the current invention may preferably be used to index medical documents and assign codes from independent coding systems, such as, SNOMED, ICD-9 and ICD-10. The system according to the current invention may also preferably make use of syntactic parsing to improve the efficiency of the method.08-15-2013
20130211822SPEECH RECOGNITION APPARATUS, SPEECH RECOGNITION METHOD, AND COMPUTER-READABLE RECORDING MEDIUM - A speech recognition apparatus 08-15-2013
20100023319MODEL-DRIVEN FEEDBACK FOR ANNOTATION - A system, a method and a computer readable media for providing model-driven feedback to human annotators. In one exemplary embodiment, the method includes manually annotating an initial small dataset. The method further includes training an initial model using said annotated dataset. The method further includes comparing the annotations produced by the model with the annotations produced by the annotator. The method further includes notifying the annotator of discrepancies between the annotations and the predictions of the model. The method further includes allowing the annotator to modify the annotations if appropriate. The method further includes updating the model with the data annotated by the annotator.01-28-2010
20100017194System and method for suggesting recipients in electronic messages - A system and method for dynamically recognizing a potential recipient of an electronic message. The method includes receiving content input for an electronic communication. The electronic communication includes at least one field of a plurality of fields, including a subject line, a message body, and a recipient address field. The at least one field of the electronic communication is populated with the content input. The method also includes parsing the content input of the at least one field of the electronic communication. The method also includes semantically analyzing the parsed content input of the at least one field of the electronic communication to identify a content qualifier of a recipient rule. The method also includes suggesting a potential recipient of the electronic communication based on the content qualifier of the recipient rule associated with the content input of the at least one of field of the electronic communication.01-21-2010
20080319735SYSTEMS AND METHODS FOR AUTOMATIC SEMANTIC ROLE LABELING OF HIGH MORPHOLOGICAL TEXT FOR NATURAL LANGUAGE PROCESSING APPLICATIONS - Systems and methods are provided for automated semantic role labeling for languages having complex morphology. In one aspect, a method for processing natural language text includes receiving as input a natural language text sentence comprising a sequence of white-space delimited words including inflicted words that are formed of morphemes including a stem and one or more affixes, identifying a target verb as a stem of an inflicted word in the text sentence, grouping morphemes from one or more inflicted words with the same syntactic role into constituents, and predicting a semantic role of a constituent for the target verb.12-25-2008
20100153091USER-SPECIFIED PHRASE INPUT LEARNING - Architecture that enables a user to perform manual word-breaking by phrase input. Phrase input is where the user inserts a phrase-key (or separator) as a delimiter that indicates to an editor application such as an IME (input method editor) the composition of a specific phrase when entering characters (e.g., Asian). The word-breaking is controlled by the user. The conversion quality is improved as the user knows the desired input and ambiguous cases are reduced. A phrase can be specified while the user is composing the characters. By selecting a phrase-key separator, the user can specify the composing characters before the characters are presented as a phrase. Moreover, the architecture includes a phrase prioritization mechanism wherein each phrase can be treated as a single entity and assigned a character identifier (ID), which is related to the sequence of a candidate list.06-17-2010
20100153092Expanding Base Attributes for Terms - In one embodiment, a method for expanding concept attributes for a concept term includes receiving an attribute term for expansion and determining one or more word senses for the attribute term. A word sense is selected from the one or more word senses. One or more conceptually similar terms is selected for the attribute term based on the word sense and it is determined that that at least one of the one or more conceptually similar terms is an additional attribute. A first mapping associating the additional attribute with the attribute term is generated, and a second mapping associating the additional attribute with the concept term is generated. The mappings are stored in an onomasticon.06-17-2010
20090216525SYSTEM AND METHOD FOR TREATING HOMONYMS IN A SPEECH RECOGNITION SYSTEM - A system and method for homonym treatment in a speech recognition system and method are provided. The system and method for homonym treatment in a speech recognition system may be used in a mobile wireless communication devices that are voice operated after their initial activation.08-27-2009
20090192787GRAMMER CHECKER - A method for parsing a computerized text, the method including preparing a set of logical rules, using logical grammatical links, for parsing a text, using the logical rules to identify a part of speech of each word of text and all links between the words in the text, and labeling the links as grammatically correct links or grammatically incorrect links for correction, so as to parse substantially every word in the text.07-30-2009
20090192786TEXT INPUT DEVICE AND METHOD - The present invention relates to a text input device and a method for inputting text, and a computer program for performing the method. A text input device (07-30-2009
20090192785SYSTEM AND METHOD FOR OPTIMIZING NATURAL LANGUAGE DESCRIPTIONS OF OBJECTS IN A VIRTUAL ENVIRONMENT - A system and method for constructing a natural language description of one or more objects in a virtual environment includes determining a plurality of properties of an object and an environment given a current viewpoint in a virtual environment. An object description is created using the plurality of properties where the object description reflects multiple display characteristics of the object in the virtual environment. Object descriptions in the virtual environment are combined by classifying objects in the virtual environment to condense a natural language description.07-30-2009
20090192784SYSTEMS AND METHODS FOR ANALYZING ELECTRONIC DOCUMENTS TO DISCOVER NONCOMPLIANCE WITH ESTABLISHED NORMS - A computer-implemented method for analyzing documents to discover noncompliance with an established norm is provided. The method can include receiving one or more terms indicating possible noncompliance with a pre-established norm, and, based upon the at least one term, constructing at least one grammatical unit. The grammatical unit can specify a predetermined syntax and can correspond to semantic content that is indicative of noncompliance with the pre-established norm, wherein the norm can include a statute, regulation, policy, or other standard. The method can further include identifying from among multiple electronic documents each document that contains one or more grammatical units specifying a predetermined syntax and corresponding to semantic content indicative of noncompliance with the pre-established norm.07-30-2009
20110196671HANDHELD ELECTRONIC DEVICE AND METHOD FOR DISAMBIGUATION OF COMPOUND TEXT INPUT AND FOR PRIORITIZING COMPOUND LANGUAGE SOLUTIONS ACCORDING TO QUANTITY OF TEXT COMPONENTS - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to prioritize compound language solutions according to various criteria.08-11-2011
20110196670INDEXING CONTENT AT SEMANTIC LEVEL - Systems and methods are disclosed that perform automated semantic tagging. Automated semantic tagging produces semantically linked tags for a given text content. Embodiments provide ontology mapping algorithms and concept weighting algorithms that create accurate semantic tags that can be used to improve enterprise content management, and search for better knowledge management and collaboration.08-11-2011
20110196669INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM - There is provided an information processing apparatus including: an acquiring unit acquiring a title of content; an analyzing unit dividing the title into tokens; a calculating unit calculating, for each token, an evaluation value based on a token length and weighted according to the token's position in the title; a mapping unit mapping, for each token, a token point shown by an ordinal number showing the token's position in the title and the evaluation value, onto a coordinate plane; a deciding unit deciding, based on the mapped token points, coordinates of a criterion point used as a criterion for extracting a series identifier and an extraction criterion based on the criterion point; an extracting unit extracting token points that confoitu to the extraction criterion out of the token points; and a generating unit generating the series identifier from the character strings included in tokens associated with the extracted token points.08-11-2011
20110196668Integrated Language Model, Related Systems and Methods - An integrated language model includes an upper-level language model component and a lower-level language model component, with the upper-level language model component including a non-terminal and the lower-level language model component being applied to the non-terminal. The upper-level and lower-level language model components can be of the same or different language model formats, including finite state grammar (FSG) and statistical language model (SLM) formats. Systems and methods for making integrated language models allow designation of language model formats for the upper-level and lower-level components and identification of non-terminals. Automatic non-terminal replacement and retention criteria can be used to facilitate the generation of one or both language model components, which can include the modification of existing language models.08-11-2011
20110202335HANDHELD ELECTRONIC DEVICE PROVIDING A LEARNING FUNCTION TO FACILITATE CORRECTION OF ERRONEOUS TEXT ENTRY AND ASSOCIATED METHOD - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate text input. In addition to identifying and outputting representations of language objects that are stored in the memory and that correspond with a text input, the device provides a learning function which facilitates providing proposed corrected output by the device in certain circumstances of erroneous input.08-18-2011
20100076750System for Low-Latency Animation of Talking Heads - Methods and apparatus for rendering a talking head on a client device are disclosed. The client device has a client cache capable of storing audio/visual data associated with rendering the talking head. The method comprises storing sentences in a client cache of a client device that relate to bridging delays in a dialog, storing sentence templates to be used in dialogs, generating a talking head response to a user inquiry from the client device, and determining whether sentences or stored templates stored in the client cache relate to the talking head response. If the stored sentences or stored templates relate to the talking head response, the method comprises instructing the client device to use the appropriate stored sentence or template from the client cache to render at least a part of the talking head response and transmitting a portion of the talking head response not stored in the client cache, if any, to the client device to render a complete talking head response. If the client cache has no stored data associated with the talking head response, the method comprises transmitting the talking head response to be rendered on the client device.03-25-2010
20100076749LANGUAGE PROCESSING SYSTEM, LANGUAGE PROCESSING METHOD, LANGUAGE PROCESSING PROGRAM, AND RECORDING MEDIUM - A language processing system according to the present invention includes: an input device 03-25-2010
20120245926METHODS AND APPARATUS FOR FORMATTING TEXT FOR CLINICAL FACT EXTRACTION - An original text that is a representation of a narration of a patient encounter provided by a clinician may be received and re-formatted to produce a formatted text. One or more clinical facts may be extracted from the formatted text. A first fact of the clinical facts may be extracted from a first portion of the formatted text, and the first portion of the formatted text may be a formatted version of a first portion of the original text. A linkage may be maintained between the first fact and the first portion of the original text.09-27-2012
20120245925METHODS AND DEVICES FOR ANALYZING TEXT - A method, operating model, system, method, computer program, application, online service, or application program interface (API) Application Program Interface (API), and computer program product for analyzing any email message or text, online post, online web pages, social media sites, and online news sites to detect predefined and actionable events and intent. A method for detecting important emails or messages, and actionable emails or messages that signify intent including questions or promises. A method for detecting past or possible future events in any online posts where the event is defined a priori.09-27-2012
20120245924CUSTOMER REVIEW AUTHORING ASSISTANT - An authoring assistant includes a parser which automatically identifies opinion expressions in input text. The text may include an author's review of an item, such as a product or service. A computer-implemented opinion review component generates an analysis of the text, which is based on the identified opinion expressions. The opinion review component computes an effective opinion of the text as a function of a measure of polarity associated with the identified opinion expressions. A representation generator generates a representation of the analysis for display on an associated user interface. The representation of the analysis includes a representation of the effective opinion. In the case of a review, the authoring assistant may allow the author to modify the review to reduce incoherence with a rating of the item.09-27-2012
20100114563REAL-TIME SEMANTIC ANNOTATION SYSTEM AND THE METHOD OF CREATING ONTOLOGY DOCUMENTS ON THE FLY FROM NATURAL LANGUAGE STRING ENTERED BY USER - Disclosed herein are a real-time semantic annotation system and a method of converting user-entered natural language strings into semantically-readable knowledge structure documents using the system in real time. The real-time semantic annotation system includes a natural language character string input device for enabling a user to enter natural language character strings, a character string pattern triplet-mapping table for storing natural language character string patterns and their corresponding triplets, a triplet extraction device for converting the entered natural language character strings into triplets by analyzing and processing the entered natural language character strings using the pattern-triplet mapping table, an alternative word recommendation device for providing notification that a user should enter an alternative word, and a machine-readable document generation device for generating machine-readable documents from the triplets using a semantically-readable knowledge structure.05-06-2010
20130080153INFORMATION PROCESSING APPARATUS, NON-TRANSITORY COMPUTER READABLE MEDIUM STORING INFORMATION PROCESSING PROGRAM, AND INFORMATION PROCESSING METHOD - An information processing apparatus includes a receiving unit that receives character sequences, a sorting unit that sorts the character sequences received by the receiving unit into known words and unknown words, and a detecting unit that detects character sequences sorted as unknown words by the sorting unit as incorrect words and detects a third character sequence between a first character sequence and a second character sequence, which have been sorted as unknown words by the sorting unit, as incorrect words when the third character sequence includes words sorted as known words by the sorting unit and the number of the known words is less than or equal to or less than a predetermined number.03-28-2013
20130080152LINGUISTICALLY-ADAPTED STRUCTURAL QUERY ANNOTATION - A system and method for natural language processing of queries are provided. A lexicon includes text elements that are recognized as being a proper noun when capitalized. A natural language query includes a sequence of text elements including words. The query is processed. The processing includes a preprocessing step, in which part of speech features are assigned to the text elements in the query. This includes identifying, from a lexicon, a text element in the query which starts with a lowercase letter and assigning recapitalization information to the text element in the query, based on the lexicon. This information includes a part of speech feature of the capitalized form of the text element. Then parts of speech for the text elements in the query are disambiguated, which includes applying rules for recapitalizing text elements based on the recapitalization information.03-28-2013
20130080154NETWORK BASED RESTORATIVE JUSTICE - Systems and methods herein provide for resolution between at least two parties through a network. In one embodiment, a system includes an interface operable to communicatively couple to the network. The system also includes a processor operable to establish secure communications between at least two client terminals and a facilitator terminal through the network. The processor provides dialog interfaces to the client terminals and to the facilitator terminal, receives dialog of the parties from the client terminals via their respective dialog interfaces, and provides the dialog of the parties to the facilitator terminal. The processor receives dialog from the facilitator terminal via the facilitator's dialog interface to manage the dialog between the parties, generates a file detailing an agreement between the parties based on the dialog between the parties, transfers the file to the parties via their respective dialog interfaces, and stores the file for subsequent access by the parties.03-28-2013
20130080149SYSTEM AND METHOD FOR EXTRACTING CATEGORIES OF DATA - Lines of data that are from a historical document, such as a digitized city directory, have information extracted and stored in searchable data fields. Words and phrases within the lines of data are identified and tagged. Rendering rules are applied to the tagged words and phrases to extract names, addresses, occupations, spouse information and other data, and store that data in the searchable fields.03-28-2013
20130080151Systems and Methods for Teaching Phonemic Awareness - A system to teach phonemic awareness uses a plurality of phonemes and a plurality of graphemes. Each phoneme is a unique sound and an indivisible unit of sound in a spoken language, and each grapheme is a written representation of one of the plurality of phonemes. A plurality of distinct graphical images and a plurality of unique names are provided where each unique name is associated with one of the graphical images and represents a grouping of graphemes selected from the plurality of graphemes. The system uses a plurality of sets of display pieces having a plurality of individual display pieces. Each individual display piece includes at least a portion of one of the graphical images and the graphemes from the grouping of graphemes constituting the associated unique name. A predefined instructional environment defines a predefined spatial context and predefined rules governing the acquisition and utilization of individual display pieces.03-28-2013
20130080150Automatic Semantic Evaluation of Speech Recognition Results - A semantic error rate calculation may be provided. After receiving a spoken query from a user, the spoken query may be converted to text according to a first speech recognition hypothesis. A plurality of results associated with the converted query may be received and compared to a second plurality of results associated with the converted query.03-28-2013
20130085746PROOF READING OF TEXT DATA GENERATED THROUGH OPTICAL CHARACTER RECOGNITION - A novel system includes: a first proof reading tool for performing carpet proof reading on text data; a second proof reading tool for performing side-by-side proof reading on the text data; a storage unit configured to store a log of proof reading operations having been performed by using the first and second proof reading tools; and an analysis unit configured to determine, for each attribute serving as units in which carpet proof reading is performed with the first proof reading tool, whether or not to use the first proof reading tool in proof reading of the attribute, by comparing a first estimated value of a time taken when proof reading is performed by using the first proof reading tool with a second estimated value of a time taken when proof reading is performed by using the second proof reading tool without using the first proof reading tool, the first and second estimated values being calculated on the basis of the log.04-04-2013
20090076796Natural language processing method - Methods for converting a natural language sentence into a set of primitive sentences. The method include identifying verbal blocks in the sentence, splitting the sentence into a set of logical clauses, disambiguating ambiguous verbal blocks within each logical clause, and constructing a primitive sentence for each verbal block by duplicating the shared noun phrases of verbal blocks.03-19-2009
20130085745SEMANTIC-BASED APPROACH FOR IDENTIFYING TOPICS IN A CORPUS OF TEXT-BASED ITEMS - A method of identifying topics in a corpus that includes a plurality of text-based items begins by extracting keytext from each of the plurality of text-based items, resulting in sets of keytext. The method continues by processing the keytext sets to generate a respective semantic footprint for each of the text-based items, resulting in a plurality of semantic footprints. The semantic footprints are used to calculate similarity values for the text-based items, wherein the similarity values indicate commonality between pairs of the text-based items. The method continues by clustering the text-based items into a number of topic groups, wherein the clustering is influenced by the similarity values, and by generating a topic heading for each of the number of topic groups, resulting in a number of topic headings. Next, the text-based items are grouped into accessible topic groups associated with the topic headings.04-04-2013
20130035928TxtAnalizer - Text to motion pictures software program working on sentences that can have visual interpretation: you type in the sentence, the program understands the meaning of the sentence (that means one can say the same thing in other words) and starts an external process, a video file. The video file closes itself after the video finishes and the program is ready to process the next sentence.02-07-2013
20130035931PREDICTING LEXICAL ANSWER TYPES IN OPEN DOMAIN QUESTION AND ANSWERING (QA) SYSTEMS - In an automated Question Answer (QA) system architecture for automatic open-domain Question Answering, a system, method and computer program product for predicting the Lexical Answer Type (LAT) of a question. The approach is completely unsupervised and is based on a large-scale lexical knowledge base automatically extracted from a Web corpus. This approach for predicting the LAT can be implemented as a specific subtask of a QA process, and/or used for general purpose knowledge acquisition tasks such as frame induction from text.02-07-2013
20130035932SYSTEM AND METHOD OF GENERATING RESPONSES TO TEXT-BASED MESSAGES - A system to generate a response to a text-based natural language message includes a user interface, processing device, and a computer-readable storage medium storing executable instructions to generate the response to the text-based natural language message. The instructions and a method for generating the response include identifying a sentence in the text-based natural language message, identifying an input clause in the sentence, and parsing the input clause, thereby defining a relationship between words in the input clause. The instructions and method also include assigning a semantic tag to the parsed input clause, comparing the input clause to a previously received clause, the previously received clause being correlated with a previously generated response clause, and generating an output response message derived from the previously generated response clause.02-07-2013
20130035930PREDICTING LEXICAL ANSWER TYPES IN OPEN DOMAIN QUESTION AND ANSWERING (QA) SYSTEMS - In an automated Question Answer (QA) system architecture for automatic open-domain Question Answering, a system, method and computer program product for predicting the Lexical Answer Type (LAT) of a question. The approach is completely unsupervised and is based on a large-scale lexical knowledge base automatically extracted from a Web corpus. This approach for predicting the LAT can be implemented as a specific subtask of a QA process, and/or used for general purpose knowledge acquisition tasks such as frame induction from text.02-07-2013
20130035929INFORMATION PROCESSING APPARATUS AND METHOD - According to one embodiment, an information processing apparatus includes an acquisition unit, an analysis unit, and a generation unit. The acquisition unit is configured to acquire a status of a user while the user is working with a resource. The analysis unit is configured to acquire text information included in the resource by analyzing the resource. The generation unit is configured to generate at least one work label from the status of the user and the text information, and to generate a work history including a part of the text information, to which the work label is assigned.02-07-2013
20130138427Fraud Detection Using Text Analysis - In one embodiment, a method executed by at least one processor includes receiving text from submitted by a user. The method also includes determining a text score for the received text by comparing a first set of phrases included in the received text to a second set of phrases. The second set of phrases includes phrases from stored text. The stored text includes stored text known to be genuine and stored text known to be fraudulent. The method also includes determining that the received text is fraudulent based on the text score.05-30-2013
20090157386DIAGNOSTIC EVALUATION OF MACHINE TRANSLATORS - A system for evaluating translation quality of a machine translator is discussed. The system includes a bilingual data generator configured to intermittently access a wide area network and generate a bilingual corpus from data received from the wide area network. The method also includes an example extraction component configured to receive an ontology input indicative of a plurality of ontological categories of evaluation and to extract evaluation examples from the bilingual corpus based on the ontology input. The system further includes an evaluation component configured to evaluate translation results from translation by a machine translator of the evaluation examples and to score the translation results according to the ontological categories.06-18-2009
20130041654AUTOMATED SENTENCE PLANNING IN A TASK CLASSIFICATION SYSTEM - Disclosed is a task classification system that interacts with a user. The task classification system may include a recognizer that may recognize symbols in the user's input communication, and a natural language understanding unit that may determine whether the user's input communication can be understood. If the user's input communication can be understood, the natural language understanding unit may generate understanding data. The system may also include a communicative goal generator that may generate communicative goals based on the symbols recognized by the recognizer and understanding data from the natural language understanding unit. The generated communicative goals may be related to information needed to be obtained from the user. The system may further include a sentence planning unit that may automatically plan one or more sentences based on the communicative goals generated by the communicative goal generator with at least one of the sentences plans being output to the user.02-14-2013
20130041655Systems and Methods for Word Offensiveness Detection and Processing Using Weighted Dictionaries and Normalization - Computer-implemented systems and methods are provided for identifying language that would be considered obscene or otherwise offensive to a user or proprietor of a system. A plurality of offensive words are received, where each offensive word is associated with a severity score identifying the offensiveness of that word. A string of words is received. A distance between a candidate word and each offensive word in the plurality of offensive words is calculated, and a plurality of offensiveness scores for the candidate word are calculated, each offensiveness score based on the calculated distance between the candidate word and the offensive word and the severity score of the offensive word. A determination is made as to whether the candidate word is an offender word, where the candidate word is deemed to be an offender word when the highest offensiveness score in the plurality of offensiveness scores exceeds an offensiveness threshold value.02-14-2013
20130041653Coefficients Attribution for Different Objects Based on Natural Language Processing - In one embodiment, a system includes one or more computing systems that implement a social networking environment and is operable to parse users' actions that include free form text to determine and store objects and affinities contained in the text string through natural-language processing. The method comprises accessing a text string, identifying objects and affinity declarations via natural-language processing, assessing the combination of objects and context data to determine an instance of a broader concept, and determining an affinity coefficient through a natural-language processing dictionary. Once a database of stored instances and affinities has been generated and stored, it may be leveraged to push suggestions to members of the social network to enhance their social networking experience.02-14-2013
20090164207User device having sequential multimodal output user interace - In one aspect of the exemplary embodiments of this invention an apparatus includes a user interface that contains a plurality of input modalities and a plurality of output modalities, and a data processor coupled with the user interface and configurable to present a user with a content item that includes a plurality of attributes. In response to user input that data processor is operable to partition at least some of the attributes into a plurality of presentation tokens, where an individual presentation token comprises at least one attribute. The data processor is further configurable to respond to further user input to define one of the plurality of input modalities to generate a trigger condition for individual ones of the presentation tokens, where generation of a trigger condition results in an associated presentation token being made manifest to the user. The plurality of input modalities may include two or more of physical or virtual keys, an input acoustic transducer, a speech recognition unit, and a gesture detection unit, and where the plurality of output modalities may include two or more of an output acoustic transducer, a speech synthesis unit, a vibro-tactile transducer, and a display screen.06-25-2009
20100042404METHOD FOR BUILDING A NATURAL LANGUAGE UNDERSTANDING MODEL FOR A SPOKEN DIALOG SYSTEM - A method of generating a natural language model for use in a spoken dialog system is disclosed. The method comprises using sample utterances and creating a number of hand crafted rules for each call-type defined in a labeling guide. A first NLU model is generated and tested using the hand crafted rules and sample utterances. A second NLU model is built using the sample utterances as new training data and using the hand crafted rules. The second NLU model is tested for performance using a first batch of labeled data. A series of NLU models are built by adding a previous batch of labeled data to training data and using a new batch of labeling data as test data to generate the series of NLU models with training data that increases constantly. If not all the labeling data is received, the method comprises repeating the step of building a series of NLU models until all labeling data is received. After all the training data is received, at least once, the method comprises building a third NLU model using all the labeling data, wherein the third NLU model is used in generating the spoken dialog service.02-18-2010
20100042402APPARATUS, AND ASSOCIATED METHOD, FOR DETECTING FRAUDULENT TEXT MESSAGE - An apparatus, and an associated method, detects spam and other fraudulent messages sent to a recipient station. The textual portion of a received message is analyzed to determine whether the message includes errors made by non-native language speakers when authoring a text message. A text analysis engine analyzes the text using rules sets that identify grammatical errors made by non-native language speakers, usage errors made by non-native language speakers, and other errors.02-18-2010
20100042401 Semantic Cognitive Map - A semantic cognitive map created by associating each of a multitude of dictionary entries with a point among a multitude of points in a metric space, each of the dictionary entries associated with at least one onym, the at least one onym including at least one synonym or antonym, the metric space having a topology and metrics, the location of each of the multitude of points defined by a global minimum of an energy function of the multitude of points.02-18-2010
20100042400Method for Triggering at Least One First and Second Background Application via a Universal Language Dialog System - At least one transaction and at least one transaction parameter that is allocated thereto are determined based on at least one user statement in order to trigger at least one first and second background application via a universal language dialogue system, first transactions and first transaction parameters being assigned to the first background application and second transactions and second transaction parameters being associated with the second background application. The first and second transactions as well as the first and second transaction parameters are linked together via a universal dialogue specification which is evaluated to determine the at least one transaction and at least on associated transaction parameter in order to trigger at least one of the background application via the universal language dialogue system.02-18-2010
20090157387Connected Text Data System - A connected text data system for efficiently and accurately translating connected text. The connected text data system includes inputting or receiving connected text, transmitting the connected text to a text iterator, scanning the connected text, identifying a plurality of words in the connected text, and translating the connected text to separated text by adding a space between each of the plurality of words.06-18-2009
20090157384SEMI-SUPERVISED PART-OF-SPEECH TAGGING - A word is selected from a received text and features are identified from the word. The features are applied to a model to identify probabilities for sets of part-of-speech tags. The probabilities for the sets of part-of-speech tags are used to weight scores for possible part-of-speech tags for the selected word to form weighted scores. The weighted scores are used to select a part-of-speech tag for the word and the selected part of speech tag is stored or output. The scores for the possible part-of-speech tags are based on variational approximation parameters trained from a sparse prior over probability distributions describing the probability of a part-of-speech tag given a word.06-18-2009
20100106486IMAGE-BASED SEMANTIC DISTANCE - Image-based semantic distance technique embodiments are presented that involve establishing a measure of an image-based semantic distance between semantic concepts. Generally, this entails respectively computing a semantic concept representation for each concept based on a collection of images associated with the concept. A degree of difference is then computed between two semantic concept representations to produce the aforementioned semantic distance measure for the pair of corresponding concepts.04-29-2010
20100106485METHODS AND APPARATUS FOR CONTEXT-SENSITIVE INFORMATION RETRIEVAL BASED ON INTERACTIVE USER NOTES - Information retrieval systems and methods are provided based on interactive user notes. Information is retrieved from one or more data sources based on user notes by obtaining the user notes containing one or more information requests; identifying the one or more information requests from the user notes; interpreting at least one of the information requests in context; generating one or more queries required for the at least one interpreted information request; identifying an update to the user notes, the update containing one or more updated information requests; and processing the updated user notes to generate one or more queries required for the updated information requests. If the user notes contain multiple information requests, at least one query is generated for each of the plurality of information requests. The information requests can be interpreted based on user-specified context guides.04-29-2010
20100106487Style-checking method and apparatus for business writing - The method is software-based, and carried out on a computer system and cooperating apparatus. It checks written text for problems impairing clarity, conciseness and reader comfort in business documents. One embodiment includes routines for checking writing style in sentences and paragraphs, and for generating informational, critical and commendatory display indicators relating to reader comfort. The routines check subject and verb juxtaposition, verb strength, prepositional phrase use, transition words, unity-creating constructions, gerund use, and sentence variety. The indicators are displayed in the form of highlighted text and diacritical marks. Another embodiment is a method for quantifying reader discomfort. It includes routines for quantifying, reporting and displaying points indicating comfort-impairing problems of the type located by running the routines of the first embodiment. Yet another embodiment includes a method for editing text documents for reader comfort, by locating and fixing problem words and constructions.04-29-2010
20120185238Auto Generation of Social Media Content from Existing Sources - The invention provides for the automatic creation of custom content for social media based on existing text source and a set of preferences and parameters, including automatically preparing the input material from existing text source, automatically generating the social media content of said input material, automatically generating the published content of said social media content, and automatically producing the analysis and report of said published content and its consumption.07-19-2012
20100145678Method, System and Apparatus for Automatic Keyword Extraction - The present invention provides a method and a system for automatic keyword extraction based on supervised or unsupervised machine learning techniques. Novel linguistically-motivated machine learning features are introduced, including discourse comprehension features based on construction integration theory, numeric features making use of syntactic part-of-speech patterns, and probabilistic features based on analysis of online encyclopedia annotations. The improved keyword extraction methods are combined with word sense disambiguation into a system for automatically generating annotations to enrich text with links to encyclopedic knowledge.06-10-2010
20090125296METHODS AND SYSTEMS FOR USING DOMAIN SPECIFIC RULES TO IDENTIFY WORDS - Text entry systems are described that incorporate information from a specific domain to reduce the allowable words that can be spelled by ambiguous user input. The text entry systems can receive an indication identifying a key pressed by a user. The key may represent multiple characters such that the character intended by the user is ambiguous. The text entry systems identify words from a specific domain that can be spelled with any of the multiple characters represented by the key press. The text entry systems then display an indication to the user highlighting the letters of the identified words represented by the key press. Thus, the text entry systems reduce the possible words indicated by the user input based on domain-specific information.05-14-2009
20120166183SYSTEM AND METHOD FOR THE LOCALIZATION OF STATISTICAL CLASSIFIERS BASED ON MACHINE TRANSLATION - A system and method for localizing a spoken dialog system is disclosed. Source data from a source language spoken dialog system is accessed, including semantic annotations and transcriptions of a plurality of utterances. The transcriptions are machine-translated into a target language. Semantic classifiers are trained on the machine translated transcriptions and the source language semantic annotations.06-28-2012
20120166181Method For Locating Line Breaks In Text - A method for locating line breaks in text, carried out by a computer device having a processor and system memory, includes the steps of creating a probabilistic model of a paragraph of text, parameterized by inter-word spacing, and running an inference on the model to find a sequence of line-breaks that maximize the joint probability of line break positions with minimum deviation of inter-word spacing from an ideal value.06-28-2012
20130046532SYSTEM AND METHOD FOR PROVIDING DEFINITIONS - A system and method for providing definitions is described. A phrase to be defined is received. One or more documents, which each contain at least one definition, are determined. The phrase is matched to at least one of the definitions. One or more definitions for the phrase are presented.02-21-2013
20130046531PSYCHO-LINGUISTIC STATISTICAL DECEPTION DETECTION FROM TEXT CONTENT - An apparatus and method for determining whether a text is deceptive may comprise analyzing a body of textual content known to be one of text containing true content and text containing deceptive content; identifying psycho-linguistic cues that are indicative of a text being deceptive; statistically analyzing, via a computing device, a given text based upon the psycho-linguistic cues to determine if the text is deceptive. The apparatus and method may further comprise weighting the psycho-linguistic cues and statistically analyzing based on the weighted psycho-linguistic cues. The statistically analyzing step may be performed using one of a cue matching analysis, a weighted cue matching analysis, a Markov chain analysis, and a sequential probability ratio testing binary hypothesis analysis. The psycho-linguistic cues may be separated into categories, including increasing trend cues and decreasing trend cues and analyzed according to presence in a category from within the categories.02-21-2013
20090089046Word Use Difference Information Acquisition Program and Device - A device or computer implemented program for accurately and automatically obtaining general-purpose information regarding the usage difference between a plurality of synonyms and quasi-synonyms, such as the types of words with which the synonyms and quasi-synonyms are often used, is provided with: means for receiving the input of a plurality of words; means for extracting sentence data including an inputted word from a corpus; means for analyzing the sentence structure of the sentence data and extracting nouns that are in a grammatical relationship with the inputted word included in the sentence data; means for extracting the nodes representing the nouns and the nodes representing the semantic category of the noun from a thesaurus and forming a directional graph for each inputted word; means for comparing a plurality of directional graphs and extracting the difference nodes; and means for outputting the extracted difference nodes as information relating to the usage difference of the inputted words.04-02-2009
20090043565HANDHELD ELECTRONIC DEVICE WITH TEXT DISAMBIGUATION - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device.02-12-2009
20080243481Large Language Models in Machine Translation - Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n-1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.10-02-2008
20090306966Method and apparatus to determine and use audience affinity and aptitude - An embodiment of the present invention is a method of presenting a media work which includes: detecting media work content properties in a portion of the media work; associating a presentation rate of the portion with the detected media work content properties; and presenting the portion at the presentation rate; wherein the media work content properties include one or more of: (a) indicia of a number of syllables in utterances; (b) indicia of a number of letters in a word; (c) indicia of the complexity of grammatical structures in portions of the media work; (d) indicia of arrival rate of newly presented objects; (e) indicia of temporal proximity of between events in portions of the media work or (f) indicia of number of phonemes per unit of time in portions of the media work.12-10-2009
20090306965DATA DETECTION - An apparatus for processing a sequence of tokens to detect predetermined data, wherein each said token has a token type, and the predetermined data has a structure that comprises a predetermined sequence of token types, including at least one optional token type. The apparatus comprises a processor arranged to: provide a tree for detecting the predetermined data, the tree comprising a plurality of states, each said state being linked with at least one other state by a respective condition, the arrangement of linked states forming a plurality of paths; and compare the token types of the sequence of tokens to respective conditions in the tree to match the sequence of tokens to one or more paths in the tree, wherein the predetermined data can be detected without using an epsilon reduction to take account of said at least one optional token type.12-10-2009
20090306964DATA DETECTION - A method of processing a sequence of characters, the method comprising converting the sequence of characters into a sequence of tokens so that each token comprises a lexeme and one of a plurality of token types. Each of the plurality of token types relates to at least one of a plurality of predetermined functions, wherein at least one said token type relates to multiple functions of the plurality of predetermined functions.12-10-2009
20090306963Representation of objects and relationships in databases, directories, web services, and applications as sentences as a method to represent context in structured data - Systems and methods are disclosed for tagging and translating database objects and relationships into sentences. The successive composition of these sentences form hierarchies which encode contextual information about the objects. A virtual directory/context server functions using a common abstraction layer to access data from databases, applications, directories, Web Services, and other data sources within the enterprise. The virtual directory/context server includes a sentence/context builder module that enables the translation or relationships between data and from the plurality of data sources into a human-readable form, for example, an English language sentence. Thus, applications can view, access, and/or modify the data from the data sources of the enterprise through the virtual directory/context server, for example, using the sentences representative of the relationships between the data. The sentences are indexed, which allows for searches that bring information not only about objects, but also about the context in which those objects appear.12-10-2009
20090306962SYSTEM AND METHOD TO PROVIDE WARNINGS ASSOCIATED WITH NATURAL LANGUAGE SEARCHES TO DETERMINE INTENDED ACTIONS AND ACCIDENTAL OMISSIONS - A method for providing notification of content potentially omitted from within an active document in a document preparation application comprises defining a natural language model for a set of phrasal forms associating each phrasal form with a content type; parsing a textual content of the active document to generate one or more natural language tokens; accessing the natural language model to identify each of the one or more natural language tokens that matches with a phrasal form; generating a list of expected content items having an expected content item for each of the one or more natural language tokens that matches with a phrasal form; scanning the active document to attempt to locate each expected content item; and displaying a notification of each expected content item not located. Each expected content item is generated based upon the content type associated with the corresponding matching phrasal form in the natural language model.12-10-2009
20090306967Automatic Sentiment Analysis of Surveys - In one aspect, the invention provides apparatuses and methods for determining the sentiment expressed in answers to survey questions. Advantageously, the sentiment may be automatically determined using natural language processing. In another aspect, the invention provides apparatuses and methods for analyzing the sentiment of survey respondents and presenting the information as actionable data.12-10-2009
20090112576DISAMBIGUATED TEXT MESSAGE RETYPE FUNCTION - A method of editing delimited ambiguous input on a handheld electronic device, the handheld electronic device including an input apparatus, an output apparatus, and a memory having a plurality of objects stored therein, the plurality of objects including a plurality of language objects and a plurality of frequency objects having a frequency value, the input apparatus including a plurality of input members, at least one of the input members having a plurality of linguistic elements assigned thereto. The method comprises detecting a selection of a language object generated from a first delimited ambiguous input, outputting a plurality of language objects which are complete word solutions of said first delimited ambiguous input, as well as an edit option, and detecting a selection of the edit option.04-30-2009
20090094021Determining A Document Specificity - In one embodiment, determining a document specificity includes accessing a record that records the clusters of documents. The number of themes of a document is determined from the number of clusters of the document. The specificity of the document is determined from the number of themes.04-09-2009
20090094019Efficiently Representing Word Sense Probabilities - Word sense probabilities are compressed for storage in a semantic index. Each word sense for a word is mapped to one of a number of “buckets” by assigning a bucket score to the word sense. A scoring function is utilized to assign the bucket scores that maximizes the entropy of the assigned bucket scores. Once the bucket scores have been assigned to the word senses, the bucket scores are stored in the semantic index. The bucket scores stored in the semantic index may be utilized to prune one or more of the word senses prior to construction of the semantic index. The bucket scores may also be utilized to prune and rank the word senses at the time a query is performed using the semantic index.04-09-2009
20130073280DYNAMIC SENTENCE FORMATION FROM STRUCTURED OBJECTS AND ACTIONS IN A SOCIAL NETWORKING SYSTEM - A social networking system includes a mechanism for integrating user actions on objects outside of the social networking system in the social graph. External system operators include widgets that, when executed by user devices, record user interactions that correspond to a defined structure of actions and objects. Third party operators utilize a tool provided by the social networking system to define the structure of actions and objects, verb tenses of action types, and noun forms object types. External actions are recorded by the social networking system for publishing to the social graph in dynamically generated sentences formed using the structure of the actions and objects.03-21-2013
20130073279METHODS AND SYSTEMS FOR COMPILING COMMUNICATION FRAGMENTS AND CREATING EFFECTIVE COMMUNICATION - A method compiles communication fragments whereby the user is required to choose among variables to create a communication. The communication, can be immediately displayed to the user for approval, may be manipulated real-time by selecting amongst a collection of variables to arrive at the final communication. An exemplary system gathers personal information and profile information from users of the communication service. This information is used when compiling a communication to allow for more customized and personalized communications. Information is used by an empirical database to gain intelligence on which communication has the highest likelihood of satisfying a receiver for a particular communication type. These communications, with the highest likelihood of success, are recommended by the communication service to potential senders of communication.03-21-2013
20130073278METHODS AND SYSTEMS FOR COMPILING COMMUNICATION FRAGMENTS AND CREATING EFFECTIVE COMMUNICATION - Methods and systems for forming a communication. At least one variable identifying an area of communication is received. Then, information associated with one or more users is received. A plurality of variables, each variable relevant to forming a communication within a communication category are received. A communication structure is generated based on at least two of the received variable identifying the area of communication, the received user information or the received plurality of variables. Communication fragments are identified based on the generated communication structure. Communication fragments are selected from those identified. A communication is formed based upon the selected communication fragments. Then, the formed communication is outputted.03-21-2013
20130073277METHODS AND SYSTEMS FOR COMPILING COMMUNICATION FRAGMENTS AND CREATING EFFECTIVE COMMUNICATION - Methods and systems compiles communication fragments based on user chosen variables to create a communication. The communication, can be immediately displayed to the user for approval, may be manipulated real-time by selecting amongst a collection of variables to arrive at the final communication. An exemplary system gathers personal information and profile information from users of the communication service. This information is used when compiling a communication to allow for more customized and personalized communications. Information is used by an empirical database to gain intelligence on which communication has the highest likelihood of satisfying a receiver for a particular communication type. These communications, with the highest likelihood of success, are recommended by the communication service to potential senders of communication.03-21-2013
20110015921SYSTEM AND METHOD FOR USING LINGUAL HIERARCHY, CONNOTATION AND WEIGHT OF AUTHORITY - An authoring environment comprising a linguistic construction tool and method to allow qualitative search and representation of results that may use one or any combination of lingual hierarchy, connotation and weight of authority for constructing a multidimensional conceptual model applicable to one or more documents. The linguistic construction tool and method may be used to augment the authoring process and the resulting documents. The linguistic construction tool may also be used to perform search related activities.01-20-2011
20130060560SERVER-BASED SPELL CHECKING - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for server-based spell check. One aspect of the subject matter described in this specification can be embodied in methods performed by a server. The methods include the actions of receiving a request to spell check text; dividing the text into multiple segments, each segment including no more than a predetermined number of terms; providing each segment to a spell checker programmed to spell check an input including no more than the predetermined number of terms; receiving, from the spell checker, one or more spelling correction suggestions, each spelling correction suggestion corresponding to a term in a segment, the term being designated as misspelled by the spell checker; and assembling the received one or more spelling correction suggestions into a response to the request to spell check the text.03-07-2013
20130060563HANDHELD ELECTRONIC DEVICE WITH REDUCED KEYBOARD AND ASSOCIATED METHOD OF PROVIDING IMPROVED DISAMBIGUATION - An improved handheld electronic device having a reduced keyboard provides facilitated language entry by making available to a user certain words that a user may reasonably be expected to enter. In some situations, certain words can be stored, for example, in a temporary dictionary for use in particular situations. For instance, the names of the recipients of an electronic message might be stored in a temporary dictionary for rapid retrieval when entering a salutation in the message. As another example, a number of the words in an existing electronic message may be stored in a temporary dictionary and made available to a user when replying to or forwarding the message since the existing message might include words that the user might reasonably be expected to type in the reply message or the forwarded message.03-07-2013
20130060562INFORMATION PROCESSING APPRATUS, NATURAL LANGUAGE ANALYSIS METHOD, PROGRAM AND RECORDING MEDIUM - An apparatus and method for calculating a score of matching a sentence with a query pattern having a dependency structure. The apparatus includes: an input unit acquiring an analysis target sentence, a query pattern and an index value indexing how a linguistic unit in the sentence tends to modify another; and a score calculation unit calculating a matching score indexing the degree of matching of the sentence with the query pattern. The matching score is represented by a function having an index value with which a dependency relation included in the query pattern is associated. The score is calculated by attempting association between a substructure of the query pattern and a range in the sentence and by performing recursive calculation in the substructure and the range while storing partial calculation result of the function in a memory area for reuse.03-07-2013
20130060561Encoding and Decoding of Small Amounts of Text - Text is encoded using a predetermined dictionary not unique to the encoded text to substitute codes for words and phrases thereby obviating transmission of the dictionary along with transmitted encoded text. The codes of the dictionary are made of one or more text characters such that the message, once encoded, continues to be a legitimate text message and can travel through any data transport medium through which a conventional text message can travel. Non-word characters delimit codes and unencoded words in an encoded message. Any phrase that can be confused with a code is flagged to indicate that it is not a code.03-07-2013
20120310633FILTERING DEVICE AND FILTERING METHOD - A filtering device includes: a table storage unit that stores an allowed word table in which a plurality of morphemes and the number of appearances thereof are associated with each other; a program stream acquiring unit that acquires a program stream generated according to a broadcasting code of ethics; a table update unit that extracts caption data or program information, which is a first text data item related to the content of a program, from the program stream when the acquired program stream includes the caption data or the program information, divides the extracted caption data; a data acquiring unit that acquires an arbitrary second text data item; and a data processing unit that divides the second text data item into morphemes, replaces a divided morpheme with a predetermined symbol when the divided morpheme has not been registered in the allowed word table.12-06-2012
20120310632COMPUTER SYSTEM WITH SECOND TRANSLAATOR FOR VEHICLE PARTS - Described are computer-based methods and apparatuses, including computer program products, for automation of auditing claims. Data indicative of an insurance company name is received, the data comprising one or more words. The data is processed through one or more processing steps to generate processed data comprising one or more processed words. One or more candidate word strings are selected based on the one or more processed words. Matching information is associated with each of the one or more candidate word strings. Analysis information is generated for each of the one or more candidate word strings based on the associated matching information. An insurance company identifier is associated with received data based on the analysis information and one or more matching rules.12-06-2012
20120310629SYSTEMS AND METHODS FOR AUTOMATICALLY DETERMINING CULTURE-BASED BEHAVIOR IN CUSTOMER SERVICE INTERACTIONS - Systems and methods are provided to automatically determine culture-based behavioral tendencies and preferences of individuals in the context of customer service interactions. For example, systems and methods are provided to process natural language dialog input of an individual to detect linguistic features indicative of individualistic and collectivistic behavioral tendencies and predict whether such individual will be cooperative or uncooperative with automated customer service.12-06-2012
20090271179METHOD AND SYSTEM FOR EXTENDING KEYWORD SEARCHING TO SYNTACTICALLY AND SEMANTICALLY ANNOTATED DATA - Methods and systems for extending keyword searching techniques to syntactically and semantically annotated data are provided. Example embodiments provide a Syntactic Query Engine (“SQE”) that parses, indexes, and stores a data set as an enhanced document index with document terms as well as information pertaining to the grammatical roles of the terms and ontological and other semantic information. In one embodiment, the enhanced document index is a form of term-clause index, that indexes terms and syntactic and semantic annotations at the clause level. The enhanced document index permits the use of a traditional keyword search engine to process relationship queries as well as to process standard document level keyword searches. In one embodiment, the SQE comprises a Query Processor, a Data Set Preprocessor, a Keyword Search Engine, a Data Set Indexer, an Enhanced Natural Language Parser (“ENLP”), a data set repository, and, in some embodiments, a user interface or an application programming interface.10-29-2009
20120226493System and Methods for Using Short-Hand Interpretation Dictionaries in Collaboration Environments - A method for creating and using a short-hand interpretation dictionary in a collaboration environment includes creating or editing a document in a collaboration environment, said document comprising at least one short-hand notation; and replacing the at least one short-hand notation with an interpretation from at least one short-hand dictionary.09-06-2012
20120226492INFORMATION PROCESSING APPARATUS, NATURAL LANGUAGE ANALYSIS METHOD, PROGRAM AND RECORDING MEDIUM - An apparatus and method for calculating a score of matching a sentence with a query pattern having a dependency structure. The apparatus includes: an input unit acquiring an analysis target sentence, a query pattern and an index value indexing how a linguistic unit in the sentence tends to modify another; and a score calculation unit calculating a matching score indexing the degree of matching of the sentence with the query pattern. The matching score is represented by a function having an index value with which a dependency relation included in the query pattern is associated. The score is calculated by attempting association between a substructure of the query pattern and a range in the sentence and by performing recursive calculation in the substructure and the range while storing partial calculation result of the function in a memory area for reuse.09-06-2012
20100082331SEMANTICALLY-DRIVEN EXTRACTION OF RELATIONS BETWEEN NAMED ENTITIES - A system and method of developing rules for text processing enable retrieval of instances of named entities in a predetermined semantic relation (such as the DATE and PLACE of an EVENT) by extracting patterns from text strings in which attested examples of named entities satisfying the semantic relation occur. The patterns are generalized to form rules which can be added to the existing rules of a syntactic parser and subsequently applied to text to find candidate instances of other named entities in the predetermined semantic relation.04-01-2010
20130166284System and Method of Spoken Language Understanding in Human Computer Dialogs - A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self-contained clauses, identifying a dialog act in each of the self-contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify.06-27-2013
20130066625TEXT SEGMENTATION AND LABEL ASSIGNMENT WITH USER INTERACTION BY MEANS OF TOPIC SPECIFIC LANGUAGE MODELS AND TOPIC-SPECIFIC LABEL STATISTICS - The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document.03-14-2013
20090265160COMPARING TEXT BASED DOCUMENTS - Text based documents are compared by lexically normalising each word of the text of a first document (10-22-2009
20090089044INTENT MANAGEMENT TOOL - Linguistic analysis is used to identify queries that use different natural language formations to request similar information. Common intent categories are identified for the queries requesting similar information. Intent responses can then be provided that are associated with the identified intent categories. An intent management tool can be used for identifying new intent categories, identifying obsolete intent categories, or refining existing intent categories.04-02-2009
20090234641METHOD AND SYSTEM FOR ASSISTING THE PROTECTION OF TRADE MARKS - Method for assisting the protection of trade marks comprising the following steps: collecting data comprising at least one natural language term relating to a field of activity, these data being indicated by a user; determining, in an automated manner, on the basis of the natural language term or terms indicated by the user, a suggestion comprising goods and/or services and their respective classes according to an administrative classification, and transmitting the suggestion to the user; receiving data indicative of a selection of good(s) and/or service(s) and/or class(es) chosen by the user from the automated suggestion; and compiling and/or storing this selection. The method can furthermore comprise the automated searching for priorities, the preparing of documents (paper or electronic) necessary for filing a trade mark application, and the tracking of the registration procedure.09-17-2009
20090234640Method and an apparatus for automatic semantic annotation of a process model - An apparatus and a method for automated semantic annotation of a process model having model elements named by natural language expressions, wherein said apparatus comprises at least one semantic pattern analyser which analyses the textual structure of each natural language expression on the basis of predefined semantic pattern descriptions to establish a semantic linkage between each model element to classes and instances of a reference process ontology for generating a semantically annotated process model.09-17-2009
20090234639Human-Like Response Emulator - Human-like response emulator stores a library (09-17-2009
20120232886COMPUTER NETWORK, COMPUTER-IMPLEMENTED METHOD, COMPUTER PROGRAM PRODUCT, CLIENT, AND SERVER FOR NATURAL LANGUAGE-BASED CONTROL OF A DIGITAL NETWORK - The present application relates to a computer network, a computer-implemented method, a computer program product, a client, and a server for natural language-based control of a digital network. In one aspect, the computer network for natural language-based control of a digital network may comprise: a digital network operable to provide sharing of access to a network between a plurality of devices connected in the digital network; a client installed in the digital network and operable to provide a unified natural language interface to a user to control the digital network using natural language; a server connected to the client over the network and operable to process a user request of the user performed through the unified natural language interface; and one or more software agents operable to execute at least one action on at least one of the plurality of devices based on the processed user request.09-13-2012
20090012777TOKEN STREAM DIFFERENCING WITH MOVED-BLOCK DETECTION - Methods and apparatus implementing systems and techniques for differencing token streams and detecting moved blocks of tokens. In general, in one implementation, the technique includes: obtaining a first token stream and a second token stream, comparing the first and second token streams to identify a group of tokens that are substantially similar in the first and second token streams, the similar-tokens group including common sub-sequences, which are identical in the first and second token streams, and at least one unmatched token, and presenting matched token information corresponding to the similar-tokens group to represent changes in document flow.01-08-2009
20090012778APPARATUS AND METHOD FOR EXPANDING NATURAL LANGUAGE QUERY REQUIREMENT - The present invention provides an apparatus for expanding a query requirement, comprising: a query requirement understanding device which generates an explicit query requirement according to a user query request; and a query requirement expanding device which generates an implicit query requirement associated with the explicit query requirement. The query requirement understanding device generates an explicit query requirement including a query concept and a question type by searching a knowledge base and a language base, and the query requirement expanding device generates an implicit query requirement including a query concept and a question type by searching the knowledge base, the language base and a relevancy database. The present invention further provides a method for expanding a query requirement. The apparatus and method for expanding a query requirement according to the present invention can facilitate a user's query and provide the user with an accurate, comprehensive query answer.01-08-2009
20090306968SYSTEM AND METHOD OF GRANTING IDENTIFICATION CODES TO ELECTRONIC TEACHING MATERIAL CONTENTS' SENTENCE STRUCTURES, SYSTEM AND METHOD OF SEARCHING DATA OF ELECTRONIC TEACHING MATERIAL CONTENTS, SYSTEM AND METHOD OF MANAGING POINTS OF USE AND SERVICE OF ELECTRONIC TEACHING MATERIAL CONTENTS - Disclosed is a system, which grants identification code to sentence structures of electronic teaching material contents, includes the following units. The identification code production unit distinguishes each syllable of electronic teaching material content's selected sentence structure according to type of language, and produces peculiar identification code using the first phoneme or syllable of each syllable. The identification code grant unit grants identification code to metadata of file which stores electronic teaching material contents of above.12-10-2009
20130166280Concept Search and Semantic Annotation for Mobile Messaging - A textual message processing system and method are described for use in a mobile environment. A user messaging application processes at least one user textual message during a user messaging session. A semantic annotation module identifies one or more semantically salient terms in the user textual message, and annotates the user textual message with annotation terms having a low semantic distance to the semantically salient terms. A user message history stores the annotated textual messages. The semantic annotation module may further annotate the user textual message with situational meta-data characterizing the user textual message. There may be a message search module for using one or more keywords to search the user message history including the annotation terms, and identifying as a search match any annotated textual messages within a semantic distance threshold of the one or more keywords.06-27-2013
20130166281OPTIMALLY SORTING A LIST OF ELEMENTS BASED ON THEIR SINGLETONS - A method provides a non-optimized list of elements, with some of the elements having multiple terms. A table of sub-elements is generated from the elements list, with each sub-element having one term only and with a number of times a sub-element appears in the elements list being weighted in the sub-elements table. A weighted singleton histogram table is generated using a singleton dictionary, and a total popularity score of each singleton is computed from the sub-elements table. For each element from the elements list, an elements score is generated based on the total popularity score of each singleton within the element. An optimally sorted list of the elements list is generated based on the elements scores.06-27-2013
20130166282METHOD AND APPARATUS FOR RATING DOCUMENTS AND AUTHORS - Methods and apparatus for determining a competence rating of an author relating to one or more topics is disclosed. An exemplary method comprises determining semantic information associated with one or more documents related to the one or more topics, determining amplification information associated with the one or more documents, determining occurrence information associated with the author; and determining a competence rating for the author based at least in part on the semantic information associated with the one or more documents, the amplification information associated with the one or more documents, and the occurrence information associated with the author. A document rating for at least one of the one or more documents may also be determined based at least in part on the one or more weighted semantic features and the amplification information.06-27-2013
20130166283METHOD AND APPARATUS FOR GENERATING PHONEME RULE - A phoneme rule generating apparatus includes a spectrum analyzer configured to analyze pronunciation patterns of voices included in a plurality of voice data, a clusterer configured to cluster the plurality of voice data based on the analyzed pronunciation patterns, a voice group generator configured to generate voice groups from the clustered voice data, a phoneme rule generator configured to generate a phoneme rule corresponding to each respective voice group from among the generated voice groups and a group mapping DB configured to store the generated voice groups and the generated phoneme rules for an accurate voice recognition.06-27-2013
20080294427Method and apparatus for performing a semantically informed merge operation - A method and apparatus for performing an informed semantic merge operation comprises selecting a source region in a document and a target region in the same or a different document. A bi-directionally coupled surface region is identified in the source region and a bi-directionally coupled surface region is identified in the target region. A first semantic object coupled to the surface region in the source region is identified and a second semantic object coupled to the surface region in the target region is identified. The subcomponents of the first semantic object are combined with the subcomponents of the second semantic object by merging.11-27-2008
20080294426Method and apparatus for anchoring expressions based on an ontological model of semantic information - A method and apparatus for the recording and maintenance of semantic elements in electronically-held information objects provide for grounding semantic objects in an ontology, such that inheritance and other relations between concepts are preserved in persistent storage. The disclosed method and apparatus provide semantic document authors with a means to anchor concept references to specific, persistent, semantic objects, thereby providing the system with access to all properties of the underlying data model of the semantic objects being referenced, while also specifying the type and scope of their relations, as well as behavioral aspects of the visual and editing environment.11-27-2008
20080294425Method and apparatus for performing semantic update and replace operations - A method of changing semantic information comprises changing a first bi-directional coupling between a surface region in a document and a first semantic object to a second bi-directional coupling between the surface region and a second semantic object. More particularly, the method may be comprised of identifying an occurrence of a surface region in a document, the surface region having a first link for coupling the surface region to a first semantic object, and the first semantic object having a first association for coupling the first semantic object with the surface region. The first link is replaced with a second link for coupling the surface region to a second semantic object. The first association is changed to a second association for coupling the second semantic object with the surface region. Another method for changing semantic information comprising selecting a semantic object stored in a data repository and changing the selected semantic object. A scope is then selected, either manually or automatically. A set of semantically anchored expressions associated with the semantic object is identified in response to the scope. A determination is made if the semantically anchored expressions are consistent with the changed semantic object and, if not, the semantically anchored expressions are updated so as to be consistent with the changed semantic object.11-27-2008
20120271628METHOD OF USING VISUAL SEPARATORS TO INDICATE ADDITIONAL CHARACTER COMBINATIONS ON A HANDHELD ELECTRONIC DEVICE AND ASSOCIATED APPARATUS - A method and associated apparatus for using visual separators to indicate additional character combination choices from a disambiguation function on a handheld electronic device.10-25-2012
20080208569SYSTEM FOR IDENTIFYING WORD PATTERNS IN TEXT - A system for identifying word patterns in text is conducted in real time and is highly suitable for network and Internet use. The system comprises a semantic network that may be compiled on a local computer or at a remote host and a software text analysis module for receiving the text to be analyzed, parsing the text, submitting the text to the semantic network, and receiving the results. Recognized, words are then examined, together with surrounding words in the text to determine whether the words are part of a word pattern. Word patterns are located at nodes in the semantic network in a hierarchical structure, and certain word patterns correspond to objects of the semantic network. When all word patterns involving a word are located, links are followed to objects corresponding to the word patterns. Several nodes may point to a single object, but each object is represented only once in the semantic network. Identified objects may thus be identified in real time, as the text streams through the text analysis module.08-28-2008
20110035208System and Method for Extracting Radiological Information Utilizing Radiological Domain Report Ontology and Natural Language Processing - A system and method that employs radiological report domain ontology and natural language processing to specify and model historical radiological information as knowledge is provided. The system and method trains a statistical probability based natural language processing system to recognize the semantics of a radiological domain. A methodology is provided to submit portions or the entire content of textual historical radiological report to a natural language processor wherein such data is interpreted and reported in a structured hierarchy.02-10-2011
20120101810SYSTEM AND METHOD FOR PROVIDING A NATURAL LANGUAGE VOICE USER INTERFACE IN AN INTEGRATED VOICE NAVIGATION SERVICES ENVIRONMENT - A conversational, natural language voice user interface may provide an integrated voice navigation services environment. The voice user interface may enable a user to make natural language requests relating to various navigation services, and further, may interact with the user in a cooperative, conversational dialogue to resolve the requests. Through dynamic awareness of context, available sources of information, domain knowledge, user behavior and preferences, and external systems and devices, among other things, the voice user interface may provide an integrated environment in which the user can speak conversationally, using natural language, to issue queries, commands, or other requests relating to the navigation services provided in the environment.04-26-2012
20120010872Method and System for Semantic Searching - In one embodiment, there is provided a computer-implemented method and system for implementing the method. The method comprises: preliminarily analyzing at least one corpus of natural language text comprising for each sentence of each natural language text of the corpus, performing syntactic analysis using linguistic descriptions to generate at least one syntactic structure for the sentence; building a semantic structure for the sentence; associating each generated syntactic and semantic structure with the sentence; and saving each generated syntactic and semantic structure; for each corpus of natural language text that was preliminarily analyzed, performing an indexing operation to index lexical meanings and values of linguistic parameters of each syntactic structure and each semantic structure associated with sentences in the corpus; and searching in at least one preliminarily analyzed corpora for sentences comprising searched values for the linguistic parameters.01-12-2012
20110282653TEXT PROCESSING APPARATUS, TEXT PROCESSING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM - A text processing apparatus is provided with a segment determination unit 11-17-2011
20110282652MAPPING OF RELATIONSHIP ENTITIES BETWEEN ONTOLOGIES - Methods, apparatus and systems, including computer program products, for reducing an error rate when mapping entities between a first ontology and a second ontology. One or more of a general language dictionary and an industry-specific dictionary are provided. Natural language processing of the first ontology is performed to identify one or more candidate relationship entities in the first ontology. Each candidate relationship entity includes a compound name having two or more semantic labels, and each candidate relationship entity has a name that neither exists in the general language dictionary or the industry-specific dictionary. Each of the one or more candidate relationship entities in the first ontology is mapped to one or more entities in the second ontology using one or more configurable computer-implemented mapping algorithms.11-17-2011
20110282651GENERATING SNIPPETS BASED ON CONTENT FEATURES - Systems, methods, and computer storage media having computer-executable instructions embodied thereon that facilitate generation of snippets. In embodiments, text features within a keyword-sentence window are identified. The text features are utilized to determine break features that indicate favorability of breaking at a particular location of the keyword-sentence window. The break features are used to recognize features of partial snippets such that a snippet score to indicate the strength of the partial snippet can be calculated. Snippet scores associated with partial snippets are compared to select an optimal snippet, that is, the snippet having the highest snippet score.11-17-2011
20130024185Automatic Dynamic Contextual Date Entry Completion - A method performed in a computer device having associated therewith a plurality of unstructured documents having words therein, the method involves accessing at least some of the plurality of unstructured documents, extracting a multiset of words, forming a matrix from the documents in which each word in the multiset is represented in a column and each document from which the words came is represented in a row, treating each document as a vector in a multidimensional Euclidean space, uniquely pairing the unique documents, measuring the similarity between the pairs as a cosine of the angle between vectors, comparing the cosines to a specified threshold to determine relatedness among the documents, and based upon the relatedness, when an input is received by the computer device representing a string of a threshold number of characters, the computer device will provide at least one word that would complete the character string.01-24-2013
20090157385Inverse Text Normalization - Embodiments are directed to efficient multilingual inverse text normalization (ITN) of text in spoken form to produce normalized text for display. Embodiments are directed to preprocessing the multilingual text into a language-independent representation, tokenizing text in spoken form, segmenting the tokenized text into ITN items by grouping consecutive words using an ITN lexicon, classifying the ITN items into ITN categories by using the ITN lexicon or tagged information from language model, applying one or more ITN rules that are selected based on the ITN categories into which ITN items have been classified to rewrite the ITN items; and post processing the ITN item and outputting inversely normalized text in written form for display. The ITN lexicon may include ITN lexicon entries that are each located within an ITN category in the ITN lexicon.06-18-2009
20090150140EFFICIENT STEMMING OF SEMITIC LANGUAGES - A system for stemming words of Semitic languages, the system including an affix scanner configured to scan a word of a Semitic language for at least one affix according to a predefined scanning sequence and determine if at least one predefined scanning criterion is met, and a stemmer configured to remove the affix from the word if the predefined scanning criterion is met.06-11-2009
20120290293Exploiting Query Click Logs for Domain Detection in Spoken Language Understanding - Domain detection training in a spoken language understanding system may be provided. Log data associated with a search engine, each associated with a search query, may be received. A domain label for each search query may be identified and the domain label and link data may be provided to a training set for a spoken language understanding model.11-15-2012
20120290292UNSTRUCTURED DATA SUPPORT WITH AUTOMATIC RULE GENERATION - A system to process unstructured data is provided. An example system to process unstructured data comprises a receiver to access a source of unstructured data, an entity type module to determine an entity type, a rules generator to automatically generate a linguistic rule based on the determined entity type, and an entity extractor to obtain an entity from the source of unstructured data, using the linguistic rule. The entity comprises an alpha-numeric string.11-15-2012
20110301940FREE TEXT VOICE TRAINING - A system and method provide acoustic training of a voice or speech recognition engine and/or voice or speech recognition software application. Instead of requiring a user to read from a prepared or predetermined script, the system and method described herein enable acoustic training using any free text spoken phrases provided by the user directly, or by a previously recorded speech, presentation, or the like, performed by the user.12-08-2011
20110301943SYSTEM AND METHOD OF DICTATION FOR A SPEECH RECOGNITION COMMAND SYSTEM - In embodiments of the present invention, a system and computer-implemented method for enabling dictation may include parsing standard reports in order to identify a plurality of logical phrases in the report used for discrete sections and descriptions. In the report method, the phrases may be parsed and identifier words throughout the report may be compared to eliminate ambiguities. The method may then involve constructing text macros that follow the parsed text, thereby enabling the user to speak the identifiers to indicate full, formatted text. Finally, the report method may involve constructing a mnemonic document so both beginner and experienced users can easily read the identifiers out loud to produce a report. The result of the method is an intuitive, notes-style way to use speech commands to quickly produce a standard, formatted report.12-08-2011
20110301942Method and Apparatus for Full Natural Language Parsing - The method and apparatus for discriminative natural language parsing, uses a deep convolutional neural network adapted for text and a structured tag inference in a graph. In the method and apparatus, a trained recursive convolutional graph transformer network, formed by the deep convolutional neural network and the graph, predicts “levels” of a parse tree based on predictions of previous levels.12-08-2011
20110301941NATURAL LANGUAGE PROCESSING METHOD AND SYSTEM - A computer implemented natural language processing method, the method including the steps of: analysing a sentence string within textual information to determine sub-components of the sentence string, assigning one or more unique tokens to each determined sub-component, determining a probability of use that a determined sub-component has one or more specific meanings, based on the determined probability of use, creating a valid set of unique tokens that are associated with the sentence string, and linking verb sub-components associated with one or more of the unique tokens in the valid set of unique tokens to a pre-defined limited sub-set of verbs to create an identification tuple that maps onto the sub-set of verbs.12-08-2011
20120109640METHOD AND SYSTEM FOR ANALYZING AND TRANSLATING VARIOUS LANGUAGES WITH USE OF SEMANTIC HIERARCHY - A method and computer system for analyzing sentences of various languages and constructing a language-independent semantic structure are provided. On the basis of comprehensive knowledge about languages and semantics, exhaustive linguistic descriptions are created, and lexical, morphological, syntactic, and semantic analyses for one or more sentences of a natural or artificial language are performed. A computer system is also provided to implement, analyze and store various linguistic structures and to perform lexical, morphological, syntactic, and semantic analyses. As result, a generalized data structure, such as a semantic structure, is generated and used to describe the meaning of one or more sentences in language-independent form, applicable to automated abstracting, machine translation, control systems, Internet information retrieval, etc.05-03-2012
20120010874METHOD AND SYSTEM FOR PROVIDING A REPRESENTATIVE PHRASE BASED ON KEYWORD SEARCHES - Provided is a method and system for providing a representative phrase with respect to a real time popular keyword, which may determine programs including a popular keyword from broadcast information, and may generate a representative phrase with respect to the popular keyword using the determined programs, thereby providing the representative phrase by combining the generated representative phrase and the popular keyword.01-12-2012
20120010873SENTENCE TRANSLATION APPARATUS AND METHOD - Disclosed herein are a sentence translation apparatus and method. The sentence translation apparatus includes a voice recognition unit, a morphemic part-of-speech tagging unit, a pause extraction unit, and a sentence separation unit. The voice recognition unit creates a sentence in a first language based on results of recognition of a voice in a first language. The morphemic part-of-speech tagging unit tags morphemic parts of speech from the sentence in the first language. The pause extraction unit extracts pause information from the voice in the first language. The sentence separation unit separates the sentence in the first language based on information about the morphemic parts of speech tagged by the morphemic part-of-speech tagging unit and the pause information extracted by the pause extraction unit.01-12-2012
20110288856HANDHELD ELECTRONIC DEVICE WITH TEXT DISAMBIGUATION AND SELECTIVE DISABLING OF FREQUENCY LEARNING - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user is likely to have intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The learning function is disabled, however, when the relevant words are found to be in a special category for which frequency learning, i.e., frequency revision, is not employed.11-24-2011
20110288854HUMAN READABLE SENTENCES TO REPRESENT COMPLEX COLOR CHANGES - Methods and a system for a natural language control interface are provided to enable a user to modify various properties of a document. The modifications comprise building sentences from modification words, and combining them together in one display. The modifications are displayed in real time for a user to observe as they are inputted. The order of the modifications is managed by the user and is configured to be changed, added and/or removed.11-24-2011
20110295593AUTOMATED MESSAGE ATTACHMENT LABELING USING FEATURE SELECTION IN MESSAGE CONTENT - Embodiments are directed towards an automated machine learning framework to extract keywords within a message that are relevant to an attachment to the message. The machine learning model finds a set of relevant sentences within the message determined to be relevant to the one or more attachments based on identification of one or more sentence level features within a given sentence. The sentence level features include, for example, anchor features, noisy sentence features, short message features, threading features, anaphora detections, and lexicon features. From the set of relevant sentences, useful keywords may be extracted using a sequence of heuristics to convert the sentence set into the set of useful keywords. The set of useful keywords may then be associated to at least one attachment such that the keywords may subsequently be used to perform various indexing, searching, sorting, and to provide further context to the attachment.12-01-2011
20120016664LANGUAGE ANALYSIS APPARATUS, LANGUAGE ANALYSIS METHOD, AND LANGUAGE ANALYSIS PROGRAM - A language analysis apparatus of the invention includes division rules, each of which is classified into one of levels according to the degree of risk of causing analysis accuracy problems when applied; a division point candidate generation unit 01-19-2012
20110295595DOCUMENT PROCESSING, TEMPLATE GENERATION AND CONCEPT LIBRARY GENERATION METHOD AND APPARATUS - The present invention relates to document processing method and apparatus which can edit a natural language and generate a machine-processable document; a template generating method and apparatus which can be used for document processing method and apparatus; a concept library generating method and apparatus which can be used for the document processing method and apparatus and the template generating method and apparatus. The present invention provided a possibility for semantic interaction of documents in different systems and enhances efficiency.12-01-2011
20110295592Survey Analysis and Categorization Assisted by a Knowledgebase - The disclosure generally relates to knowledge retrieval using a knowledgebase storing general and/or expert knowledge. In particular, the disclosure relates to using an enhanced knowledgebase to implement a tool for analysis and categorization of surveys.12-01-2011
20090048823SYSTEM AND METHODS FOR OPINION MINING - A system that incorporates teachings of the present disclosure may include, for example, a system having a controller to identify from commentaries of an object or service one or more context-dependent opinions associated with one or more features of the object or the service, and synthesize a semantic orientation for each of one or more context-dependent opinions of the one or more features. Additional embodiments are disclosed.02-19-2009
20130218553INFORMATION NOTIFICATION SUPPORTING DEVICE, INFORMATION NOTIFICATION SUPPORTING METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, an information notification supporting device includes an analyzer configured to analyze an input voice so as to identify voice information indicating information related to speech; a storage unit configured to store therein a history of the voice information; an output controller configured to determine, using the history of the voice information, whether a user is able to listen to a message of which the user should be notified; and an output unit configured to output the message when it is determined that the user is in a state in which the user is able to listen to the message.08-22-2013
20130218554Multi-Concept Latent Semantic Analysis Queries - A method includes accessing text, identifying a plurality of terms from the text, determining a plurality of term vectors associated with the identified plurality of terms, and clustering the determined plurality of term vectors into a plurality of clusters, the plurality of clusters comprising a first and a second cluster, the first and second clusters each comprising two or more of the determined term vectors. The method further includes creating a first pseudo-document according to the first cluster, creating a second pseudo-document according to the second cluster, identifying a first set of terms associated with the first cluster using latent semantic analysis (LSA) of the first pseudo-document, identifying a second set of terms associated with the second cluster using LSA of the second pseudo-document, and combining the first and second sets of terms into a list of output terms.08-22-2013
20130218555DEVICE FOR ANALYZING TEXT DOCUMENTS - An analysis device for analyzing a text document is provided. The analysis device includes a context storage unit configured to store context information that shows a position of a character set of a predetermined context in the text document. The analysis device also includes an index storage unit configured to store index information that shows a position of a word in the text document, for each word of a plurality of words contained in the text document. An input unit is configured to input a target word. A position detection unit is configured to detect from the index information a position of the target word contained in the text document. A frequency detection unit is configured to detect an appearance frequency of the target word per each type of context in the text document based on the position of the target word and on the context information.08-22-2013
20100004923Method and an apparatus for clustering process models - The invention relates to an apparatus for clustering process models each consisting of model elements comprising a text phrase which describes in a natural language a process activity according to a process modeling language grammar and a natural language grammar, wherein said apparatus comprises a process object ontology memory for storing a process object ontology, a distance calculation unit for calculating a distance matrix employing said processing modeling language grammar and said natural language grammar, wherein said distance matrix consists of distances each indicating a dissimilarity of a pair of said process models, and a clustering unit which partitions said process models into a set of clusters based on said calculated distance matrix.01-07-2010
20100036654Systems and methods for identifying collocation errors in text - Systems and methods for detecting collocation errors in a text sample using a reference database from a corpus are provided. Collocation candidates are identified within the text sample based upon syntactic patterns in the text sample. Whether a given collocation candidate contains a collocation error is detected, the detecting including: determining a first association measure using the reference database for the given collocation candidate; determining whether the first association measure satisfies a predetermined condition and identifying the given collocation candidate as proper if the first association measure satisfies the predetermined condition; determining an additional association measure for a variation of the given collocation candidate using the reference database; and determining whether or not the collocation candidate contains an error based upon the additional association measure of the variation.02-11-2010
20100030552DERIVING ONTOLOGY BASED ON LINGUISTICS AND COMMUNITY TAG CLOUDS - In some embodiments, a method comprises receiving a tag cloud including tags that hyperlink to web content. The method can also comprise separating the tags into different linguistic categories, assigning a weight to each tag, and grouping the tags into clusters, wherein tags in a cluster are associated with a context. The method can also include determining one or more domains for the tag clusters, wherein a domain is a broadest class that defines one or more of the tags in a linguistic category, determining a hierarchy for the tags based on the weights of the tags, and identifying linguistic relationships between the tags. The method can also comprise determining properties associated with one or more of the tags and one or more of the domains, wherein the tag's properties are determined using linguistic analysis and storing the tags, the hierarchies, the linguistic relationships, and the properties.02-04-2010
20090248398Vocal Alert Unit Having Automatic Situation Awareness - A system and method for instructing dynamic nodes in a dynamically changing mobile network how to maneuver. A receiver receives situation data indicative of a respective situation of each dynamic node in space and a situation unit coupled to the receiver determines the respective situation of each dynamic node. An analysis unit coupled to the situation unit analyzes the respective situation of each dynamic node in combination with specified criteria to generate respective situation awareness date for each dynamic node. A dynamic selector unit coupled to the analysis unit determines from the respective situation awareness data appropriate action to be performed by each node; and a communication unit coupled to the dynamic selector unit conveys to the respective dynamic node command data to permit rendering of a personalized command for informing the respective node of appropriate action to be carried out thereby.10-01-2009
20100114560SYSTEMS AND METHODS FOR EVALUATING A SEQUENCE OF CHARACTERS - A sequence of characters may be evaluated to determine the presence of a natural language word. The sequence of characters may be analyzed to find a subsequence of alphabetical characters. Based on a statistical model of a natural language, a probability that the subsequence is a natural language word may be calculated. The probability may then be used to determine if the subsequence is indeed a natural language word.05-06-2010
20100114561LATENT METONYMICAL ANALYSIS AND INDEXING (LMAI) - The present invention relates to Latent Metonymical analysis and Indexing (LMai) is a novel concept for Advance Machine Learning or Unsupervised Machine Learning Techniques, which uses a statistical approach to identify the relationship between the words in a set of given documents (Unstructured Data). This approach does not necessarily need training data to make decisions on matching the related words together but actually has the ability to do the classification by itself. All that is needed is to give the algorithm a set of natural documents. The method is elegant enough to classify the relationships automatically without any human guidance during the process as shown in FIGS. 05-06-2010
20100114562DOCUMENT PROCESSOR AND ASSOCIATED METHOD - A computer implemented method of processing a digitally encoded document having a text composed by an author by using a processor to analyse the segmentation, punctuation and linguistics of text and storing the results in a digitally accessible format. Author traits are then predicted using a machine learning system based on the results of the segmentation, punctuation and linguistics analysis of the text.05-06-2010
20090150142BEHAVIOR DETERMINATION APPARATUS AND METHOD, BEHAVIOR LEARNING APPARATUS AND METHOD, ROBOT APPARATUS, AND MEDIUM RECORDED WITH PROGRAM - A robot includes a knowledge acquisition unit for extracting words from external instruction information, a network construction unit for constructing a network from the extracted words and updating weightings between the words, and a behavior determination unit for determining a behavior on the basis of a word network in which relationships between the words are weighted on a network.06-11-2009
20100235163METHOD AND SYSTEM FOR ENCODING CHINESE WORDS - A Chinese character or word encoding system and method for encoding a Unicode Differentiation Index (UDI) into the least significant 3 bits of one of the three component color of the foreground color of the RTF Chinese text. This encoded UDI value allows the correct identification of the encoded Chinese word. It also allows the identification of the traditional Chinese or simplified Chinese counterpart correctly. Further, the encoded UDI allows the identification of the font file differentiator when user is generating a correct Dualese script for a given Chinese word, wherein Dualese refers to a dual-script-in-one type of script.09-16-2010
20090112578Handheld Electronic Device and Method for Disambiguation of Compound Text Input and for Prioritizing Compound Language Solutions According to Completeness of Text Components - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to prioritize compound language solutions according to various criteria, including the degree of completeness of the text components of a compound language solution.04-30-2009
20090150141Method and system for learning second or foreign languages - The present invention provides a method for providing linguistically interesting terms to a user, the method comprising processing a received digital text by a natural language processing technology, and then comparing the processed digital text with a linguistically interesting term database with a plurality of predetermined linguistically interesting terms. When the processed digital text has at least one predetermined linguistically interesting term, then at least one predetermined linguistically interesting term is extracted and is identified in a display.06-11-2009
20120239382RECOMMENDATION METHOD AND RECOMMENDER COMPUTER SYSTEM USING DYNAMIC LANGUAGE MODEL - A recommendation method and a recommender computer system using dynamic language model are provided. The recommender computer system using dynamic language model includes a language model constructing computer module, a language model adapting computer module, a sentence selecting computer module and a sentence recommendation computer module. The language model constructing computer module is used for constructing a language model. The language model adapting computer module is used for dynamically emerging different language models to construct a dynamic language model. The sentence selecting computer module generates a plurality of recommended sentences from a database according to a search keyword. The sentence recommendation computer module analyzes the difference level between the recommended sentences and the dynamic language model and sorts recommended sentences to provide a recommendation list.09-20-2012
20120239383SYSTEM AND METHOD OF SPOKEN LANGUAGE UNDERSTANDING IN HUMAN COMPUTER DIALOGS - A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self-contained clauses, identifying a dialog act in each of the self-contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify.09-20-2012
20120239381SEMANTIC PHRASE SUGGESTION ENGINE - A semantic phrase suggestion engine that provides term and sentence suggestions based on context-specific user groups. Knowledge domains within a semantic network may be automatically derived from user software applications, and each term within the knowledge domain includes meta-data about the terms, e.g., term type and an importance indicator. The indicators may be defined within the context of specific user groups and relate to how many times that group has used the term (e.g., in documents, emails, etc.) The semantic phrase suggestion engine may also include spelling conditions and grammar conditions, which can then provide phrase suggestions according to the conditions and importance indicators, specific to a user group.09-20-2012
20120239380Classification-Based Redaction in Natural Language Text - When redacting natural language text, a classifier is used to provide a sensitive concept model according to features in natural language text and in which the various classes employed are sensitive concepts reflected in the natural language text. Similarly, the classifier is used to provide an utility concepts model based on utility concepts. Based on these models, and for one or more identified sensitive concept and identified utility concept, at least one feature in the natural language text is identified that implicates the at least one identified sensitive topic more than the at least one identified utility concept. At least some of the features thus identified may be perturbed such that the modified natural language text may be provided as at least one redacted document. In this manner, features are perturbed to maximize classification error for sensitive concepts while simultaneously minimizing classification error in the utility concepts.09-20-2012
20120109642COMPUTER-IMPLEMENTED PATENT PORTFOLIO ANALYSIS METHOD AND APPARATUS - A computer-implemented apparatus and method for performing patent portfolio analysis. The patent portfolio analysis apparatus and method clusters a group of patents based upon one or more techniques. The clustering techniques include linguistic clustering techniques (e.g., eigenvector analysis), claim meaning, and patent classification techniques. Different aspects of the clusters are analyzed, including financial, claim breadth, and assignee patent comparisons. Moreover, patents and/or their clusters are linked to the Internet in order to determine what products might be covered by the claims of the patents or whether materials on the Internet might render patent claims invalid.05-03-2012
20120109641METHOD, SYSTEM, AND APPARATUS FOR VALIDATION - In a method for validating data, a text of a document is received. At least one fact is extracted from the text. At least one expert refinement is merged with the at least one fact to create at least one modified fact. The at least one modified fact is provided for a review. An expert refinement to the at least one modified fact is captured in response to the review. A superset document based on the at least one pre-existing refinement and the expert refinement is stored.05-03-2012
20120109639METHOD, COMPUTER PROGRAM AND APPARATUS FOR ANALYZING SYMBOLS IN A COMPUTER SYSTEM - The present invention provides a computer-implemented method of analyzing messages in a computer system to allow workflows constituted by the messages to be identified, the method comprising: analyzing a sequence of messages in a computer system in order to classify the messages, thereby producing a corresponding sequence of classifications of the messages; and, applying sequence induction to the sequence of classifications of the messages to produce (i) a set or sub-sequences of the classifications of the messages and (ii) a sequence grammar for the sub-sequences, from which a workflow constituted by the sequence of messages can be identified.05-03-2012
20120109638ELECTRONIC DEVICE AND METHOD FOR EXTRACTING COMPONENT NAMES USING THE SAME - A method for extracting component names from a document reads text content of the document, searches for component labels in the text content, and stores a position of each component label in the text content in a storage device. The method further extract a component name corresponding to each component label in the text content according to the position of each component label, and creates a component table according to the component label and the component name.05-03-2012
20120109637EXTRACTING RICH TEMPORAL CONTEXT FOR BUSINESS ENTITIES AND EVENTS - Methods and apparatus for performing computer-implemented extraction of temporal information for business entities and events are disclosed. In one embodiment, a sequence of text is obtained. A label is assigned to one or more of a plurality of segments of the text such that each of the one or more of the plurality of segments of the text is classified as temporal data in one of a plurality of classes of temporal data. One or more rules are applied to the one or more segments of the text that have been classified as temporal data to generate a structured representation of the temporal data, where the rules include one or more schematic rules. Each of the schematic rules pertains to one or more of the plurality of classes of temporal data and indicates a structure in which temporal data in the corresponding one or more of the plurality of classes is to be stored.05-03-2012
20120109636SUBSTITUTION, INSERTION, AND DELETION (SID) DISTANCE AND VOICE IMPRESSIONS DETECTOR (VID) DISTANCE - A device may receive user input, select two strings to compare based on the user input, obtain a first set of keyboard codes for a first of the two strings, obtain a second set of keyboard codes for a second of the two strings, and determine a distance between the two strings based on the first and the second set of keyboard codes. In addition, the device may send a result associated with determining the distance to another device, store the result in a storage device, or display the result.05-03-2012
20090138258Natural language enhanced user interface in a business rule management system - Some embodiments of a natural language enhanced user interface in a business rule management system have been presented. In one embodiment, one or more rule templates in a natural language are generated from one or more prefabricated sentences. Then a user interface is created using the one or more rule templates to allow a user to compose rules for a business rule management system.05-28-2009
20120035915LANGUAGE MODEL CREATION DEVICE, LANGUAGE MODEL CREATION METHOD, AND COMPUTER-READABLE STORAGE MEDIUM - The present invention uses a language model creation device 02-09-2012
20120035916Handheld Electronic Device and Associated Method Employing a Multiple-Axis Input Device and Learning a Context of a Text Input For Use by a Disambiguation Routine - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate text input. In addition to identifying and outputting representations of language objects that are stored in the memory and that correspond with a text input, the device is able to employ contextual data in certain circumstances to prioritize output and to learn new contextual data.02-09-2012
20110172989INTELLIGENT AND PARSIMONIOUS MESSAGE ENGINE - A message engine for analyzing or examining a message and generating a textual description of the message. The message engine can provide a textual description of a voice message. The message engine does not present a speech to text conversion of the complete voice message (that is, it does not convert the entire message to text and present the textual version of the entire voice message to the user). Rather, the message engine presents only the conceptual key words that describe the essence of the voice message to the user. As such, the message engine is a more intelligent version of a speech-to-text convertor. An exemplary message engine will only present in text the key conceptual words of the message rather than the entire speech to text translation of the whole message.07-14-2011
20090292527Methods, Apparatuses and Computer Program Products for Receiving and Utilizing Multidimensional Data Via A Phrase - Methods, apparatuses and computer program products are provided for receiving multidimensional data via a phrase. In this regard, various exemplary embodiments may guide a user in defining a phrase on a segment-by-segment basis. Recommendations may be provided to the user to guide the user in defining the segment to thereby define the phrase. Upon defining the phrase, the phrase may be parsed into one or more segments. The parsed segments may provide information about the phrase, and content associated with the parsed segments may be linked to data fields of, for example, a search engine or database. Using the linked data fields, operations may be performed with respect to the phrase including searches for data or storage of data.11-26-2009
20090182553METHOD AND APPARATUS FOR GENERATING A LANGUAGE INDEPENDENT DOCUMENT ABSTRACT - A method of extracting significant phrases from one or more documents stored in a computer-readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract.07-16-2009
20100100371Method, System, and Apparatus for Message Generation - Methods, systems, and apparatuses for message generation receive one or more keywords that are indicative of a message subject matter. Based on the keywords, information related to the keywords is searched, and message preferences of a message recipient are determined. A natural language message is created using the information related to the keywords and the message preferences, and the message is sent to the message recipient.04-22-2010
20110172991SENTENCE EXTRACTING METHOD, SENTENCE EXTRACTING APPARATUS, AND NON-TRANSITORY COMPUTER READABLE RECORD MEDIUM STORING SENTENCE EXTRACTING PROGRAM - A sentence similar to the sampling sentence group can be efficiently extracted from the extraction target sentence group by repeating the process of narrowing a plurality of pairs of morphemes extracted from the sampling sentence group in the order of closer number of higher similarity to the extraction target sentence including each pair of morphemes.07-14-2011
20120143595FAST TITLE/SUMMARY EXTRACTION FROM LONG DESCRIPTIONS - Techniques are described herein for automatic generation of a title or summary from a long body of text. A grammatical tree representing one or more sentences of the long body of text is generated. One or more nodes from the grammatical tree are selected to be removed. According to one embodiment, a particular node is selected to be removed based on its position in the grammatical tree and its node-type, where the node type represents a grammatical element of the sentence. Once the particular node is selected, a branch of the tree is cut at the node. After branch has been cut, one or more sub-sentences are generated from the remaining nodes in the grammatical tree. The one or more sub-sentences may be returned as a title or summary.06-07-2012
20090204391GAMING MACHINE WITH CONVERSATION ENGINE FOR INTERACTIVE GAMING THROUGH DIALOG WITH PLAYER AND PLAYING METHOD THEREOF - A player inputs a message or a conversation sentence in the form of a sound or characters into an input unit of a gaming machine to request an inquiry of a history of games with the gaming machine in the past or an inquiry of a gaming history of the player. Then, the message inputted in the sound or the characters is analyzed by a conversation engine, and the history of the games with the gaming machine in the past or the gaming history of the player in the past, which is a target of the request of the inquiry, is read out of a memory or a portable memory owned by the player. Further, data on a message or a response sentence including the history thus read out are created by the conversation engine and are outputted in the sound or characters from an output unit.08-13-2009
20100125451Natural Language Recognition Using Context Information - A method of recognising digital ink input by a user into a computer-based digital ink recognition system is disclosed. The user interacts with a paper-based document. The paper-based document has disposed thereon coded data indicative of a particular field of the paper-based document and of at least one reference point of the paper-based document. An image sensor in a sensing device captures images of at least some of the coded data when the sensing device is placed in an operative position relative to the paper-based document. The sensing device then decodes at least some of the coded data to form indicating data indicative of the identity of the field of the paper-based document containing the coded data and at least one of a position and a movement of the sensing device relative to the paper-based document. A server receives the indicating data from the sensing device, and processes the indicating data using a recognizer residing on the server to produce intermediate format data. The intermediate format data is then transmitted to an application which decodes the intermediate format data into computer-readable format data using context information associated with the paper-based document.05-20-2010
20100125450SYNCHRONIZED TRANSCRIPTION RULES HANDLING - Methods, systems, and software are disclosed for providing rule handling functionality in a distributed transcription environment. Some embodiments provide client-server workflow management for providing and supporting distributed transcription services. Other embodiments provide audio-to-text synchronization to support certain transcription functionality. Still other embodiments provide logging functionality to support quality, personnel, billing, and/or other enterprise tasks. And other embodiments provide functionality to support rule generation, editing, validation, and/or execution.05-20-2010
20080288244METHOD AND SYSTEM FOR AUTOMATICALLY DETECTING MORPHEMES IN A TASK CLASSIFICATION SYSTEM USING LATTICES - In an embodiment, a lattice of phone strings in an input communication of a user may be recognized, wherein the lattice may represent a distribution over the phone strings. Morphemes in the input communication of the user may be detected using the recognized lattice. Task-type classification decisions may be made based on the detected morphemes in the input communication of the user.11-20-2008
20090119095Machine Learning Systems and Methods for Improved Natural Language Processing - Disclosed is a method to generate at least one new set of concepts to be used to perform natural language processing (NLP) on data. The method includes receiving one or more sources of input data, and determining, based on the one or more sources of input data and on at least one initial set of concepts, at least one attribute representative of a type of information detail to be included in the at least one new set of concepts.05-07-2009
20110172990Knowledge Utilization - Data is organized in a knowledge network by defining a set of nodes, each node comprising data describing knowledge and a task pertinent to the knowledge, and defining relationships between the nodes based on the data.07-14-2011
20100100370SELF-ADJUSTING EMAIL SUBJECT AND EMAIL SUBJECT HISTORY - In one embodiment, an apparatus for automated generation of subject line content for e-mail messages includes an input operable to receive content data including text-based information corresponding to a body of an e-mail message, a text analyzer including logic operable to analyze received content data, a topic extractor including logic operable to extract topic data in accordance with an output of the text analyzer, a string generator including logic operable to generate subject line data in accordance with an output of the topic extractor, and a message output operable to output a multi-field e-mail message having a body field inclusive of the content data and a subject line field inclusive of generated subject line data.04-22-2010
20110172988ADAPTIVE CONSTRUCTION OF A STATISTICAL LANGUAGE MODEL - A statistical language model (SLM) may be iteratively refined by considering N-gram counts in new data, and blending the information contained in the new data with the existing SLM. A first group of documents is evaluated to determine the probabilities associated with the different N-grams observed in the documents. An SLM is constructed based on these probabilities. A second group of documents is then evaluated to determine the probabilities associated with each N-gram in that second group. The existing SLM is then evaluated to determine how well it explains the probabilities in the second group of documents, and a weighting parameter is calculated from that evaluation. Using the weighting parameter, a new SLM is then constructed as a weighted average of the existing SLM and the new probabilities.07-14-2011
20090281792SELF-LEARNING DATA LENSES - A semantic conversion system (11-12-2009
20090281791UNIFIED TAGGING OF TOKENS FOR TEXT NORMALIZATION - Raw input text is received, and divided into sequences of tokens. Each token is marked with a text normalization tag that identifies a text normalization operation to be performed on the token during text normalization. The tags are assigned to the tokens by determining a most likely tag sequence, given the sequence of tokens being processed. The text normalization operations are performed on the tokens in order to provide clean output text, which can be output for further natural language processing.11-12-2009
20120296639Verification of Extracted Data - Facts are extracted from speech and recorded in a document using codings. Each coding represents an extracted fact and includes a code and a datum. The code may represent a type of the extracted fact and the datum may represent a value of the extracted fact. The datum in a coding is rendered based on a specified feature of the coding. For example, the datum may be rendered as boldface text to indicate that the coding has been designated as an “allergy.” In this way, the specified feature of the coding (e.g., “allergy”-ness) is used to modify the manner in which the datum is rendered. A user inspects the rendering and provides, based on the rendering, an indication of whether the coding was accurately designated as having the specified feature. A record of the user's indication may be stored, such as within the coding itself.11-22-2012
20090254336PROVIDING A TASK DESCRIPTION NAME SPACE MAP FOR THE INFORMATION WORKER - Providing for generation of a task oriented data structure that can correlate natural language descriptions of computer related tasks to application level commands and functions is described herein. By way of example, a system can include an activity translation component that can receive a natural language description of an application level task. Furthermore, the system can include a language modeling component that can generate the data structure based on an association between the description of the task and at least one application level command utilized in executing the computer related task. Once generated, the data structure can be utilized to automate computer related tasks by input of a human centric description of those tasks. According to further embodiments, machine learning can be employed to train classifiers and heuristic models to optimize task/description relationships and/or tailor such relationships to the needs of particular users.10-08-2009
20110270603Method and Apparatus for Language Processing - A method for language enhancement, including receiving text, identifying grammatical constructs within the text, and suggesting at least one alternate text portion for at least one original portion of the text, the alternate text portion being consistent with the grammatical constructs of the original portion and having substantially the same meaning as the original portion but conveying a different impression. Apparatus is also described and claimed.11-03-2011
20110208510System and Method for Converting Graphical Call Flows Into Finite State Machines - A method, system and module for automatically converting a call flow into a state-based representation are disclosed. The method comprises walking a call flow and converting each page of the call flow into a rule of a higher level representation of the call flow, augmenting the higher level representation with terminal symbols representing state variable assignments and comparisons associated with decision and computation shapes in the call flow and converting the higher level representation into a state-based representation.08-25-2011
20110208507Speech Correction for Typed Input - A method, computer program product, and system are provided for correcting one or more typed words on an electronic device. The method can include receiving one or more typed words from a text input device and generating one or more candidate words for the one or more typed words. The method can also include receiving an audio stream at the electronic device that corresponds to the one or more typed words. The audio stream can then be translated into text using the one or more candidate words, where the translation includes assigning a confidence score to each of the one or more candidate words. Based on the confidence score associated with each of the one or more candidate words, a candidate word can be selected among the one or more candidate words to represent each portion of the text. A word from the one or more typed words can be replaced with the selected candidate word based on the value of the confidence score associated with the selected candidate word.08-25-2011
20100138215SYSTEM AND METHOD FOR USING ALTERNATE RECOGNITION HYPOTHESES TO IMPROVE WHOLE-DIALOG UNDERSTANDING ACCURACY - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for using alternate recognition hypotheses to improve whole-dialog understanding accuracy. The method includes receiving an utterance as part of a user dialog, generating an N-best list of recognition hypotheses for the user dialog turn, selecting an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list, and responding to the user based on the selected underlying user intention. Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention.06-03-2010
20080243483UTILIZING SPEECH GRAMMAR RULES WRITTEN IN A MARKUP LANGUAGE - The present invention provides a method and apparatus that utilize a context-free grammar written in a markup language format. The markup language format provides a hierarchical format in which grammar structures are delimited within and defined by a set of tags. The markup language format also provides grammar switch tags that indicate a transitions from the context-free grammar to a dictation grammar or a text buffer grammar. In addition, the markup language format provides for the designation of code to be executed when particular grammar structures are recognized from a speech signal.10-02-2008
20080249763USER-TAILORABLE ROMANIZED CHINESE TEXT INPUT SYSTEMS AND METHODS - Methods and systems for romanizing Chinese ideograms allow a user to create a personalized spelling dictionary that converts a user's desired roman-alphabet spelling to an equivalent Chinese character. A phonetic combination from a standard Chinese dialect is selected. The user defines a roman alphabet equivalent of the selected phonetic combination that fits the way the user pronounces the phonetic combination in the user's own dialect or idiolect.10-09-2008
20080249762CATEGORIZATION OF DOCUMENTS USING PART-OF-SPEECH SMOOTHING - A method and system is provided for classifying documents based on the subjectivity of the content of the documents using a part-of-speech analysis to help account for unseen words. A classification system trains a classifier using the parts of speech of training documents so that the classifier can classify unseen words based on the part of speech of the unseen word. The classification system then trains a part-of-speech model using the parts of speech of the n-grams of training data and labels of the training documents, and trains a term model using the term unigrams and labels. To classify a target document, the classification system applies the part-of-speech model to the part-of-speech n-grams of the target document and the term model to term n-grams of the target document.10-09-2008
20080249764Smart Sentiment Classifier for Product Reviews - A sentiment classifier is described. In one implementation, a system applies both full text and complex feature analyses to sentences of a product review. Each analysis is weighted prior to linear combination into a final sentiment prediction. A full text model and a complex features model can be trained separately offline to support online full text analysis and complex features analysis. Complex features include opinion indicators, negation patterns, sentiment-specific sections of the product review, user ratings, sequence of text chunks, and sentence types and lengths. A Conditional Random Field (CRF) framework provides enhanced sentiment classification for each segment of a complex sentence to enhance sentiment prediction.10-09-2008
20110270606SYSTEMS AND METHODS FOR SEMANTIC SEARCH, CONTENT CORRELATION AND VISUALIZATION - Methods and systems for searching over large (i.e., Internet scale) data to discover relevant information artifacts based on similar content and/or relationships are disclosed. Improvements over simple keyword and phrase based searching over internet scale data are shown. Search engines providing accurate and contextually relevant search results are disclosed. Users are enabled to identify related documents and information artifacts and quickly, ascertain, via visualization, which of these documents are original, which are derived (or copied) from a source document or information artifact, and which subset is independently generated (i.e., an original document or information artifact).11-03-2011
20110270605ASSESSING SPEECH PROSODY - A method, system and computer readable storage medium for assessing speech prosody. The method includes the steps of: receiving input speech data; acquiring a prosody constraint; assessing prosody of the input speech data according to the prosody constraint; and providing assessment result where at least of the steps is carried out using a computer device.11-03-2011
20080270120Processing text with domain-specific spreading activation methods - A method for performing natural language processing of free text using domain-specific spreading activation. Embodiments of the present invention ontologize free text using an algorithm based on neurocognitive theory by simulating human recognition, semantic, and episodic memory approaches. Embodiments of the invention may be used to process clinical text for assignment of billing codes, analyze suicide notes or legal discovery materials, and for processing other collections of text. Further, embodiments of the invention may be used to more effectively search large databases, such as a database containing a large number of medical publications.10-30-2008
20120046938LARGE-SCALE SENTIMENT ANALYSIS - A method for determining a sentiment associated with an entity includes inputting a plurality of texts associated with the entity, labeling seed words in the plurality of texts as positive or negative, determining a score estimate for the plurality of words based on the labeling, re-enumerating paths of the plurality of words and determining a number of sentiment alternations, determining a final score for the plurality of words using only paths whose number of alternations is within a threshold, converting the final scores to corresponding z-scores for each of the plurality of words, and outputting the sentiment associated with the entity.02-23-2012
20080228467NATURAL LANGUAGE PARSING METHOD TO PROVIDE CONCEPTUAL FLOW - A method for parsing the flow of natural human language to convert a flow of machine recognizable language into a conceptual flow includes, first, recognizing the lexical structure and then, a basic semantic grouping is determined for the language flow in the lexical structure. The basic semantic grouping is then determined that denotes the main action, occurrence or state of being for the language flow. The responsibility of the main action, occurrence or state of being for the language flow is then determined within the lexical structure followed by semantically parsing the lexical structure. Thereafter, any ambiguities in the responsibilities are resolved in a recursive manner by applying a predetermined set of rules thereto.09-18-2008
20080228468English-Language Translation Of Exact Interpretations of Keyword Queries - The present invention relates to a methodology to translate exact interpretations of keyword queries into meaningful and grammatically correct plain-language queries in order to convey the meaning of these interpretations to the initiator of the search. The method includes the steps of generating at least one grammatically valid plain-language sentence interpretation for a keyword query from a generated sentence plain-language sentence clauses, wherein the grammatically valid plain-language sentence is based upon differing matching elements, and presenting at least one grammatically valid plain-language sentence interpretation for the keyword query to a keyword query system user for the user's review.09-18-2008
20080288243Information Processing Apparatus, Informaton Processing Method, Program, and Recording Medium - Disclosed herein is an information processing apparatus for analyzing text data, including: acquisition means for acquiring the text data; morpheme information registration means for registering morpheme information for use in analyzing the text data morphologically; morphological analysis means for analyzing the text data acquired by the acquisition means; compound word processing rule registration means for registering compound word processing rules for creating a compound word not registered in the morpheme information registration means; and compound word processing means, by use of the compound word processing rules registered in the compound word processing rule registration means, for combining the morphemes included in the morphological analysis information created by the morphological analysis means, into the compound word not registered in the morpheme information registration means and detecting the created compound word.11-20-2008
20090265161TRANSFORMING A NATURAL LANGUAGE REQUEST FOR MODIFYING A SET OF SUBSCRIPTIONS FOR A PUBLISH/SUBSCRIBE TOPIC STRING - A method, apparatus and software is disclosed for transforming a natural language request for modifying a set of subscriptions for a publish/subscribe topic string in which a predetermined element in the request is transformed into a publish/subscribe symbol in the topic string.10-22-2009
20080312905Extracting Tokens in a Natural Language Understanding Application - A method of processing text within a natural language understanding system can include applying a first tokenization technique to a sentence using a statistical tokenization model. A second tokenization technique using a named entity can be applied to the sentence when the first tokenization technique does not extract a needed token according to a class of the sentence. A token determined according to at least one of the tokenization techniques can be output.12-18-2008
20080312907METHOD AND SYSTEM FOR DATA MODELING ACCORDING TO USER PERSPECTIVES - Techniques for the design and use of a perception modeling language for communicating according to the perspective of at least two communicators. The disclosed method and system provide for forming a model including a predetermined number of states and a plurality of related transitions. The disclosed subject matter represents each of said predetermined number of states according to a plurality of perspectives, said perspectives including a plurality of states and a set of related transitions, and forms a perspective language by deriving a plurality of functions associating said plurality of perspectives for representing at least one actually observable system. Furthermore, the perspective modeling language derives a set of modeling perspectives for modeling said at least one actually observable system.12-18-2008
20080312908SYSTEMS AND METHODS FOR NORMALIZATION OF LINGUISTIC STRUCTURES - A text passage is analyzed to determine whether it contains a “be” verb or a “have” verb. If so, syntactic dependencies are obtained from the text passage, a direct object relation involving the “be” verb or “have” verb is obtained, and a verbal form of a noun appearing in the first direct object relation is obtained. The syntactic dependencies are rewritten based on the verbal form of the noun. Different syntactic rewriting criteria are applied if the text passage also contains a noun object preceding a past participle verb, or also contains an active present participle verb.12-18-2008
20080312904Sub-Model Generation to Improve Classification Accuracy - A method of classifying text input for use with a natural language understanding system can include determining classification information including a primary classification and one or more secondary classifications for a received text input using a statistical classification model (statistical model). A statistical classification sub-model (statistical sub-model) can be selectively built according to a model generation criterion applied to the classification information. The method further can include selecting the primary classification or the secondary classification for the text input as a final classification according to the statistical sub-model and outputting the final classification for the text input.12-18-2008
20080312906Reclassification of Training Data to Improve Classifier Accuracy - A method of creating a statistical classification model for a classifier within a natural language understanding system can include processing training data using an existing statistical classification model. Sentences of the training data correctly classified into a selected class of the statistical classification model can be selected. The selected sentences of the training data can be assigned to a fringe group or a core group according to confidence score. The training data can be updated by associating the fringe group with a fringe subclass of the selected class and the core group with a core subclass of the selected class. A new statistical classification model can be built from the updated training data. The new statistical classification model can be output.12-18-2008
20090326920Linguistic Service Platform - Linguistic service platform techniques are described. In implementations, one or more computer-readable media comprise instructions that are executable by a computer to designate a linguistic service having a particular property responsive to an application program interface call specifying the property. Communication may be brokered between the linguistic service and the application so that communication occurs without the application directly communicating with the linguistic service.12-31-2009
20120296636TAXONOMY AND APPLICATION OF LANGUAGE ANALYSIS AND PROCESSING - Words can be identified in text. Membership numerical values for the words can be determined in categories, or in communication types generated using those categories. The membership numerical values for the words can then be used to generate a signature. The signature can then be used to identify documents with a similar attitude.11-22-2012
20080281580DYNAMIC PARSER - The subject disclosure pertains to systems and methods for dynamic parsing. A dynamic parser can perform syntactic analysis or parsing of input data consisting of a set of tokens based upon a provided grammar including conditional tokens. While the parser grammar can be fixed, the dynamic parser can utilize an independent transform function at parse time to translate or replace particular tokens, effectively performing dynamic parsing. The transform function can be utilized in conjunction with conditional tokens to selectively activate and deactivate particular grammar rules. Additionally, systems and methods for automatically generating a dynamic parser from a grammar description are described herein.11-13-2008
20100010802System and Method for User Skill Determination - A system comprises a user interface configured to receive natural language input from a user. An input module couples to the user interface and is configured to process the received natural language input for selected words and phrases. A user skill determination module couples to the input module and is configured to determine a skill level of the user based on the selected words and phrases.01-14-2010
20120296634SYSTEMS AND METHODS FOR CATEGORIZING AND MODERATING USER-GENERATED CONTENT IN AN ONLINE ENVIRONMENT - Exemplary embodiments provide systems, devices and methods for computer-based categorization and moderation of user-generated content for publication of the content in an online environment. Exemplary embodiments automatically determine a probability value indicating that the user-generated content is either a positive example or a negative example of one or more unsuitable categories. If the user-generated content is determined to be a positive example of any of the unsuitable categories to a predefined degree of certainty, exemplary embodiments may automatically exclude the content from publication in the online environment.11-22-2012
20120296637METHOD AND APPARATUS FOR CALCULATING TOPICAL CATEGORIZATION OF ELECTRONIC DOCUMENTS IN A COLLECTION - A computer implemented method calculates topical categorization of electronic documents in a collection. A processor applies a metric to categorize semantic distance between two sections of a document or between two documents. The processor executes a topic algorithm using the categorization provided by the metric to determine topic boundaries. Topics are extracted based upon the topic boundaries; and the extracted topics are compared for similarity with topics in other documents for organizational and research purposes.11-22-2012
20120296635USER-MODIFIABLE WORD LATTICE DISPLAY FOR EDITING DOCUMENTS AND SEARCH QUERIES - An “Interactive Word Lattice” provides a user interface for interacting with and selecting user-modifiable paths through a lattice-based representation of alternative suggested text segments in response to a user's text segment input, such as phrases, sentences, paragraphs, entire documents, etc. More specifically, the user input is provided to a trained paraphrase generation model that returns a plurality of alternative text segments having the same or similar meaning as the original user input. An interactive graphical lattice-based representation of the alternative text segments is then presented to the user. One or more words of each alternative text segment represents a “node” of the lattice, while each connection between nodes represents a lattice “edge. Both nodes and edges are user modifiable. Each possible path through the lattice corresponds to a different alternative text segment. Users select a path through the lattice to select an alternative text to the original input.11-22-2012
20080228466LANGUAGE NEUTRAL TEXT VERIFICATION - A resource string associated with output text is identified. A regular expression pattern is generated from the resource string. The regular expression pattern is matched to the output text. A verification result based on the matching of the regular expression pattern to the output text is provided.09-18-2008
20090089047Natural Language Hypernym Weighting For Word Sense Disambiguation - Technologies are described herein for probabilistically assigning weights to word senses and hypernyms of a word. The weights can be used in natural language processing applications such as information indexing and querying. A word hypernym weight (WHW) score can be determined by summing word sense probabilities of word senses from which the hypernym is inherited. WHW scores can be used to prune away hypernyms prior to indexing, to rank query results, and for other functions related to information indexing and querying. A semantic search technique can use WHW scores to retrieve an entry related to a word from an index in response to matching an indexed hypernym of the word with a query term applied to the index. More refined and accurate query results may be provided based on reduced user inputs.04-02-2009
20090157388METHOD AND DEVICE FOR OUTPUTTING INFORMATION AND/OR STATUS MESSAGES, USING SPEECH - In a method and device for outputting information and/or messages from at least one device using speech, the information and/or messages required for vocal output are provided in a voice memory, the information and/or messages are read by a processing device according to a demand, and the information and/or messages are output via acoustic output device. The information and/or messages are output with a varying intonation according to their relevance.06-18-2009
20080270117Method and system for text compression and decompression - Creation and recovering of the pseudo-code (Y) form the basis of the present method of text compression and decompression. The pseudo-code (Y) is created by formula Y=C+X. The pseudo-code includes information of a repeating index/symbol (constant C) and a current index/symbol (X). The pseudo-code (Y) is converted back into original information by formula X=Y−C. To service the pseudo-code one needs to convert original symbols of text into indexes, and to create a permanent and temporary vocabulary. The present permanent vocabulary is a redundant vocabulary built in advance, includes dictionary with common symbols taken from books, articles, and dictionaries, and serves as a reference vocabulary stored in the permanent memory. The temporary vocabulary is built and is used during compression and decompression processes. The functionality of the temporary vocabulary is to convert a high bit length of indexes belonging to the permanent vocabulary into a low bit length indexes present in the temporary vocabulary.10-30-2008
20100138216 METHOD FOR THE EXTRACTION OF RELATION PATTERNS FROM ARTICLES - A method for building a knowledge base containing entailment relations, including 06-03-2010
20100145676METHOD AND APPARATUS FOR ADJUSTING THE LENGTH OF TEXT STRINGS TO FIT DISPLAY SIZES - The various aspects provide methods and devices which can reduce the length of a text string to fit dimensions of a display by identifying and deleting elements of the string that are not essential to its meaning. In the various aspects, handheld devices may be configured with software configured to analyze and modify text strings to shorten their length by adjusting font size, changing fonts, deleting unnecessary words, such as articles, abbreviating some words, deleting letters (e.g., vowels) from some words, and deleting non-critical words. The order in which transformations are affected may vary depending upon the text string according to a priority of transformations. Such transformation operations may be applied incrementally until the text string fits within the display size requirements. Similar methods may be implemented to increase the length of text strings by adding words in a manner that does not substantially change the meaning of the text string.06-10-2010
20120035914System and method for handling multiple languages in text - A system and method for processing text are disclosed. The method includes receiving text to be processed. A main language of the text is identified. At least one unknown sequence in the text is identified, each unknown sequence comprising at least one word that is unknown in the main language. For a secondary language, for each of the at least one unknown sequence, the method includes determining whether the unknown sequence includes a first word recognized in the secondary language and, if so, identifying a sequence of words in the secondary language which includes at least the first word. The identifying of the sequence of words in the secondary language includes applying an algorithm for determining whether the sequence of words in the secondary language is expandable beyond the first word to include adjacent words. The text is labeled based on the identified sequences of words in the secondary language.02-09-2012
20090099839System And Method For Prospecting Digital Information - A system and method for prospecting digital information is provided. A home evergreen index for a home subject area within a corpus of digital information is maintained and includes topic models matched to the corpus. A frontier evergreen index for a frontier subject area within the corpus topically distinct from the home subject area is identified. Quality assessments for frontier articles from the corpus identified by the topic models of the frontier evergreen index are obtained. The frontier articles with positive quality assessments are reclassified against the topic models in the home evergreen index. The frontier articles are provided in a display with home articles previously classified against the topic models in the home evergreen index.04-16-2009
20110208511METHOD AND SYSTEM FOR ANALYZING TEXT - An apparatus for providing a control input signal for an industrial process or technical system having one or more controllable elements includes elements for generating a semantic space for a text corpus, and elements for generating a norm from one or more reference words or texts, the or each reference word or text being associated with a defined respective value on a scale, and the norm being calculated as a reference point or set of reference points in the semantic space for the or each reference word or text with its associated respective scale value. Elements for reading at least one target word included in the text corpus, elements for predicting a value of a variable associated with the target word based on the semantic space and the norm, and elements for providing the predicted value in a control input signal to the industrial process or technical system. A method for predicting a value of a variable associated with a target word is also disclosed together with an associated system and computer readable medium.08-25-2011
20110208508Interactive Language Training System - An interactive language training system allows practice word/phrase lists customized by the user for training. The customized lists may include words/phrases extracted from content sources based upon user selections. Extraction may include analysis of the content sources to determine word/phrase frequency, topic, and/or other parameters. A video sample of a student speaking a word/phrase is compared with video examples of a speaker and provides visual feedback for pronunciation and articulation. Progress is monitored for each word/phrase on the list, and performance feedback provided. Communication, such as voice or video chat may be established between the student and a speaker to provide for additional practice.08-25-2011
20090125297AUTOMATIC GENERATION OF DISTRACTORS FOR SPECIAL-PURPOSE SPEECH RECOGNITION GRAMMARS - A computer-implemented method for dynamically generating a speech recognition grammar is provided. The method includes determining a target entry, and accessing a plurality of potential distractors. The method also includes selecting one or more distractors from the plurality of potential distractors. More particularly, each potential distractor selected is selected based upon an assessed acoustic dissimilarity between the distractor and the target entry. The method further includes dynamically generating a speech recognition grammar that includes the target entry and one or more of the distractors selected based upon an acoustic dissimilarity to the target entry.05-14-2009
20100145679Handheld Electronic Device With Text Disambiguation - In view of the foregoing, an improved handheld electronic device includes a keypad in the form of a reduced QWERTY keyboard and is enabled with disambiguation software. As a user enters keystrokes, the device provides output in the form of a default output and a number of variants from which a user can choose. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry, and when initiating an activity session on a word such as during editing, the display outputs variants of the entire word being edited, rather than providing as variants only those parts of a word that are being edited. The device also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. In certain predefined circumstances, the disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device.06-10-2010
20100145677System and Method for Making a User Dependent Language Model - A language model for a speech recognition engine is made based on user-viewed data files. The data files are reviewed and texts are extracted therefrom. The language model is generated based on the extracted texts. Transcriptions of previous user statements are not required. Different weighting factors can be applied to elements of the extracted texts based on the nature of the data files. The weighting factors are then considered during generation of the language model. A user dependent and application independent language model can be created prior to initial use of the speech recognition engine.06-10-2010
20090070101DEVICE FOR AUTOMATICALLY CREATING INFORMATION ANALYSIS REPORT, PROGRAM FOR AUTOMATICALLY CREATING INFORMATION ANALYSIS REPORT, AND METHOD FOR AUTOMATICALLY CREATING INFORMATION ANALYSIS REPORT03-12-2009
20110208512METHOD AND SYSTEM FOR GENERATING DERIVATIVE WORDS - The present invention provides a method for generating derivative words including the steps of: creating a number of derivative grammar arrays; matching the inputting character information with the derivative grammar arrays and obtaining the match derivative grammar arrays; obtaining match words from the language database according to the condition arrays of the obtained derivative grammar arrays and the inputting character information; and generating derivative words by adding the suffix alphabetic character sets of the obtained derivative grammar arrays to the ends of the words. In accordance with the established grammar rules, the words in the language database can be converted to derivative words and the derivative words do not need to be stored in the language database. Therefore, the storage space of the language database can be remarkably reduced. The present invention also provides a system for generating derivative words.08-25-2011
20090182554TEXT ANALYSIS METHOD - A list of reference terms can be provided. Text and the list of reference terms can be broken down into tokens. At least one candidate can be generated in the text for mapping to at least one of the reference terms. Characters of the candidate can be compared to characters of the reference term according to one or more mapping rules. A confidence value of the mapping can be generated based on the comparison of characters. Candidates can be ranked according to their confidence value.07-16-2009
20090119094APPARATUS AND METHOD FOR LINGUISTIC SCORING - In embodiments of the invention, a system receives selections from a user based on a list of pre-defined monitoring categories and/or optionally receives custom category definitions from the user. The option for custom category definitions may be advantageous due to the flexibility provided to a system administrator or other user. In embodiments of the invention, the pre-defined and/or custom monitoring categories may be or include complex hierarchical behavior. Such an approach provides monitoring algorithms that can achieve improved accuracy compared to known methods. In embodiments of the invention, the order of computations used in resolving a monitoring category may be re-ordered, statically and/or dynamically, to improve the efficiency of monitoring operations.05-07-2009
20090119093METHOD AND SYSTEM TO PARSE ADDRESSES USING A PROCESSING SYSTEM - A method and system for parsing an address is disclosed. The method and system comprise separating the address into a plurality of tokens and providing one or more token meaning discovery passes based upon region specific configuration information to determine the meaning of each token in the address. In so doing, an address can be parsed by a processing system in an efficient and effective fashion. By disclosing the meaning of each token of the address in accordance with a region specific configuration information rule set a parsing process is provided which allows for easy modification as the requirements for the parsing change.05-07-2009
20090094020Recommending Terms To Specify Ontology Space - In one embodiment, a set of target search terms for a search is received. Candidate terms are selected, where a candidate term is selected to reduce an ontology space of the search. The candidate terms are to a computer to recommend the candidate terms as search terms. In another embodiment, a document stored in one or more tangible media is accessed. A set of target tags for the document is received. Terms are selected, where a term is selected to reduce an ontology space of the document. The terms are sent to a computer to recommend the terms as tags.04-09-2009
20090164208METHOD AND APPARATUS FOR ALIGNING PARALLEL SPOKEN LANGUAGE CORPORA - The method for aligning parallel spoken language corpora comprises obtaining a statistics method and dictionaries-based word alignment set from the parallel spoken language corpora, aligning chunks of the parallel spoken language corpora by using the statistics method and dictionaries-based word alignment set, to obtain a chunk alignment set, and aligning words in aligned chunks of the parallel spoken language corpora to obtain a chunk alignment-based word alignment set. Chunk alignment set and word alignment set are obtained by aligning chunks in parallel spoken language corpora in a corpus repository using a statistics method and dictionaries-based high precision word alignment set obtained from the parallel spoken language corpora and further aligning words in the chunks, and by using them in the speech-to-speech machine translation, the ambiguities of spoken language word alignment can be decreased by using the integrality of chunks.06-25-2009
20090076795System And Method Of Generating Responses To Text-Based Messages - In accordance with one aspect of the present invention, an automated method of and system for generating a response to a text-based natural language message is disclosed. The method includes identifying a sentence in the text-based natural language message. Also, identifying an input clause in the sentence. Further, comparing the input clause to a previously received clause, where the previously received clause is correlated with a previously generated response message. Additionally, generating an output response message based on the previously generated response message. The system includes means for performing the method steps.03-19-2009
20090138257Document analysis, commenting, and reporting system - A document analysis, commenting, and reporting system provides tools that automate quality assurance analysis tailored to specific document types. As one example, the specific document type may be a requirements specification and the system may tag different parts of requirements, including actors, entities, modes, and a remainder. However, the flexibility of the system permits analysis of any other document type, such as instruction manuals and best practices guides. The system helps avoid confusion over the document when it is delivered because of non-standard terms, ambiguous language, conflicts between document sections, incomplete or inaccurate descriptions, size and complexity of the document, and other issues.05-28-2009
20100023320SYSTEM AND METHOD OF SUPPORTING ADAPTIVE MISRECOGNITION IN CONVERSATIONAL SPEECH - A system and method are provided for receiving speech and/or non-speech communications of natural language questions and/or commands and executing the questions and/or commands. The invention provides a conversational human-machine interface that includes a conversational speech analyzer, a general cognitive model, an environmental model, and a personalized cognitive model to determine context, domain knowledge, and invoke prior information to interpret a spoken utterance or a received non-spoken message. The system and method creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech or non-speech communication and presenting the expected results for a particular question or command.01-28-2010
20090006077SPATIALLY INDEXED GRAMMAR AND METHODS OF USE - Improved systems and methods are described which simplify the individual's interaction with speech recognition software, expand the database of spoken point names that can be recognized, and increase the quality and therefore likelihood of success of speech recognition applications. The present systems and methods apply to various uses, such as providing driving directions, finding the nearest location based service, and finding the nearest “Where Am I?” type of location based services.01-01-2009
20080306730System and method to modify text entry - A system and method to modify entry of text is provided. The system includes an input device, a display device, and a processor configured to store a correlation between at least one word with at least one candidate phrase; receive at least one word into the input device; identify the at least one candidate phrase correlated to the at least one word; replace the at least one word with a selected phrase from the at least one candidate phrase; and store the selected phrase in a computer readable storage medium.12-11-2008
20090055166Method, Computer Program and Apparatus for Analysing Symbols in a Computer System - A computer-implemented method of analysing symbols in a computer system, and a computer program and apparatus therefor are provided. The symbols conform to a specification for the symbols. The specification is codified into a set of computer-readable rules. The symbols are analysed using the computer-readable rules to obtains patterns of the symbols by: determining the path that is taken by the symbols through the rules that successfully terminates, and grouping the symbols according to said paths.02-26-2009
20090055164Method and System of Optimal Selection Strategy for Statistical Classifications in Dialog Systems - An optimal selection or decision strategy is described through an example that includes use in dialog systems. The selection strategy or method includes receiving multiple predictions and multiple probabilities. The received predictions predict the content of a received input and each of the probabilities corresponds to one of the predictions. In an example dialog system, the received input includes an utterance. The selection method includes dynamically selecting a set of predictions from the received predictions by generating ranked predictions. The ranked predictions are generated by ordering the plurality of predictions according to descending probability.02-26-2009
20090326926Displaying Time-Series Data and Correlated Events Derived from Text Mining - The present invention is directed to a method and system for correlating time-series data with events derived from text mining. The system is configured to receive a time period and a parameter concerning an entity, retrieve an event which is related to the entity and occurred within the time period from events which are previously extracted automatically from unstructured text, and display an indication of the event superimposed on a display representing the time series of the parameter for the time period.12-31-2009
20090326925PROJECTING SYNTACTIC INFORMATION USING A BOTTOM-UP PATTERN MATCHING ALGORITHM - Embodiments for converting a token collection that is derived from a natural language expression into a computational independent model (CIM) syntax tree representation are disclosed. In accordance with one embodiment, the conversion includes deriving a plurality of tokens from a natural language expression, where each of the plurality of tokens including at least one word. The conversion further includes transforming the plurality of tokens into a CIM syntax tree representation based on a CIM phrase tree model. The conversion also includes providing the CIM syntax tree representation to an application.12-31-2009
20090326921GRAMMAR CHECKER FOR VISUALIZATION - A visualization development system is provided. The system includes a visualization tool to develop one or more visualizations and a grammar engine that operates with the visualization tool to automatically detect visualization problems during the development of the visualizations.12-31-2009
20090326923METHOD AND APPARATUS FOR NAMED ENTITY RECOGNITION IN NATURAL LANGUAGE - The present invention provides a method for recognizing a named entity included in natural language, comprising the steps of: performing gradual parsing model training with the natural language to obtain a classification model; performing gradual parsing and recognition according to the obtained classification model to obtain information on positions and types of candidate named entities; performing a refusal recognition process for the candidate named entities; and generating a candidate named entity lattice from the refusal-recognition-processed candidate named entities, and searching for a optimal path. The present invention uses a one-class classifier to score or evaluate these results to obtain the most reliable beginning and end borders of the named entities on the basis of the forward and backward parsing and recognizing results obtained only by using the local features.12-31-2009
20090326919Acquisition and application of contextual role knowledge for coreference resolution - Coreference resolution is the process of identifying when two noun phrases (NP) refer to the same entity. Two main contributions to computational coreference resolution are made. First, this work contributes a new method for recognizing when an NP is anaphoric. Second, traditional approaches to coreference resolution typically select the most appropriate antecedent by recognizing word similarity, proximity, and agreement in number, gender, and semantic class. This work contributes a new source of evidence that focuses on the roles that an anaphor and antecedent play in particular events or relationships. I show that using contextual role knowledge as part of the coreference resolution process increases the number of anaphors that can be resolved, and I demonstrate an unsupervised method for acquiring contextual role knowledge that does not require an annotated training corpus. A probabilistic model based on the Dempster-Shafer model of evidence is used to incorporate contextual role knowledge with traditional evidence sources.12-31-2009
20090006079Regular expression word verification - The present disclosure is directed to a method of verifying a compound word. The method includes receiving an input signal indicative of a textual input and accessing a rule and a lexical data structure from data stores. The rule is applied to the textual input to determine whether the textual input is a valid compound word. An output signal is provided that is indicative of whether the textual input is a compound word.01-01-2009
20080319737Method and apparatus for connecting a cellular telephone user to the internet - A method and devices are described for providing a user of a mobile communication apparatus with a prediction of a string of characters based on one or more characters. The method includes the steps of: receiving information that relates to the location of the mobile communication apparatus; receiving character information which is part a the string of characters to be predicted; accessing databases which comprise pre-defined strings of characters and selecting therefrom a group of strings of characters associated with the vicinity at which the mobile communication apparatus is currently located; transmitting the selected group of strings of characters to the mobile communication apparatus; and displaying the selected group of strings of characters at the mobile communication apparatus.12-25-2008
20080319736Discriminative Syntactic Word Order Model for Machine Translation - A discriminatively trained word order model is used to identify a most likely word order from a set of word orders for target words translated from a source sentence. For each set of word orders, the discriminatively trained word order model uses features based on information in a source dependency tree and a target dependency tree and features based on the order of words in the word order. The discriminatively trained statistical model is trained by determining a translation metric for each of a set of N-best word orders for a set of target words. Each of the N-best word orders are projective with respect to a target dependency tree and the N-best word orders are selected using a combination of an n-gram language model and a local tree order model.12-25-2008
20110224971N-Gram Selection for Practical-Sized Language Models - Described is a technology by which a statistical N-gram (e.g., language) model is trained using an N-gram selection technique that helps reduce the size of the final N-gram model. During training, a higher-order probability estimate for an N-gram is only added to the model when the training data justifies adding the estimate. To this end, if a backoff probability estimate is within a maximum likelihood set determined by that N-gram and the N-gram's associated context, or is between the higher-order estimate and the maximum likelihood set, then the higher-order estimate is not included in the model. The backoff probability estimate may be determined via an iterative process such that the backoff probability estimate is based on the final model rather than any lower-order model. Also described is additional pruning referred to as modified weighted difference pruning.09-15-2011
20090024385SEMANTIC PARSER - A method and an apparatus for semantic parsing of electronic text documents. The electronic text documents can comprise a plurality of sentences with several language components. The method comprises analyzing at least one sentence of the electronic text document and dynamically generating a graph from the analyzed sentence of the text document. The graph represents a semantic representation of the analyzed one or more sentences. The method continues the analysis until an ambiguous sentence is determined and analyzed by evaluating at least a portion of the generated graph.01-22-2009
20090248397Service Initiation Techniques - Service initiation techniques are described. In at least one implementation, a computing device receives a selection of text that is displayed in a user interface by an application. Selection is detected of one of a plurality of services that are displayed in the user interface. Responsive to the detection, the selection of text is provided to the selected service without further user intervention.10-01-2009
20110144976APPLICATION USER INTERFACE SYSTEM AND METHOD - The application user interface system and method enables users to give instructions to business software applications by simply entering text in an interface bar in the most intuitive verbal or written human way for carrying out a particular task. The business application, in turn, processes the instruction and presents the summary for review to the users. Upon approval by user, the particular task or request gets executed.06-16-2011
20090055167METHOD FOR TRANSLATION SERVICE USING THE CELLULAR PHONE - Disclosed is a method for providing translation service using a mobile communication terminal. The method includes a button input step of pressing a voice recognition key to use a voice recognition function, a menu screen provision step of selecting a translator menu item, a translation recognition method determination step of selecting a sentence input method or a word input method, a Korean input step of inputting Korean, a confirmation step of confirming whether a completed Korean sentence matches an intended sentence, and a translated sentence output step of providing a relevant translated sentence in a text form and reproducing the relevant translated sentence in a voice form.02-26-2009
20090055165DYNAMIC MIXED-INITIATIVE DIALOG GENERATION IN SPEECH RECOGNITION - Disclosed are a method (02-26-2009
20090055163Dynamic Mixed-Initiative Dialog Generation in Speech Recognition - Disclosed are a method (02-26-2009
20090099840Request Content Identification System, Request Content Identification Method Using Natural Language, and Program - A request content identification system performs an audio recognition process according to audio data inputted from an input device (04-16-2009
20110224973SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT FOR DYNAMICALLY CORRECTING GRAMMAR ASSOCIATED WITH TEXT - In accordance with embodiments, there are provided mechanisms and methods for dynamically correcting grammar associated with text. These mechanisms and methods for dynamically correcting grammar associated with text can enable enhanced data display, simplified language support, etc.09-15-2011
20110224972Localization for Interactive Voice Response Systems - A language-neutral speech grammar extensible markup language (GRXML) document and a localized response document are used to build a localized GRXML document. The language-neutral GRXML document specifies an initial grammar rule element. The initial grammar rule element specifies a given response type identifier and a given action. The localized response document contains a given response entry that specifies the given response type identifier and a given response in a given language. The localized GRXML document specifies a new grammar rule element. The new grammar rule element specifies the given response in the given language and the given action. The localized GRXML document is installed in an interactive voice response (IVR) system. The localized GRXML document configures the IVR system to perform the given action when a user of the IVR system speaks the given response to the IVR system.09-15-2011
20090083027AUTOMATIC TEXT SKIMMING USING LEXICAL CHAINS - Automatic text skimming using lexical chains may be provided. First, at least one lexical chain may be created from an electronic document. Next, a list of positions within the electronic document may be created. The positions may include where at least one concept represented by one of the at least one lexical chain is mentioned. In addition, a list of the position where the at least one concept is mentioned may be assembled. A selection of at least one concept may be received from the list.03-26-2009
20090083028AUTOMATIC CORRECTION OF USER INPUT BASED ON DICTIONARY - Methods, systems, and apparatus, including computer program products, in which input keystroke data can be interpreted by a current mapping and a determination can be made whether the current mapping is valid based upon the characters identified by the mapping and the keystroke data. Invalid mappings can be corrected based upon alternative mapping of the keystroke data.03-26-2009
20110231183LANGUAGE MODEL CREATION DEVICE - This device 09-22-2011
20110231182MOBILE SYSTEMS AND METHODS OF SUPPORTING NATURAL LANGUAGE HUMAN-MACHINE INTERACTIONS - A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network.09-22-2011
20120078616Handheld Electronic Device and Associated Method Enabling Spell Checking in a Text Disambiguation Environment - An improved handheld electronic device and associated method enable spell checking in a reduced keyboard and disambiguation environment. The improved spell checking routine converts a misspelled word into a canonical version thereof and receives from a dictionary 03-29-2012
20120078615Multiple Touchpoints For Efficient Text Input - Methods and systems for using multiple simultaneous touchpoints of a touch-sensitive keyboard, such as an on-screen keyboard, for more efficient text input are provided. A method for generating text using a touch-sensitive keyboard may include receiving touch input from multiple simultaneous touchpoints. The method may also include determining a text character for each respective simultaneous touchpoint based on the touch input. The method may further include generating a text word based on the text characters determined from the multiple simultaneous touchpoints. A system for generating text using a touch-sensitive keyboard may include a touch input receiver, a slide detector and a text word generator.03-29-2012
20120078614Virtual keyboard for a non-tactile three dimensional user interface - A method, including presenting, by a computer system executing a non-tactile three dimensional user interface, a virtual keyboard on a display, the virtual keyboard including multiple virtual keys, and capturing a sequence of depth maps over time of a body part of a human subject. On the display, a cursor is presented at positions indicated by the body part in the captured sequence of depth maps, and one of the multiple virtual keys is selected in response to an interruption of a motion of the presented cursor in proximity to the one of the multiple virtual keys.03-29-2012
20120078613METHOD, SYSTEM, AND COMPUTER READABLE MEDIUM FOR GRAPHICALLY DISPLAYING RELATED TEXT IN AN ELECTRONIC DOCUMENT - Disclosed herein are systems and methods for navigating electronic texts. According to an aspect, a method may include receiving search criteria for searching an electronic text. Further, the method may include determining text subgroups within the electronic text. The method may also include determining, for each text subgroup, a similarity relationship between the search criteria and the text subgroup. Further, the method may include presenting, for each text subgroup, a graphic representing the similarity relationship between the text subgroup and the search criteria.03-29-2012
20120078612SYSTEMS AND METHODS FOR NAVIGATING ELECTRONIC TEXTS - Disclosed herein are systems and methods for navigating electronic texts. According to an aspect, a method may include determining text subgroups within an electronic text. The method may also include selecting a text seed within one of the text subgroups. Further, the method may include determining a similarity relationship between the text seed and one or more adjacent text subgroups that do not include the selected text seed. The method may also include associating the text seed with the one or more adjacent text subgroups based on the similarity relationship to create a text cluster.03-29-2012
20120078611CONTEXT-AWARE CONVERSATIONAL USER INTERFACE - An input handler may receive natural language input associated with a command from a user through a user interface, and a language parser may parse the natural language input to determine parsed natural language input. A context monitor may receive context information associated with the user, and a context parser may parse the context information to obtain parsed context information associated with the natural language input and with the command. A command interpreter may interpret the parsed natural language input, using the parsed context information, to thereby determine the command.03-29-2012
20120078610DETERMINING OFFER TERMS FROM TEXT - Systems, methods, and machine readable and executable instructions are provided for determining offer terms from text. A method for determining offer terms from text can include mapping keywords to a domain of a procurement event, and receiving, to a computing device, an offer text associated with the procurement event. Event-specific entities are identified, by the computing device, in the offer text. The computing device determines the domain of the procurement event from the identified event-specific entities, and using the mapped keywords corresponding to the determined domain, determines offer components from the offer text, extracts offer parameters from the offer text, and constructs the offer structure using the identified event-specific entities, derived offer components, and extracted offer parameters.03-29-2012
20090099842Creating A Voice Response Grammar From A Presentation Grammar - Methods, systems, and products are disclosed for creating a voice response grammar in a voice response server including identifying presentation documents for a presentation, each presentation document having a presentation grammar. Typical embodiments include storing each presentation grammar in a voice response grammar on a voice response server. In typical embodiments, identifying presentation documents for a presentation includes creating a data structure representing a presentation and listing at least one presentation document in the data structure representing a presentation. In typical embodiments listing the at least one presentation document includes storing a location of the presentation document in the data structure representing a presentation and storing each presentation grammar includes retrieving a presentation grammar of the presentation document in dependence upon the location of the presentation document.04-16-2009
20090099841AUTOMATIC SPEECH RECOGNITION METHOD AND APPARATUS - A system for calculating the look ahead probabilities at the nodes in a language model look ahead tree, wherein the words of the vocabulary of the language are located at the leaves of the tree, 04-16-2009
20080262831Method for the Natural Language Recognition of Numbers - A method for the natural language recognition of numbers, in particular, for use in a voice recognition system. The recognition method is as the follows: a spoken numeral is detected and digitized, the numeral is broken down into number-related word components, the mutual position of the word components is determined within the numeral, the numerical values corresponding to the word components are compared and recognized using word component-number value pairs maintained in a digital dictionary, and the individual numerical values are strung together and/or added and/or multiplied according to the type and positions thereof of the corresponding word components in the numeral such that the numerical value corresponding to the input numeral is obtained.10-23-2008
20090006080COMPUTER-READABLE MEDIUM HAVING SENTENCE DIVIDING PROGRAM STORED THEREON, SENTENCE DIVIDING APPARATUS, AND SENTENCE DIVIDING METHOD - A typical sentence having a specific typical characteristic in the sentence is divided. A division target typical sentence is divided on the basis of a small clause definition. The sentence is divided where positions suitable for dividing the typical sentence based on the structure are expressed by a user. A small clause string including small clauses that serve as independent sentences is created after the division. The small clause string is compared to the structure patterns, and a structure pattern that is determined to match the small clause string is selected as a result of the typical sentence division.01-01-2009
20080281581METHOD OF IDENTIFYING DOCUMENTS WITH SIMILAR PROPERTIES UTILIZING PRINCIPAL COMPONENT ANALYSIS - The present invention generally provides methods and systems for characterizing texts, for example, for identifying textual documents by language, topic, author, or other attributes. In some embodiments, a method of the invention can include creating an n-gram frequency spectrum for a document under analysis, preferably selecting a subset of the n-gram frequency spectrum, transforming the n-gram frequency spectrum into principal component space, and identifying one or more attributes of the document according to its similarity to (or distinction from) reference documents in the principal component space.11-13-2008
20090210218Deep Neural Networks and Methods for Using Same - A method and system for labeling a selected word of a sentence using a deep neural network includes, in one exemplary embodiment, determining an index term corresponding to each feature of the word, transforming the index term or terms of the word into a vector, and predicting a label for the word using the vector. The method and system, in another exemplary embodiment, includes determining, for each word in the sentence, an index term corresponding to each feature of the word, transforming the index term or terms of each word in the sentence into a vector, applying a convolution operation to the vector of the selected word and at least one of the vectors of the other words in the sentence, to transform the vectors into a matrix of vectors, each of the vectors in the matrix including a plurality of row values, constructing a single vector from the vectors in the matrix, and predicting a label for the selected word using the single vector.08-20-2009
20090254337COMPUTER-IMPLEMENTED METHOD AND SYSTEM FOR CONDUCTING A SEARCH OF ELECTRONICALLY STORED INFORMATION - A computer-implemented method, system, and computer program product are provided for conducting a search of electronically stored information. The method includes: (a) providing a user with an interactive targeting rule editor to enable the user to formulate a targeting rule to identify desired search results, the targeting rule comprising a natural language text string, the interactive targeting rule editor allowing the user to change one or more designated editable portions of the natural language text string to one of a set of specified alternate portions, delete one or more designated removable portions of the natural language text string, or add one or more of a set of specified insertable portions to form a syntactically valid targeting rule in accordance with a targeting rule grammar; (b) receiving the text string or a representation thereof from the user; (c) translating the text string or a representation thereof into an executable query; and (d) executing the executable query against the electronically stored information to generate search results.10-08-2009
20090248399System and method for analyzing text using emotional intelligence factors - A system, method and computer program products for facilitating the automated reading, disambiguation, analysis, indexing, retrieval and scoring of text by utilizing emotional intelligence-based factors. Text quality is scored based upon character development, rhythm, per-page quality, gaps, and climaxes, among other factors. The scores may be standardized by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation.10-01-2009
20090248400Rule Based Apparatus for Modifying Word Annotations - A rule based apparatus and method for modifying word annotations in an annotated text base is described. The apparatus includes an index creator component for creating an index of word annotations. An annotations modifying component for modifying word annotations, and a retriggering component, responsive to said annotations modifying component, for retriggering a rules engine to modify all occurrences of a matching word annotation in said annotated text base and updating the index of word annotations with the modified occurrences of a matching word annotation in said annotated text base.10-01-2009
20120197630METHODS AND SYSTEMS TO SUMMARIZE A SOURCE TEXT AS A FUNCTION OF CONTEXTUAL INFORMATION - Methods and systems to summarize a source text as a function of contextual information, including to fit a summary within a context-based allotted time. The context-based allotted time may be apportioned amongst multiple portions of the source text, such as by relevance. The context-based allotted time and/or relevance may be user-specified and/or determined, such as by look-up, rule, computation, inference, and/or machine learning. During summary presentation, one or more portions of the source text may be re-summarized, such as to adjust a level of detail. A presentation rate may be user-controllable. Where new and/or changed contextual information affects an available time to review a remaining portion of the summary, the summary presentation may be automatically adjusted, and/or one or more portions of the source text may be re-summarized based on a revised context-based allotted time.08-02-2012
20090259459CONCEPTUAL WORLD REPRESENTATION NATURAL LANGUAGE UNDERSTANDING SYSTEM AND METHOD - A Natural Language Understanding system is provided for indexing of free text documents. The system according to the invention utilizes typographical and functional segmentation of text to identify those portions of free text that carry meaning. The system then uses words and multi-word terms and phrases identified in the free to text to identify concepts in the free text. The system uses a lexicon of terms linked to a formal ontology that is independent of a specific language to extract concepts from the free text based on the words and multi-word terms in the free text. The formal ontology contains both language independent domain knowledge concepts and language dependent linguistic concepts that govern the relationships between concepts and contain the rules about how language works. The system according to the current invention may preferably be used to index medical documents and assign codes from independent coding systems, such as, SNOMED, ICD-9 and ICD-10. The system according to the current invention may also preferably make use of syntactic parsing to improve the efficiency of the method.10-15-2009
20090198488System and method for analyzing communications using multi-placement hierarchical structures - A system and method are provided for analyzing communications to disambiguate the meaning of the analyzed communications using placements into a fixed hierarchical structure based on the words, position and grammar of the communications. The disambiguated meaning of the communications can be used in conjunction with other functional programs (e.g., search engines, email, word processing). The system and method may further analyze communications associated with a communicator to determine a profile (attributes, preferences, relationships, trends, ratios) for the communicator indicated by the communications. Automated communications can be generated to match the attributes of any communicator's preferences stored in their profile, to match the attributes of other communications, or to match certain standards.08-06-2009
20090265162Method for Retrieving Items Represented by Particles from an Information Database - A set of words is converted to a corresponding set of particles, wherein the words and the particles are unique within each set. For each word, all possible partitionings of the word into particles are determined, and a cost is determined for each possible partitioning. The particles of the possible partitioning associated with a minimal cost are added to the set of particles.10-22-2009
20100179805METHOD, APPARATUS, AND COMPUTER PROGRAM PRODUCT FOR ONE-STEP CORRECTION OF VOICE INTERACTION - A one-step correction mechanism for voice interaction is provided. Correction of a previous state is enabled simultaneously with recognition in a current or subsequent state. An application is decomposed into a set of tasks. Each task is associated with the collection of one piece of information. Each task may be in a different state. At any point during the interaction, while a task/state pair is active, the dialog manager may enable multiple other task/state pairs to be active in latent fashion. The application developer may then use those facilities or resources to the active task/state and the latent task/state pairs depending on contextual condition of the interaction state of the application.07-15-2010
20100161316PROBABILISTIC NATURAL LANGUAGE PROCESSING USING A LIKELIHOOD VECTOR - A method for natural language processing on a computing device is described. The computing device receives a free text document. The computing device parses the free text document for gross structure. The gross structure includes sections, paragraphs and sentences. The computing device determines an application of at least one knowledge base. The free text document is parsed for fine structure on the computing device. The fine structure includes sub-sentences. The computing device applies the parsed document and at least one likelihood vector to a Bayesian network. The computing device outputs meanings and probabilities.06-24-2010
20100169077METHOD, SYSTEM AND COMPUTER READABLE RECORDING MEDIUM FOR CORRECTING OCR RESULT - Disclosed is a method, system and computer readable recording medium for correcting an OCR result. According to an exemplary embodiment of the present invention, there is provided a method for correcting an OCR result, the method including performing character recognition on content including character information using an OCR technique, removing extra carriage return information from the content, outputting the character recognition result, and correcting word spacing on the outputted result.07-01-2010
20100153093METHOD AND APPARATUS FOR PROVIDING CASE RESTORATION - A method and apparatus for providing case restoration in a communication network are disclosed. For example, the method obtains one or more content sources from one or more information feeds, and extracts textual information from the one or more content sources obtained from the one or more information feeds. The method then creates or updates a capitalization model based on the textual information.06-17-2010
20100179804Natural Language Assertion Processor - A method of processing natural language assertions (NLAs) can include identifying an NLA and then translating that NLA into a verification language assertion (VLA) using a natural language parser (NLP) and synthesis techniques. This VLA can be translated into an interpreted NLA (NLA*) using a VLA parser and pattern matching techniques. At this point, the process can allow user review of the NLA* and the NLA. When the user determines that the NLA* and the NLA are the same or have insignificant difference, then verification can be performed using the VLA. The results of the verification can then be back annotated on the NLA. In one fully-automatic embodiment, in addition to comparing the NLA and the NLA*, the VLA and a VLA* (generated from the NLA*) can be compared, thereby providing yet another test of accuracy for the user during verification.07-15-2010
20100161317SEMANTIC NETWORK METHODS TO DISAMBIGUATE NATURAL LANGUAGE MEANING - A computer implemented data processor system automatically disambiguates a contextual meaning of natural language symbols to enable precise meanings to be stored for later retrieval from a natural language database, so that natural language database design is automatic, to enable flexible and efficient natural language interfaces to computers, household appliances and hand-held devices.06-24-2010
20100161314Region-Matching Transducers for Text-Characterization - Computer methods, apparatus and articles of manufacture therefor, are disclosed for text-characterization using a finite state transducer that along each path accepts on a first side an n-gram of text-characterization (e.g., a language or a topic) and outputs on a second side a sequence of symbols identifying one or more text-characterizations from a set of text-characterizations. The finite state transducer is applied to input data. For each n-gram accepted by the finite state transducer, a frequency counter associated with the n-gram of the one or more text-characterizations in the set of text-characterizations is incremented. The input data is classified as one or more text-characterizations from the set of text-characterizations using the frequency counters associated therewith.06-24-2010
20120197632SYSTEM AND METHOD FOR THE TRANSFORMATION AND CANONICALIZATION OF SEMANTICALLY STRUCTURED DATA - A method of transforming and canonicalizing semantically structured data includes obtaining data from a network of computers, applying text patterns to the obtained data and placing the data in a first data file, providing a second data file containing the obtained data in a uniform format, and generating interface specific sentences from the data in the second data file.08-02-2012
20100049500DIALOGUE GENERATION APPARATUS AND DIALOGUE GENERATION METHOD - A dialogue generation apparatus includes a transmission/reception unit configured to receive incoming text and transmit return text, a presentation unit configured to present the contents of the incoming text to a user, a morphological analysis unit configured to perform a morphological analysis of the incoming text to obtain first words included in the incoming text and linguistic information on the first words, a selection unit configured to select second words that characterize the contents of the incoming text from the first words based on the linguistic information, a speech recognition unit configured to perform speech recognition of the user's speech after the presentation of the incoming text in such a manner that the second words are recognized preferentially, and produce a speech recognition result representing the contents of the user's speech, and a generation unit configured to generate the return text based on the speech recognition result.02-25-2010
20100049499DOCUMENT ANALYZING APPARATUS AND METHOD THEREOF - In a document analyzing apparatus (02-25-2010
20100049498DETERMINING UTILITY OF A QUESTION - A question search system provides a collection of questions having words for use in evaluating the utility of the questions based on a language model. The question search system calculates n-gram probabilities for words within the questions of the collection. The n-gram probability of a word for a sequence of n−1 words indicates the probability of that word being next after that sequence in the collection of questions. The n-gram probabilities for the words of the collection represent the language model of the collection. The question search system calculates a language model utility score for each question within a collection that indicates the likelihood that a question is repeatedly asked by users. The question search system derives the language model utility score for a question from the n-gram probabilities of the words within that question.02-25-2010
20100185437PROCESS OF DIALOGUE AND DISCUSSION - A method for effecting a dialogue with an emulated brain. The method includes the step of receiving a query in the form of a semantic string. The semantic string is then parsed into basic concepts of the query. The basic concepts are then clumped into a clump concept. If the clump concept constitutes part of a dialogue, then the dialogue thread is activated by determining the context of the clump concept and assessing a potential reply from a group of weighted replies, which expected replies are weighted based on the parsed concepts produced in the step of parsing. The heaviest weighted one of the expected replies is selected and the weight of the selected reply after it is selected is downgraded. The selected reply is then generated for output in a sentence structure.07-22-2010
20100185436Arabic poetry meter identification system and method - The Arabic poetry meter identification system and method produces coded Al-Khalyli transcriptions of Arabic poetry. The meters (Wazn, Awzan being forms of the Arabic poems units Bayt, Abyate) are identified. A spoken or written poem is accepted as input. A coded transcription of the poetry pattern forms is produced from input processing. The system identifies and distinguishes between proper spoken poetic meter and improper poetic meter. Error in the poem meters (Bahr, Buhur) and the ending rhyme pattern, “Qafiya” are detected and verified. The system accepts user selection of a desired poem meter and then interactively aids the user in the composition of poetry in the selected meter, suggesting alternative words and word groups that follow the desired poem pattern and dactyl components. The system can be in a stand-alone device or integrated with other computing devices.07-22-2010
20080215312Handheld Electronic Device With Text Disambiguation - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device.09-04-2008
20080215311DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR TEXT AND SPEECH CLASSIFICATION - Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.09-04-2008
20080215310Method and system for mapping a natural language text into animation - A method for analyzing a natural language sentence describing an action, to create an action structure to be used in creating an animation of the action, the method comprising: processing the natural language sentence to create a grammatical tree comprising an action word and its associated values; providing constructs for the action word, each of the constructs having parameter types for defining the action expressed by the action word; identifying from the constructs at least one construct wherein at least one of the parameter types can take on at least one of the associated values thereby defining a matching value; and recording the at least one of the parameter types from the at least one construct as well as the matching value, thereby creating the action structure.09-04-2008
20100191521METHODS AND APPARATUS FOR EVALUATING SEMANTIC PROXIMITY - Methods and apparatus to evaluate the semantic proximity between reference free-form text entry and a candidate free-form text request.07-29-2010
20100191519TOOL AND FRAMEWORK FOR CREATING CONSISTENT NORMALIZATION MAPS AND GRAMMARS - A runtime framework and authoring tool are provided for enabling linguistic experts to author text normalization maps and grammar libraries without requiring high level of technical or programming skills. Authors define or select terminals, map the terminals, and define rules for the mapping. The tool enables an author to validate their work, by executing the map in the same way the recognition engine does, causing consistency in results from authoring to user operations. The runtime is used by the speech engines and by the tools to provide consistent normalization for supported scenarios.07-29-2010
20120197631System for Identifying Textual Relationships - A computer-implemented method identifies textual statement relationships. Textual statement pairs including a first and second textual statement are identified, and parsed word group pairs are extracted from first and second textual statements. The parsed word groups are compared, and a parsed word score for each statement pair is calculated. Word vectors for the first and second textual statements are created and compared. A word vector score is calculated based on the comparison of the word vectors for the first and second textual statements. A match score is determined for the textual statement pair, with the match score being representative of at least one of the parsed word score and the word vector score.08-02-2012
20100153096Handheld Electronic Device and Method for Disambiguation of Compound Text Input and That Employs N-Gram Data to Limit Generation of Low-Probability Compound Language Solutions - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to analyze the combinations of language objects in light of N-gram data stored on the device to avoid proposing low-probability compound language solutions.06-17-2010
20100228539METHOD AND APPARATUS FOR PSYCHOMOTOR AND PSYCHOLINGUISTIC PREDICTION ON TOUCH BASED DEVICE09-09-2010
20100161315CORRELATED CALL ANALYSIS - A method of correlating received communication data with operational communication characteristics is provided. The method includes receiving audible input from a source in a communication over a communications network, recording the received audible input, and transcribing the recorded audible input into a transcript. The method further includes outputting the transcript, specifying features of the transcript to be analyzed, specifying and recording operational communication characteristics particular to the communication, analyzing the transcript for the specified features to identify patterns associated with the audible input, computing statistical correlations of the identified patterns with the operational communication characteristics, and outputting results of the computed statistical correlations on a user interface.06-24-2010
20110112825SENTIMENT PREDICTION FROM TEXTUAL DATA - A semantically organized domain space is created from a training corpus. Affective data are mapped onto the domain space to generate affective anchors for the domain space. A sentiment associated with an input text is determined based the affective anchors. A speech output may be generated from the input text based on the determined sentiment.05-12-2011
20100217583SYSTEM AND METHOD FOR PROCESSING ONLINE READING INTERACTIONS - An online reading processing system and method for providing interactive messages to users at end computer devices are provided, which include storing online reading information in a data storage medium; setting the online reading information with a head mark and a tail mark of at least one expert-marked key range by a setting module; reading the information after being set and hiding the head mark and the tail mark thereof; receiving the key range marked by the user; determining whether the user-marked key range covers the head mark and the tail mark so as to form interactive messages according the determination, thereby solving the drawback of failing to provide appropriate feedback or assessment according to users' behaviors as encountered in the prior techniques, and also increasing online reading interaction and enjoyment.08-26-2010
20100228540Methods and Systems for Query-Based Searching Using Spoken Input - Systems and methods for query-based searching using spoken input are disclosed. In systems and methods according to embodiments of the invention, continuous speech natural language queries are accepted from a user using a client device. Speech processing tasks are divided between the client device and one or more server systems. Once user speech is recognized, the system searches one or more data repositories containing queries for at least one query that matches the recognized speech and returns information related to the query.09-09-2010
20100228538COMPUTATIONAL LINGUISTIC SYSTEMS AND METHODS - An apparatus and corresponding method are disclosed for selecting and managing morphological, syntactic and semantic information found in natural languages using a reduced instruction set grammar (RISG). The apparatus and corresponding method 1) convert natural language inputs into morphological tokens and stores those tokens, 2) convert morphological tokens into syntactic groups and stores those groups, and/or 3) convert syntactic groups into semantic blocks and stores those blocks, and vice versa. The process can start with text and find the corresponding morphological tokens, syntactic groups and/or semantic blocks or start with semantic block(s) and find the corresponding morphological tokens.09-09-2010
20090076798Apparatus and method for post-processing dialogue error in speech dialogue system using multilevel verification - Provided are an apparatus and method for post-processing a dialogue error in a speech dialogue system using multilevel verification, in which both of a user's current utterance and a whole dialogue flow are taken into account through the multilevel verification including speech recognition results analysis, linguistic analysis, discourse analysis and dialogue analysis. As a result, various errors that may occur in the speech dialogue system are detected, and error post-processing appropriate to a detected error type is performed, so that speech recognition errors may be reduced.03-19-2009
20090076797System and Method For Accessing Images With A Novel User Interface And Natural Language Processing - Systems and methods for accessing images with natural language processing are provided. The methods for accessing images include linking an image with image-summarizing text by applying a hierarchical clustering algorithm to cluster one or more abstract sentences and one or more images, and linking an image with image-summarizing text if the abstract sentence belongs to a cluster that includes the image. The systems for accessing images include a natural language processor that applies a hierarchical clustering algorithm to link one or more abstract sentences in an article with one or more images in the article, and a user interface in which selecting image- summarizing text displays one or more linked images.03-19-2009
20090076794ADDING PROTOTYPE INFORMATION INTO PROBABILISTIC MODELS - Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.03-19-2009
20100235164QUESTION-ANSWERING SYSTEM AND METHOD BASED ON SEMANTIC LABELING OF TEXT DOCUMENTS AND USER QUESTIONS - A question-answering system for searching exact answers in text documents provided in the electronic or digital form to questions formulated by user in the natural language is based on automatic semantic labeling of text documents and user questions. The system performs semantic labeling with the help of markers in terms of basic knowledge types, their components and attributes, in terms of question types from the predefined classifier for target words, and in terms of components of possible answers. A matching procedure makes use of mentioned types of semantic labels to determine exact answers to questions and present them to the user in the form of fragments of sentences or a newly synthesized phrase in the natural language. Users can independently add new types of questions to the system classifier and develop required linguistic patterns for the system linguistic knowledge base.09-16-2010
20100235165SYSTEM AND METHOD FOR AUTOMATIC SEMANTIC LABELING OF NATURAL LANGUAGE TEXTS - Systems and methods for automatic semantic labeling of natural language documents provided in electronic or digital form include a semantic processor that performs a basic linguistic analysis of text, including recognizing in the text semantic relationships of the type objects and/or classes of objects, facts and cause-effect relationships; matching linguistically analyzed text against target semantic relationship patterns, created by generalization of particular cases of target semantic relationships; and generating semantic relationship labels based on linguistically analyzed text and a result of the matching.09-16-2010
20100223051Method and System for Determining Text Coherence - A method and system for determining text coherence in an essay is disclosed. A method of evaluating the coherence of an essay includes receiving an essay having one or more discourse elements and text segments. The one or more discourse elements are annotated either manually or automatically. A text segment vector is generated for each text segment in a discourse element using sparse random indexing vectors. The method or system then identifies one or more essay dimensions and measures the semantic similarity of each text segment based on the essay dimensions. Finally, a coherence level is assigned to the essay based on the measured semantic similarities.09-02-2010
20110112826SYSTEM AND METHOD FOR SIMULATING EXPRESSION OF MESSAGE - A system and a method for simulating expression of a message are provided. The system comprises a network platform end and at least one user end. The network platform end comprises a message capturing module for capturing a user message; a feature analyzing module for performing a characteristic analysis on content of the user message, so as to mark at least one simulation action tag on the message content; and a simulation message generating module for acquiring simulation instructions corresponding to the at least one simulation action tag and combining the same with the message content to generate a simulation message. The user end comprises a user device for receiving the simulation message and outputting the message content and simulation instructions contained in the simulation message to a simulation device; and the simulation device for playing the received message content and executing corresponding simulation instructions.05-12-2011
20100241420AUTOMATED SENTENCE PLANNING IN A TASK CLASSIFICATION SYSTEM - The invention relates to a system that interacts with a user in an automated dialog system (09-23-2010
20100241419Method for identifying the integrity of information - A preferred method for identifying at least one of a grammatical, linguistic and/or conceptual integrity of a data corpus is disclosed. In a preferred method, the associations between several word elements of a data corpus are identified. Then, the word elements experiencing several associations are used for identifying the continuum between associations and the number of word elements involved and/or not involved in the associations which is then used for identifying at least one of a: linguistic, semantic, grammatical, conceptual or other integrity or coherence of the analyzed data corpus, such as a query for optionally displaying a data corpus understanding and/or selecting a particular search behavior or other.09-23-2010
20100211380INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD, AND PROGRAM - An information processing apparatus includes: an acquiring unit acquiring text data as data associated with plural contents; a separating unit separating the text data acquired by the acquiring means into words of a predetermined unit in accordance with attributes; a comparing unit calculating a correspondence length indicating the number of words which continuously correspond to each other in order of the attributes between the text data, by comparing the words, which are separated by the separating means, between the text data of the plural contents; a calculating unit calculating a similarity degree score indicating a similarity degree between the contents corresponding to the text data on the basis of the correspondence length obtained by the comparing means; and a display controlling unit controlling displaying outlines of the plural contents on the basis of the similarity degree score between a predetermined content and another content among the plural contents.08-19-2010
20100211378MODULAR APPROACH TO BUILDING LARGE LANGUAGE MODELS - Methods for building arbitrarily large language models are presented herein. The methods provide a scalable solution to estimating a language model using a large data set by breaking the language model estimation process into sub-processes and parallelizing computation of various portions of the process.08-19-2010
20120191446System and method for creating a parser generator and associated computer program - A system is provided for building a parser generator. The system includes a grammar input module for inputting in the parser generator a grammar expressed in a given formalism. A checking module formally verifies that a given grammar belongs to a predetermined class of grammars for which a translation to a correct, terminating parser is feasible. A checking module formally verifies that a grammar expressed in the formalism is well-formed. A semantic action module defines a parsing result depending on semantic actions embedded in the grammar. The semantic action module ensures in a formal way that all semantic actions of the grammar are terminating semantic actions. A formal module generates a parser with total correctness guarantees, using the modules to verify that the grammar is well-formed, belongs to a certain class of feasible, terminating grammars and all its semantic actions are terminating.07-26-2012
20120143597System and Methods for Evaluating Feature Opinions for Products, Services, and Entities - A system for evaluating a review having unstructured text comprises a segment splitter for separating at least a portion of the unstructured text into one or more segments, each segment comprising one or more words; a segment parser coupled to the segment splitter for assigning one or more lexical categories to one or more of the one or more words of each segment; an information extractor coupled to the segment parser for identifying a feature word and an opinion word contained in the one or more segments; and a sentiment rating engine coupled to the information extractor for calculating an opinion score based upon an opinion grouping, the opinion grouping including at least the feature word and the opinion word identified by the information extractor.06-07-2012
20120245923CORPUS-BASED SYSTEM AND METHOD FOR ACQUIRING POLAR ADJECTIVES - A system, method, and computer program product for generating a polar vocabulary are provided. The method includes extracting textual content from each review in a corpus of reviews. Each of the reviews includes an author's rating, e.g., of a specific product or service to which the textual content relates. A set of frequent nouns is identified from the textual content of the reviews. Adjectival terms are extracted from the textual content of the reviews. Each adjectival term is associated in the textual content with one of the frequent nouns. A polar vocabulary including at least some of the extracted adjectival terms is generated. A polarity measure is associated with each adjectival term in the vocabulary which is based on the ratings of those reviews from which the adjectival term was extracted.09-27-2012
20090326922CLIENT SIDE RECONCILIATION OF TYPOGRAPHICAL ERRORS IN MESSAGES FROM INPUT-LIMITED DEVICES - A method for reconciling typographical errors, includes: receiving an electronic text message from a pervasive device with limited input keypads on a receiving device configured with a messaging application; determining an input protocol of the pervasive device; examining the electronic text message for words that are not in the messaging application's dictionary; identifying words that are not in the messaging application's dictionary; mapping each of the identified words to a set of keystrokes used to produce each of the identified words based on a series of input protocols that the receiving device has stored in a memory; utilizing each set of keystrokes from each of the input protocols in an algorithm to compute each permutation of the keystrokes; checking the computed permutations against the messaging application's dictionary to determine viable matches of the computed permutations; and presenting the viable matches to a user of the receiving device.12-31-2009
20090276208REDUCING SPAM EMAIL THROUGH IDENTIFICATION OF SOURCE - Embodiments of the present invention address deficiencies of the art in respect to email and provide a novel and non-obvious method and computer program product for detecting undesirable email. In one embodiment of the invention, the method includes receiving an email including text and identifying at least one natural language grammar mistake in the text. The method further includes calculating a country of origin of an author of the text based on the at least one natural language grammar mistake and calculating a first value based on the country of origin of the author of the text. The method further includes correcting the at least one natural language grammar mistake in the text and determining whether the email is undesirable based on the text that was corrected and the first value11-05-2009
20090326924Projecting Semantic Information from a Language Independent Syntactic Model - Embodiments for the conversion of Computational Independent Model (CIM) rule expressions into semantically non-ambiguous syntax trees are disclosed. In accordance with one embodiment, a method includes analyzing a sentential structure of a Computational Independent Model (CIM) rule expression for clauses. The clauses include at least one expression and at least one rule. The method further includes constructing a semantically non-ambiguous LF syntax tree from the CIM rule expression. The construction being implemented using a logical form (LF) model.12-31-2009
20080312909SYSTEM FOR ADAPTIVE MULTI-CULTURAL SEARCHING AND MATCHING OF PERSONAL NAMES - An automated name searching system incorporates an automatic name classifier and a multi-path architecture in which different algorithms are applied based on cultural identity of the query name. The name classifier operates with a preemptive list, analysis of morphological elements, length, and linguistic rules. A name regularizer produces a character based computational representation of the name. A pronunciation equivalent representation such as an IPA language representation, and language specific rules to generate name searching keys, are used in a first pass to eliminate database entries which are obviously not matches for the query name. The methods can also be implemented as a callable set of library routines including an intelligent preprocessor and a name evaluator that produces a score comparing a query name and database name, based on a variety of user-adjustable parameters. The user-controlled parameters permit tuning of the search methodologies for specific custom applications.12-18-2008
20120143596Voice Communication Management - A method, a computer program product, and an apparatus for managing a voice communication are provided. In one illustrative embodiment, an audio phrase produced by a first user is identified in the voice communication between the first user and a second user. A determination is made whether the audio phrase is present in a policy which prohibits the transmission of the set of undesired audio phrases. Responsive to a determination that the audio phrase is present in the policy which prohibits the transmission of the set of undesired audio phrases, a communication of the audio phrase is modified.06-07-2012
20090018818OPERATING DEVICE FOR NATURAL LANGUAGE INPUT - An operating device for natural language input is disclosed. A user can express its own request to the operating device by inputting natural language, then a processor determines format of natural language, if the natural language is voice format, a voice identification cell transforms it into word data and transmits to a natural language analysis unit; if the natural language is word or character identification format, the natural language analysis unit directly analyzes sentence type and issues a instruction accordingly, and then, an executive interface is to find out a matched equipment end to transmit the instruction for operating in real time, such design can respond to equipment end as required by the user so as to achieve a best man-machine communicating channel.01-15-2009
20090112577SYSTEM AND METHOD FOR LOCALIZATION OF ASSETS USING DICTIONARY FILE BUILD - A system and method for organizing localization content for video game development is disclosed. The method includes generating executable instructions for a video game being developed, wherein the video game is being developed for deployment in a plurality of natural languages, wherein text strings and/or multimedia data to be rendered during game play are referenced by the executable instructions. The method further includes identifiably storing the text strings and/or multimedia data in one or more dictionary files such that the text strings and/or multimedia data are identifiably referenced by references in the executable instructions and are identifiably referenced by a corresponding natural language.04-30-2009
20080270118Recognition architecture for generating Asian characters - Architecture for correcting incorrect recognition results in an Asian language speech recognition system. A spelling mode can be launched in response to receiving speech input, the spelling mode for correcting incorrect spelling of the recognition results or generating new words. Correction can be obtained using speech and/or manual selection and entry. The architecture facilitates correction in a single pass, rather than multiples times as in conventional systems. Words corrected using the spelling mode are corrected as a unit and treated as a word. The spelling mode applies to languages of at least the Asian continent, such as Simplified Chinese, Traditional Chinese, and/or other Asian languages such as Japanese.10-30-2008
20080270121SYSTEM AND METHOD FOR COMPUTER ANALYSIS OF COMPUTER GENERATED COMMUNICATIONS TO PRODUCE INDICATIONS AND WARNING OF DANGEROUS BEHAVIOR - The present invention is a system and method for computer analysis of computer generated communications to produce indications and warnings of dangerous behavior. A method of computer analysis of computer generated communications in accordance with the invention, includes collecting at least one computer generated communication produced by or received by an author; parsing the collected at least one computer generated communication to identify categories of information therein; processing the categories of information with at least one analysis to quantify at least one type of information in each category; and generating an output communication when a difference between the quantification of at least one type of information for at least one category and a reference for the at least one category is detected involving a psychological state of the author to which a responsive action should be taken with content of the output communication and the at least one category being programmable to define a psychological state in response to which an action should be taken and what the action is to be taken in response to the defined psychological state.10-30-2008
20110040555System and method for creating and playing timed, artistic multimedia representations of typed, spoken, or loaded narratives, theatrical scripts, dialogues, lyrics, or other linguistic texts - A system and method generate artistic multimedia representations of user-input texts, spoken or loaded narratives, theatrical scripts, or other linguistic corpus types, via a user interface, or batch interface, by classifying component words, and/or phrases into lexemes and/or parts of speech, and interpreting said classifications to construct playable structures. A database of natural language grammatical rules, a set of media objects, parameters, and rendering directives, and an algorithm facilitate the generation of sequential scenes from grammatical representations, convert user-input texts into playable structures of graphics, sounds, animations, and modifications, where playable structures may be combined to create a scene, or multiple scenes, and may be played in the order of occurrence in the input text as a sequential and timed multimedia representation of the input, and subsequently output, in real-time, or stored in memory for later output, via output devices such as a monitor and/or speakers.02-17-2011
20110040553NATURAL LANGUAGE PROCESSING - A method and system for computational interpretation of natural language, wherein in an input string is received from input means. The input string is first tokenizde for providing a list of words. Then the list of words is stemmed for providing the words in the root form. The stemmed list is then tagged for providing classification tags for each word, which allows generating the context sensitive information for each word. Lastly said tags are used for parsing the structural dependencies for each word.02-17-2011
20100299142SYSTEM AND METHOD FOR SELECTING AND PRESENTING ADVERTISEMENTS BASED ON NATURAL LANGUAGE PROCESSING OF VOICE-BASED INPUT - A system and method for selecting and presenting advertisements based on natural language processing of voice-based inputs is provided. A user utterance may be received at an input device, and a conversational, natural language processor may identify a request from the utterance. At least one advertisement may be selected and presented to the user based on the identified request. The advertisement may be presented as a natural language response, thereby creating a conversational feel to the presentation of advertisements. The request and the user's subsequent interaction with the advertisement may be tracked to build user statistical profiles, thus enhancing subsequent selection and presentation of advertisements.11-25-2010
20100299140IDENTIFYING AND ROUTING OF DOCUMENTS OF POTENTIAL INTEREST TO SUBSCRIBERS USING INTEREST DETERMINATION RULES - A method, system and computer program product for identifying documents of interest. A profile of a subscriber is created based on information obtained about the subscriber. Subscriber-interest determination rules are used to identify potential topics of interest of the subscriber based on the subscriber's profile as well as based on external knowledge sources. Each potential interest of the subscriber may be represented by a pointer that references a concept. Additionally, concepts in the documents published by the publishers are identified. A comparison may be made between the concepts identified in the documents published by the publishers with those concepts representing the potential topics of interests of the subscriber. Those documents with matching concepts may then be identified as potentially being of interest for the subscriber. In this manner, documents of interest are more accurately identified for the document seeker.11-25-2010
20100299135Automated Extraction of Semantic Content and Generation of a Structured Document from Speech - Techniques are disclosed for automatically generating structured documents based on speech, including identification of relevant concepts and their interpretation. In one embodiment, a structured document generator uses an integrated process to generate a structured textual document (such as a structured textual medical report) based on a spoken audio stream. The spoken audio stream may be recognized using a language model which includes a plurality of sub-models arranged in a hierarchical structure. Each of the sub-models may correspond to a concept that is expected to appear in the spoken audio stream. Different portions of the spoken audio stream may be recognized using different sub-models. The resulting structured textual document may have a hierarchical structure that corresponds to the hierarchical structure of the language sub-models that were used to generate the structured textual document.11-25-2010
20100299137STORAGE MEDIUM STORING PRONUNCIATION EVALUATING PROGRAM, PRONUNCIATION EVALUATING APPARATUS AND PRONUNCIATION EVALUATING METHOD - A game apparatus includes a CPU, and the CPU evaluates a pronunciation of a user with respect to an original sentence (ES). First, envelops as to a volume of a voice of the original sentence (ES) and a volume of a voice of the user are taken, and the average values of the volumes are made uniform. When the volumes are made uniform to each other, a degree of similarity (scoreA) of distributions of local solutions when the volumes are equal to or more than the average values, a degree of similarity (scoreB) of distributions (timing of concaves/convexes of the waveform) of values of the high or low level indicating whether or not the volume is equal to or more than a value multiplying the average value by a predetermined value, and a degree of similarity (scoreC) of dispersion values (dispersion of concaves/convexes of the waveform) of the envelopes are evaluated by utilizing the respective envelopes. On the basis of these degree of similarities (scoreA, scoreB, scoreC), the rhythm of the pronunciation by the user is evaluate.11-25-2010
20100299136 Dialogue System and a Method for Executing a Fully Mixed Initiative Dialogue (FMID) Interaction Between a Human and a Machine - A method for executing a fully mixed initiative dialogue (FMID) interaction between a human and a machine, a dialogue system for a FMID interaction between a human and a machine and a computer readable data storage medium having stored thereon computer code for instructing a computer processor to execute a method for executing a FMID interaction between a human and a machine are provided. The method includes retrieving a predefined grammar setting out parameters for the interaction; receiving a voice input; analysing the grammar to dynamically derive one or more semantic combinations based on the parameters; obtaining semantic content by performing voice recognition on the voice input; and assigning the semantic content as fulfilling the one or more semantic combinations.11-25-2010
20100299138APPARATUS AND METHOD FOR LANGUAGE EXPRESSION USING CONTEXT AND INTENT AWARENESS - A language expression apparatus and a method based on a context and a intent awareness, are provided. The apparatus and method may recognize a context and an intent of a user and may generate a language expression based on the recognized context and the recognized intent, thereby providing an interpretation/translation service and/or providing an education service for learning a language.11-25-2010
20090018821LANGUAGE PROCESSING DEVICE, LANGUAGE PROCESSING METHOD, AND LANGUAGE PROCESSING PROGRAM - A language processing device includes first analysis unit 01-15-2009
20090106019METHOD AND SYSTEM FOR PRIORITIZING COMMUNICATIONS BASED ON SENTENCE CLASSIFICATIONS - A method and system for prioritizing communications based on classifications of sentences within the communications is provided. A sentence classification system may classify sentences of communications according to various classifications such as “sentence mode.” The sentence classification system trains a sentence classifier using training data and then classifies sentences using the trained sentence classifier. After the sentences of a communication are classified, a document ranking system may generate a rank for the communication based on the classifications of the sentences within the communication. The document ranking system trains a document rank classifier using training data and then calculates the rank of communications using the trained document rank classifier.04-23-2009
20100305942METHOD AND APPARATUS FOR GENERATING A LANGUAGE INDEPENDENT DOCUMENT ABSTRACT - A method of extracting significant phrases from one or more documents stored in a computer-readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract.12-02-2010
20090070103Management and Processing of Information - Disclosed is a method to perform natural language (NL) processing. The method includes accessing a data source having one or more data portions, and applying multi-stage NL processing on the one or more data portions, using a dynamically generated set of concepts relating to one or more subject matters and relationships between at least some of the concepts, to determine the association of the one or more data portions with one or more of the concepts.03-12-2009
20090070100METHODS, SYSTEMS, AND COMPUTER PROGRAM PRODUCTS FOR SPOKEN LANGUAGE GRAMMAR EVALUATION - A method, system, and computer program product for spoken language grammar evaluation are provided. The method includes playing a recorded question to a candidate, recording a spoken answer from the candidate, and converting the spoken answer into text. The method further includes comparing the text to a grammar database, calculating a spoken language grammar evaluation score based on the comparison, and outputting the spoken language grammar evaluation score.03-12-2009
20090070102SPEECH RECOGNITION METHOD, SPEECH RECOGNITION SYSTEM AND SERVER THEREOF - A speech recognition method includes a model selection step which selects a recognition model and translation dictionary information based on characteristic information of input speech and a speech recognition step which translates input speech into text data based on the selected recognition model and translation step which translates the text data based on the selected translation dictionary information.03-12-2009
20110112824DETERMINING AT LEAST ONE CATEGORY PATH FOR IDENTIFYING INPUT TEXT - In a method of determining at least one category path for identifying an input text, one or more categories that are most relevant to the input text are determined, one or more concepts that are most relevant to the input text using information from a labeled text data source and the one or more categories determined to be the most relevant to the input text are determined, and one or more category paths through a hierarchy of predefined category levels are determined for one or more of the determined concepts.05-12-2011
20110112823Ellipsis and movable constituent handling via synthetic token insertion - Movable and elliptic constituents are handled in a parser by inserting synthetic tokens that do not occur in the input. Parser actions can push a syntax tree or semantic value to be realized later as a synthetic token, and some synthetic tokens (for cataphoric ellipsis) may be inserted without a prior push but require a later definition. At clause boundary it may be checked that all mandatory tokens have been inserted.05-12-2011
20130138425MULTIPLE RULE DEVELOPMENT SUPPORT FOR TEXT ANALYTICS - Methods, computer program products and systems are provided for applying text analytics rules to a corpus of documents. The embodiments facilitate selection of a document from the corpus within a graphical user interface (GUI), where the GUI opens the selected document to display text of the selected document and also a token parse tree that lists tokens associated with text components of the document, facilitate construction of a text analytics rule, via the GUI, by user selection of one or more tokens from the token parse tree, and, in response to a user selecting one or more tokens from the token parse tree, provide a list of hits via the GUI, the hits including a listing of text components from documents of the corpus that are associated with tokens that comply with the constructed text analytics rule.05-30-2013
20130138426AUTOMATED CONTENT GENERATION - Described are computer-based methods and apparatuses, including computer program products, for automated content generation. In some examples, the method includes generating content metadata from document content via natural language processing based on one or more context parameters associated with the document content. The method can further include receiving user feedback about the content metadata from a computing device associated with a user associated with the document content. The method can further include modifying the one or more context parameters based on the received user feedback.05-30-2013
20130138428SYSTEMS AND METHODS FOR AUTOMATICALLY DETECTING DECEPTION IN HUMAN COMMUNICATIONS EXPRESSED IN DIGITAL FORM - An apparatus and method for determining whether text is deceptive has a computer programmed with software that automatically analyzes text in digital form by at least one of statistical analysis of psycho-linguistic cues, IP geo-location, gender analysis, authorship analysis, and analysis to detect coded/camouflaged messages. The computer has truth data against which the text message can be compared and a graphical user interface. The computer may be connectable to the Internet and may obtain the text automatically. Speech-to-text software may be used to convert verbal messages to text for analysis. The system may be made available on a webpage, web service, on a computer or by a wireless device. The text may be emails, website content, tweets. In one embodiment, the system detects coded messages (FIG. 05-30-2013
20130138430METHODS AND APPARATUS TO CLASSIFY TEXT COMMUNICATIONS - Methods and apparatus to classify text communications are disclosed. An example method includes determining a first score indicating a likelihood that a text belongs to a first classification mode by combining a first sentence score and a second sentence score retrieved from an index, the first sentence score indicating a probability that a first sentence in the text belongs to the first classification mode, the second sentence score indicating that a second sentence following the first sentence belongs to the first classification mode, determining a second score indicating a likelihood that the text belongs to a second classification mode, comparing the first score to the second score, classifying the text as the first classification mode when the first score is greater than the second score, and determining a confidence level that the text belongs to the first classification mode by dividing the first score by the second score.05-30-2013
20100312548Querying Dialog Prompts - Implementations use hash values in proxy for images to enable aggregating of images for creating a knowledge base regarding certain images determined to be of interest.12-09-2010
20100324888SOLVING CONSTRAINT SATISFACTION PROBLEMS FOR USER INTERFACE AND SEARCH ENGINE - A method for interpreting a Natural Language by an artificial construct using constraint satisfaction problem solving, comprises a) providing a plurality of ways suitable to define at least a grammar for at least a Natural Language, b) providing a plurality of constraint satisfaction problem instructions c) providing a plurality of values for solving a plurality of constraints, d) converting said plurality of constraints to at least one constraint satisfaction problem pattern, e) receiving a Natural Language construct, f) unifying said plurality of constraints through said at least one constraint satisfaction problem pattern at execution runtime by the artificial construct to solve theconstraint satisfaction problem, g) interpreting said Natural Language construct according to a plurality of constraint satisfaction problem instructions, and h) answering to a Natural Language construct by a Natural Language construct.12-23-2010
20130144602Quantitative Type Data Analyzing Device and Method for Quantitatively Analyzing Data - A method for quantitatively analyzing data is applied to a computer system for determining whether a document under test is sensitive. The method obtains sample message from the computer system, partitions content of the sample message to derive at least one original paragraph. The method then partitions the original paragraph to derive original sentences and to derive a plurality of original sentence characteristics from the original sentences. After that, the method produces the feature vector according to the derived sentence characteristics.06-06-2013
20130144605Text Mining Analysis and Output System - A natural language authoring system that organizes technical, financial, legal and market information into Point of View specific analytical, visual and narrative decision-support content. The expert system transforms a user's point of view into a tailored narrative and/or visualization report. Expert rules embed interactive advertising, such as affiliate URL links, into analytical, visual and narrative and statistical content. The rules may be modified by one or more users, thereby capturing knowledge as the rules are utilized by users of the system.06-06-2013
20130144606System and Method for Using Data and Derived Features to Automatically Generate a Narrative Story - A system and method for automatically generating a narrative story receives data and information pertaining to a domain event. The received data and information and/or one or more derived features are then used to identify a plurality of angles for the narrative story. The plurality of angles is then filtered, for example through use of parameters that specify a focus for the narrative story, length of the narrative story, etc. Points associated with the filtered plurality of angles are then assembled and the narrative story is rendered using the filtered plurality of angles and the assembled points.06-06-2013
20130144607CHARACTER-BASED AUTOMATED TEXT SUMMARIZATION - Methods, devices, systems and tools are presented that allow the summarization of text, audio, and audiovisual presentations, such as movies, into less lengthy forms. High-content media files are shortened in a manner that preserves important details, by splitting the files into segments, rating the segments, and reassembling preferred segments into a final abridged piece. Summarization of media can be customized by user selection of criteria, and opens new possibilities for delivering entertainment, news, and information in the form of dense, information-rich content that can be viewed by means of broadcast or cable distribution, “on-demand” distribution, internet and cell phone digital video streaming, or can be downloaded onto an iPod™ and other portable video playback devices.06-06-2013
20080270116Large-Scale Sentiment Analysis - A method for determining a sentiment associated with an entity includes inputting a plurality of texts associated with the entity, labeling seed words in the plurality of texts as positive or negative, determining a score estimate for the plurality of words based on the labeling, re-enumerating paths of the plurality of words and determining a number of sentiment alternations, determining a final score for the plurality of words using only paths whose number of alternations is within a threshold, converting the final scores to corresponding z-scores for each of the plurality of words, and outputting the sentiment associated with the entity.10-30-2008
20100332219METHOD AND APPARATUS FOR DETERMINING TEXT PASSAGE SIMILARITY - According to one embodiment of the invention, a method classifying a number of noun phrases in a first text passage and a second text passage into a number of classifications. The method also includes determining a similarity between a noun phrase from the first text passage and a noun phase from the second text passage for each of the noun phrases of a same classification. Additionally, a similarity between a sentence from the first text passage and a sentence from the second text passage is determined for each of the sentences in the first and second text passages based on similarities between the noun phrases. The method also includes determining a similarity between the first text passage and the second text passage based on a similarity between sentences.12-30-2010
20100332217Method for text improvement via linguistic abstractions - This invention provides hierarchical, gradual and iterative methods, systems, and software for improving and correcting natural language text. The methods comprise the steps of applying natural language processing (NLP) algorithms to a corpus of sentences so as to abstract each sentence; applying scoring and linguistic annotation to each abstract sentence; applying NLP algorithms to abstract input sentences; applying search algorithms to match an abstract input sentence to at least one abstract corpus sentence; and applying NLP algorithms to adapt said matched abstract corpus sentence to the input sentence.12-30-2010
20110010164SYSTEM AND METHOD FOR GENERATING MANUALLY DESIGNED AND AUTOMATICALLY OPTIMIZED SPOKEN DIALOG SYSTEMS - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for generating a natural language spoken dialog system. The method includes nominating a set of allowed dialog actions and a set of contextual features at each turn in a dialog, and selecting an optimal action from the set of nominated allowed dialog actions using a machine learning algorithm. The method includes generating a response based on the selected optimal action at each turn in the dialog. The set of manually nominated allowed dialog actions can incorporate a set of business rules. Prompt wordings in the generated natural language spoken dialog system can be tailored to a current context while following the set of business rules. A compression label can represent at least one of the manually nominated allowed dialog actions.01-13-2011
20110010165APPARATUS AND METHOD FOR OPTIMIZING A CONCATENATE RECOGNITION UNIT - An apparatus and method for optimizing a concatenate recognition unit are provided. The apparatus and method of optimizing a concatenate recognition unit may generate an optimized concatenate recognition unit based on a basic language model generated using the concatenate recognition unit extracted from statistical information.01-13-2011
20110029303WORD CLASSIFICATION SYSTEM, METHOD, AND PROGRAM - A word classification system is provided with an inter-word pattern learning section for learning at least either the context information or the layout information between classification-known words which co-appear and creating an inter-word pattern for determining whether data relating to a word pair which is a combination of words is data relating to a same-classification word pair which is the combination of words in the same classification or data relating to a different-classification word pair which is a combination of words in different classifications on the basis of the relationship between the classification-known words which co-appear in a document.02-03-2011
20110029302METHOD AND SYSTEM FOR CANDIDATE MATCHING - A method and system for candidate matching, such as used in match-making services, assesses narrative responses to measure candidate qualities. A candidate database includes self-assessment data and narrative data. Narrative data concerning a defined topic is analyzed to determine candidate qualities separate from topical information. Candidate qualities thus determined are included in candidate profiles and used to identify desirable candidates.02-03-2011
20110029301METHOD AND APPARATUS FOR RECOGNIZING SPEECH ACCORDING TO DYNAMIC DISPLAY - A speech recognition apparatus and method that can improve speech recognition rate and recognition speed by reflecting information for dynamic display, are provided. The speech recognition apparatus generates a display variation signal indicating that variations have occurred on a screen and creates display information about the varied screen. The speech recognition apparatus adjusts a word weight for at least one word related to the varied screen and a domain weight for at least one domain included in the varied screen, according to the display variation signal and the display information. The adjusted word weight and the adjusted domain weight are dynamically reflected in a language model that is used for speech recognition.02-03-2011
20110035210CONDITIONAL RANDOM FIELDS (CRF)-BASED RELATION EXTRACTION SYSTEM - A system for extracting information from text, the system including parsing functionality operative to parse a text using a grammar, the parsing functionality including named entity recognition functionality operative to recognize named entities and recognition probabilities associated therewith and relationship extraction functionality operative to utilize the named entities and the probabilities to determine relationships between the named entities, and storage functionality operative to store outputs of the parsing functionality in a database.02-10-2011
20110035209Entry of text and selections into computing devices - Aids for improving the use of computing devices incorporating touch sensitive screens and other computing devices, including a method for correcting words incorrectly entered into a computing device which has the steps of: selecting as the word to be corrected one of the one or more words displayed on a computing device display screen during use of text entry software; entering text correction mode and leaving the text entry program; displaying the characters comprising the word to be corrected in such a way that each character can be selected individually by the user; selecting a character to be corrected or deleted, or a character adjacent where a missing character(s) will be inserted; correcting the character selected in the previous step (which can include deleting the character selected) or inserting a character(s); optionally repeating the last two steps to correct additional characters until the word selected to be corrected is changed to a corrected word to which no more changes or corrections need to be made; exiting correction mode and re-entering the text entry program; and replacing the word selected to be corrected with the corrected word.02-10-2011
20110112828Handheld Electronic Device with Text Disambiguation - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device.05-12-2011
20110112827SYSTEM AND METHOD FOR HYBRID PROCESSING IN A NATURAL LANGUAGE VOICE SERVICES ENVIRONMENT - A system and method for hybrid processing in a natural language voice services environment that includes a plurality of multi-modal devices may be provided. In particular, the hybrid processing may generally include the plurality of multi-modal devices cooperatively interpreting and processing one or more natural language utterances included in one or more multi-modal requests. For example, a virtual router may receive various messages that include encoded audio corresponding to a natural language utterance contained in a multi-modal interaction provided to one or more of the devices. The virtual router may then analyze the encoded audio to select a cleanest sample of the natural language utterance and communicate with one or more other devices in the environment to determine an intent of the multi-modal interaction. The virtual router may then coordinate resolving the multi-modal interaction based on the intent of the multi-modal interaction.05-12-2011
20090063133SYSTEM AND ARTICLE OF MANUFACTURE FOR FILTERING CONTENT USING NEURAL NETWORKS - Provided are a system and article of manufacture for filtering communications received from over a network for a person-to-person communication program. A communication is received for the person-to person communication program. The communication is processed to determine predefined language statements. Information on the determined language statements is inputted into a neural network to produce an output value. A determination is made as to whether the output value indicates that the communication is unacceptable. The communication is forwarded to the person-to-person communication program unchanged if the output value indicates that the communication is acceptable. An action is performed with respect to the communication upon determining that the communication is unacceptable that differs from the forwarding of the communication that occurs if the output value indicates that the communication is acceptable.03-05-2009
20090063131Methods and systems for language representation - A method of representing a language statement having one or more words includes capturing an expression of the language statement, associating one or more properties with each of the one or more words in the language statement, substantially removing as necessary one or more first ambiguities in the language statement, establishing one or more functional roles for each of the one or more words in the language statement, processing as necessary one or more second ambiguities in the language statement, and providing a representation of the language statement including the one or more properties associated with and the one or more functional roles established for each of the one or more words, the one or more first ambiguities substantially removed, and the one or more second ambiguities processed.03-05-2009
20090063132Information Processing Apparatus, Information Processing Method, and Program - An information processing apparatus includes: morphological analysis means for performing morphological analysis on a text document; managing means for managing a connection pattern indicating a connection relationship of a morpheme of a predetermined part of speech; and extracting means extracting, from a string of morphemes obtained by performing morphological analysis by the morphological analysis means, a phrase including a plurality of morphemes having a same connection relationship as the connection relationship indicated by the connection pattern managed by the managing means.03-05-2009
20110246180ENHANCING LANGUAGE DETECTION IN SHORT COMMUNICATIONS - A method, system, and computer usable program product for enhancing language detection in short communications are provided in the illustrative embodiments. A short communication is stored in an element of a line cache. The line cache is accessible to an application executing in a data processing system. The element is an element in a set of elements in the line cache. A compound text is assembled from contents of a subset of the elements of the line cache. A language identifier (language ID) is received for the compound text from a language detection algorithm. The language ID is stored in a language cache element of a language ID cache. The language ID cache is accessible to the application and includes a set of language cache elements. A language of the short communication is determined using the contents of a subset of language cache elements.10-06-2011
20110246183TOPIC TRANSITION ANALYSIS SYSTEM, METHOD, AND PROGRAM - The present invention provides a topic transition analysis system that determines a position on a primary media stream leading to a certain statement made in a language communication carried out in a secondary channel associated with the primary media stream. The topic transition analysis system includes a statement trigger string determination unit receiving a primary media stream and one or a plurality of language communication streams (hereinafter, language streams) executed in parallel with the media stream and determining whether or not a certain statement on the one or plurality of language streams has been made newly in response to contents of the media stream.10-06-2011
20110246181NLP-BASED SYSTEMS AND METHODS FOR PROVIDING QUOTATIONS - Techniques for providing quotations obtained from text documents using natural language processing techniques are described. Some embodiments provide a content recommendation system (“CRS”) configured to provide quotations by extracting quotations from a corpus text documents, and providing access to the extracted quotations in response to search requests received from users. The CRS may extract quotations by using natural language processing-based techniques to identify one or more entities, such as people, places, objects, concepts, or the like, that are referenced by the extracted quotations. The CRS may then store the extracted quotations along with identified entities, such as quotation speakers and subjects, for later access via search requests.10-06-2011
20110082687METHOD AND SYSTEM FOR TAKING ACTIONS BASED ON ANALYSIS OF ENTERPRISE COMMUNICATION MESSAGES - A computer-based system receives and analyzes digital communication between at least one party in a business enterprise and another party using a natural language analyzer to extract meanings from the message. The system includes a database storing specific actions to be taken upon the detection of specified meanings in such communications. Certain actions may require the system to interrogate the enterprise computer system's database to locate the existence or nature of specified data. The directed actions take the form of communications within an enterprise to assist activities related to the analyzed digital communication.04-07-2011
20110087486SYSTEM, REPORT, AND METHOD FOR GENERATING NATURAL LANGUAGE NEWS-BASED STORIES - The present invention generally relates to a system, report, and method for automatically generating a series of natural language news-based stories to be presented via a digital interface or printed publication to a portfolio user. The disclosure relates to a filter or selection of a handful of relevant and desired financial instruments, or events created in a large group of events such as sports results, travel information, auction related data, online shopping tools, social media, retail store promotion generation, search engine daily report, etc. for a specific use. These financial instruments, based on different selections from a portfolio manager via a management tool, are then used to either produce a strategies page where a list of useful covered call trade and hedged trade are displayed in the form of a table, or natural language news-based stories relating to a selected list of financial instruments found in a portfolio. The events are based on different selections from a portfolio manager via a management tool and are then used to either produce a secondary page where a list of the selected event data is displayed or natural language news-based stories relating to a selected list of events found in a portfolio from a large event database.04-14-2011
20110213609LANGUAGE-INDEPENDENT PROGRAM INSTRUCTION - A natural language-independent computer program is constructed. A data element is defined by a graphical representation in a user interface. A data element has a data type and a value. An operator is defined on multiple data elements by association of the graphical representations in the user interface. A natural language-independent graph data structure is defined by the association of data elements representing the logic of a computer program. The data types and operators have referenced descriptions in one or more natural languages, enabling a logical expression such as a computer program to be defined and understood in one or more natural languages.09-01-2011
20120173227METHOD, TERMINAL, AND COMPUTER-READABLE RECORDING MEDIUM FOR SUPPORTING COLLECTION OF OBJECT INCLUDED IN THE IMAGE - The present invention relates to a method for supporting a collection of an object included in a created image. The method includes the steps of: (a) creating an image of an object; (b) automatically creating and providing a combined sentence correct under the grammar of a language for the object on a first area on a screen of the terminal by using at least part of recognition information on what an identity of the object is, a place where the image was created and a time when the image was created, and automatically getting and providing a thumbnail corresponding to the recognized object on a second area on the screen of the terminal; and (c) if a Collection button is selected, storing data provided on the first and the second areas onto a storage space, to thereby complete the collection of the object.07-05-2012
20100153095Virtual Pet Chatting System, Method, and Virtual Pet Question and Answer Server - A virtual pet chatting system includes a virtual pet client unit, a virtual pet data maintaining unit as well as a questioning and answering unit. A virtual pet chatting method includes: sending, by a first virtual pet, a natural language question to a second virtual pet; and generating, by the second virtual pet, a natural language response sentence according to the natural language question, after understanding the natural language and performing reasoning taking into account attributes of a virtual pet. A virtual pet questioning and answering server includes a natural language understanding module and a response sentence generating module.06-17-2010
20110144977WRITTEN EXPRESSION DEVELOPMENT SYSTEM - A system is disclosed for developing writing skills, and more particularly, developing written expression skills. The system may include various types of instruction sheets arranged in order of increasing complexity to aid the user in written expression and creative writing skill development.06-16-2011
20110082686REDUCED KEYBOARD SYSTEM AND A METHOD FOR GLOBAL DISAMBIGUATION - A reduced keyboard system for text input comprising: a first keyboard having a first plurality of keys, the keys being adapted to be keystroked for input of a word; a virtual keyboard having a plurality of virtual keys, the plurality of virtual keys corresponding respectively to the first plurality of keys and wherein the virtual keyboard is adapted to generate a linear pattern from the keystroked keys of the first keyboard; and a dictionary database associated with the virtual keyboard, the dictionary database having a plurality of classes wherein each of the classes contains at least one candidate word having first and last letters corresponding to predetermined keys of the virtual keyboard, wherein the linear pattern and dictionary database are adapted to enable recognition and disambiguation of the inputted word.04-07-2011
20100191520TEXT AND SPEECH RECOGNITION SYSTEM USING NAVIGATION INFORMATION - A system and method are provided for recognizing a user's speech input. The method includes the steps for detecting the user's speech input, recognizing the user's speech input by comparing the speech input to a list of entries using language model statistics to determine the most likely entry matching the user's speech input, and detecting navigation information of a trip to a predetermined destination, where the most likely entry is determined by modifying the language model statistics taking into account the navigation information. A system and method is further provided that takes into account navigation trip information to determine the most likely entry using language model statistics for recognizing text input.07-29-2010
20110246184SYSTEM AND METHOD FOR INCREASING ACCURACY OF SEARCHES BASED ON COMMUNICATION NETWORK - Disclosed are systems, methods and computer-readable media for using a local communication network to generate a speech model. The method includes retrieving for an individual a list of numbers in a calling history, identifying a local neighborhood associated with each number in the calling history, truncating the local neighborhood associated with each number based on the at least one parameter, retrieving a local communication network associated with each number in the calling history and each phone number in the local neighborhood, and creating a language model for the individual based on the retrieved local communication network. The generated language model may be used for improved automatic speech recognition for audible searches as well as other modules in a spoken dialog system.10-06-2011
20110087482Method for identifying and manipulating language information - A preferred method and methods for manipulating linguistic information in grammatical disarray are disclosed. In a preferred method, a plurality of word elements in sequential order from a data corpus are analyzed with a conceptual-grammatical relational protocol such as CIRN producing an unsuccessful outcome; wherein said unsuccessful outcome involves the failure of forming an association or failure of identifying an association between said word elements. Then, the word elements are shuffled, forming different sequential orders to later be reanalyzed with same or other conceptual-grammatical relational protocols until a successful outcome is attained; wherein a successful outcome includes at least one of a: association between said word elements, and identification of an association between said word elements.04-14-2011
20110246179SIGNAL PROCESSING APPROACH TO SENTIMENT ANALYSIS FOR ENTITIES IN DOCUMENTS - A document can be processed to provide sentiment values for phrases in the document. The sequence of sentiment values associated with the sequence of phrases in a document can be handled as if they were a sampled discrete time signal. For phrases which have been identified as entities, a filtering operation can be applied to the sequence of sentiment values around each entity to determine a sentiment value for the entity.10-06-2011
20100010804METHODS AND SYSTEMS FOR EXTRACTING PHENOTYPIC INFORMATION FROM THE LITERATURE VIA NATURAL LANGUAGE PROCESSING - Systems and methods for extracting and encoding genotype-phenotype information from journal articles and other publications are provided. In some embodiments, the disclosed subject matter includes a preprocessor, boundary identifier, parser, phrase recognizer and an encoder to convert natural-language input text and parameters into structured text. The structured text can take the form of codes which account for genotype-phenotype information and are compatible with a controlled vocabulary.01-14-2010
20100063800Method, System and Software for Implementing an Automated Call Routing Application in a Speech Enabled Call Center Environment - A system, method and software for implementing an automated call routing application in a speech enabled call center environment are provided. In operation, the invention provides for the identification of a call center transaction selection from a natural language user utterance and the invocation of one or more scripts operable to route the user to a call center service agent configured to service the selected transaction. In the event a transaction selection cannot be readily identified or can only be partially identified, the invention provides for the initiation of a dialog module or script directed to eliciting a discernable transaction selection and/or the presentation of one or more menus from which the user may select an available call center transaction.03-11-2010
20100063795DATA PROCESSING DEVICE, DATA PROCESSING METHOD, AND DATA PROCESSING PROGRAM - [PROBLEMS] To provide a data processing device such as a text mining device capable of extracting characteristic structures properly even in case a plurality of words indicating identical contents or a plurality of words semantically associated are contained in input data. [MEANS FOR SOLVING PROBLEMS] Association node extraction unit (03-11-2010
20100131265Method, Apparatus and Computer Program Product for Providing Context Aware Queries in a Network - A method for providing context aware queries in a network may include receiving a question directed to a question answering service from an originating node, routing the question to one or more candidate nodes selected based at least in part on context information associated with the question, receiving an answer to the question from at least one of the candidate nodes, and providing the answer to the originating node based at least in part on result parameters associated with the originating node. An apparatus and computer program product corresponding to the method are also provided.05-27-2010
20100131264SYSTEM AND METHOD FOR HANDLING MISSING SPEECH DATA - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for handling missing speech data. The computer-implemented method includes receiving speech with a missing segment, generating a plurality of hypotheses for the missing segment, identifying a best hypothesis for the missing segment, and recognizing the received speech by inserting the identified best hypothesis for the missing segment. In another method embodiment, the final step is replaced with synthesizing the received speech by inserting the identified best hypothesis for the missing segment. In one aspect, the method further includes identifying a duration for the missing segment and generating the plurality of hypotheses of the identified duration for the missing segment. The step of identifying the best hypothesis for the missing segment can be based on speech context, a pronouncing lexicon, and/or a language model. Each hypothesis can have an identical acoustic score.05-27-2010
20100131263Identifying and Generating Audio Cohorts Based on Audio Data Input - A computer implemented method, apparatus, and computer program product for generating audio cohorts. An audio analysis engine receives audio data from a set of audio input devices. The audio data is associated with a plurality of objects. The audio data comprises a set of audio patterns. The audio data is processed to identify attributes of the audio data to form digital audio data. The digital audio data comprises metadata describing the attributes of the audio data. A set of audio cohorts is generated using the digital audio data and cohort criteria. Each audio cohort in the set of audio cohorts comprises a set of objects from the plurality of objects that share at least one audio attribute in common.05-27-2010
20100057442DEVICE, METHOD, AND PROGRAM FOR DETERMINING RELATIVE POSITION OF WORD IN LEXICAL SPACE - The position of a word in the lexical space is determined stably and highly accurately by arbitrarily setting a predetermined initial condition, determining the occurrence frequency and cooccurrence relationship of the word under a given condition, and minimizing the difference between the values of the occurrence frequency and cooccurrence and the initial layout values arbitrarily set.03-04-2010
20100063799Process for Constructing a Semantic Knowledge Base Using a Document Corpus - Related free-text documents, a corpus, are used to empirically derive a semantic knowledge base through a method in which documents are segmented into unique sentences, and then used to define sentential propositions which are arranged in a knowledge hierarchy. The method takes compound natural language sentences and transforms them to simple sentences by a process that is a part of the invention. A knowledge editor enables a domain expert using the methods of the invention to map the sentences in the corpus to sentential proposition(s). The resulting knowledge base can be used to semantically analyze documents in data mining and decision support applications, and can assist word processors or speech recognition devices. The invention is illustrated in connection with radiology reports, but it has wide applicability.03-11-2010
20100063798ERROR-DETECTING APPARATUS AND METHODS FOR A CHINESE ARTICLE - The invention discloses an error-detecting method for a Chinese article, handling a Chinese sentence including a first erroneous Chinese character string in a first location. The method includes subdividing the first erroneous Chinese character string into a plurality of first subgroups, wherein each of the first subgroups consists of two consecutive and non-consecutive Chinese characters out of the first erroneous Chinese character string. The method further includes providing a database containing a plurality of first correct Chinese character strings and a plurality of corresponding first correct indices, wherein the first correct indices consist of two consecutive and non-consecutive Chinese characters out of the first correct Chinese character strings. The method further includes acquiring one of the first correct indices according to the first subgroup, and one of the first correct Chinese character strings according to the acquired first correct index. The method further includes generating a best candidate sentence according to the acquired first correct Chinese character string, and showing the Chinese sentence and the best candidate sentence on a display device.03-11-2010
20100063797DISCOVERING QUESTION AND ANSWER PAIRS - The present invention provides a new approach to extracting question-answer pairs from online forums. The system develops a classification-based technique to discover questions in forums using sequential patterns automatically extracted from both questions and non-question sentences in forums as features. Once the questions are discovered, the system discovers the answers. The invention includes a graph-based method is that it is complementary with supervised methods for knowledge extraction, and techniques for question answering.03-11-2010
20100063796Word Sense Disambiguation Using Emergent Categories - Disclosed herein is a computer implemented method and system for word sense disambiguation in a natural language sentence. The natural language sentence is parsed for identifying possible parts of speech for each term and identifying possible phrase structures. Terms comprising one or more linguistic roles are identified. The possible sense combinations for the terms with linguistic roles are identified. Emergent categories are applied to identify possible valid senses for each of the terms with identified linguistic roles. Linguistic role pairs are identified from among the terms identified with linguistic roles. The correspondence functions with the correspondence function types matching the identified linguistic role pairs are identified from an emergent categories database. The pair-wise senses for each term are compared with the identified linguistic roles to identify the possible sense combinations. The possible senses are inferred for each term with identified linguistic roles in the natural language sentence and previous sentences.03-11-2010
20110178794METHODS AND SYSTEMS FOR INTERPRETING TEXT USING INTELLIGENT GLOSSARIES - A computer implemented method used to interpret text, including from a set of formal glossaries which may refer one to the other and are intended to define precisely the terminology of a field of endeavor. Such glossaries are known as intelligent, in the sense that they allow machines to make deductions, without the need for human intervention. However, they may also accept human intervention. Once a word is defined in an intelligent glossary, all the logical consequences of the use of that word in a formal and well-formed sentence are computable. The process includes a question and answer mechanism, which applies the definitions contained in the intelligent glossaries to a given formal sentence. The methods may be applied in the development of knowledge management methods and tools that are based on semantics; for example: modeling of essential knowledge in the field based on the relevant semantics.07-21-2011
20110178793DIALOGUE ANALYZER CONFIGURED TO IDENTIFY PREDATORY BEHAVIOR - A dialogue analyzer configured to identify online communications relating to lewd, predatory, hostile, and/or otherwise inappropriate subject matter is disclosed. Identified communications include those occurring via social networks, instant messaging, online chat rooms, computer in-game chat, email and the like. The communications of a monitored computer user are scanned to identify those communications that match predetermined lexical rules. The rules comprise sets of word-concepts that may be associated based on spelling, sound, meaning, appearance or probability of appearance in a text string, etc. Various numbers and configurations of word concepts may be implemented in a rule in order to more accurately scan the online communication data for a potential match. When a match is found, a copy of the communication, along with contextual information, is presented to a parent or guardian user. This information is presented at a central website and via an email notification to the parent or guardian. Various embodiments are described.07-21-2011
20100057443SYSTEMS AND METHODS FOR RESPONDING TO NATURAL LANGUAGE SPEECH UTTERANCE - Systems and methods are provided for receiving speech and non-speech communications of natural language questions and/or commands, transcribing the speech and non-speech communications to textual messages, and executing the questions and/or commands. The invention applies context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users presenting questions or commands across multiple domains. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech and non-speech communications and presenting the expected results for a particular question or command.03-04-2010
20110099003INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM - An information processing apparatus includes a category classifying unit configured to classify a document into one or more categories, a word extracting unit configured to extract one or more words from the document, a word score calculating unit configured to calculate a word score for each of the one or more words extracted from the document on the basis of an appearance frequency of the word in each of the one or more categories, the word score serving as an index of interest of the word, a sentence-for-computation extracting unit configured to extract one or more sentences from the document, and a sentence score calculating unit configured to calculate a sentence score for each of the extracted one or more sentences on the basis of the word score calculated by the word score calculating unit, the sentence score serving as an index of interest of the sentence.04-28-2011
20110099002INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM - An information processing apparatus includes a holding unit configured to hold, in advance, presentation data to be presented to a person; a detection unit configured to detect, in a captured image obtained by capturing an image of a photographic subject, the photographic subject; a reading unit configured to read presentation data associated with a detection result of the photographic subject from among items of presentation data held in advance; and an output unit configured to output the read presentation data.04-28-2011
20110087483EMOTION ANALYZING METHOD, EMOTION ANALYZING SYSTEM, COMPUTER READABLE AND WRITABLE RECORDING MEDIUM AND EMOTION ANALYZING DEVICE - A system for analyzing a sentence emotion is provided. The system comprises a case repository, an input module, a sentence structure analyzing module, a similarity analyzing module and an emotion detection module. The case repository stores several case sentences and each case sentence comprises at least one major term and is corresponding to at least one emotion annotation. The input module receives an input sentence and the sentence structure analyzing module analyzes a sentence structure of the input sentence. The similarity analyzing module performs a semantic analysis and a syntax analysis according to the sentence structure to obtain a similarity level between the input sentence and each of the case sentences. The emotion detection module detects at least one emotion of the input sentence according to the similarity level between the input sentence and each of the case sentences.04-14-2011
20110251839METHOD AND SYSTEM FOR INTERACTIVELY FINDING SYNONYMS USING POSITIVE AND NEGATIVE FEEDBACK - Determining synonyms of words in a set of documents. Particularly, when provided with a word or phrase as input, in exemplary embodiments there is afforded the return of a predetermined number of “top” synonym words (or phrases) for an input word (or phrase) in a specific collection of text documents. Further, a user is able to provide ongoing and iterative positive or negative feedback on the returned synonym words, by manually accepting or rejecting such words as the process is underway.10-13-2011
20110087484APPARATUS AND METHOD FOR DETECTING SENTENCE BOUNDARIES - Provided are an apparatus and a method for detecting sentence boundaries. The apparatus includes a sentence boundary candidate extracting unit, a document context analyzing unit, a sentence boundary candidate classifying unit, a sentence generating unit. The sentence boundary candidate extracting unit extracts a sentence boundary candidate from an input document. The document context analyzing unit extracts features from information on preceding and following contexts of the sentence boundary candidate. The features are used in two or more statistical algorithms. The sentence boundary candidate classifying unit classifies whether the sentence boundary candidate is a sentence boundary or not, using the features and the two or more statistical algorithms. The sentence generating unit extracts sentence units from the document based on a result of the classification of whether the sentence boundary candidate is a sentence boundary or not.04-14-2011
20110153310MULTIMODAL AUGMENTED REALITY FOR LOCATION MOBILE INFORMATION SERVICE - In one or more embodiments, one or more methods and/or systems described can perform producing a lattice of object hypotheses based on multiple reference objects from image information; receiving input speech information that includes a request for information associated with at least one reference object of the multiple reference objects; producing a lattice of speech hypotheses based on at least a first possible description included in the speech information; producing a lattice of scored semantic hypotheses based on at least the lattice of object hypotheses and the lattice of speech hypotheses; determining that a single semantic interpretation score of the lattice of scored semantic hypotheses exceeds a predetermined value; and providing requested information associated with the at least the first reference object of the plurality of reference objects.06-23-2011
20110082688Apparatus and Method for Analyzing Intention - An apparatus and system for analyzing intention are provided. The apparatus for analyzing an intention applies a context-free grammar to each of one or more sentences in units of one or more phrases to perform phrase spotting on each sentence, thereby extending a recognition range for an out-of-grammar (OOG) expression. Meanwhile, the apparatus for analyzing an intention determines whether sentences that have undergone phrase spotting are grammatically valid by applying a dependency grammar to the sentences to filter an invalid sentence, and generates the intention analysis result of a valid sentence, thereby and grammatically and/or semantically verifying a sentence that has undergone speech recognition while extending a speech recognition range.04-07-2011
20130158977System and Method for Evaluating Speech Exposure - Systems and methods are provided for detecting and analyzing speech spoken in the vicinity of a user. The detected speech may be analyzed to determine the quality, volume, complexity, language, and other attributes. A value metric may be calculated for the received speech, such as to inform parents of a child's progress related to learning to speak, or to provide feedback to a foreign language learner. A corresponding device may display the number of words, the value metric, or other information about speech received by the device.06-20-2013
20130158978Adaptation of Vocabulary Levels for Enhanced Collaboration - A mechanism is provided for adapting vocabulary levels in a collaborative session. A vocabulary level indicator is received for a first user in the collaborative session. During generation of an electronic communication by a second user in the collaborative session, text entered in the electronic communication is scanned in order to identify a vocabulary level associated with text. The vocabulary level associated with the text is compared to the vocabulary level indicator for the first user. Responsive to the text exceeding the vocabulary level indicator for the first user thereby indicating violating text, an indication is provided to the second user that the violating text is above a vocabulary level of the first user.06-20-2013
20130158980SUGGESTING INTENT FRAME(S) FOR USER REQUEST(S) - Techniques are described herein that are capable of suggesting intent frame(s) for user request(s). For instance, the intent frame(s) may be suggested to elicit a request from a user. An intent frame is a natural language phrase (e.g., a sentence) that includes at least one carrier phrase and at least one slot. A slot in an intent frame is a placeholder that is identified as being replaceable by one or more words that identify an entity and/or an action to indicate an intent of the user. A carrier phrase in an intent frame includes one or more words that suggest a type of entity and/or action that is to be identified by the one or more words that may replace the corresponding slot. In accordance with these techniques, the intent frame(s) are suggested in response to determining that natural language functionality of a processing system is activated.06-20-2013
20110077936SYSTEM AND METHOD FOR GENERATING VOCABULARY FROM NETWORK DATA - A method is provided in one example and includes receiving data propagating in a network environment and separating the data into one or more fields. At least some of the fields are evaluated in order to identify nouns and noun phrases within the fields. The method also includes identifying selected words within the nouns and noun phrases based on a whitelist and a blacklist. The whitelist includes a plurality of designated words to be tagged and the blacklist includes a plurality of rejected words that are not to be tagged. A resultant composite is generated for the selected nouns and noun phrases that are tagged. The resultant composite is incorporated into the whitelist if the resultant composite is approved.03-31-2011
20110060585INPUTTING METHOD BY PREDICTING CHARACTER SEQUENCE AND ELECTRONIC DEVICE FOR PRACTICING THE METHOD - The present invention relates to a method of predicting and entering a character string and an electronic device in which the method is implemented. The method of predicting and entering a character string includes a step (S03-10-2011
20120303358SEMANTIC TEXTUAL ANALYSIS - A method of comparing the semantic similarity of two different text phrases in which the grammatical structure of the two different text phrases is analysed and a keyword set for each of the different text phrases is derived The semantic similarity of the phrases can be determined in accordance with the grammatical structure of the two different text phrases and the contents of the two keyword sets.11-29-2012
20120303357SELF-LEARNING METHODS FOR AUTOMATICALLY GENERATING A SUMMARY OF A DOCUMENT, KNOWLEDGE EXTRACTION AND CONTEXTUAL MAPPING - Advance Machine Learning or Unsupervised Machine Learning Techniques are provided that relate to Self-learning processes by which a machine generates a sensible automated summary, extracts knowledge, and extracts contextually related Topics along with the justification that explains “why they are related” automatically without any human intervention or guidance (backed ontology's) during the process. Such processes also relate to generating a 360-Degree Contextual Result (360-DCR) using Auto-summary, Knowledge Extraction and Contextual Mapping.11-29-2012
20120303356AUTOMATED SELF-SERVICE USER SUPPORT BASED ON ONTOLOGY ANALYSIS - A method for providing information to a user in response to a received user query. A natural language analysis generates substrings relevant to the user query. An ontology analysis outputs: terms of an ontology matching the relevant generated substrings; and relationships between the terms. A query analysis analyzes the user query regarding the outputted terms and relationships, including ascertaining whether the user query is more suitable for service than for an information search. If it is so ascertained, then service actions for the user to perform are identified to the user. If it is not so ascertained, then: the user query is refined based on the outputted terms and relationships; a search query is generated based on the refined user query, a search is initiated based on the search query, and results of the search are provided to the user.11-29-2012
20120303355Method and System for Text Message Normalization Based on Character Transformation and Web Data - A method for generating non-standard tokens that correspond to standard tokens used in speech synthesis systems has been developed. The method includes selecting a standard token from a plurality of standard tokens stored in memory, using a random field model to select a predetermined operation to perform on each character in the selected token, performing the selected operation on each character to generate an output token, and storing the output token in the memory in association with the selected token. The output token is different from each token in the plurality of standard tokens.11-29-2012
20110060584ERROR CORRECTION USING FACT REPOSITORIES - The disclosed system and method apply stores of factual information to correct errors in digital text, for example, generated from OCR, speech and/or handwriting recognition devices, and other automatic recognition devices. A text produced by OCR, speech recognition, handwriting recognition, and others may be processed to extract discussed facts. Databases of facts are searched based on information in the text. After comparing facts asserted in the text with the factual data from the databases, suggested corrections of the text are produced.03-10-2011
20120150533PROVIDING DEFINITIONS THAT ARE SENSITIVE TO THE CONTEXT OF A TEXT - Systems and techniques for providing definitions to a user. The provision embodies the context of a text in which the defined term appears. In one aspect, a system includes an electronic device that includes one or more data processing devices programmed to respond to receipt of the user selection of the first term by performing operations. The operations include accessing, from the one or more persistent data storage devices, the characterizations of the contexts of the texts, comparing the accessed characterizations of the contexts of the texts with one or more characteristics of the context of the textual content of a media file, and ranking the definitions of the first term according to respective likelihoods that the definitions appropriately characterize the usage of the first term within the textual content of the media file.06-14-2012
20130158979System and Method for Identifying Phrases in Text - A method includes accessing text that includes a plurality of words, tagging each of the plurality of words with one of a plurality of parts of speech (POS) tags, and creating a plurality of tokens, each token comprising one of the plurality of words and its associated POS tag. The method further includes clustering one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the POS tags of the one or more tokens, and forming a phrase based on the chunk of tokens, the phrase comprising the words of the one or more tokens clustered into the chunk of tokens.06-20-2013
20130158984METHOD OF AND SYSTEM FOR VALIDATING A FACT CHECKING SYSTEM - A fact checking system is able to verify the correctness of information and/or characterize information by comparing the information with one or more sources. The fact checking system automatically monitors, processes, fact checks information and indicates a status of the information. Fact checking results are able to be validated by re-fact checking the fact check results.06-20-2013
20100262419METHOD OF CONTROLLING COMMUNICATIONS BETWEEN AT LEAST TWO USERS OF A COMMUNICATION SYSTEM - A communication system includes at least a sound re-production system (10-14-2010
20100305941AUTOMATION OF AUDITING CLAIMS - Described are computer-based methods and apparatuses, including computer program products, for automation of auditing claims. Data indicative of an insurance company name is received, the data comprising one or more words. The data is processed through one or more processing steps to generate processed data comprising one or more processed words. One or more candidate word strings are selected based on the one or more processed words. Matching information is associated with each of the one or more candidate word strings. Analysis information is generated for each of the one or more candidate word strings based on the associated matching information. An insurance company identifier is associated with received data based on the analysis information and one or more matching rules.12-02-2010
20090089045METHOD OF TRANSFORMING NATURAL LANGUAGE EXPRESSION INTO FORMAL LANGUAGE REPRESENTATION - This invention comprises a series of steps which transforms one or more natural language expressions into a single, well-formed formal language representation. Each natural language expression is partially parsed into simple fragments, each of which is then associated with one or more short formal expressions. Each formal expression is constructed in such a way as to contain one or more placeholder variables, each of which is associated with one or more attributes to constrain the types of entities that each variable can potentially represent. The resulting plurality of formal expressions is then filtered for relevance within a given context, and the surviving expressions manipulated based upon a plurality of rules, which are cognizant of the attributes associated with each variable contained therein. A user is then presented with the resulting plurality of formal expressions, whereupon the user optionally selects, rejects, adds to, logically connects and otherwise manipulates each member of said plurality. When the user is satisfied that the plurality represents an intended meaning, the formal expressions are combined into a single, formal representation.04-02-2009
20100280819Dialog Design Apparatus and Method - This invention relates to a dialog design apparatus and method. More specifically, this invention relates to a state oriented dialog design apparatus and method to facilitate the creation of natural language dialogs and creating data structures for voice user interfaces. The dialog design apparatus may include inputting means for receiving a user's prompt; response generating means for the user to generating at least one response; dialog structure generating means for structurally managing the user's input and response; and output means for outputting and displaying at least one dialog structure. A state in the present invention may include at least one system prompt and at least one response, and a linking unit may link a first state to a second state related to the first state, link the second state to a third state, and so on until certain system actions are achieved. A loop detecting unit in the present invention detects and identifies loops in the dialog structure.11-04-2010
20110153312METHOD AND COMPUTER SYSTEM FOR AUTOMATICALLY ANSWERING NATURAL LANGUAGE QUESTIONS - A computer system and method for automatically answering natural language questions. The system comprises an input to receive said natural language questions; a data store to record linked pairs of questions and corresponding answers; a matcher configured to compare a said received natural language question withsaid linked question and answer pairs and an output to transfer a said received natural language question to a researcher if no matches are found. The system may further comprise a system to link pairs of questions and corresponding answers into groups, to enable the generation of a prototypicalanswer for each group of pairs of questions and answers and to store said prototypicalanswers in said data store; wherein said matcher compares a said received naturallanguage question with a question in said data store having an associated prototypicalanswer and output said associated prototypical answer for said question in response to said matching. Alternatively, said matcher may be configured to output all linked question and answer pairs which matchsaid received natural language question. The system may be further adapted to distribute natural language questions to be answered to researchers by assigning unpopularity scores to each of said natural language questions.06-23-2011
20110161071SYSTEM AND METHOD FOR DETERMINING SENTIMENT EXPRESSED IN DOCUMENTS - A system, computer readable storage medium storing instructions, and computer-implemented method for determining sentiment expressed in documents is disclosed. A document is received from a plurality of documents. A sentence in the document that includes at least one sentiment signature within a predetermined distance of at least one keyword from a list of keywords is identified, wherein the list of keywords is extracted from the plurality of documents and is filtered using a phase transition formula, and wherein the at least one sentiment signature corresponds to an expression of at least one sentiment in the sentence. At least one category corresponding to the at least one keyword of the sentence is determined, wherein the at least one category is included in a list of categories that is generated using the list of keywords. At least one sentiment corresponding to the at least one category is determined based on the at least one sentiment signature.06-30-2011
20110161070PRE-HIGHLIGHTING TEXT IN A SEMANTIC HIGHLIGHTING SYSTEM - A method, computer system and/or computer program product pre-highlight text that is located in a search. A text highlight and a triple statement semantic annotation based on the text highlight of a first document are received. The triple statement semantic annotation comprises a subject, a relationship and an object. A natural language processing (NLP) pattern based on the triple statement semantic annotation is generated. The NLP pattern is representative of a linguistic pattern between the text highlight and the triple statement semantic annotation. A multi-dimensional linguistic profile is generated based on the text highlight, the triple statement semantic annotation and the NLP pattern, wherein the multi-dimensional linguistic profile defines entities, relationships and attributes associated with document text. Text in a second document is compared with the multi-dimensional linguistic profile, and text in the second document is highlighted based on the comparison.06-30-2011
20100004922METHOD AND SYSTEM FOR AUTOMATICALLY GENERATING REMINDERS IN RESPONSE TO DETECTING KEY TERMS WITHIN A COMMUNICATION - A computer-implemented method of automatically generating an electronic reminder is provided. The method includes identifying, using term-recognition circuitry, at least one key term within an electronic message received with an electronic communications device. The method further includes generating at least one reminder based upon the at least one key term. One or more reminders are, according to the method, electronically conveyed to a user at a time later than when the message was received.01-07-2010
20100004924Method and system context-aware for identifying, activating and executing software that best respond to user requests generated in natural language - A computer-implemented method capable of identifying, activating, and executing commands, methods, functions, interfaces, and software-based applications that can satisfy a specific natural language user request represented by a text stream and generated from any means such as typing, voice, gestures, signs or by human thoughts.01-07-2010
20100004925Clique based clustering for named entity recognition system - A soft clustering method comprises (i) grouping items into non-exclusive cliques based on features associated with the items, and (ii) clustering the non-exclusive cliques using a hard clustering algorithm to generate item groups on the basis of mutual similarity of the features of the items constituting the cliques. In some named entity recognition embodiments illustrated herein as examples, named entities together with contexts are grouped into cliques based on mutual context similarity. Each clique includes a plurality of different named entities having mutual context similarity. The cliques are clustered to generate named entity groups on the basis of mutual similarity of the contexts of the named entities constituting the cliques.01-07-2010
20120203544CORRECTING TYPING MISTAKES BASED ON PROBABILITIES OF INTENDED CONTACT FOR NON-CONTACTED KEYS - Systems and methods for identifying word candidates based on a sequence of contact events within one or more keys on a keyboard. In some examples, the system identifies a probability of intended contact for keys adjacent to a contacted key, and returns the identified probabilities to a typing correction system that identifies likely word candidates that correspond to text input sequences.08-09-2012
20120203543Method for Analyzing Message Archives and Corresponding Computer Program - A method for analysing a large number of messages, wherein the number of messages is reduced based on pattern recognition and pattern simplification, rules for the pattern recognition and pattern simplification are based on a regular grammatical structure, and patterns are sought in the remaining messages, or directly, i.e., without previous simplification. Syntactic pattern recognition is used for each type of pattern search, and a finite machine is derivable using the regular grammatical structure underlying each pattern recognition by transforming the mapping rules into transfer function, such that structural connections between the messages can be displayed graphically.08-09-2012
20090240487MACHINE TRANSLATION - A method for computer-assisted translation from a source language to a target language makes use of number of rules. Each rule forms an association between a representation of a sequence of source language tokens with a corresponding tree-based structure in the target language. The tree-based structure for each of at least some of the rules represents one or more asymmetrical relations within a number of target tokens associated with the tree-based structure and provides an association of the target tokens with the sequence of source language tokens of the rule. An input sequence of source tokens is decoded according to the rules to generate a representation of one or more output sequences of target language tokens. Decoding includes, for each of at least some sub-sequences of the input sequence of source tokens, determining a tree-based structure associated with the sub-sequence according a match to one of the plurality of rules.09-24-2009
20090240488CORRECTIVE FEEDBACK LOOP FOR AUTOMATED SPEECH RECOGNITION - A method for facilitating the updating of a language model includes receiving, at a client device, via a microphone, an audio message corresponding to speech of a user; communicating the audio message to a first remote server; receiving, that the client device, a result, transcribed at the first remote server using an automatic speech recognition system (“ASR”), from the audio message; receiving, at the client device from the user, an affirmation of the result; storing, at the client device, the result in association with an identifier corresponding to the audio message; and communicating, to a second remote server, the stored result together with the identifier.09-24-2009
20120065963System And Method Of Generating Responses To Text-Based Messages - In accordance with one aspect of the present invention, an automated method of and system for generating a response to a text-based natural language message is disclosed. The method includes identifying a first selected input clause in a sentence in the text-based natural language message. Also, assigning a semantic tag to the first selected input clause and matching the semantic tag to a historical input tag. The historical input tag associated with a first previously generated response clause. Further; generating an output response message based on the historical response clause, the output response message derived from the historical input tag and a second previously generated response clause. The system includes means for performing the method steps.03-15-2012
20080243482Method for performing effective drill-down operations in text corpus visualization and exploration using language model approaches for key phrase weighting - The invention relates to a method and an apparatus for performing a drill-down operation on a text corpus comprising documents, using language models for key phrase weighting, said method comprising the steps of weighting key phrases occurring both in a foreground language model, which contains a selected document cluster of said text corpus, and in a background language model, which does not contain said selected document cluster, by calculating for each key phrase a key phrase weight comprising a ratio between the foreground weight of said key phrase and a background weight of said key phrase, and assigning documents of the foreground language model to cluster labels which are formed by key phrases having high calculated key phrase weights.10-02-2008
20080243480System and method for determining semantically related terms - Systems and methods for determining semantically related terms are disclosed. Generally, a semantically related term tool receives a seed set and identifies a plurality of terms that constitute the seed set. For each term of the seed set, the semantically related term tool identifies concept terms associated with terms of the seed set other than the term being processed, joins the term being processed with each of the identified concept terms, and adds the resulting terms to a plurality of semantically related terms. The semantically related term tool removes invalid terms from the plurality of semantically related terms based on a language model and ranks at least a portion of the remaining terms of the plurality of semantically related terms based on a metric indicating a degree of semantical relationship between a term of the plurality of semantically related terms and one or more terms of the set seed.10-02-2008
20080243478Efficient Implementation of Morphology for Agglutinative Languages - A method for constructing an automaton for automated analysis of agglutinative languages, the method including constructing an affix automaton for each of a plurality of affix types of an agglutinative language, where each of the affix types is associated with one or more affixes associated with a morphological concept, combining any of the affix automatons to form a plurality of template automatons, where each of the template automatons is patterned after any of a plurality of agglutination templates of any of the affix types for the language, and combining the template automatons into a master automaton.10-02-2008
20080243484SYSTEMS AND METHODS FOR GENERATING WEIGHTED FINITE-STATE AUTOMATA REPRESENTING GRAMMARS - A context-free grammar can be represented by a weighted finite-state transducer. This representation can be used to efficiently compile that grammar into a weighted finite-state automaton that accepts the strings allowed by the grammar with the corresponding weights. The rules of a context-free grammar are input. A finite-state automaton is generated from the input rules. Strongly connected components of the finite-state automaton are identified. An automaton is generated for each strongly connected component. A topology that defines a number of states, and that uses active ones of the non-terminal symbols of the context-free grammar as the labels between those states, is defined. The topology is expanded by replacing a transition, and its beginning and end states, with the automaton that includes, as a state, the symbol used as the label on that transition. The topology can be fully expanded or dynamically expanded as required to recognize a particular input string.10-02-2008
20110257963METHOD AND SYSTEM FOR SEMANTIC SEARCHING - A method comprising a preliminary automated analysis of at least one corpus of natural language text is disclosed. For each sentence of a corpus, the method includes performing a syntactic analysis using linguistic descriptions to generate at least one syntactic structure for the sentence, building a semantic structure for the sentence, associating each generated syntactic and semantic structure with the sentence, and saving each structure. For each corpus text that was preliminary analyzed, performing an indexing operation to index lexical meanings and values of linguistic parameters of each syntactic structure and each semantic structure associated with sentences in the corpus text. A semantic search includes at least one automatic preliminary analyzed corpus of sentences comprising searched values of linguistic, syntactic and semantic parameters. Due to a deep semantic analysis of a corpus, the search may be executed in various languages, in resources of various languages, and in the text of corpora of various languages regardless of the language of the query.10-20-2011
20110257960METHOD AND APPARATUS FOR CONTEXT-INDEXED NETWORK RESOURCE SECTIONS - Techniques to provide context-indexed network resource sections include, in response to receiving first data that describes a network resource, determining a section of a plurality of sections included in the network resource. A section context token that indicates a probability in the section of a topic from a context vocabulary is determined. The context vocabulary includes concepts describing temporal, spatial, environmental or activity circumstances of consumers. Second data that indicates the section in association with the section context token is stored.10-20-2011
20110131033Weight-Ordered Enumeration of Referents and Cutting Off Lengthy Enumerations - In many reference resolution problems there are many candidate referents, and the overhead of enumerating them can be considerable. The overhead is reduced by stopping enumeration before all candidate referents have been enumerated, utilizing the properties of ordered and semi-ordered enumerators. Converting semi-ordered enumerators into ordered enumerators and combining several ordered enumerators into a single using dynamic weightings for handling determiner interpretations are disclosed.06-02-2011
20080221874Method and Apparatus for Fast Semi-Automatic Semantic Annotation - A method, apparatus and computer instructions is provided for fast semi-automatic semantic annotation. Given a limited annotated corpus, the present invention assigns a tag and a label to each word of the next limited annotated corpus using a parser engine, a similarity engine, and a SVM engine. A rover then combines the parse trees from the three engines and annotates the next chunk of limited annotated corpus with confidence, such that the efforts required for human annotation is reduced.09-11-2008
20100332218KEYWORD BASED MESSAGE HANDLING - An apparatus comprising a controller, wherein said controller is configured to display a message text, receive an input indicating a keyword; determine an associated operation and to generate a response message according to the associated operation.12-30-2010
20110054884SYSTEM FOR ASSISTING IN DRAFTING APPLICATIONS - This invention relates to a Method and System for assisting in drafting applications comprising a server (03-03-2011
20110054883SPEECH UNDERSTANDING SYSTEM USING AN EXAMPLE-BASED SEMANTIC REPRESENTATION PATTERN - A speech understanding apparatus includes: a speech recognition unit for recognizing an input speech to produce a speech recognition result; a sentence analysis unit for performing morpheme analysis on a sentence corresponding to the speech recognition result, extracting additional information, and performing syntax analysis; a hierarchy describing unit for describing hierarchy of the sentence; a class transformation unit for performing class transformation on the sentence; a semantic representation determination unit for marking optional expressions for the sentence, deleting meaningless expressions and the additional information, converting the sentence into its base form, and deleting morphemic tags or symbols to determine a semantic representation; a semantic representation retrieval unit for retrieving the determined semantic representation from an example-based semantic representation pattern database; and a retrieval result processing unit for selectively producing a retrieved semantic representation.03-03-2011
20110054882MECHANISM FOR IDENTIFYING INVALID SYLLABLES IN DEVANAGARI SCRIPT - A mechanism for identifying invalid syllables in Devanagari script is disclosed. A method of embodiments of the invention includes receiving Devanagari text from an application of a computing device for parsing, determining a character type for a character of the Devanagari text, determining a new state associated with the character by referencing a Devanagari state machine with the determined character type and a current state of the Devanagari text, and transmitting an invalid syllable signal to the application for display on a display device to an end user of the application if the determined new state is invalid.03-03-2011
20110264444KEYWORD DISPLAY SYSTEM, KEYWORD DISPLAY METHOD, AND PROGRAM - The present invention is a keyword display system that includes a speaker specifier for specify a speaker; a weight determinator for determining a weight of the specified speaker; a keyword extractor for extracting keywords from a speech of the aforementioned speaker; a keyword relation degree calculator for calculating a relation degree between the aforementioned extracted keywords, carrying out a weighting for this calculated relation degree by using the weight of the speaker having spoken the aforementioned keywords, and calculating a keyword relation degree between the keywords; and a keyword display controller for displaying a relevancy between the aforementioned extracted keywords responding to the aforementioned keyword relation degree.10-27-2011
20110264443INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM - Disclosed herein is an information processing device including: a data acquirer configured to acquire a sentence collection having a plurality of sentences and a plurality of phrases included in the sentence collection; a phrase feature decider configured to decide phrase features each representing a characteristic of a respective one of the phrases acquired by the data acquirer; a collection feature decider configured to decide a collection feature representing a characteristic of the sentence collection; and a compressor configured to generate compressed phrase features by using the phrase features and the collection feature, the compressed phrase features having a dimension lower than a dimension of the phrase features and each representing a characteristic of the respective one of the phrases acquired by the data acquirer.10-27-2011
20110264442VISUALLY EMPHASIZING PREDICTED KEYS OF VIRTUAL KEYBOARD - A computing system includes a touch display and a virtual keyboard visually presented by the touch display. The virtual keyboard includes a plurality of touch-selectable keys each having a visual appearance that dynamically changes. A touch-selectable key has a deemphasized visual appearance if the touch-selectable key is not predicted to be a next selected key, and the touch-selectable key has a prediction-emphasized visual appearance if the touch-selectable key is predicted to be a next selected key.10-27-2011
20110257962SYSTEM AND METHOD FOR AUTOMATICALLY GENERATING SENTENCES OF A LANGUAGE - A system and method for automatically generating sentences in a language is disclosed. The system comprising a grammar processor for converting an input grammar into a hierarchical representation, and a grammar explorer module for traversing the grammar hierarchy based on an explore specification, which defines what nodes of the hierarchy should be explored. The explorer module takes the exploration specification as input and traverses the hierarchy according to the exploration types specified in the exploration specification. The system and method can be used to automatically generate assembly instructions for a microprocessor given its assembly language grammar, to generate sentences of a natural language like English from its grammar and to generate programs in a high-level programming language like C.10-20-2011
20110257961SYSTEM AND METHOD FOR GENERATING QUESTIONS AND MULTIPLE CHOICE ANSWERS TO ADAPTIVELY AID IN WORD COMPREHENSION - An adaptive learning system and method provides for automatically generating question types to a user for word comprehension and selecting multiple choice answers for display. Questions are developed for the user by obtaining online content and indexing the content into individual sentences and questions. The system provides questions in a series of rounds to the user and then adaptively tracks the progress of the user based on the categorization of each question.10-20-2011
20080235004DISAMBIGUATING TEXT THAT IS TO BE CONVERTED TO SPEECH USING CONFIGURABLE LEXEME BASED RULES - A software language including language constructs for disambiguating text that is to be converted to speech using configurable lexeme based rules. The language can include at least one conditional statement and a significance indicator. The conditional statement can define a sense of usage for a lexeme. The significance indicator can define a criteria for selecting an associated sense of usage. The language can also include an action expression that is associated with a conditional statement that defines a set of programmatic actions to be executed upon a selection of the associated usage sense. The conditional statement can include a context range specification that defines a scope of an input string for examination when evaluating the conditional statement. Further, the conditional statement can include a directive that represents a defined condition of the lexeme or the text surrounding the lexeme.09-25-2008
20110040554Automatic Evaluation of Spoken Fluency - A procedure to automatically evaluate the spoken fluency of a speaker by prompting the speaker to talk on a given topic, recording the speaker's speech to get a recorded sample of speech, and then analyzing the patterns of disfluencies in the speech to compute a numerical score to quantify the spoken fluency skills of the speakers. The numerical fluency score accounts for various prosodic and lexical features, including formant-based filled-pause detection, closely-occurring exact and inexact repeat N-grams, normalized average distance between consecutive occurrences of N-grams. The lexical features and prosodic features are combined to classify the speaker with a C-class classification and develop a rating for the speaker.02-17-2011
20100292985DOCUMENT MANAGEMENT APPARATUS AND DOCUMENT MANAGEMENT METHOD - A document management apparatus is aimed at easily processing, managing and reusing newly taken image data in accordance with user's needs. The apparatus includes: a document area analyzing unit configured to analyze and extract a document area from image data; a text information analyzing unit configured to analyze and extract text information with respect to the document area; a text information semantic analysis unit configured to analyze and extract semantics of the text information from the text information; a managing unit configured to associate the document area, the text information and the semantics of the text information with each other, and manage them as integrated information; an integrated information presenting unit configured to present to a user at least the semantics of the text information, of the integrated information managed by the managing unit; and a user-designated semantic setting unit configured to be capable of allowing the user to change the semantics of the text information presented by the integrated information presenting unit and to set the changed semantics.11-18-2010
20100292984METHOD FOR QUICKLY INPUTTING CORRELATIVE WORD - The present invention provides a text input method, which is integrated in a text input program or device supporting word input (e.g., software/hardware keyboard, input method, etc.) and assists a user in easily inputting a word or a phrase (e.g., various tense forms of a verb, etc.) relating to a certain word. The user may fast input a specific word relating to the certain word by a specific operation (e.g., clicking a software or hardware key, moving a screen contact point, etc.) or by a combination of a plurality of operations.11-18-2010
20120150532SYSTEM AND METHOD FOR FEATURE-RICH CONTINUOUS SPACE LANGUAGE MODELS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for predicting probabilities of words for a language model. An exemplary system configured to practice the method receives a sequence of words and external data associated with the sequence of words and maps the sequence of words to an X-dimensional vector, corresponding to a vocabulary size. Then the system processes each X-dimensional vector, based on the external data, to generate respective Y-dimensional vectors, wherein each Y-dimensional vector represents a dense continuous space, and outputs at least one next word predicted to follow the sequence of words based on the respective Y-dimensional vectors. The X-dimensional vector, which is a binary sparse representation, can be higher dimensional than the Y-dimensional vector, which is a dense continuous space. The external data can include part-of-speech tags, topic information, word similarity, word relationships, a particular topic, and succeeding parts of speech in a given history.06-14-2012
20110125487Joint disambiguation of syntactic and semantic ambiguity - Ambiguities in a natural language expression are interpreted by jointly disambiguating multiple alternative syntactic and semantic interpretations. More than one syntactic alternative, represented by parse contexts, are analyzed together with joint analysis of referents, word senses, relation types, and layout of a semantic representation for each syntactic alternative. Best combinations of interpretations are selected from all participating parse contexts, and are used to form parse contexts for the next step in parsing.05-26-2011
20110137640Handheld Electronic Device With Reduced Keyboard and Associated Method of Providing Quick Text Entry in a Message - An improved handheld electronic device having a reduced keyboard provides facilitated language entry by making available to a user certain words that a user may reasonably be expected to enter. In some situations, certain words can be stored, for example, in a temporary dictionary for use in particular situations. For instance, the names of the recipients of an electronic message might be stored in a temporary dictionary for rapid retrieval when entering a salutation in the message. As another example, a number of the words in an existing electronic message may be stored in a temporary dictionary and made available to a user when replying to or forwarding the message since the existing message might include words that the user might reasonably be expected to type in the reply message or the forwarded message.06-09-2011
20110137639ADAPTING A LANGUAGE MODEL TO ACCOMMODATE INPUTS NOT FOUND IN A DIRECTORY ASSISTANCE LISTING - A statistical language model is trained for use in a directory assistance system using the data in a directory assistance listing corpus. Calculations are made to determine how important words in the corpus are in distinguishing a listing from other listings, and how likely words are to be omitted or added by a user. The language model is trained using these calculations.06-09-2011
20110137638ROBUST SPEECH RECOGNITION BASED ON SPELLING WITH PHONETIC LETTER FAMILIES - A system and method for entering a destination into a navigation system, usually a vehicle navigation system, that uses phonetic letter families, or groups of letters which sound similar, to improve the reliability and accuracy of speech recognition. The method involves grouping each letter of the English alphabet into a family of letters which sound similar, such as A, J, and K. When a destination name is spelled by a user, each letter is recognized in terms of the phonetic letter family to which it belongs. This phonetic equivalent spelling is compared to the navigation database of street, city, and state names, which has also been converted to its phonetic equivalent spelling. If a match is found, the user is asked to confirm that this is the desired destination.06-09-2011
20100324889ENABLING GLOBAL GRAMMARS FOR A PARTICULAR MULTIMODAL APPLICATION - Methods, apparatus, and computer program products are described for enabling global grammars for a particular multimodal application according to the present invention by loading a multimodal web page; determining whether the loaded multimodal web page is one of a plurality of multimodal web pages of the particular multimodal application. If the loaded multimodal web page is one of the plurality of multimodal web pages of the particular multimodal application, enabling global grammars typically includes loading any currently unloaded global grammars of the particular multimodal application identified in the multimodal web page and maintaining any previously loaded global grammars. If the loaded multimodal web page is not one of the plurality of multimodal web pages of the particular multimodal application, enabling global grammars typically includes unloading any currently loaded global grammars.12-23-2010
20100121631DATA DETECTION - A method for detecting data in a sequence of characters or text using both a statistical engine and a pattern engine. The statistical engine is trained to recognize certain types of data and the pattern engine is programmed to recognize the grammatical pattern of certain types of data. The statistical engine may scan the sequence of characters to output first data, and the pattern engine may break down the first data into subsets of data. Alternatively, the statistical engine may output items that have a predetermined probability or greater of being a certain type of data and the pattern engine may then detect the data from the output items and/or remove incorrect information from the output items.05-13-2010
20120310631HANDHELD ELECTRONIC DEVICE AND METHOD FOR DISAMBIGUATION OF COMPOUND TEXT INPUT AND THAT EMPLOYS N-GRAM DATA TO LIMIT GENERATION OF LOW-PROBABILITY COMPOUND LANGUAGE SOLUTIONS - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to analyze the combinations of language objects in light of N-gram data stored on the device to avoid proposing low-probability compound language solutions.12-06-2012
20120310630TOKENIZATION PLATFORM - A tokenization platform and method is described for accurately tokenizing character strings, including but not limited to non-delimited character strings of the type commonly used in Internet domain names and computer filenames, to accurately identify words and phrases occurring therein. In one embodiment, a phased tokenization approach is used in which the final phase is a lexical analysis-based tokenization using a dictionary. The dictionary may be advantageously created and updated based upon one or more query logs associated with respective information retrieval systems, thereby ensuring that the dictionary accurately reflects currently-used terminology and captures alternative spellings and presentations of words and phrases submitted by users.12-06-2012
20120310628METHOD AND SYSTEM FOR PROVIDING ACCESS TO INFORMATION OF POTENTIAL INTEREST TO A USER - The present invention provides a method and system for providing access to information of potential interest to a user. Closed-caption information is analyzed to find related information on the Internet. User interactions with a TV which receives programming including closed-caption information are monitored to determine user interests or topics.12-06-2012
20120310627DOCUMENT CLASSIFICATION WITH WEIGHTED SUPERVISED N-GRAM EMBEDDING - Methods and systems for document classification include embedding n-grams from an input text in a latent space, embedding the input text in the latent space based on the embedded n-grams and weighting said n-grams according to spatial evidence of the respective n-grams in the input text, classifying the document along one or more axes, and adjusting weights used to weight the n-grams based on the output of the classifying step.12-06-2012
20090292528APPARATUS FOR PROVIDING INFORMATION FOR VEHICLE - A system is provided with a conversation support means. A conversation support means creates a conversation response, and outputs it in a sound, a character, etc. A conversation response is created in a manner that combines words by inserting a reference keyword as a leading keyword in the response sentence model prepared separately. A conversation support means retrieves the reference keyword beforehand provided in conversation support by dictionary collation from the conversation entry content made by a sound, a manual entry, etc. by a user. Furthermore, the retrieved reference keyword themselves or another reference keyword associated with the retrieved reference keyword are handled as a leading keyword. A series of user conversation contents inputted by the conversation support are accumulated as a base data for determining a user interest. The base data is analyzed to determine a user interest for providing suitable information service.11-26-2009
20110153311Method and an apparatus for automatically providing a common modelling pattern - At least one embodiment of the present invention is directed to a method and/or an apparatus for automatically providing a common modelling pattern as a function of a plurality of stored process models. The common modelling patterns are identified according to three substeps, namely semantic annotation, extraction of pattern based description and composite process pattern mining. The detected common modelling patterns serve as best practice candidates as regards process engineering. At least one embodiment of the present invention finds application in a variety of domains being related to process management, such as process design, process mining and semantic process planning.06-23-2011
20110099001System for extracting information from a natural language text - In the method of extraction, the words of the text are encoded by comparing them with the contents of a lexicon of tool words (essentially articles, prepositions, conjunctions, and verbal auxiliaries), and nominal groups are then identified by searching subsets of the resulting succession of encoded words to look for groups of encoded words that comply with predefined syntactical rules.04-28-2011
20100241418VOICE RECOGNITION DEVICE AND VOICE RECOGNITION METHOD, LANGUAGE MODEL GENERATING DEVICE AND LANGUAGE MODEL GENERATING METHOD, AND COMPUTER PROGRAM - A speech recognition device includes one intention extracting language model and more in which an intention of a focused specific task is inherent, an absorbing language model in which any intention of the task is not inherent, a language score calculating section that calculates a language score indicating a linguistic similarity between each of the intention extracting language model and the absorbing language model, and the content of an utterance, and a decoder that estimates an intention in the content of an utterance based on a language score of each of the language models calculated by the language score calculating section.09-23-2010
20100082332METHODS AND APPARATUS FOR PROTECTING USERS FROM OBJECTIONABLE TEXT - Methods and apparatus are provided for protecting users from objectionable text. Users are protected from objectionable text, by obtaining a predefined acceptable word list containing a plurality of acceptable words; receiving a textual entry from at least one user; and limiting the textual entry to only the acceptable words. The acceptable word list may comprise a dictionary of the acceptable words, and can be maintained by a central server or by a client associated with at least one of the users. The textual entry can be limited by only allowing the user to enter a subsequent character following entry of one or more entered characters if the subsequent character following the one or more entered characters comprises at least a portion of one of the acceptable words. The acceptable word list can optionally be updated with one or more additional acceptable words. The acceptable word list optionally comprises a context sensitive word list or one or more context sensitive rules.04-01-2010
20090299731AURAL SIMILARITY MEASURING SYSTEM FOR TEXT - The aural similarity measuring system and method provides a measure of the aural similarity between a target text (12-03-2009
20090299730MOBILE TERMINAL AND METHOD FOR CORRECTING TEXT THEREOF - A method for selecting text created in a mobile terminal by word and correcting it or changing it to another word, and a mobile terminal implementing the same are disclosed. The mobile terminal includes: a display unit to display one or more words of text, and to display tags for each of the one or more words; an input unit to select at least one of the tagged one or more words as selected one word; and a controller to display candidate words having a similar pronunciation to that of the word selected via the input unit, select one of the candidate words as selected one candidate word, and change the selected one word from the text to the selected one candidate word.12-03-2009
20090299729PARALLEL FRAGMENT EXTRACTION FROM NOISY PARALLEL CORPORA - Machine translation algorithms for translating between a first language and a second language are often trained using parallel fragments, comprising a first language corpus and a second language corpus comprising an element-for-element translation of the first language corpus. Such training may involve large training sets that may be extracted from large bodies of similar sources, such as databases of news articles written in the first and second languages describing similar events; however, extracted fragments may be comparatively “noisy,” with extra elements inserted in each corpus. Extraction techniques may be devised that can differentiate between “bilingual” elements represented in both corpora and “monolingual” elements represented in only one corpus, and for extracting cleaner parallel fragments of bilingual elements. Such techniques may involve conditional probability determinations on one corpus with respect to the other corpus, or joint probability determinations that concurrently evaluate both corpora for bilingual elements.12-03-2009
20090292526MONITORING CONVERSATIONS TO IDENTIFY TOPICS OF INTEREST - A system and method for monitoring conversations of a community of users to identify topics of interest is provided. A user community which is based partly on social networking connections relative to a first user is identified. Conversations involving at least one member of the identified user community are monitored. Based in part on an aggregated analysis of the monitored conversations, keywords are selected to present to the first user. The first user is then provided with a display in which the selected keywords associated with the user community are presented to the first user such that the first user can select a keyword to access content associated therewith.11-26-2009
20090292525APPARATUS, METHOD AND STORAGE MEDIUM STORING PROGRAM FOR DETERMINING NATURALNESS OF ARRAY OF WORDS - An apparatus is provided which determines the naturalness of an array of words as a sentence. When an entire source text to be translated is not registered in a lexicon, the source text is divided into plural words. A parallel translation for each word in the source text is obtained to generate parallel translation patterns, and a web search is made for a text which includes each of the parallel translation patterns (step 11-26-2009
20090292529SYSTEM AND METHOD OF PROVIDING A SPOKEN DIALOG INTERFACE TO A WEBSITE - Disclosed is a system and method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes converting data from a structured database associated with a website to a structured text data set and a structured task knowledge base, extracting linguistic items from the structured database, and training a spoken dialog service component using at least one of the structured text data, the structured task knowledge base, or the linguistic items. The system includes modules configured to implement the method.11-26-2009
20110191099System and Methods for Improving Accuracy of Speech Recognition - The invention provides a system and method for improving speech recognition. A computer software system is provided for implementing the system and method. A user of the computer software system may speak to the system directly and the system may respond, in spoken language, with an appropriate response. Grammar rules may be generated automatically from sample utterances when implementing the system for a particular application. Dynamic grammar rules may also be generated during interaction between the user and the system. In addition to arranging searching order of grammar files based on a predetermined hierarchy, a dynamically generated searching order based on history of contexts of a single conversation may be provided for further improved speech recognition. Dialogue between the system and the user of the system may be recorded and extracted for use by a speech recognition engine to refine or create language models so that accuracy of speech recognition relevant to a particular knowledge area may be improved.08-04-2011
20110191098PHRASE-BASED DOCUMENT CLUSTERING WITH AUTOMATIC PHRASE EXTRACTION - Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.08-04-2011
20110191097Systems and Methods for Word Offensiveness Processing Using Aggregated Offensive Word Filters - Computer-implemented systems and methods are provided for identifying language that would be considered obscene or otherwise offensive to a user or proprietor of a system. A first plurality of offensive words are received, and a second plurality of offensive words are received. A string of words are received, where one or more detected offensive words are selected from the string of words that matches words from the first plurality of offensive words or the second plurality of offensive words. The string of words is processed based upon the detection of offensive words in the string of words.08-04-2011
20110137641INFORMATION ANALYSIS DEVICE, INFORMATION ANALYSIS METHOD, AND PROGRAM - An information analysis device (06-09-2011
20110264441NAVLIPI - Articles, surfaces, media or educational material containing a universal script, comprised of glyphs derived almost entirely from the Roman script and with only a few new glyphs, for transcription of all the world's languages, with particular attention to a means for expression of the phonemic idiosyncrasies within and between languages and language families are provided.10-27-2011
20100030553Linguistic Analysis - A method of operating a computer to perform linguistic analysis includes the steps of splitting an input text into words and sentences; for each sentence, comparing phrases in the sentence with known phrases stored in a database, as follows: for each word in the sentence, comparing its value and values of words following it with values of words of stored phrases, starting with the longest stored phrase that starts with that word, and working from longest to shortest; in the event a match is found for two or more consecutive words, and considering the words around the phrase, labelling the matched phrase with an overphrase that describes the grammar use of the matched phrase; after the penultimate word has been compared, recasting the sentence by replacing the matched phrases by their respective overphrases; and then repeating the comparison process with the recast sentence until there is no further recasting.02-04-2010
20100023318METHOD AND DEVICE FOR RETRIEVING DATA AND TRANSFORMING SAME INTO QUALITATIVE DATA OF A TEXT-BASED DOCUMENT - Method for extracting information from a data file comprising a first step wherein the data are transmitted to a device (01-28-2010
20120209594METHOD AND AN APPARATUS TO DISAMBIGUATE REQUESTS - A method and an apparatus to disambiguate requests are presented. In one embodiment, the method includes receiving a request for information from a user. Then data is retrieved from a back-end database in response to the request. Based on a predetermined configuration of a disambiguation system and the data retrieved, the ambiguity within the request is dynamically resolved.08-16-2012
20110119049Specializing disambiguation of a natural language expression - Disambiguation of the meaning of a natural language expression proceeds by constructing a natural language expression, and then incrementally specializing the meaning representation to more specific meanings as more information and constraints are obtained, in accordance with one or more specialization hierarchies between semantic descriptors. The method is generalized to disjunctive sets of interpretations that can be specialized hierarchically.05-19-2011
20110119050Method for the automatic determination of context-dependent hidden word distributions - Described is method, the Latent Words Language Model (LWLM), that automatically determines context-dependent word distributions (called hidden or latent words) for each word of a text. The probabilistic word distributions reflect the probability that another word of the vocabulary of a language would occur at that position in the text. Furthermore, a method is described to use these word distributions in statistical language processing applications, such as information extraction applications (for example, semantic role labeling, named entity recognition), automatic machine translation, textual entailment, paraphrasing, information retrieval, and speech recognition.05-19-2011
20110119047Joint disambiguation of the meaning of a natural language expression - At least two ambiguous aspects of the meaning of a natural language expression are disambiguated jointly. In the preferred embodiment, word sense ambiguity, reference ambiguity, and relation ambiguity are resolved simultaneously, finding the disambiguation result(s) that simultaneously optimize the weight of the solution, taking into account semantic information, constraints, and common sense knowledge. Choices are enumerated for each constituent being disambiguated, combinations of choices are constructed and evaluated according to semantic information on which meanings are sensible, and the choices with the best weights are selected, with the enumeration pruned aggressively to reduce computational cost.05-19-2011
20100153094TOPIC MAP BASED INDEXING AND SEARCHING APPARATUS - A topic map based indexing apparatus analyzes community Q/A lists to acquire Q/A analysis information, removes redundant answers depending on the Q/A analysis information, removes insignificant answers based on the degree of reliability, ranks answer lists, and extracts the highest ranking answer as a best answer, to thereby store, in a community Q/A topic map, index information containing the community Q/A lists and the Q/A analysis information. A topic map based searching apparatus analyzes a user question to acquire question analysis information, searches similar questions from community Q/A lists belonging to a specific topic node of a pre-stored community Q/A topic map, ranks the searched similar questions depending on the question analysis information, removes redundant answers among answers to the ranked similar questions, ranks the answers, and extracts the highest ranking answer as a best answer.06-17-2010
20120041757HANDHELD ELECTRONIC DEVICE WITH TEXT DISAMBIGUATION - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device.02-16-2012
20090076799Coreference Resolution In An Ambiguity-Sensitive Natural Language Processing System - Technologies are described herein for coreference resolution in an ambiguity-sensitive natural language processing system. Techniques for integrating reference resolution functionality into a natural language processing system can processes documents to be indexed within an information search and retrieval system. Ambiguity awareness features, as well as ambiguity resolution functionality, can operate in coordination with coreference resolution. Annotation of coreference entities, as well as ambiguous interpretations, can be supported by in-line markup within text content or by external entity maps. Information expressed within documents can be formally organized in terms of facts, or relationships between entities in the text. Expansion can support applying multiple aliases, or ambiguities, to an entity being indexed so that all of the possibly references or interpretations for that entity are captured into the index. Alternative stored descriptions can support retrieval of a fact by either the original description or a coreferential description.03-19-2009
20110307246Methods And Systems For Changing A Communication Quality Of A Communication Session Based On A Meaning Of Speech Data - Methods and systems are described for changing a communication quality of a communication session based on a meaning of speech data. Speech data exchanged between clients participating in a communication session is parsed. A meaning of the parsed speech data is determined. An action is performed to change a communication quality of the communication session based on the meaning of the parsed speech data.12-15-2011
20120041756VOICE RECOGNITION DEVICE - Voice recognition is realized by a pattern matching with a voice pattern model, and when a large number of paraphrased words are required for one facility, such as a name of a hotel or a tourist facility, the pattern matching needs to be performed with the voice pattern models of all the paraphrased words, resulting in an enormous amount of calculation. Further, it is difficult to generate all the paraphrased words, and a large amount of labor is required. A voice recognition device includes: voice recognition means for applying the voice recognition to an input voice by using a language model and an acoustic model, and outputting a predetermined number of recognition results each including a set of a recognition score and a text representation; and N-best candidate rearrangement means for: comparing the recognition result to a morpheme dictionary held in a morpheme dictionary memory; checking whether a representation of the recognition result can be expressed by any one of combinations of the morphemes of the morpheme dictionaries; correcting the recognition score when the representation can be expressed; and rearranging an order according to the corrected recognition score so as to acquire recognition results.02-16-2012
20090276209SYSTEM AND METHOD FOR AUTOMATICALLY PROCESSING CANDIDATE RESUMES AND JOB SPECIFICATIONS EXPRESSED IN NATURAL LANGUAGE INTO A NORMALIZED FORM USING FREQUENCY ANALYSIS - Systems and methods for automatically processing candidate resumes and job specifications expressed in natural language into a normalized form using frequency analysis. A database of elements is provided in which each element is expressed in natural language and at least some of which are associated with a corresponding set of synonymous words or phrases. Candidate resumes and job specifications are received in electronic form and expressed in natural language. The candidate resumes and job specifications are analyzed to extract elements expressed in candidate resumes and job specifications. The extracted elements are compared to the database. For each extracted element, the most frequent element or synonym is identified an used as a common form for the extracted element. A set of candidate resumes is matched with a corresponding job specification by comparing the set of elements expressed in common form for the resumes with the set of elements expressed in common form for the job specification.11-05-2009
20120130705TEXT SEGMENTATION WITH MULTIPLE GRANULARITY LEVELS - Text processing includes: segmenting received text based on a lexicon of smallest semantic units to obtain medium-grained segmentation results; merging the medium-grained segmentation results to obtain coarse-grained segmentation results, the coarse-grained segmentation results having coarser granularity than the medium-grained segmentation results; looking up in the lexicon of smallest semantic units respective search elements that correspond to segments in the medium-grained segmentation results; and forming fine-grained segmentation results based on the respective search elements, the fine-grained segmentation results having finer granularity than the medium-grained segmentation results.05-24-2012
20120130706SYSTEMS AND METHODS FOR CHARACTER CORRECTION IN COMMUNICATION DEVICES - A system and method for character error correction is provided, useful for a user of mobile appliances to produce written text with reduced errors. The system includes an interface, a word prediction engine, a statistical engine, an editing distance calculator, and a selector. A string of characters, known as the inputted word, may be entered into the mobile device via the interface. The word prediction engine may generate word candidates similar to the inputted word using fuzzy logic and user preferences generated from past user behavior. The statistical engine may generate variable error costs determined by the probability of erroneously inputting any given character. The editing distance calculator may determine the editing distance between the inputted word and each of the word candidates by grid comparison using the variable error costs. The selector may choose one or more preferred candidates from the word candidates using the editing distances.05-24-2012
20090306961SEMANTIC RELATIONSHIP-BASED LOCATION DESCRIPTION PARSING - An automated arrangement for parsing location descriptions is provided in which semantic verification is integrated into a parsing process to reduce the generation of false results. The semantic verification involves checking up to three semantic relationships between keywords (i.e., syntactical components) parsed from the location description in a tokenization process to determine if a tokenization result is valid. The semantic relationships include: a) a spatial “part-of” relationship between location keywords; b) a spatial “near-by” relationship; and, c) a spatial “intersect” relationship. The semantic relationships between particular locations may be pre-calculated and stored as extended vocabulary to enable the semantic verification to occur early in the parsing process to thus increase overall parsing efficiency. The results of the parsing are sorted based on a rank score that is derived using the semantic relationships between the locations.12-10-2009
20120253788Augmented Conversational Understanding Agent - An augmented conversational understanding agent may be provided. Upon receiving, by an agent, at least one natural language phrase from a user, a context associated with the at least one natural language phrase may be identified. The natural language phrase may be associated, for example, with a conversation between the user and a second user. An agent action associated with the identified context may be performed according to the at least one natural language phrase and a result associated with performing the action may be displayed.10-04-2012
20090006078METHOD AND SYSTEM FOR NATURAL LANGUAGE DICTIONARY GENERATION - A method and computer system for analyzing a text corpus in a natural language is provided. An initial morphological description having word inflection rules for various groups of words in the natural language is created by a linguist. A plurality of text corpuses are analyzed to obtain information on the occurrence of a plurality of word forms for each word token in each text corpus. A morphological dictionary which contains information about each base form and word inflection rules for each word token with verified hypothesis is generated.01-01-2009
20120004901PHONETIC KEYS FOR THE JAPANESE LANGUAGE - Various embodiments of phonetic keys for the Japanese language are described herein. A Kana rule set is applied to Kana characters provided by a user. The Kana characters are defined in an alphabetic language based on the sound of the Kana characters. A full phonetic key is then generated based on the defined Kana characters. A replaced-vowel phonetic key is generated by replacing a vowel in the full phonetic key and a no-vowel phonetic key is generated by removing the vowel in the full phonetic key. Kana records in a database are then processed to determine a relevant Kana record that has a phonetic key identical to at least one of the full phonetic key, the replaced-vowel phonetic key, and the no-vowel phonetic key. The relevant Kana records are then presented to the user.01-05-2012
20120004905TECHNIQUES FOR CREATING COMPUTER GENERATED NOTES - Text is extracted from and information resource such as documents, emails, relational database tables and other digitized information sources. The extracted text is processed using a decomposition function to create. Nodes are a particular data structure that stores elemental units of information. The nodes can convey meaning because they relate a subject term or phrase to an attribute term or phrase. Removed from the node data structure, the node contents are or can become a text fragment which conveys meaning, i.e., a note. The notes generated from each digital resource are associated with the digital resource from which they are captured. The notes are then stored, organized and presented in several ways which facilitate knowledge acquisition and utilization by a user.01-05-2012
20120004904METHOD AND SYSTEM FOR PROVIDING REPRESENTATIVE PHRASE - A method and system for providing a representative phrase corresponding to a real time (current time) popular keyword. The method and system may extend a representative criterion word, determined by analyzing morphemes of words in documents grouped into a cluster, and may combine the extended representative criterion word and the popular keyword, thereby providing the representative phrases. The method and system may display the popular keyword and the representative phrases on a web page, or the like.01-05-2012
20120004902Computerized Selection for Healthcare Services - A method for producing healthcare data records from graphical inputs by computer users. Includes generating a plurality of user input categories, displaying on a graphical display icons that correspond to a first of the user input categories and receiving a first user selection of a first icon of the plurality of icons, and displaying on the graphical display a plurality of icons that correspond to a second of the user input categories and receiving a second user selection of a second icon of the plurality of icons. The method also includes displaying icons that correspond to a physical target on which the medical action or observation is performed and receiving a third user selection of the physical target, and applying a syntax to populate a data record of the action using the at least two of the first, second, and third user selections.01-05-2012
20100010805RELATIVE DELTA COMPUTATIONS FOR DETERMINING THE MEANING OF LANGUAGE INPUTS - A method for processing language input can include the step of determining at least two possible meanings for a language input. For each possible meaning, a probability that the possible meaning is a correct interpretation of the language input can be determined. At least one relative data computation can be computed based at least in part upon the probabilities. At least one irregularity within the language input can be detected based upon the relative delta computation. The irregularity can include mumble, ambiguous input, and/or compound input. At least one programmatic action can be performed responsive to the detection of the irregularity.01-14-2010
20120046939SYSTEMS AND METHODS FOR GENERATING WEIGHTED FINITE-STATE AUTOMATA REPRESENTING GRAMMARS - A context-free grammar can be represented by a weighted finite-state transducer. This representation can be used to efficiently compile that grammar into a weighted finite-state automaton that accepts the strings allowed by the grammar with the corresponding weights. The rules of a context-free grammar are input. A finite-state automaton is generated from the input rules. Strongly connected components of the finite-state automaton are identified. An automaton is generated for each strongly connected component. A topology that defines a number of states, and that uses active ones of the non-terminal symbols of the context-free grammar as the labels between those states, is defined. The topology is expanded by replacing a transition, and its beginning and end states, with the automaton that includes, as a state, the symbol used as the label on that transition. The topology can be fully expanded or dynamically expanded as required to recognize a particular input string.02-23-2012
20120046937SEMANTIC CLASSIFICATION OF VARIABLE DATA CAMPAIGN INFORMATION - A method and system for semantically classifying variable data campaign information. The method and system include loading, by a processing device, a variable data campaign from a computer readable storage medium operably connected to the processing device; extracting, by the processing device, variable data from the campaign; semantically classifying, by the processing device, the variable data to produce semantically classified variable data; building, by the processing device, a variable data campaign model based upon the semantically classified variable data; and storing, by the processing device, the variable data campaign model in the computer readable storage medium.02-23-2012
20120046936System and method for distributed audience feedback on semantic analysis of media content - A system and computer implemented method of distributed audience feedback of media content in real time or substantially real time, including: semantically analyzing, at a semantic speech analysis engine, media content from a media program and identifying relevant topic data; distributing, at a topic data publisher, the identified relevant topic data to an audience of the media program; collecting, at a server, audience opinions on the identified relevant topic data; and processing the collected audience opinions. Other embodiments are disclosed.02-23-2012
20120010876VOICE INTEGRATION PLATFORM - A voice integration platform and method provide for integration of a voice interface with a data system that includes stored data. The voice integration platform comprises one or more generic software components, the generic software components being configured to enable development of a specific voice user interface that is designed to interact with the data system in order to present the stored data to a user.01-12-2012
20120010875CLASSIFYING TEXT VIA TOPICAL ANALYSIS, FOR APPLICATIONS TO SPEECH RECOGNITION - An assignment device (01-12-2012
20110166853Method and System for Text Retrieval for Computer-Assisted Item Creation - A tool, method, and system for use in the development of sentence-based test items are disclosed. The tool may include a user interface that may include a database selection field, a sentence pattern entry field, an option pane, and an output pane. The tool may search a database for one or more sentences and may generate one or more responses to the one or more sentences. The one or more sentences and one or more responses may be used to produce the sentence-based test items. The tool may allow test items to be developed more quickly and easily than manual test item authoring. Accordingly, test item development costs may be lowered and test security may be enhanced.07-07-2011
20110166852DIALOGUE SYSTEM USING EXTENDED DOMAIN AND NATURAL LANGUAGE RECOGNITION METHOD AND COMPUTER-READABLE MEDIUM THEREOF - A dialogue system uses an extended domain in order to have a dialogue with a user using natural language. If a dialogue pattern actually input by the user is different from a dialogue pattern predicted by an expert, an extended domain generated in real time based on user input is used and an extended domain generated in advance is used to have a dialogue with the user.07-07-2011
20110166851Word-Level Correction of Speech Input - The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.07-07-2011
20110166850CROSS-GUIDED DATA CLUSTERING BASED ON ALIGNMENT BETWEEN DATA DOMAINS - A system and associated method for cross-guided data clustering by aligning target clusters in a target domain to source clusters in a source domain. The cross-guided clustering process takes the target domain and the source domain as inputs. A common word attribute shared by both the target domain and the source domain is a pivot vocabulary, and all other words in both domains are a non-pivot vocabulary. The non-pivot vocabulary is projected onto the pivot vocabulary to improve measurement of similarity between data items. Source centroids representing clusters in the source domain are created and projected to the pivot vocabulary. Target centroids representing clusters in the target domain are initially created by conventional clustering method and then repetitively aligned to converge with the source centroids by use of a cross-domain similarity graph that measures a respective similarity of each target centroid to each source centroid.07-07-2011
20120065959WORD GRAPH - One example embodiment includes a method for constructing a word graph. The method includes obtaining a subject text and dividing the subject text into one or more units. The method also includes dividing the units into one or more sub-units and recording each of the one or more sub-units.03-15-2012
20120016663IDENTIFYING RELATED NAMES - Provided are techniques for identifying related names. A collection of names from different languages is stored, wherein each of the names has a native orthographic form and a romanized form. An input name is received in a known encoding scheme. An alphabet of the input name is determined based on the known encoding scheme. One or more romanized names are generated based on the query name and the determined query name alphabet. Culture-sensitive regularization rules are applied to create an additional romanized name. The one or more romanized names and the additional romanized name are matched against the romanized names in the collection of names from the different languages. Data store records that have romanized names that match the one or more romanized names or the additional romanized name are returned.01-19-2012
20120016662METHOD AND APPARATUS FOR PROCESSING BIOMETRIC INFORMATION USING DISTRIBUTED COMPUTATION - An approach is provided for providing biometric information processing using distributed computation. A biometric information processing infrastructure determines to receive an input including, at least in part, biometric information. The biometric information processing infrastructure selects one or more analyses for processing the input. The biometric information processing infrastructure also determines one or more processes associated with the one or more analyses. The biometric information processing infrastructure further determines to derive one or more computation closures from the one or more processes. The biometric information processing infrastructure determines to decompose the one or more computation closures for distribution in one or more computation spaces.01-19-2012
20120016661SYSTEM, METHOD AND DEVICE FOR INTELLIGENT TEXTUAL CONVERSATION SYSTEM - A method of intelligent textual markup in an information exchange includes: determining semantic elements in said information exchange; determining relations between said semantic elements; representing said semantic elements as nodes in a directed graph; and representing said relations as edges connecting said nodes. A data processing system for enabling a visual representation of semantic relations in an information exchange includes: a semantic analysis engine adapted to determine semantic elements of said information exchange; a relation analysis engine adapted to determine relations between said semantic elements; and a presentation engine adapted to present said semantic elements as nodes and said relations as edges in a directed graph representing said information exchange.01-19-2012
20120022858HANDHELD ELECTRONIC DEVICE AND ASSOCIATED METHOD EMPLOYING A MULTIPLE-AXIS INPUT DEVICE AND PROVIDING A LEARNING FUNCTION IN A TEXT DISAMBIGUATION ENVIRONMENT - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided.01-26-2012
20120022857SYSTEM AND METHOD FOR A COOPERATIVE CONVERSATIONAL VOICE USER INTERFACE - A cooperative conversational voice user interface is provided. The cooperative conversational voice user interface may build upon short-term and long-term shared knowledge to generate one or more explicit and/or implicit hypotheses about an intent of a user utterance. The hypotheses may be ranked based on varying degrees of certainty, and an adaptive response may be generated for the user. Responses may be worded based on the degrees of certainty and to frame an appropriate domain for a subsequent utterance. In one implementation, misrecognitions may be tolerated, and conversational course may be corrected based on subsequent utterances and/or responses.01-26-2012
20120022856Browsing of Contextual Information - Systems and methods for searching and browsing a data store of contextually related data objects. The system includes a search/browse module that receives a search query. The search/browse module identifies data objects that match the search query and generates sentences from data objects that are contextually related to the matching data objects. The sentences are human-readable sentences, for example in subject-verb-object format, where each sentence represents the relationship between two data objects. The sentences are output for display as a hierarchy of sentences. Additionally, a user can browse the data store of contextually related data objects by selecting a sentence that is displayed to the user. The search/browse module then outputs attributes of the data object represented by the sentence for display in two separate regions of a user interface.01-26-2012
20120022855Searching and Browsing of Contextual Information - Systems and methods for searching and browsing a data store of contextually related data objects. The system includes a search/browse module that receives a search query. The search/browse module identifies data objects that match the search query and generates sentences from data objects that are contextually related to the matching data objects. The sentences are human-readable sentences, for example in subject-verb-object format, where each sentence represents the relationship between two data objects. The sentences are output for display as a hierarchy of sentences. Additionally, a user can browse the data store of contextually related data objects by selecting a sentence that is displayed to the user. The search/browse module then outputs attributes of the data object represented by the sentence for display in two separate regions of a user interface.01-26-2012
20120022854INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM - An apparatus and method provide logic for processing information. In one implementation, an apparatus includes a receiving unit configured to receive a selection of displayed content from a user. An obtaining unit is configured to obtain data corresponding to the selection. The data includes text data. An identification unit is configured to identify a keyword within the text data, and a storage unit is configured to generate a command to transmit the keyword to a device.01-26-2012
20110093256Method of disambiguating information - A preferred method and system for disambiguating information are disclosed. In a preferred method, the word elements of homonyms form a plurality of information corpuses which are analyzed by conceptual and/or grammatical relational analysis such as CIRN for identifying successful outcomes and unsuccessful outcomes; wherein said successful outcome involve the proper grammatical classification of the homonym thus leading to identify the correct meaning.04-21-2011
20110093259METHOD AND DEVICE FOR GENERATING VOCABULARY ENTRY FROM ACOUSTIC DATA - A method and a device (04-21-2011
20110093258SYSTEM AND METHOD FOR TEXT CLEANING - A method and system for cleaning an electronic document are provided. The method comprises: identifying at least one sentence in the electronic document; numerically representing features of the sentence to obtain a numeric feature representation associated with the sentence; inputting the numeric feature representation into a machine learning classifier, the machine learning classifier being configured to determine, based on each numeric feature representation, whether the sentence associated with that numeric feature representation is a bad sentence; and removing sentences determined to be bad sentences from the electronic document to create a cleaned document.04-21-2011
20110093257Information retrieval through indentification of prominent notions - A system and method for information retrieval from a corpus of text based on offline prominent sentences extraction, and online prominent sentences retrieval ordered by predefined criteria, and recommending online cross-interest prominent sentences.04-21-2011
20120059647Touchless Texting Exercise - A method, system, and computer program product are provided for touchless texting that enhances user activity. A plurality of graphical images are displayed on a computer display. An exercise motion is detected using a camera, and the motion is resolved to a selected graphical image from the plurality of graphical images. The selected graphical image is entered into an application.03-08-2012
20120065961SPEECH MODEL GENERATING APPARATUS, SPEECH SYNTHESIS APPARATUS, SPEECH MODEL GENERATING PROGRAM PRODUCT, SPEECH SYNTHESIS PROGRAM PRODUCT, SPEECH MODEL GENERATING METHOD, AND SPEECH SYNTHESIS METHOD - According to one embodiment, a speech model generating apparatus includes a spectrum analyzer, a chunker, a parameterizer, a clustering unit, and a model training unit. The spectrum analyzer acquires a speech signal corresponding to text information and calculates a set of spectral coefficients. The chunker acquires boundary information indicating a beginning and an end of linguistic units and chunks the speech signal into linguistic units. The parameterizer calculates a set of spectral trajectory parameters for a trajectory of the spectral trajectory parameters of the linguistic unit on the basis of the spectral coefficients. The clustering unit clusters the spectral trajectory parameters calculated for each of the linguistic units into clusters on the basis of linguistic information. The model training unit obtains a trained spectral trajectory model indicating a characteristic of a cluster based on the spectral trajectory parameters belonging to the same cluster.03-15-2012
20120158400METHODS AND SYSTEMS FOR KNOWLEDGE DISCOVERY - In an aspect, provided is a Natural Language Processing (NLP) workflow engine to analyze text. The engine can combine one or more independent NLP components (e.g. Tokenization, Part of Speech Tagging, Named Entity Recognition) into a meaningful processing workflow.06-21-2012
20120209592STATISTICAL STEMMING - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating suffix rewriting rules. A method includes obtaining a plurality of canonical suffix-rewriting rules each associated with one or more words, generating a suffix tree from the words, selecting a minimum colored subset of the nodes and leaves in the suffix tree, and generating a plurality of final suffix-rewriting rules from the nodes in the minimum colored subset. Another method includes receiving applicable and non-applicable words for a suffix-rewriting rule, generating a suffix tree from the applicable words and the non-applicable words, selecting a minimum colored subset of the nodes and leaves in the suffix tree, and generating a plurality of suffix-rewriting rules, wherein each rule corresponds to a node in the minimum colored subset with a valid status.08-16-2012
20120209593HANDHELD ELECTRONIC DEVICE WITH REDUCED KEYBOARD AND ASSOCIATED METHOD OF PROVIDING QUICK TEXT ENTRY IN A MESSAGE - An improved handheld electronic device having a reduced keyboard provides facilitated language entry by making available to a user certain words that a user may reasonably be expected to enter. In some situations, certain words can be stored, for example, in a temporary dictionary for use in particular situations. For instance, the names of the recipients of an electronic message might be stored in a temporary dictionary for rapid retrieval when entering a salutation in the message. As another example, a number of the words in an existing electronic message may be stored in a temporary dictionary and made available to a user when replying to or forwarding the message since the existing message might include words that the user might reasonably be expected to type in the reply message or the forwarded message.08-16-2012
20120065962Systems and Methods of Building and Using Custom Word Lists - Standard word lists that are often used for such operations as predictive text, spell checking, and word completion are based on general linguistic data that might not accurately reflect actual text usage patterns of particular users. Systems and methods of building and using a custom word list for use in text operations on an electronic device are provided. A collection of text items associated with a user of the electronic device is scanned to identify words in the text items. A weighting is then assigned to each identified word, and the words and corresponding weightings are stored.03-15-2012
20110071819APPARATUS, SYSTEM, AND METHOD FOR NATURAL LANGUAGE PROCESSING - Various embodiments are described for searching and retrieving documents based on a natural language input. A computer-implemented natural language processor electronically receives a natural language input phrase from an interface device. The natural language processor attributes a concept to the phrase with the natural language processor. The natural language processor searches a database for a set of documents to identify one or more documents associated with the attributed concept to be included in a response to the natural language input phrase. The natural language processor maintains the concepts during an interactive session with the natural language processor. The natural language processor resolves ambiguous input patterns in the natural language input phrase with the natural language processor. The natural language processor includes a processor, a memory and/or storage component, and an input/output device.03-24-2011
20120072204SYSTEMS AND METHODS FOR NORMALIZING INPUT MEDIA - A method and system for processing input media for provision to a text to speech engine comprising: a rules engine configured to maintain and update rules for processing the input media; a pre-parsing filter module configured to determine one or more metadata attributes using pre-parsing rules; a parsing filter module configured to identify content component from the input media using the parsing rules; a context and language detector configured to determine a default context and a default language; a learning agent configured to divide the content component into units of interest; a tagging module configured to iteratively assign tags to the units of interest using the tagging rules, wherein each tag is associated with a post-parsing rule; a post-parsing filter module configured to modify the content component by executing the post-parsing rules identified by the tags assigned to the phrases and strings. The context and language detector, tagging module, learning agent and post-parsing filter module are configured to iteratively process the content component and modifications thereto until there are no further modifications or a threshold number of iterations are performed.03-22-2012
20100042403CONTEXT BASED ONLINE ADVERTISING - A software and/or hardware facility for inferring user context and delivering advertisements, such as coupons, using natural language and/or sentiment analysis is disclosed. The facility may infer context information based on a user's emotional state, attitude, needs, or intent from the user's interaction with or through a mobile device. The facility may then determine whether it is appropriate to deliver an advertisement to the user and select an advertisement for delivery. The facility may also determine an appropriate expiration time and/or discount amount for the advertisement.02-18-2010
20130185059Method and System for Automatically Detecting Morphemes in a Task Classification System Using Lattices - The invention concerns a method and corresponding system for building a phonotactic model for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model, detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting the detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system during the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication.07-18-2013
20130185056SYSTEM FOR GENERATING TEST SCENARIOS AND TEST CONDITIONS AND EXPECTED RESULTS - A requirements testing system facilitates the review and analysis of requirement statements for software applications. The requirements testing system automatically generates test artifacts from the requirement statements, including test scenarios, test conditions, test hints, and expected results. These test artifacts characterize the requirements statements to provide valuable analysis information that aids understanding what the intentions of the requirement statements are. The automation of the generation of these test artifacts produces numerous benefits, including fewer errors, objectivity, and no dependency on the skills and experience of a creator.07-18-2013
20110087485NET MODERATOR - A method and an apparatus for moderating an inappropriate relationship between two parties by analyzing a dialog between the two parties. The method and apparatus creates an alert depending upon the nature of the dialog between the two parties. The alert is sent to a third party who can moderate the relationship between the two parties. The third party can ban or block the dialog between the two parties based upon the inappropriate relationship between the two parties. A banning or block of the dialog between the two parties can also be automated.04-14-2011
20130185054TECHNIQUES FOR INSERTING DIACRITICAL MARKS TO TEXT INPUT VIA A USER DEVICE - A computer-implemented method for assisting a user to input Vietnamese text to a user device lacking a subset of characters in a Vietnamese alphabet includes receiving a character input by a user, determining three words previously input by the user, the three words having already had diacritical marks inserted, transmitting the three words and the character to a server via a network, receiving first and second information corresponding to the character from the server via the network, the first and second information generated at the server based on a context of the three words, the context determined at the server using a language model, the first and second information indicating whether the character requires a diacritical mark and a specific diacritical mark, respectively, generating a modified character comprising a character in the Vietnamese alphabet based on the character and the first and second information, and displaying the modified character.07-18-2013
20120158399SAMPLE CLUSTERING TO REDUCE MANUAL TRANSCRIPTIONS IN SPEECH RECOGNITION SYSTEM - Techniques for grouping a plurality of samples automatically transcribed from a plurality of utterances. The method comprises forming clusters from the plurality of samples, wherein the clusters include two or more of the plurality of samples. One or more samples are selected from a cluster and manually-processed data samples for the one or more samples are obtained. A weighting factor may be assigned to the data samples based, at least in part, on the number of samples in the cluster associated with the selected data sample.06-21-2012
20110066424Text Stitching From Multiple Images - A reading machine has processing for detecting common text between a pair of individual images. The reading machine combines the text from the pair of images into a file or data structure if common text is detected, and determines if incomplete text phrases are present in the common text. If incomplete text phrases are present, the machine signals a user to move an image input device in a direction to capture more of the text.03-17-2011
20120130707Linguistic Assistance Systems And Methods - System and Methods determine a linguistic preference between two or more phrases. Each of the phrases is submitted to at least one search engine as a search string. Search results are retrieved from each of the at least one search engine for each submitted search string and total hit values of each search result are compared. One of the two or more phrases associated with the greatest total hit value are displayed to a user as the preferred phrase.05-24-2012
20110106528Method and System to Automatically Change or Update the Configuration or Setting of a Communication System - A method and device for automatically changing or updating a configuration or setting of a communication system is disclosed. In one aspect, the method includes providing information to the communication system, the information comprising natural human language, storing the information in a digital storage device, detecting a triggering event in the information, and changing the configuration or setting of the communication system automatically using a processor. The information is an input to the communication system, an input from at least one alternate communication system, or a combination of an input to the communication system and an input from the at least one alternate communication system.05-05-2011
20110106526METHOD AND APPARATUS FOR PRUNING SIDE INFORMATION FOR GRAMMAR-BASED COMPRESSION - A computer-implemented method for generating side information for grammar-based data compression systems, such as YK compression systems, is described. An admissible grammar (G) for an input sequence (A(s05-05-2011
20110106527Method and Apparatus for Adapting a Voice Extensible Markup Language-enabled Voice System for Natural Speech Recognition and System Response - A system for analyzing natural language spoken through a voice recognition system comprising: a language separator for separating a natural language expression into multiple word segments; and a grammar module for creating XML-based description sets or binary sets using word segments as input. In a preferred embodiment, the word segments are further processed as class objects and then organized according to original spoken order and wherein content fields are created to contain the class objects for comparison during voice interaction using the voice recognition system.05-05-2011
20090083026Summarizing document with marked points - A summary of a text document may be presented in the form of a list of points. A summary of text can be created by choosing words or groups of words from the original text, by modifying words in the original text, etc. Collections of the chosen words can be presented in a list form together with a mark that indicates that the text is a list of words that might not form complete sentences. Presentation of a summary in list form may lower a reader's expectation as to readability issues such as sentence flow, word flow, etc., and thus the reader may be more accepting of a machine-generated summary presented in list form than of a machine generated summary presented as sentences or paragraphs.03-26-2009
20100094618Transcription data extraction - A computer program product, for performing data determination from medical record transcriptions, resides on a computer-readable medium and includes computer-readable instructions for causing a computer to obtain medical transcription of a dictation, the dictation being from medical personnel and concerning a patient, analyze the transcription for an indicating phrase associated with a type of data desired to be determined from the transcription, the type of desired data being relevant to medical records, determine whether data indicated by text disposed proximately to the indicating phrase is of the desired type, and store an indication of the data if the data is of the desired type.04-15-2010
20120284016TEXT MINING METHOD, TEXT MINING DEVICE AND TEXT MINING PROGRAM - Disclosed are a text mining method, device, and program capable of performing text mining with a specific topic as an object with high precision. An element identification unit calculates a feature degree, which is an index for indicating a degree that within a text set of interest, which is a set of text that is to be analyzed, an element of the text appears. An output unit identifies distinctive elements within the text set of interest on the basis of the calculated feature degree and outputs the identified elements. The element identification unit corrects the feature degree on the basis of a topic relatedness degree, which is a value indicating a degree related to a topic of analysis, which is a topic for which each text portion of the text being analyzed has been partitioned into predetermined units that are to be analyzed.11-08-2012
20120316866SYSTEM AND METHOD OF PROVIDING A SPOKEN DIALOG INTERFACE TO A WEBSITE - Disclosed is a system and method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes converting data from a structured database associated with a website to a structured text data set and a structured task knowledge base, extracting linguistic items from the structured database, and training a spoken dialog service component using at least one of the structured text data, the structured task knowledge base, or the linguistic items. The system includes modules configured to implement the method.12-13-2012
20120316865INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM - An information processing apparatus performs topic analysis on one or more collected documents to calculate a probability indicating the degree of fitness of each sentence constituting the collected document for each item of a local topic, performs linguistic analysis on the collected document to detect a unique expression pattern in each item of the local topic, sets topic usefulness for each sentence constituting the collected document on the basis of evaluation of the sentence by an evaluator, sets a total evaluation value with respect to each item of the local topic on the basis of the topic analysis result and the topic usefulness, selects an item of the local topic on the basis of the total evaluation values, and extracts an appropriate sentence for a unique expression pattern in the selected item of the local topic from the collected document as a profound text candidate.12-13-2012
20120316864READING ORDER DETERMINATION APPARATUS, METHOD, AND PROGRAM FOR DETERMINING READING ORDER OF CHARACTERS - A method and apparatus for determining a reading order of characters The method includes preparing a list of character information, which is character information extracted from image data by character recognition processing and preparing a list of line information, which is made up of a line box surrounding a set of characters which are continuously aligned in the same direction in image data and an alignment direction of characters in the line box. In response to a request for adding character information to the list of character information, extracting a line box containing a character region of the character to be added, obtaining all character information having the character region contained in the concerned line box from the list of character information and rearranging according to the position with respect to the alignment direction of characters corresponding to the line box to determine a new reading order of characters.12-13-2012
20120123768METHOD AND APPARATUS FOR DETERMINING TEXT PASSAGE SIMILARITY - According to one embodiment of the invention, a method classifying a number of noun phrases in a first text passage and a second text passage into a number of classifications. The method also includes determining a similarity between a noun phrase from the first text passage and a noun phase from the second text passage for each of the noun phrases of a same classification. Additionally, a similarity between a sentence from the first text passage and a sentence from the second text passage is determined for each of the sentences in the first and second text passages based on similarities between the noun phrases. The method also includes determining a similarity between the first text passage and the second text passage based on a similarity between sentences.05-17-2012
20120123767AUTOMATICALLY ASSESSING DOCUMENT QUALITY FOR DOMAIN-SPECIFIC DOCUMENTATION - Methods and arrangements for document quality assessment. Documents are accepted and a quality specification containing predetermined quality criteria is assimilated. Each document is assessed based on the predetermined quality criteria, and a quality score is assigned to each document, the quality score being a function of positive and negative attributes assessed for each document.05-17-2012
20090132236SELECTION OR RELIABLE KEY WORDS FROM UNRELIABLE SOURCES IN A SYSTEM AND METHOD FOR CONDUCTING A SEARCH - The invention provides for a system to select data including a reception component that receives at least one data entry from at least one data source, a processor component to determine the entropy of a word extracted from the at least one data entry, a filtering component to select reliable words, wherein reliable words are words with low entropy values, the filtering component further excluding words with high entropy values, and a transmission component to output a set of reliable words, wherein the set of reliable words is associated with the at least one data entry from which the reliable words were extracted.05-21-2009
20120166179SYSTEM AND METHOD FOR CLASSIFYING COMMUNICATIONS THAT HAVE LOW LEXICAL CONTENT AND/OR HIGH CONTEXTUAL CONTENT INTO GROUPS USING TOPICS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for identifying document topics. A system configured to practice the method receives a document from a corpus of documents, learns interpersonal relationships of users associated with the document, performs a lexical analysis of the document, and, based on the interpersonal relationships of the users and the lexical analysis, identifying a topic for the document. The approaches disclosed herein can integrate user-people relationships to identify topics for documents with low lexical or high contextual content. The system can learn this user-people relationship from context. The system uses this learned behavior to identify communication documents correctly. Another aspect is the separation of the two phases. The system overlays the learned model on the lexical topic analysis, allowing the system to capture user-defined topics and user behavior that is learned from other factors such as medium (calls, events, etc) or user preferences.06-28-2012
20100250237INTERACTIVE MANUAL, SYSTEM AND METHOD FOR VEHICLES AND OTHER COMPLEX EQUIPMENT - A method and system of providing an interactive manual, including a speech engine to receive and process speech from a user, convert the speech into a word sequence, and identify meaning structures from the word sequence, a structured manual including information related to an operation of a device, a visual model to relate visual representation of the information, a dialog management arrangement to interpret the meaning structures in a context and to extract pertinent information and the visual representation from the structured manual and the visual model, and an output arrangement to output the information and visual representation.09-30-2010
20100250235TEXT ANALYSIS USING PHRASE DEFINITIONS AND CONTAINERS - In one example, a phrase analyzer may analyze a text input stream to identify phrases contained in the text input stream. The phrase analyzer may receive a specification, which includes dictionaries of phrases and synonyms, and a specification of the phrases, or sequences of phrases to be matched. The phrase analyzer may compare the input stream to the specification and may produce, as output, an identification of which phrases appear in the input stream, and where in the input stream those phrases occur.09-30-2010
20100250238Lexical Association Metric for Knowledge-Free Extraction of Phrasal Terms - A method and system for determining a lexical association of phrasal terms are described. A corpus having a plurality of words is received, and a plurality of contexts including one or more context words proximate to a word in the corpus is determined. An occurrence count for each context is determined, and a global rank is assigned based on the occurrence count. Similarly, a number of occurrences of a word being used in a context is determined, and a local rank is assigned to the word-context pair based on the number of occurrences. A rank ratio is then determined for each word-context pair. A rank ratio is equal to the global rank divided by the local rank for a word-context pair. A mutual rank ratio is determined by multiplying the rank ratios corresponding to a phrase. The mutual rank ratio is used to identify phrasal terms in the corpus.09-30-2010
20100250236COMPUTER-ASSISTED ABSTRACTION OF DATA AND DOCUMENT CODING - A computer-assisted method of abstracting and coding data includes receiving one or more documents is disclosed. The methods and systems extract information from a record based on extraction rules that correspond to an identified record type, determine codes corresponding to the information extracted from the record, present the correspondence between the extracted information and the codes, receive from the user-input device a validation of the correspondence between the extracted information and one of the codes, and output a report including the validated information and the validated code.09-30-2010
20120084076CONTEXT-BASED DISAMBIGUATION OF ACRONYMS AND ABBREVIATIONS - Context-based disambiguation of acronyms and/or abbreviations may determine a target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage, the target abbreviation representing a shortened form of one or more word. A contextual search query including the target abbreviation and said one or more keywords may be generated. A pseudo document index may be searched for one or more expansions of the target abbreviation by invoking the contextual search query, the pseudo document index containing index of one or more pseudo documents, associated one or more abbreviations and associated context keywords. One or more pseudo documents associated with the target abbreviation may be returned based on the searching of the pseudo document index.04-05-2012
20120084075CHARACTER INPUT APPARATUS EQUIPPED WITH AUTO-COMPLETE FUNCTION, METHOD OF CONTROLLING THE CHARACTER INPUT APPARATUS, AND STORAGE MEDIUM - A character input apparatus which makes it possible to suppress degradation of use-friendliness in a case where a visually disabled user inputs characters using an auto-complete function. In the character string input apparatus, a character string to be input as a portion following a character string input by a user is predicted based on the character string input by the user, and the character string input by the user is completed using the predicted character string as a portion complementary thereto. In a voice guidance mode, information associated with a key selected by the user is read aloud by voice. When the voice guidance mode is enabled, the character string input apparatus disables the auto-complete function and performs control such that a character string cannot be automatically completed.04-05-2012
20120084074Association Of Semantic Meaning With Data Elements Using Data Definition Tags - Technology is described for associating semantic meaning with data elements. The system can include a messaging module configured to receive a message having data elements. A storage module can store the data elements from the message in a structured format. A message dictionary can be configured to identify a type of the message received and to lexically identify data elements of the message using the message dictionary and the type of message. In addition, a taxonomy module can be configured to provide a semantic meaning for the data elements of the lexically identified portions of the message. Further, a data definition tag repository can store data definition tags and link the message dictionary, the taxonomy, and storage location of the data elements in the storage module. The data definition tags can enable the semantic meaning of data elements to be queried.04-05-2012
20120130708INFORMATION PROCESSOR - An information processor includes a keyword registration means for accepting an input of a keyword composed of a predetermined character string and storing the accepted keyword in a storage device; and a content display means for displaying externally acquired content on a display device. The content display means is configured to display the content on the display device by replacing a character string in a preset range containing the keyword with other display data if the keyword stored in the storage device exists in character information contained in the content.05-24-2012
20120166180Compassion, Variety and Cohesion For Methods Of Text Analytics, Writing, Search, User Interfaces - The present invention increases precision and recall of search engines, while decreasing hardware resources needed, using musical rhythmic analysis to detect sentiment and emotion, using poetic and metaphoric resonances with dictionary meanings, to annotate, distinguish and summarize n-grams of word meanings, then intersecting n-grams to locate mutually salient sentences, using metaphor salience analysis to cluster sentences and paragraphs into automatically named concepts, automatically characterizing quality and depth to which documents describe concepts, using editorial metrics of compassion, variety of perspectives and logical cohesion, to automatically set pricing for written works and their copyrights, and to monitor blogs and social media for newly important concepts, and provide advanced user interfaces.06-28-2012
20120166182Autocompletion for Partially Entered Query - A server system receives, respectively, a first character string from a first user and a second character string from a second user. There are one or more differences between the first and second character strings. The server system obtains from a plurality of previously submitted complete queries, respectively, a first set of predicted complete queries corresponding to the first character string and a second set of predicted complete queries corresponding to the second character string. There are one or more identical queries in both the first and second sets. The server system conveys at least a first subset of the first set to the first user and at least a second subset of the second set to the second user. Both the first subset and the second subset include a respective identical query.06-28-2012
20120166178SYSTEMS AND METHODS FOR MODEL-BASED PROCESSING OF LINGUISTIC USER INPUTS - The present invention includes model-based processing of linguistic user inputs. In one embodiment, the present invention includes a computer-implemented method comprising receiving linguistic inputs, parsing the linguistic inputs, mapping the linguistic inputs to a formal representation used by a model, applying the formal representation against the model, where the model comprises said formal representation, and where the model specifies relationships between the elements of the formal representation and defines process information, and accessing software resources based on the formal representation of the user input and the relationships and process information in said model.06-28-2012
20120166177SYSTEMS AND METHODS FOR ACCESSING APPLICATIONS BASED ON USER INTENT MODELING - In one embodiment, the present invention includes a computer-implemented method comprising storing information in a datastore, the information corresponding to a plurality of computer applications, wherein the plurality of computer applications have associated annotations, receiving an input from a user, providing a first verb and a first noun corresponding to a user intent based on said input, and specifying one or more of said plurality of applications based on the verb and noun annotations for the plurality of applications and the first verb and first noun corresponding to the user intent. The annotations comprise a verb describing one or more activities performed by an associated application and a noun describing work objects on which the activities are performed. Users access the applications in the datastore.06-28-2012
20100204984VIRTUAL PET SYSTEM, METHOD AND APPARATUS FOR VIRTUAL PET CHATTING - A virtual pet system includes: a virtual pet client, adapted to receive a sentence in natural language and send the sentence to a Q&A server; the Q&A server, adapted to receive the sentence, process the sentence through natural language comprehension, generate an answer in natural language based on a result of natural language comprehension and reasoning knowledge, and send the answer in natural language to the virtual pet client. A method for virtual pet chatting includes: receiving a sentence in natural language, perform natural language comprehension for the sentence, and generating an answer in natural language based on a result of natural language comprehension and reasoning knowledge. A Q&A server includes: a sentence comprehension engine unit, adapted to process a received sentence in natural language through natural language comprehension, and send a result of natural language comprehension to a reasoning engine unit; the reasoning engine unit, adapted to generate an answer in natural language based on reasoning knowledge and the result of natural language comprehension, and send the answer in natural language; a knowledge base, adapted to store the reasoning knowledge.08-12-2010
20100204983Method and System for Extracting Web Query Interfaces - A computer program product being embodied on a computer readable medium for extracting semantic information about a plurality of documents being accessible via a computer network, the computer program product including computer-executable instructions for: generating a plurality of tokens from at least one of the documents, each token being indicative of a displayed item and a corresponding position; and, constructing at least one parse tree indicative of a semantic structure of the at least one document from the tokens dependently upon a grammar being indicative of presentation conventions.08-12-2010
20100204982System and Method for Generating Data for Complex Statistical Modeling for use in Dialog Systems - Embodiments of a dialog system that utilizes grammar-based labeling scheme to generate labeled sentences for use in training statistical models. During the process of training data development, a grammar is constructed manually based on the application domain or adapted from a general grammar rule. An annotation schema is created accordingly based on the application requirements, such as syntactic and semantic information. Such information is then included in the grammar specification. After the labeled grammar is constructed, a generation algorithm is then used to generate sentences for training various statistical models.08-12-2010
20120136650SUGGESTING SPELLING CORRECTIONS FOR PERSONAL NAMES - Personal name spelling correction suggestion technique embodiments are presented which provide suggestions for alternate spellings of a personal name. This involves creating a personal name directory which can be queried to suggest spelling corrections for personal names. A hash function that maps any personal name in a particular language and misspellings thereof to similar binary codewords is used to produce one or more binary codewords for each personal name in the directory. The same hash function is used to produce one or more binary codewords from a personal name presented in a query. The personal name directory is employed to identify up to a prescribed number of personal names, each of which has one or more associated binary codewords that are similar to one or more of the binary codewords produced from the personal name query. The identified personal names are suggested as alternate names for the query personal name.05-31-2012
20120136651ONE-ROW KEYBOARD AND APPROXIMATE TYPING - In one aspect, the present invention comprises an apparatus for character entry on an electronic device, comprising: a keyboard with one row of keys; and an electronic display device in communication with the keyboard; wherein one or more keys on the keyboard has a correspondence with a plurality of characters, and wherein the correspondence enables QWERTY-based typing. In another aspect, the invention comprises an apparatus for character entry on an electronic device, comprising: a keyboard with a plurality of keys; and an electronic display device in communication with the keyboard; wherein one or more keys on the keyboard has a correspondence with a plurality of characters, and wherein, for each of the one or more keys, the plurality of characters comprises: (a) a home row character associated with a particular finger when touch typing; and (b) a non-home-row character associated with the particular finger when touch typing.05-31-2012
20120136652METHOD, A COMPUTER PROGRAM AND APPARATUS FOR ANALYZING SYMBOLS IN A COMPUTER - The invention provides a computer-implemented method of analyzing symbols in a computer system, the symbols conforming to a specification for the symbols, in which the specification has been codified into a set of computer-readable rules; and, the symbols analyzed using the computer-readable rules to obtain patterns of the symbols by determining the path that is taken by the symbols through the rules that successfully terminates, and grouping the symbols according to said paths, the method comprising; upon receipt of a message at a computer, performing a lexical analysis of the message; and, in dependence on lexical analysis of the message assigning the message to one of the groups identified according to said paths. The invention also provides a computer programmed to perform the method and a computer program comprising program instructions for causing a computer to perform the method.05-31-2012
20120136649Natural Language Interface - The present disclosure involves systems, software, and computer implemented methods for providing a natural language interface for searching a database. One process includes operations for receiving a natural language query. One or more tokens contained in the natural language query are identified. A set of sentences is generated based on the identified tokens, each sentence representing a possible logical interpretation of the natural language query and including a combination of at least one of the identified tokens. At least one sentence in the set of sentences is selected for searching a database based on the identified tokens.05-31-2012
20110184729APPARATUS AND METHOD FOR EXTRACTING AND ANALYZING OPINION IN WEB DOCUMENT - The present invention deals with an apparatus and method for extracting and analyzing opinions in web documents, wherein automatic extraction and analysis are performed effectively on user opinion information from web documents that are scattered across many websites on the Internet so that opinion search services may be easily implemented which enable search and statistical results to be checked as affirmative/negative opinions, and opinion search users can easily implement a system that helps in searching and monitoring the opinions of other users with respect to a specific keyword. In addition, according to the present invention, marketing representatives and stock inventors and corporate value assessors of each company can quickly check the opinions of many users about an applicable corporation and goods that exist on the vast Internet, and expenses that used to be spent on questionnaires and consulting companies to find opinions of existing users can be greatly reduced, and opinion extraction and statistics for each user can be effectively performed and utilized.07-28-2011
20110184728HANDHELD ELECTRONIC DEVICE AND METHOD FOR DISAMBIGUATION OF COMPOUND TEXT INPUT AND FOR PRIORITIZING COMPOUND LANGUAGE SOLUTIONS ACCORDING TO COMPLETENESS OF TEXT COMPONENTS - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to prioritize compound language solutions according to various criteria, including the degree of completeness of the text components of a compound language solution.07-28-2011
20110184727Prose style morphing - This invention is a system and method for incrementally and multi-dimensionally adjusting prose style. It comprises: a prose input interface, through which the user inputs, or otherwise selects, prose; a multi-dimensional style-adjusting interface, through which the user incrementally adjusts multiple dimensions of prose style; and a style-morphing engine which executes the adjustments specified by the user through the multi-dimensional style-adjusting interface. The dimensions of prose style to be adjusted may be selected from the group consisting of: person perspective; tense; voice; length; vocabulary; formality; colloquiality; complexity; emotion; emoticons; color; font; romantic; positivity; strength; precision; certainty; alliteration; humor; nationality; regionality; gender specificity; obscenity filter; academic jargon; business jargon; legal jargon; medical jargon; scientific jargon; and connectivity jargon. In an example, the style-morphing engine may include a database of sets of phrase synonyms and may use this database to make phrase substitutions that incrementally and multi-dimensionally change the style of the prose. In another example, the style-morphing engine may include a semantic algorithm or Natural Language Processor (NLP) that identifies phrases with similar meanings but different values across different style dimensions and may use it to make phrase substitutions that incrementally and multi-dimensionally change the style of the prose. This invention that enables incremental and multi-dimensional adjustment of prose style has a wide variety of useful applications.07-28-2011
20110184726Morphing text by splicing end-compatible segments - This invention is a method for “text morphing,” wherein text morphing involves integrating or blending together substantive content from two or more bodies of text into a single body of text based on locations of linguistic commonality among the two or more bodies of text. This method entails: identifying pairs of “Synonym-Different-Synonym” (SDS) text segments between an import body of text and an export body of text; and, for each selected pair of SDS text segments, substituting some or all of the SDS text segment from the export body of text for some or all of the SDS text segment in the import body of text. In some respects, this method is analogous to splicing and substituting gene segments with compatible starting and ending sequences, but different middle sequences. Text morphing as disclosed herein can be useful for creative ideation, product development, integrative search engines, and entertainment purposes.07-28-2011
20110184724SPEECH RECOGNITION - Presented is a method and system for speech recognition. The method includes determining noise level in an environment, comparing the determined noise level with a predetermined noise level threshold value, using a first set of grammar for speech recognition, if the determined noise level is below the predetermined noise level threshold value, and using a second set of grammar for speech recognition, if the determined noise level is above the predetermined noise level threshold value.07-28-2011
20100174526SYSTEM AND METHODS FOR QUANTITATIVE ASSESSMENT OF INFORMATION IN NATURAL LANGUAGE CONTENTS - A method is disclosed for quantitatively assessing information in natural language contents related to an object name. The method includes identifying a sentence in a document, determining a subject and a predicate in the sentence, and retrieving an object-specific data set related to the object name. The object-specific data set includes property names and association-strength values. Each property name is associated with an association-strength value. The method also includes identifying a first property name in the property names that matches the subject, assigning a first association-strength value associated with the first property name to the subject, identifying a second property name in the property names that matches the predicate, assigning a second association-strength value associated with the second property name to the predicate, and multiplying the first association-strength value and the second association-strength value to produce a sentence information index.07-08-2010
20100174527DICTIONARY REGISTERING SYSTEM, DICTIONARY REGISTERING METHOD, AND DICTIONARY REGISTERING PROGRAM - There is provided a dictionary registration system which makes it possible to register a word into a user dictionary while minimizing an adverse effect that the word may have on natural language processing, if any. The dictionary registration system performs natural language processing by using a user dictionary, and includes a data processing apparatus that performs the natural language processing by managing and using the user dictionary and a storage apparatus that retains system dictionary information and user dictionary information for use in the natural language processing. The storage apparatus includes the system dictionary information for use in the natural language processing, and the user dictionary. The data processing apparatus includes: a word information registering init that registers information on an input word into the user dictionary; a difference creating unit that creates differences in a result of processing between a first result of processing when the natural language processing is performed, by using the system dictionary, information and a second result of processing when the natural language processing is performed by using the system dictionary information and the user dictionary information; a correct-incorrect accepting unit that accepts correct-incorrect judgments as to whether changes from the first result of processing to the second result of processing are correct or incorrect, the changes corresponding to the differences created by the difference creating unit; and dictionary registration unit that registers registration information on the accepted word into the user dictionary along with part or all of pairs of the correct-incorrect judgments accepted and input sentences from which the differences given the respective correct-incorrect judgments are created.07-08-2010
20100049503Method and apparatus for processing natural language using tape-intersection - Operations for weighted and non-weighted multi-tape automata are described for use in natural language processing tasks such as morphological analysis, disambiguation, and entity extraction.02-25-2010
20100049502METHOD AND SYSTEM OF GENERATING REFERENCE VARIATIONS FOR DIRECTORY ASSISTANCE DATA - Methods and systems of performing user input recognition are disclosed. A digital directory comprising listings is accessed. Metadata information is associated with individual listings describing the individual listings. The metadata information is modified to generate transformed metadata information. Therefore, the transformed metadata information is generated as a function of context information relating to a typical user interaction with the listings. Information is generated for aiding in an automated user input recognition process based on the transformed metadata information.02-25-2010
20100049501DYNAMIC SPEECH SHARPENING - An enhanced system for speech interpretation is provided. The system may include receiving a user verbalization and generating one or more preliminary interpretations of the verbalization by identifying one or more phonemes in the verbalization. An acoustic grammar may be used to map the phonemes to syllables or words, and the acoustic grammar may include one or more linking elements to reduce a search space associated with the grammar. The preliminary interpretations may be subject to various post-processing techniques to sharpen accuracy of the preliminary interpretation. A heuristic model may assign weights to various parameters based on a context, a user profile, or other domain knowledge. A probable interpretation may be identified based on a confidence score for each of a set of candidate interpretations generated by the heuristic model. The model may be augmented or updated based on various information associated with the interpretation of the verbalization.02-25-2010
20120215523TIME-SERIES ANALYSIS OF KEYWORDS - Processing for a time-series analysis of keywords comprises clustering or classifying pieces of document data, each of which is description of a phenomenon in a natural language, on the basis of frequencies of occurrence of keywords in the pieces of document data, individual keywords being also clustered or classified by clustering or classifying the pieces of document data, and performing a time-series analysis of frequencies of occurrence of pieces of document data containing individual keywords in clusters or classes into which the pieces of document data are clustered or classified or a time-series analysis of frequencies of occurrence of pieces of document data containing clusters or classes into which the individual keywords are clustered or classified. Frequency distribution showing variation of the frequencies of occurrence of the pieces of document data is acquired by the time-series analysis.08-23-2012
20120215522HANDHELD ELECTRONIC DEVICE PROVIDING A LEARNING FUNCTION TO FACILITATE CORRECTION OF ERRONEOUS TEXT ENTRY, AND ASSOCIATED METHOD - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate text input. In addition to identifying and outputting representations of language objects that are stored in the memory and that correspond with a text input, the device provides a learning function which facilitates providing proposed corrected output by the device in certain circumstances of erroneous input.08-23-2012
20120221324Document Processing Apparatus - In a text document processing apparatus, there is provided standard knowledge network data composed of networked phrases having strong mutual relation to each other, the phrases being selected from a knowledge field including contents of a text document to be examined. In addition, there is provided a document knowledge preparing function that prepares knowledge network data of the document to be examined, the knowledge network data being composed of networked phrases having strong mutual relation to each other, the phrases being selected from the text document. Further, a processing unit that checks a specified word constituting the knowledge network data of the document to be examined and a standard knowledge network data, and in a case when information of phrases which are networked to the specified word are different from each other, outputs difference information including information of the specified word.08-30-2012
20110208509SYSTEM AND METHOD FOR THE TRANSFORMATION AND CANONICALIZATION OF SEMANTICALLY STRUCTURED DATA - A method of transforming and canonicalizing semantically structured data includes obtaining data from a network of computers, applying text patterns to the obtained data and placing the data in a first data file, providing a second data file containing the obtained data in a uniform format, and generating interface specific sentences from the data in the second data file.08-25-2011
20100299141Document Based Character Ambiguity Resolution - Methods and apparatus for document based ambiguous character resolution. An application searches a document for words that do not contain ambiguous characters and adds them to a dictionary, then searches the document for words that do contain ambiguous characters. For each ambiguous word, a set of candidate solutions is created by resolving the ambiguous characters in all possible ways. The dictionary is searched for words matching members of the candidate solution set. When a single member is matched, the ambiguous characters are resolved accordingly. When no member or more than one member is matched, a user is prompted to resolve the ambiguous characters. Alternatively, when more than one member is matched, the ambiguous characters are resolved to obtain the largest word, the smallest word, the most words, or the fewest words.11-25-2010
20100299139METHOD FOR PROCESSING NATURAL LANGUAGE QUESTIONS AND APPARATUS THEREOF - A method and an apparatus for selecting an answer to a natural language question. The method includes: detecting a named entity in the natural language question; extracting information related to an answer from the natural language question; searching in linked data according to the detected named entity; generating a candidate answer according to a search result; parsing the candidate answer according to the information related to the answer; and obtaining a value of a feature of the candidate answer; and evaluating each candidate answer by synthesizing the value of the feature of the candidate answer.11-25-2010
20120179454APPARATUS AND METHOD FOR AUTOMATICALLY GENERATING GRAMMAR FOR USE IN PROCESSING NATURAL LANGUAGE - Provided is an apparatus and method for automatically generating grammar for use in the processing of natural language. The apparatus may extract a corpus relevant to a target domain from a collection of corpora and may generate grammar for use in the target domain based on the extracted corpus. The apparatus may set one domain out of a plurality of domains as a target domain to be processed by an intention analysis system. The apparatus may extract a corpus relevant to the target domain from a collection of corpora and generate grammar based on the extracted corpus.07-12-2012
20120179453PREPROCESSING OF TEXT - Performance of statistical machine learning techniques, particularly classification techniques applied to the extraction of attributes and values concerning products, is improved by preprocessing a body of text to be analyzed to remove extraneous information. The body of text is split into a plurality of segments. In an embodiment, sentence identification criteria are applied to identify sentences as the plurality of segments. Thereafter, the plurality of segments are clustered to provide a plurality of clusters. One or more of the resulting clusters are then analyzed to identify segments having low relevance to their respective clusters. Such low relevance segments are then removed from their respective clusters and, consequently, from the body of text. As the resulting relevance-filtered body of text no longer includes portions of the body of text containing mostly extraneous information, the reliability of any subsequent statistical machine learning techniques may be improved.07-12-2012
20100010801CONFLICT RESOLUTION AND ERROR RECOVERY STRATEGIES - A plethora of strategies is afforded to facilitate conflict resolution and error recovery with respect to parsing, among other things. Grammar authors can select amongst a range of strategies or options on a case-by-case basis to address conflicts, ambiguities, errors, and the like. The strategies can be either static or dynamic. In one instance, code external to a parsing system can be invoked to resolve conflicts or recover from errors, and further enable change of strategy without requiring modification of the parser. Interaction between the parsing system and the external code can also be formalized to ensure general type safety of the system.01-14-2010
20100010803TEXT PARAPHRASING METHOD AND PROGRAM, CONVERSION RULE COMPUTING METHOD AND PROGRAM, AND TEXT PARAPHRASING SYSTEM - A paraphrase model of a question text inputted by a user is learned, and a paraphrase expression is generated in real time. When information in text set storage unit is updated, text pair extracting unit extracts a paraphrase text pair from the text set storage unit and stores it in text pair storage unit. Model learning unit learns a question text paraphrase model from the paraphrase text pair in text pair storage unit, and stores it in model storage unit. Text pair extracting unit extracts a paraphrase text pair again from the text set storage unit by using the question text paraphrase model which the model storage unit possesses, and stores it in the text pair storage unit. In case where the stored paraphrase text pair is the same as the paraphrase text pair stored in the text pair storage unit, learning of the question text paraphrase model is ended. Candidate creating unit reads the question text paraphrase model from the model storage unit and generates a paraphrase candidate of the inputted question text.01-14-2010
20120232887INFORMATION EXTRACTION ACROSS MULTIPLE EXPERTISE-SPECIFIC SUBJECT AREAS - Techniques are disclosed for bridging terminology differences between at least two subject areas. By way of example, a computer-implemented method includes executing the following steps on a computer. A first affinity measure is computed between a first term in a first corpus, corresponding to a first subject area, and a bridge term. A second affinity measure is computed between a second term in a second corpus, corresponding to a second subject area, and the bridge term. A third affinity measure is computed between the first term and the second term based on the first affinity measure and the second affinity measure. The bridge term is a term that appears in both the first corpus and the second corpus.09-13-2012
20120253792Sentiment Classification Based on Supervised Latent N-Gram Analysis - A method for sentiment classification of a text document using high-order n-grams utilizes a multilevel embedding strategy to project n-grams into a low-dimensional latent semantic space where the projection parameters are trained in a supervised fashion together with the sentiment classification task. Using, for example, a deep convolutional neural network, the semantic embedding of n-grams, the bag-of-occurrence representation of text from n-grams, and the classification function from each review to the sentiment class are learned jointly in one unified discriminative framework.10-04-2012
20120253790Personalization of Queries, Conversations, and Searches - Personalization of user interactions may be provided. Upon receiving a phrase from a user, a plurality of semantic concepts associated with the user may be loaded. If the phrase is determined to comprise at least one of the plurality of semantic concepts associated with the user, a first action may be performed according to the phrase. If the phrase is determined not to comprise at least one of the plurality of semantic concepts associated with the user, a second action may be performed according to the phrase.10-04-2012
20120253789Conversational Dialog Learning and Correction - Conversational dialog learning and correction may be provided. Upon receiving a natural language phrase from a first user, at least one second user associated with the natural language phrase may be identified. A context state may be created according to the first user and the at least one second user. The natural language phrase may then be translated into an agent action according to the context state.10-04-2012
20100004921AUTO-GENERATED TO-DO LIST - Methods, systems, and computer readable media for providing an auto-generated to-do list are described. Text is received in an instant messenger conversation, wherein the text comprises a task sender, a task body, and a task date, and an input is received selecting a selection of the text, wherein the selection comprises the task body. The text is analyzed to identify the task sender, the task body, and the task date. The task is then entered into the to-do list, wherein the task comprises the task sender, the task body, and the task date, thereby providing an auto-generated to-do list.01-07-2010
20100268528Method & Apparatus for Identifying Contract Characteristics - A contract characteristic identification application includes a user interface, a plurality of contract characteristic definitions, a natural language processing module and a characteristic identification function. At least one contract characteristic is defined and evaluated and the text of at least one contract is entering into the application. A document evaluation function included in the natural language processing module operates to evaluate the contents of the text of the contract against the defined contract characteristic and returns a listing of contract text that is closest to the defined contract characteristic of interest.10-21-2010
20120316867COMPUTER SYSTEM WITH SECOND TRANSLATOR FOR VEHICLE PARTS - Described are computer-based methods and apparatuses, including computer program products, for automation of auditing claims. Data indicative of an insurance company name is received, the data comprising one or more words. The data is processed through one or more processing steps to generate processed data comprising one or more processed words. One or more candidate word strings are selected based on the one or more processed words. Matching information is associated with each of the one or more candidate word strings. Analysis information is generated for each of the one or more candidate word strings based on the associated matching information. An insurance company identifier is associated with received data based on the analysis information and one or more matching rules.12-13-2012
20120259615TEXT PREDICTION - One or more techniques and/or systems are provided for suggesting a word and/or phrase to a user based at least upon a prefix of one or more characters that the user has inputted. Words in a database are respectively assigned a unique identifier. Generally, the unique identifiers are assigned sequentially and contiguously, beginning with a first word alphabetically and ending with a last word alphabetically. When a user inputted prefix is received, a range of unique identifiers corresponding to words respectively having a prefix that matches the user inputted prefix are identified. Typically, the range of unique identifiers corresponds to substantially all of the words that begin with the given prefix and does not correspond to words that do not begin with the given prefix. The unique identifiers may then be compared to a probability database to identify which words have a higher probability of being selected by the user.10-11-2012
20120259621Translating Texts Between Languages - Methods and computer systems for translating sentences between languages from an intermediate language-independent semantic representation are provided. Based on a comprehensive understanding about languages and semantics, exhaustive linguistic descriptions are used to analyze sentences, build syntactic structures and language independent semantic structures and representations, and synthesize one or more sentences in a natural or artificial language. A computer system is also provided to analyze and synthesize various linguistic structures and perform translation of a wide spectrum of various sentence types. As result, a generalized data structure, such as a semantic structure, is generated from a sentence of an input language and can be transformed into a natural sentence expressing its meaning correctly in an output language. The methods and systems can be applied to automated abstracting, machine translation, natural language processing, control systems, Internet information retrieval, etc.10-11-2012
20120259619SHORT MESSAGE AGE CLASSIFICATION - Systems and methods for short message age classification in accordance with embodiments of the invention are disclosed. In one embodiment of the invention, classifying messages using a classifier includes determining keyword feature information for a message using the classifier, classifying the determined feature information using the classifier, and estimating user age using the classifier.10-11-2012
20120259618COMPUTING DEVICE AND METHOD FOR COMPARING TEXT DATA - A method for comparing text data reads two patent documents comprising varying text sections. The method compares characters of a first text section in a first patent document with a corresponding second text section in a second patent document, and acquires a same sub-character string that has a maximum matching length and matching positions of the first and second text sections. The method marks characters before the matching positions of the first and second text sections as different characters. The method displays a comparison result list of the comparison between the first patent document and the second patent document on a display device.10-11-2012
20120259616SYSTEMS, METHODS AND DEVICES FOR GENERATING AN ADJECTIVE SENTIMENT DICTIONARY FOR SOCIAL MEDIA SENTIMENT ANALYSIS - Embodiments generally relate to systems and methods for generating a sentiment dictionary and calculating sentiment scores of adjectives within the sentiment dictionary. A set of seed words can be identified and expanded using synonyms and antonyms of the set of seed words. Social media data can be parse to identify adjectives that link to the set of seed words with the words “and” or “but.” Matrices representing the attraction and repulsion among the linked adjectives can be generated. A factorization algorithm can be minimized to determine an output matrix that comprises positive and negative sentiment scores for each of the adjectives. In embodiments, a sentiment score for part of all of the social media data can be calculated using the output matrix, and one or more parts of the social media data can be classified as a positive or negative sentiment.10-11-2012
20110119048SYSTEM AND METHOD FOR COMPUTERIZED PSYCHOLOGICAL CONTENT ANALYSIS OF COMPUTER AND MEDIA GENERATED COMMUNICATIONS TO PRODUCE COMMUNICATIONS MANAGEMENT SUPPORT, INDICATIONS AND WARNINGS OF DANGEROUS BEHAVIOR, ASSESSMET OF MEDIA IMAGES, AND PERSONNEL SELECTION SUPPORT - At least one computer-mediated communication produced by or received by an author is collected and parsed to identify categories of information within it. The categories of information are processed with at least one analysis to quantify at least one type of information in each category. A first output communication is generated regarding the at least one computer-mediated communication, describing the psychological state, attitudes or characteristics of the author of the communication. A second output communication is generated when a difference between the quantification of at least one type of information for at least one category and a reference for the at least one category is detected involving a psychological state, attitude or characteristic of the author to which a responsive action should be taken. The content of the second output communication and the at least one category are programmable to define a psychological state, attitude or characteristic in response to which an action should be taken and the action that is to be taken in response.05-19-2011
20090018820Character String Anonymizing Apparatus, Character String Anonymizing Method, and Character String Anonymizing Program - A character string anonymizing apparatus classifies each of a plurality of pieces of text data, each including a character string, into a plurality of kinds of data in accordance with a classification condition, extracts a plurality of words included in each of the plurality of pieces of text data (hereinafter, referred to as linked data) classified into the same kind by the classification, extracts, among word sets including one or more of the extracted words, a word set in which the number of pieces of linked data including all words forming each of the word sets is greater than or equal to a threshold, and anonymizes, among words included in a character string included in each of the plurality of pieces of text data, a word matching at least some of the extracted words and not matching words forming the extracted word set.01-15-2009
20090018819TRACKING CHANGES IN STRATIFIED DATA-STREAMS - Disclosed are systems, methods, and computer readable media for detecting and coordinating changes in stratified data streams. The method embodiment comprises receiving one or more data streams, each data stream comprising at least one lexical item and having at least one metavalue, detecting a change in a frequency of the at least one lexical item for each metavalue separately, coordinating the change in frequency of the at least one lexical item with changes in frequencies of lexical items associated with the at least one lexical item by grouping the at least one lexical item and the associated lexical items over time and across at least one metavalue, wherein end grouping is a coordinated change-event, and presenting a summarization of the coordinated change-event to a user.01-15-2009
20090018817Method and System for Connecting Characters, Words and Signs to a Telecommunication Number - This invention relates to the field of wireless data and instant communication technologies and describes a method and a system for connecting words, phrases, or symbols of any languages or multimedia expressions, within the content of transmitted data, to telecommunication codes. The presented method of the invention selects a group of Telecom Codes, defines Content Names, assigns the Content Names to the Telecom Codes, receives the transmitted content, and redirects the content to the connected Telecom Codes after detecting the existence of the Content Names. The presented system of the invention combines both software and hardware functions, with the hardware portion comprising a Processor, a Memory, a Display Device, an input Device, and a Communication Interface, and the software portion comprising an Operating System, a Client Data Management Module including Management Interface, a Database Software, a group of Telecom Codes as well as other connectable Telecom Codes configured in the Database Software, a group of defined Content Names configured in the Database Software, the Connection Relations and the Rules of Directing configured in the Database Software, an Analysis and Redirecting Module, and a Communication Interface. The presented method and system, of the invention solves the difficulties in memorizing and input cumbersomeness when long Telecom Codes of many digits are used, and leads to five new application developments; Information Portal, Multiple Content Names Connecting to Single Telecom Code, Multiple Telecom Codes Connecting to Single Content Name, Connection of Classified Advertisements, and Interactive Customer Relation Management (ICRM) and Supplier Relation Management (ISRM) System.01-15-2009
20120232885SYSTEM AND METHOD FOR BUILDING DIVERSE LANGUAGE MODELS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model.09-13-2012
20130173256NATURAL LANGUAGE PROCESSING ('NLP') - Natural language processing (‘NLP’) including: receiving text specifying predetermined evidence; receiving a text passage to process, the text passage including conditions and logical operators, the text passage comprising criteria for evidence; decomposing the text passage into coarse grained text fragments, including grouping text segments in dependence upon the logical operators; analyzing each coarse grained text fragment to identify conditions; evaluating each identified condition in accordance with the predetermined evidence and predefined condition evaluation rules; evaluating each coarse grained text fragment in dependence upon the condition evaluations and the logical operators; and calculating, in dependence upon the evaluations of each text fragment, a truth value indicating a degree to which the evidence meets the criteria of the text passage.07-04-2013
20130173249Natural Language Processing ('NLP') - Natural language processing (‘NLP’) including: receiving text specifying predetermined evidence; receiving a text passage to process, the text passage including conditions and logical operators, the text passage comprising criteria for evidence; decomposing the text passage into coarse grained text fragments, including grouping text segments in dependence upon the logical operators; analyzing each coarse grained text fragment to identify conditions; evaluating each identified condition in accordance with the predetermined evidence and predefined condition evaluation rules; evaluating each coarse grained text fragment in dependence upon the condition evaluations and the logical operators; and calculating, in dependence upon the evaluations of each text fragment, a truth value indicating a degree to which the evidence meets the criteria of the text passage.07-04-2013
20130173253SPEECH EFFECTS - A method of complementing a spoken text. The method including receiving text data representative of a natural language text, receiving effect control data including at least one effect control record, each effect control record being associated with a respective location in the natural language text, receiving a stream of audio data, analyzing the stream of audio data for natural language utterances that correlate with the natural language text at a respective one of the locations, and outputting, in response to a determination by the analyzing that a natural language utterance in the stream of audio data correlates with a respective one of the locations, at least one effect control signal based on the effect control record associated with the respective location.07-04-2013
20130173254Sentiment Analyzer - A sentiment analysis tool receives a t-gram from an electronic device. The t-gram comprises gram(s), each of the gram(s) representing a word in a collection of words. A polarity is set for the t-gram. Possible smaller-gram combinations are generated from the t-gram. Until a condition is met, iterative actions are taken. A likelihood ratio is calculated for the largest of the smaller-gram combinations employing the training set. A determination is made of whether the likelihood ratio meets a minimum replication threshold. If satisfied: the smaller-gram combinations most distant from an undefined polarity value are selected, the smaller-gram combinations employed in calculating the likelihood ratio are excluded; the polarity value for the t-gram is increasing proportional to the likelihood ratio; and the training set is reduced to v-grams that include the t-gram. Otherwise, the size of the smaller-gram is reduced by 1.07-04-2013
20120265520TEXT PROCESSOR AND METHOD OF TEXT PROCESSING - A text processor and a method of text processing comprises obtaining a plurality of word groups each comprising a sequence of words from a text, determining a frequency of occurrence of each of the word groups within a text corpus, by interrogating a database including the frequency information, and indicating word groups that have a frequency of occurrence that is below a threshold value.10-18-2012
20120265521Methods and systems relating to information extraction - The invention relates to information extraction systems having discriminative models which utilize hierarchical cluster trees and active learning to enhance training.10-18-2012
20080300864Syndication of documents in increments - Some embodiments of a publishing tool to provide syndication in increments have been presented. In one embodiment, a set of documents in different formats and/or different natural languages has been generated from a master document. In response to a change in the master document, a corresponding part in each of the plurality of documents is synchronously generated without regenerating an entirety of each of the plurality of documents. Then each of the set of documents is updated using the corresponding part generated.12-04-2008
20080300865METHOD, SYSTEM, AND APPARATUS FOR NATURAL LANGUAGE MIXED-INITIATIVE DIALOGUE PROCESSING - In a natural language, mixed-initiative system, a method of processing user dialogue can include receiving a user input and determining whether the user input specifies an action to be performed or a token of an action. The user input can be selectively routed to an action interpreter or a token interpreter according to the determining step.12-04-2008
20080300863Publishing tool for translating documents - Some embodiments of a publishing tool to translate documents have been presented. In one embodiment, a master document written in a first natural language is received. The master document is repurposed to generate a set of output documents in one or more predetermined formats, wherein each of the output document is in a distinct one of a set of natural languages.12-04-2008
20110004465Computation and Analysis of Significant Themes - Systems and computer-implemented processes for computation and analysis of significant themes in a corpus of documents. The computation and analysis of significant themes can be executed on a processor and involves generating a lexical unit document association (LUDA) vector for each lexical unit that has been provided and quantifying similarities between each unique pair of lexical units. The LUDA vector characterizes a measure of association between its corresponding lexical unit and documents in the corpus. The lexical units can then be grouped into clusters such that each cluster contains a set of lexical units that are most similar as determined by the LUDA vectors and a predetermined clustering threshold.01-06-2011
20110004464METHOD AND SYSTEM FOR SMART MARK-UP OF NATURAL LANGUAGE BUSINESS RULES - Smart Mark-up or highlighting delimits a rule using ontology technology to identify words and fields as objects and/or possible values in the rule. These technologies support the user in formalizing parts of the rules in a manner consistent with the system's data.01-06-2011
20110004463SYSTEMS AND METHODS FOR EXTRACTING PATTERNS FROM GRAPH AND UNSTRUCTURED DATA - A computing system receives input data having both graph and unstructured data and computes a current log likelihood of the input data. The computing system compares the current log likelihood with a previous log likelihood of the input data. If the current log likelihood is larger than the previous log likelihood, the computing system update topic modeling parameters, community modeling parameters, and the link generation parameter until the computing system obtains a maximal value of the log likelihood of the input data. Then, the computing system creates a graph indicating topic similarity between the input data based on the topic modeling parameters, creates another graph indicating community similarity between entities associated with the input data based on the community modeling parameters, and predicts a link existence between input data or entities based on the link generation parameter, the topic modeling parameter and the community modeling parameter.01-06-2011
20110004462Generating Topic-Specific Language Models - Speech recognition may be improved by generating and using a topic specific language model. A topic specific language model may be created by performing an initial pass on an audio signal using a generic or basis language model. A speech recognition device may then determine topics relating to the audio signal based on the words identified in the initial pass and retrieve a corpus of text relating to those topics. Using the retrieved corpus of text, the speech recognition device may create a topic specific language model. In one example, the speech recognition device may adapt or otherwise modify the generic language model based on the retrieved corpus of text.01-06-2011
20120323563GENERATING SNIPPET FOR REVIEW ON THE INTERNET - A method and system for generating snippet for review on the Internet. The method includes the steps of: receiving a review and a set of feedbacks corresponding to the review, where the review includes a plurality of evaluating sentences that evaluates product features of a product; calculating support degrees of each of the plurality of evaluating sentences by using the set of feedbacks; extracting, by relying on calculated support degrees of each of the evaluating sentences, at least one of the evaluating sentences from the plurality of evaluating sentences; and designating extracted evaluating sentence as a snippet of the review; where at least one of the steps is carried out by using a computer device.12-20-2012
20110131036SYSTEM AND METHOD OF SUPPORTING ADAPTIVE MISRECOGNITION IN CONVERSATIONAL SPEECH - A system and method are provided for receiving speech and/or non-speech communications of natural language questions and/or commands and executing the questions and/or commands. The invention provides a conversational human-machine interface that includes a conversational speech analyzer, a general cognitive model, an environmental model, and a personalized cognitive model to determine context, domain knowledge, and invoke prior information to interpret a spoken utterance or a received non-spoken message. The system and method creates, stores, and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech or non-speech communication and presenting the expected results for a particular question or command.06-02-2011
20110131035HANDHELD ELECTRONIC DEVICE AND METHOD FOR DISAMBIGUATION OF COMPOUND TEXT INPUT EMPLOYING DIFFERENT GROUPINGS OF DATA SOURCES TO DISAMBIGUATE DIFFERENT PARTS OF INPUT - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to generate compound language solutions by employing different groupings of data sources to generate different portions of the compound language solutions.06-02-2011
20110131034METHOD, A COMPUTER PROGRAM AND APPARATUS FOR PROCESSING A COMPUTER MESSAGE - Embodiments of the invention provide a method, computer program and apparatus for processing a computer message, the method comprising: upon receipt of a computer message at a computer, classifying the computer message and assigning it a message cluster identification in dependence thereon; and, utilising a message template to trans-denotate the message, wherein the message template is selected in dependence on the message cluster identification.06-02-2011
20120265519SYSTEM AND METHOD FOR OBJECT DETECTION - A system and method for object detection is provided, which system and method combines parsing and classification technologies for extracting objects, e.g., events, entities or the like, from text. In exemplary embodiment, the output of a parsing technique is transformed into a model suitable as input for classification in order to provide event or entity detection results.10-18-2012
20110010163METHOD, DEVICE, COMPUTER PROGRAM AND COMPUTER PROGRAM PRODUCT FOR PROCESSING LINGUISTIC DATA IN ACCORDANCE WITH A FORMALIZED NATURAL LANGUAGE - A method, device and computer program product for processing, in a computer system, linguistic data in accordance with a grammar of a Formalized Natural Language. The grammar of the Formalized Natural language is a text grammar operating on a set of texts of type Text. This text grammar is defined by a set of four elements W, N, R and Text. W is a finite set of invariable words of type Word, to be used as terminal, elementary expressions of a text. N is a finite set of non-terminal help symbols, to be used for the derivation and the representation of texts. R is a finite set of inductive rules for the production of grammatical expressions of the Formalized Natural Language, and Text is an element of N and start-symbol for grammatical derivation of all texts of type Text of the Formalized Natural Language. Linguistic data to be processed are acquired and processed in accordance with the Formalized Natural Language. A physical representation of a processed syntactic and semantic structure of the linguistic data is provided.01-13-2011
20120330647HIERARCHICAL MODELS FOR LANGUAGE MODELING - The described implementations relate to natural language processing, and more particularly to training a language prior model using a model structure. The language prior model can be trained using parameterized representations of lexical structures such as training sentences, as well as parameterized representations of lexical units such as words or n-grams. During training, the parameterized representations of the lexical structures and the lexical units can be adjusted using the model structure. When the language prior model is trained, the parameterized representations of the lexical structures can reflect how the lexical units were used in the lexical structures.12-27-2012
20110046944PLAIN ENGLISH DOCUMENT TRANSLATION METHOD - A method for building a plain English guide containing translations of a complex document is provided. The method begins by creating a working template from an electronic copy of the complex document using a computing device. Next, the method displays the working template on a user interface connected to the computing device. After that, the method performs one or more translation actions. Then, the method selects one of a set of custom icons after performing each of the translation actions. Finally, the method produces the plain English guide using the computing device to generate a report.02-24-2011
20110046943METHOD AND APPARATUS FOR PROCESSING DATA - A data processing method and apparatus that may set emotion based on development of a story are provided. The method and apparatus may set emotion without inputting emotion for each sentence of text data. Emotion setting information is generated based on development of the story and the like, and may be applied to the text data.02-24-2011
20120323564PROGRAM SEARCH DEVICE AND PROGRAM SEARCH METHOD - A program search device includes: a table storage unit that stores an allowed word table; a program stream acquiring unit that acquires a program stream generated according to a broadcasting code of ethics; a table update unit that extracts caption data or program information, which is a first text data item related to the content of a program; a program storage unit that stores a program included in the acquired program stream; a data acquiring unit that acquires a second text data item; a data processing unit that divides the second text data item into morphemes, replaces the divided morpheme with a predetermined symbol; an index giving unit that gives a set of the recombined third text data item; and a program extracting unit that extracts the program stored in the program storage unit.12-20-2012
20120323561HANDHELD ELECTRONIC DEVICE WITH TEXT DISAMBIGUATION - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device.12-20-2012
20120323562METHOD AND SYSTEM FOR CONVERTING IMAGE TEXT DOCUMENTS IN BIT-MAPPED FORMATS TO SEARCHABLE TEXT AND FOR SEARCHING THE SEARCHABLE TEXT - A system and method for searching optical character recognition results of image text documents includes an image text transformer that linguistically analyzes the optical character recognition results within a context of multiple lexicons to form edited text results and creates a reflection repository having reflection files therein corresponding to the image documents from the optical character recognition results. A search engine searches the reflection files and a user device displays a first reflection file from the reflection files or a first image document from the image documents in response to searching. The files are displayed on a display associated with a user device.12-20-2012
20120323559INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM - An apparatus is provided for determining a lyric importance level, comprising a memory and a processor executing instructions stored in the memory. The processor executes instructions stored in the memory to acquire lyric information, the lyric information identifying: lyrics of a song; and lyric location information indicating locations of the lyrics within the song. The processor further executes instructions stored in the memory to acquire section information, the section information identifying: sections of the song; section importance levels corresponding to the sections; and section location information indicating locations of the sections within the song. The processor still further executes instructions stored in the memory to identify, based on the lyric location information and the section location information, one or more sections corresponding to a subset of the lyrics; and determine, based on the section importance levels, a lyric importance level of the subset.12-20-2012
20120323558METHOD AND APPARATUS FOR CREATING A PREDICTING MODEL - A method for creating a predictive model is disclosed herein, including the steps of determining trends and patterns in electronic data, using at least a first machine language algorithm, refining the determination of the algorithm, searching for social models that describe the identified trends and patterns using at least a second machine language algorithm, verifying causal links, constructing at least one model about human node behavior and interactions, utilizing the social models to do at least one of the following: validate hypotheses, predict future behavior, and examine hypothetical scenarios, and automatically updating predictions when new data is introduced.12-20-2012
20120323560METHOD FOR SYMBOLIC CORRECTION IN HUMAN-MACHINE INTERFACES - Disclosed embodiments include methods and systems for symbolic correction in human-machine interfaces that comprise (a) implementing a language model; (b) implementing a hypothesis model; (c) implementing an error model; and (d) processing a symbolic input message based on weighted finite-state transducers to encode 1) a set of input hypothesis using the hypothesis model, 2) the language model, and 3) the error model to perform correction on the sequential pre-segmented symbolic input message in the human-machine interface. According to a particular embodiment, the processing step comprises a combination of the language model, the hypothesis model, and the error model performed without parsing by employing a composition operation between the transducers and a lowest cost path search, exact or approximate, on the composed transducer.12-20-2012
20120271627CROSS-LANGUAGE TEXT CLASSIFICATION - Methods are described for performing classification (categorization) of text documents written in various languages. Language-independent semantic structures are constructed before classifying documents. These structures reflect lexical, morphological, syntactic, and semantic properties of documents. The methods suggested are able to perform cross-language text classification which is based on document properties reflecting their meaning. The methods are applicable to genre classification, topic detection, news analysis, authorship analysis, etc.10-25-2012
20120271626APPARATUS AND METHOD FOR LINGUISTIC SCORING - In embodiments of the invention, a system receives selections from a user based on a list of pre-defined monitoring categories and/or optionally receives custom category definitions from the user. The option for custom category definitions may be advantageous due to the flexibility provided to a system administrator or other user. In embodiments of the invention, the pre-defined and/or custom monitoring categories may be or include complex hierarchical behavior. Such an approach provides monitoring algorithms that can achieve improved accuracy compared to known methods. In embodiments of the invention, the order of computations used in resolving a monitoring category may be re-ordered, statically and/or dynamically, to improve the efficiency of monitoring operations.10-25-2012
20120271625MULTIMODAL NATURAL LANGUAGE QUERY SYSTEM FOR PROCESSING AND ANALYZING VOICE AND PROXIMITY BASED QUERIES - The present disclosure provides a natural language query system and method for processing and analyzing multimodally-originated queries, including voice and proximity-based queries. The natural language query system includes a Web-enabled device including a speech input module for receiving a voice-based query in natural language form from a user and a location/proximity module for receiving location/proximity information from a location/proximity device. The query system also includes a speech conversion module for converting the voice-based query in natural language form to text in natural language form and a natural language processing module for converting the text in natural language form to text in searchable form. The query system further includes a semantic engine module for converting the text in searchable form to a formal database query and a database-look-up module for using the formal database query to obtain a result related to the voice-based query in natural language form from a database.10-25-2012
20120271624PROCESSING GEOGRAPHICAL LOCATION DATA IN A DOCUMENT - Techniques for processing geographical location data in a document comprise: obtaining geographical location data in the document; grading the geographical location data according to a predetermined condition to determine an associated relationship between the geographical location data; marking on an electronic map the associated relationship between the geographical location data; and presenting the marked electronic map.10-25-2012
20100010800Automatic Pattern Generation In Natural Language Processing - Disclosed herein is a computer implemented method and system of generating declared patterns from components of a sentence. Parts of speech in the sentence are tagged for identifying parts of speech of each word and phrase in the sentence. Sentence chunking is then performed using the identified parts of speech of each word and phrase to generate pattern units. A first dictionary and a database of equivalent pattern specification sets are then applied to identify grammatical roles and senses of the generated pattern units. A second dictionary and a conceptionary are then applied to identify an equivalent name set for each of the generated pattern units. The declared patterns are then generated for the sentence using the identified equivalent name set for each of the generated pattern units.01-14-2010
20110238411DOCUMENT PROOFING SUPPORT APPARATUS, METHOD AND PROGRAM - According to one embodiment, a document proofing support apparatus includes an input unit, an analysis unit, a detection unit, a database unit, a retrieval unit, and a display unit. The input unit is configured to receive input of one of at least one proof document and at least one entry document. The analysis unit is configured to perform a morphological, a syntactic and a dependency analysis and generate analysis information including a dependency relation. The detection unit is configured to detect as a possible coined word character string a compound word having a nominal continuation relation. The database unit is configured to store syntactic information. The retrieval unit is configured to retrieve a dependency-relation sentence, and to determine the possible coined word character string as a coined word if the dependency-relation sentence exists. The display unit is configured to display a message including the coined word.09-29-2011
20110238410Semantic Clustering and User Interfaces - Semantic clustering techniques are described. In various implementations, a conversational agent is configured to perform semantic clustering of a corpus of user utterances. Semantic clustering may be used to provide a variety of functionality, such as to group a corpus of utterances into semantic clusters in which each cluster pertains to a similar topic. These clusters may then be leveraged to identify topics and assess their relative importance, as for example to prioritize topics whose handling by the conversation agent should be improved. A variety of utterances may be processed using these techniques, such as spoken words, textual descriptions entered via live chat, instant messaging, a website interface, email, SMS, a social network, a blogging or micro-blogging interface, and so on.09-29-2011
20110238409Semantic Clustering and Conversational Agents - Semantic clustering techniques are described. In various implementations, a conversational agent is configured to perform semantic clustering of a corpus of user utterances. Semantic clustering may be used to provide a variety of functionality, such as to group a corpus of utterances into semantic clusters in which each cluster pertains to a similar topic. These clusters may then be leveraged to identify topics and assess their relative importance, as for example to prioritize topics whose handling by the conversation agent should be improved. A variety of utterances may be processed using these techniques, such as spoken words, textual descriptions entered via live chat, instant messaging, a website interface, email, SMS, a social network, a blogging or micro-blogging interface, and so on.09-29-2011
20120278065GENERATING SNIPPET FOR REVIEW ON THE INTERNET - A method and system for generating snippet for review on the Internet. The method includes the steps of: receiving a review and a set of feedbacks corresponding to the review, where the review includes a plurality of evaluating sentences that evaluates product features of a product; calculating support degrees of each of the plurality of evaluating sentences by using the set of feedbacks; extracting, by relying on calculated support degrees of each of the evaluating sentences, at least one of the evaluating sentences from the plurality of evaluating sentences; and designating extracted evaluating sentence as a snippet of the review; where at least one of the steps is carried out by using a computer device.11-01-2012
20120278066COMMUNICATION INTERFACE APPARATUS AND METHOD FOR MULTI-USER AND SYSTEM - A communication interface apparatus for a system and a plurality of users is provided. The communication interface apparatus for the system and the plurality of users includes a first process unit configured to receive voice information and face information from at least one user, and determine whether the received voice information is voice information of at least one registered user based on user models corresponding to the respective received voice information and face information; a second process unit configured to receive the face information, and determine whether the at least one user's attention is on the system based on the received face information; and a third process unit configured to receive the voice information, analyze the received voice information, and determine whether the received voice information is substantially meaningful to the system based on a dialog model that represents conversation flow on a situation basis.11-01-2012
20120278064SYSTEM AND METHOD FOR DETERMINING SENTIMENT FROM TEXT CONTENT - A system and method for determining sentiment from user-generated text content is provided. A sentiment score is determined for one or more terms in a user-generated text content. A sentiment value is determined for the text content that is based at least in part on the sentiment score for the one or more terms.11-01-2012
20110246182SYSTEMS FOR DYNAMICALLY GENERATING AND PRESENTING NARRATIVE CONTENT - In some embodiments, a non-transitory processor-readable medium stores code representing instructions that when executed cause a processor to select a narrative content template based at least in part on a predetermined content type associated with a real-world and/or virtual event. The code further represents instructions that when executed cause the processor to select a narrative tone type. The code further represents instructions that when executed cause the processor to, for each phrase included in an ordered set of phrases associated with the narrative content template, select, based at least in part on the narrative tone type, a phrase variation from a set of phrase variations associated with that phrase, and define, based on the selected phrase variation and at least one datum from a set of data, a narrative content portion associated with the real-world event. The code further represents instructions that when executed cause the processor to output, at a display, the narrative content portion.10-06-2011
20120089388Segmenting Words Using Scaled Probabilities - Systems, methods, and apparatuses including computer program products for segmenting words using scaled probabilities. In one implementation, a method is provided. The method includes receiving a probability of a n-gram identifying a word, determining a number of atomic units in the corresponding n-gram, identifying a scaling weight depending on the number of atomic units in the n-gram, and applying the scaling weight to the probability of the n-gram identifying a word to determine a scaled probability of the n-gram identifying a word.04-12-2012
20120089387GENERAL PURPOSE CORRECTION OF GRAMMATICAL AND WORD USAGE ERRORS - Architecture that detects and corrects writing errors in a human language based on the utilization of three different stages: error detection, correction candidate generation, and correction candidate ranking. The architecture is a generic framework for generating fluent alternatives to non-grammatical word sequences in a written sample. Error detection is addressed by a suite of language model related scores and other scores such as parse scores that can identify a particularly unlikely sequence of words. Correction candidate generation is addressed by a lookup in a very large corpus of “correct” English that looks for alternative arrangements of the same or similar words or subsequences of these words in the same context. Correction candidate ranking is addressed by a language model ranker.04-12-2012
20110276322TEXTUAL ENTAILMENT METHOD FOR LINKING TEXT OF AN ABSTRACT TO TEXT IN THE MAIN BODY OF A DOCUMENT - Aspects of the exemplary embodiment relate to a system and method for processing a document which enables assessment of the coherence of an abstract of the document. The method includes storing the document in memory and, for each sentence of the abstract, comparing the sentence with sentences of a main body of the document using textual entailment techniques to identify whether the sentence of the abstract entails a sentence in the main body of the document. Links can then be generated between the entailing sentences of the abstract and the corresponding entailed sentences of the document. The document and generated links are output. The links enable the coherence of the abstract to be evaluated, either manually or automatically, using an evaluation component of the system.11-10-2011
20120095752LEVERAGING BACK-OFF GRAMMARS FOR AUTHORING CONTEXT-FREE GRAMMARS - A system and method of refining context-free grammars (CFGs). The method includes deriving back-off grammar (BOG) rules from an initially developed CFG and utilizing the initial CFG and the derived BOG rules to recognize user utterances. Based on a response of the initial CFG and the derived BOG rules to the user utterances, at least a portion of the derived BOG rules are utilized to modify the initial CFG and thereby produce a refined CFG. The above method can carried out iterativey, with each new iteration utilizing a refined CFG from preceding iterations.04-19-2012
20120095751TEXT SEGMENTATION AND LABEL ASSIGNMENT WITH USER INTERACTION BY MEANS OF TOPIC SPECIFIC LANGUAGE MODELS AND TOPIC-SPECIFIC LABEL STATISTICS - The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document.04-19-2012
20120095750PARSING OBSERVABLE COLLECTIONS - Parsing technology is applied to observable collections. More specifically, a parser, such as combinator parser, can be employed to perform syntactic analysis over one or more observable collections. Further, multiple observable collections can be combined into a single collection and time can be captured by annotating collection items or generating time items.04-19-2012
20120330648CONTEXT-BASED DISAMBIGUATION OF ACRONYMS AND ABBREVIATIONS - Context-based disambiguation of acronyms and/or abbreviations may determine a target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage, the target abbreviation representing a shortened form of one or more word. A contextual search query including the target abbreviation and said one or more keywords may be generated. A pseudo document index may be searched for one or more expansions of the target abbreviation by invoking the contextual search query, the pseudo document index containing index of one or more pseudo documents, associated one or more abbreviations and associated context keywords. One or more pseudo documents associated with the target abbreviation may be returned based on the searching of the pseudo document index.12-27-2012
20120330649SYSTEMS AND METHODS FOR EXTRACTING PATTERNS FROM GRAPH AND UNSTRUCTURED DATA - A computing system receives input data having both graph and unstructured data and computes a current log likelihood of the input data. The computing system compares the current log likelihood with a previous log likelihood of the input data. If the current log likelihood is larger than the previous log likelihood, the computing system update topic modeling parameters, community modeling parameters, and the link generation parameter until the computing system obtains a maximal value of the log likelihood of the input data. Then, the computing system creates a graph indicating topic similarity between the input data based on the topic modeling parameters, creates another graph indicating community similarity between entities associated with the input data based on the community modeling parameters, and predicts a link existence between input data or entities based on the link generation parameter, the topic modeling parameter and the community modeling parameter.12-27-2012
20110320191TEXT CREATION SYSTEM AND METHOD - A text creation system and method is described. Input text is provided in an authoring language and may be provided in one or more rendering languages and/or writing styles. The input text is analyzed to determine the semantic content of the input, and the semantic information is stored in a database. In one embodiment, templates are provided to acquire input and information about the template and the semantic data of the input, along with a database of equivalent meanings in different languages, are utilized to instantly generate output in a number of languages. One embodiment is a method and system of presenting advertisements in multiple languages.12-29-2011
20110320190AUTOMATED SENTENCE PLANNING IN A TASK CLASSIFICATION SYSTEM - The invention relates to a task classification system (12-29-2011
20110320189SYSTEMS AND METHODS FOR FILTERING DICTATED AND NON-DICTATED SECTIONS OF DOCUMENTS - A system and method for filtering documents to determine section boundaries between dictated and non-dictated text. The system and method identifies portions of a text report that correspond to an original dictation and, correspondingly, those portions that are not part of the original dictation. The system and method include comparing tokenized and normalized forms of the original dictation and the final report, determining mismatches between the two forms, and applying machine-learning techniques to identify document headers, footers, page turns, macros, and lists automatically and accurately.12-29-2011
20110320188WEB-BASED SPEECH RECOGNITION WITH SCRIPTING AND SEMANTIC OBJECTS - The present invention is a system and method for creating and implementing transactional speech applications (SAs) using Web technologies, without reliance on server-side standard or custom services. A transactional speech application may be any application that requires interpretation of speech in conjunction with a speech recognition (SR) system, such as, for example, consumer survey systems. A speech application in accordance with the present invention is represented within a Web page, as an application script that interprets semantic objects according to a context. Any commonly known scripting language can be used to write the application script, such as JavaScript (or ECMAScript), PerlScript, and VBscript. The present invention is “Web-based” to the extent that it implements Web technologies, but it need not include or access the World Wide Web.12-29-2011
20110320187Natural Language Question Answering System And Method Based On Deep Semantics - In a computer system, systems and methods for automatically answering natural language questions using deep semantics are provided. Methods include receiving a natural language question, mapping it into one or more deductive database queries that captures one or more intents behind the question, computing one or more result sets of the question using one or more deductive database queries and a deductive database and providing one or more result sets. Systems include natural language question compilers and deductive databases. The natural language question compiler is configured to receive a natural language question and map it into one or more deductive database queries that capture one or more intents behind the question. The deductive database is configured to receive the mapped one or more deductive database queries, compute one or more result sets of the question using the one or more deductive database queries, and provide one or more result sets.12-29-2011
20110320186ENTITY RECOGNITION - The invention relates to a method of querying technical domains that recognises the concepts represented by strings of characters, rather than merely comparing strings. It can be used to compute conceptual similarity between terms. The method employs string distance metrics and a cyclic progression of lexical processing to recognise constituent term concepts that are then combined to form full-term concepts by means of a grammar. Terms can be extracted and identified as being conceptually similar (or dissimilar) to other terms even if they have never previously been encountered.12-29-2011
20120101809SYSTEM AND METHOD FOR DYNAMICALLY GENERATING A RECOGNITION GRAMMAR IN AN INTEGRATED VOICE NAVIGATION SERVICES ENVIRONMENT - The system and method described herein may dynamically generate a recognition grammar associated with a conversational voice user interface in an integrated voice navigation services environment. In particular, in response to receiving a natural language utterance that relates to a navigation context at the voice user interface, a conversational language processor may generate a dynamic recognition grammar that organizes grammar information based on one or more topological domains. For example, the one or more topological domains may be determined based on a current location associated with a navigation device, whereby a speech recognition engine may use the grammar information organized in the dynamic recognition grammar according to the one or more topological domains to generate one or more interpretations associated with the natural language utterance.04-26-2012
20120101808SENTIMENT ANALYSIS FROM SOCIAL MEDIA CONTENT - Methods and systems for extracting and analyzing user-generated content (UGC) in order to provide opinion-bearing information concerning different categories of a product. Harvested Web pages are examined for keywords to identify categories to which they pertain. Opinion-bearing information regarding those categories is then extracted and analyzed to determine its orientation and, optionally, its strength. The resulting sentiment determinations can be aggregated across multiple product reviews and the like to develop a sentiment summary, which can be reported and used as the basis for advertising, marketing and purchasing decisions, among others.04-26-2012
20120101807QUESTION TYPE AND DOMAIN IDENTIFYING APPARATUS AND METHOD - A question type and domain identifying apparatus includes: a question type identifier for recognizing the number of words of a user's question to identify whether the user's question is a query for performing information searching or a question for performing a question and answer (Q&A); a question domain distributor for distributing one of plural preset domain specialized Q&A engines, as a Q&A engine of the user's question based on the recognized word number; and a Q&A engine block, including the domain specialized Q&A engines, for selectively performing information searching or a Q&A with respect to the user's question in response to the distribution of the question domain distributor.04-26-2012
20120101806SEMANTICALLY GENERATING PERSONALIZED RECOMMENDATIONS BASED ON SOCIAL FEEDS TO A USER IN REAL-TIME AND DISPLAY METHODS THEREOF - Systems and methods of selecting recommendations for a user in an online environment are disclosed. In one aspect, embodiments of the present disclosure include a method, which may be implemented on a system, of performing semantic analysis on a content item associated with the user, online interactions of the user, and profile information related to the user to identify associated content metadata and keywords, assigning a weight to the content metadata and keywords based on semantic-type categories, comparing the content metadata and keywords to target metadata and keywords to identify recommendation matches, and selecting one or more recommendations to be provided to the user based on the recommendation matches.04-26-2012
20120101805METHOD AND APPARATUS FOR DETECTING A SENTIMENT OF SHORT MESSAGES - A method, computer readable medium and apparatus for detecting a sentiment for a short message are disclosed. For example, the method receives the short message, and obtains an abstraction of the short message. The method then determines the sentiment of the short message based upon the abstraction.04-26-2012
20120290288Parsing of text using linguistic and non-linguistic list properties - A system and method are disclosed for extracting information from text which can be performed without prior knowledge as to whether the text includes a list. The method applies parser rules to a sentence spanning lines of text to identify a set of candidate list items in the sentence. Each candidate list item is assigned a set of features including one or more non-linguistic feature and a linguistic feature. The linguistic feature defines a syntactic function of an element of the candidate list item that is able to be in a dependency relation with an element of an identified candidate list introducer in the same sentence. When two or more candidate list items are found with compatible sets of features, a list is generated which links these as list items of a common list introducer. Dependency relations are extracted between the list introducer and list items and information based on the extracted dependency relations is output.11-15-2012
20120290290Sentence Simplification for Spoken Language Understanding - Sentence simplification may be provided. A spoken phrase may be received and converted to a text phrase. An intent associated with the text phrase may be identified. The text phrase may then be reformatted according to the identified intent and a task may be performed according to the reformatted text phrase.11-15-2012
20100198584SERVER FOR AUTOMATICALLY SCORING OPINION CONVEYED BY TEXT MESSAGE CONTAINING PICTORIAL-SYMBOLS - A server is disclosed for computing a score of an opinion that a message in a text file is expected to convey regarding a subject to be evaluated, wherein the message is written using literal strings and pictorial symbols. In this server, by the use of a pictorial-symbol dictionary memory storing a correspondence between designated pictorial-symbols to be rated and scores of opinions expressed by the respective pictorial-symbols, at least one of the used pictorial-symbols in the message which is coincident with at least one of the designated pictorial-symbols stored in the pictorial-symbol dictionary memory, is extracted from the message, at least one of the opinion scores which corresponds to the at least one extracted pictorial-symbol is retrieved within the pictorial-symbol dictionary memory, and an aggregate net opinion score for the message is calculated, based on an aggregate opinion score for the at least one extracted pictorial-symbol.08-05-2010
20100198583INDICATING METHOD FOR SPEECH RECOGNITION SYSTEM - The present invention relates to an indicating method for speech recognition system, comprising a multimedia electronic product and a speech recognition device. The steps of this method include: users enter voice commands into a voice input unit and convert these commands into speech signals, which are acquired and stored by a recording unit, converted by a microprocessor into a volume indicating oscillogram, and then displayed by a display module. At the same time, compliance with speech recognition conditions will be decided in that process. That is to say, an indicating module is used for diagram, letter or color marking or speech indication according to volume indicating oscillogram, followed by playing over a sound amplifying unit, so that users can understand the voice input status and adjust the volume to fulfill voice command operations virtually through voice indication, explanations in graphs or letters and other interactive guidance, together with audio indication oscillogram, thus further enhancing speech recognition rate and avoiding such problems and deficiencies as distortions related to abnormal and poor sound acquisition or inconvenience for use.08-05-2010
20100169076Text-to-Scene Conversion - The invention relates to a method of converting a set of words into a three-dimensional scene description, which may then be rendered into three-dimensional images. The invention may generate arbitrary scenes in response to a substantially unlimited range of input words. Scenes may be generated by combining objects, poses, facial expressions, environments, etc., so that they represent the input set of words. Poses may have generic elements so that referenced objects may be replaced by those mentioned in the input set of words. Likewise, a character may be dressed according to its role in the set of words. Various constraints for object positioning may be declared. The environment, including but not limited to place, time of day, and time of year, may be inferred from the input set of words.07-01-2010
20100169075ADJUSTMENT OF TEMPORAL ACOUSTICAL CHARACTERISTICS - Embodiments may be a standalone module or part of mobile devices, desktop computers, servers, stereo systems, or any other systems that might benefit from condensed audio presentations of item structures such as lists or tables. Embodiments may comprise logic such as hardware and/or code to adjust the temporal characteristics of items comprising words. The items maybe included in a structure such as a text listing or table, an audio listing or table, or a combination thereof, or may be individual words or phrases. For instance, embodiments may comprise a keyword extractor to extract keywords from the items and an abbreviations generator to generate abbreviations based upon the keywords. Further embodiments may comprise a text-to-speech generator to generate audible items based upon the abbreviations to render to a user while traversing the item structure.07-01-2010
20100169078Style-checking method and apparatus for business writing - The method is software-based, and carried out on a computer system and cooperating apparatus. It checks written text for problems impairing clarity, conciseness and reader comfort in business documents. One embodiment includes routines for checking writing style in sentences and paragraphs, and for generating informational, critical and commendatory display indicators relating to reader comfort. The routines check subject and verb juxtaposition, verb strength, prepositional phrase use, transition words, unity-creating constructions, gerund use, and sentence variety. The indicators are displayed in the form of highlighted text and diacritical marks. Another embodiment is a method for quantifying reader discomfort. It includes routines for quantifying, reporting and displaying points indicating comfort-impairing problems of the type located by running the routines of the first embodiment. Yet another embodiment includes a method for editing text documents for reader comfort, by locating and fixing problem words and constructions.07-01-2010
20130013290LANGUAGE PROCESSOR - A referring expression processor which uses a probabilistic model and in which referring expressions including descriptive, anaphoric and deictic expressions are understood and generated in the course of dialogue is provided. The referring expression processor according to the present invention includes: a referring expression processing section which performs at least one of understanding and generation of referring expressions using a probabilistic model constructed with a referring expression Bayesian network, each referring expression Bayesian network representing relationships between a reference domain (D) which is a set of possible referents, a referent (X) in the reference domain, a concept (C) concerning the referent and a word (W) which represents the concept; and a memory which stores data necessary for constructing the referring expression Bayesian network.01-10-2013
20130013296HANDHELD ELECTRONIC DEVICE WITH REDUCED KEYBOARD AND ASSOCIATED METHOD OF PROVIDING IMPROVED DISAMBIGUATION WITH REDUCED DEGRADATION OF DEVICE PERFORMANCE - In view of the foregoing, an improved handheld electronic device having a reduced keyboard provides facilitated language entry by making available to a user certain words that a user may reasonably be expected to enter. Incoming data, such as the text of a message, can be scanned for proper nouns, for instance, since such proper nouns might not already be stored in memory and might be expected to be entered by the user when, for example, forwarding or responding to the message. A proper noun can be identified, for instance, on the basis that it begins with an upper case letter. The proper nouns can be stored, for example, in memory that may, by way of further example, be a temporary dictionary.01-10-2013
20130013292HANDHELD ELECTRONIC DEVICE WITH TEXT DISAMBIGUATION - A method for disambiguating user inputs through a handheld mobile device is disclosed. According to the method, an ambiguous input sequence is received from an input device. A list including one or more disambiguated character sequences is generated on a display device corresponding to the ambiguous input sequence. An additional input is received from the input device. A processor determines that the additional input is an operational input associated with one of a plurality of operations on the disambiguated character sequences. The processor processes the disambiguated character sequences according the one of the plurality of operations associated with the operational input.01-10-2013
20130013294HANDHELD ELECTRONIC DEVICE WITH REDUCED KEYBOARD AND ASSOCIATED METHOD OF PROVIDING QUICK TEXT ENTRY IN A MESSAGE - An improved handheld electronic device having a reduced keyboard provides facilitated language entry by making available to a user certain words that a user may reasonably be expected to enter. In some situations, certain words can be stored, for example, in a temporary dictionary for use in particular situations. For instance, the names of the recipients of an electronic message might be stored in a temporary dictionary for rapid retrieval when entering a salutation in the message. As another example, a number of the words in an existing electronic message may be stored in a temporary dictionary and made available to a user when replying to or forwarding the message since the existing message might include words that the user might reasonably be expected to type in the reply message or the forwarded message.01-10-2013
20130013293HANDHELD ELECTRONIC DEVICE PROVIDING A LEARNING FUNCTION TO FACILITATE CORRECTION OF ERRONEOUS TEXT ENTRY, AND ASSOCIATED METHOD - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate text input. In addition to identifying and outputting representations of language objects that are stored in the memory and that correspond with a text input, the device provides a learning function which facilitates providing proposed corrected output by the device in certain circumstances of erroneous input.01-10-2013
20130013291SYSTEMS AND METHODS FOR SENTENCE COMPARISON AND SENTENCE-BASED SEARCH - Systems and methods for performing logical semantic sentence comparisons and sentence-based searches. Training is performed by running an NLP pipeline on unstructured text comprising sentences and creating sentence matrix representations on the unstructured text; storing the matrix representations in an indexed database; combining the stored matrix representations; running an SVD on the combined matrix; storing the SVD components in the indexed database; reiterating through the output of the NLP pipeline the sentences of the unstructured training text to form a low-dimensional matrix conversion for each sentence for storage in the database based on the calculated SVD components. Subsequent query statements are run through the same process based and converted into low-dimensional matrix representations using the SVD components from training; the low-dimensionality query matrix is compared to the stored low-dimensional matrices to determine the closest relevant documents, that are returned to the user.01-10-2013
20130013289Method of Extracting Experience Sentence and Classifying Verb in Blog - Provided are a method of extracting an experience-revealing sentence from a blog document and a method of classifying verbs into activity verbs and state verbs in a sentence recorded in a blog document. The method of extracting an experience sentence from a blog document includes generating a sentence classifier using a machine learning algorithm based on grammatical features, and classifying experience sentences that represent actual experiences of users and non-experience sentences that represent no experience in the blog document using the sentence classifier. By classifying sentences in a blog document into experience sentences and non-experience sentences, it is possible to extract experiences that a user has actually had or that have actually happened to a user from the document.01-10-2013
20130013295METHOD AND SYSTEM FOR PROVIDING INITIAL PATENT CLAIM ANALYSIS - Information relating to intellectual property, across one or more intellectual property applications having various types of intellectual property data, can be provided and/or accessed in an integrated manner. Commonality(ies) are determined between disparate intellectual property applications, that may be applied by the intellectual property applications in accessing the intellectual property information. Responsive to a user request, which may include a specified commonality, stored information regarding the disparate data corresponding to the disparate intellectual property applications is retrieved. The commonality is utilized in bridging the gap to the intellectual property data for the disparate intellectual property applications. The bridging is provided by use of a commonality and by an IP engine.01-10-2013
20100131266HANDHELD ELECTRONIC DEVICE INCLUDING AUTOMATIC PREFERRED SELECTION OF A PUNCTUATION, AND ASSOCIATED METHOD - A method of enabling input on a handheld electronic device, which includes an input apparatus having a number of input members that are capable of being actuated, wherein at least one of the input members has a plurality of selectable output alternatives, includes detecting as a first input an actuation of an input member, generating a first output, detecting as a second input an actuation of an input member having a plurality of selectable output alternatives comprising at least a primary punctuation and a secondary punctuation, determining that said first output has a predetermined characteristic, preferring as a second output said secondary punctuation, and outputting said second output.05-27-2010
20120150534Computer-Implemented Systems and Methods for Determining a Difficulty Level of a Text - Systems and methods are provided for determining a difficulty level of a text. A determination is made as to a number of cohesive devices present in a text. A further determination is made as to a number of cohesive devices expected in the text. A cohesiveness metric is calculated based on the number of cohesive devices present in the text and the number of cohesive devices expected in the text, where the cohesiveness metric is used to identify a difficulty level of the text.06-14-2012
20120150531SYSTEM AND METHOD FOR LEARNING LATENT REPRESENTATIONS FOR NATURAL LANGUAGE TASKS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for learning latent representations for natural language tasks. A system configured to practice the method analyzes, for a first natural language processing task, a first natural language corpus to generate a latent representation for words in the first corpus. Then the system analyzes, for a second natural language processing task, a second natural language corpus having a target word, and predicts a label for the target word based on the latent representation. In one variation, the target word is one or more word such as a rare word and/or a word not encountered in the first natural language corpus. The system can optionally assigning the label to the target word. The system can operate according to a connectionist model that includes a learnable linear mapping that maps each word in the first corpus to a low dimensional latent space.06-14-2012
20130018649System and a Method for Generating Semantically Similar Sentences for Building a Robust SLMAANM Deshmukh; Om D.AACI New DelhiAACO INAAGP Deshmukh; Om D. New Delhi INAANM Joshi; SachindraAACI New DelhiAACO INAAGP Joshi; Sachindra New Delhi INAANM Mohamed; Shajith I.AACI KarnatakaAACO INAAGP Mohamed; Shajith I. Karnataka INAANM Verma; AshishAACI New DelhiAACO INAAGP Verma; Ashish New Delhi IN - A system and method are described for generating semantically similar sentences for a statistical language model. A semantic class generator determines for each word in an input utterance a set of corresponding semantically similar words. A sentence generator computes a set of candidate sentences each containing at most one member from each set of semantically similar words. A sentence verifier grammatically tests each candidate sentence to determine a set of grammatically correct sentences semantically similar to the input utterance. Also note that the generated semantically similar sentences are not restricted to be selected from an existing sentence database.01-17-2013
20130018653HANDHELD ELECTRONIC DEVICE AND METHOD FOR DISAMBIGUATION OF COMPOUND TEXT INPUT EMPLOYING DIFFERENT GROUPINGS OF DATA SOURCES TO DISAMBIGUATE DIFFERENT PARTS OF INPUT - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to generate compound language solutions by employing different groupings of data sources to generate different portions of the compound language solutions.01-17-2013
20130018652EVIDENCE DIFFUSION AMONG CANDIDATE ANSWERS DURING QUESTION ANSWERING - Diffusing evidence among candidate answers during question answering may identify a relationship between a first candidate answer and a second candidate answer, wherein the candidate answers are generated by a question-answering computer process, the candidate answers have associated supporting evidence, and the candidate answers have associated confidence scores. All or some of the evidence may be transferred from the first candidate answer to the second candidate answer based on the identified relationship. A new confidence score may be computed for the second candidate answer based on the transferred evidence.01-17-2013
20130018650Selection of Language Model Training Data - An intelligent selection system selects language model training data to obtain in-domain training datasets. The selection is accomplished by estimating a cross-entropy difference for each candidate text segment from a generic language dataset. The cross-entropy difference is a difference between the cross-entropy of the text segment according to the in-domain language model and the cross-entropy of the text segment according to a language model trained on a random sample of the data source from which the text segment is drawn. If the difference satisfies a threshold condition, the text segment is added as an in-domain text segment to a training dataset.01-17-2013
20130018651PROVISION OF USER INPUT IN SYSTEMS FOR JOINTLY DISCOVERING TOPICS AND SENTIMENTSAANM Djordjevic; DivnaAACI AntibesAACO FRAAGP Djordjevic; Divna Antibes FRAANM Ghani; RayidAACI ChicagoAAST ILAACO USAAGP Ghani; Rayid Chicago IL USAANM Krema; MarkoAACI EvanstonAAST ILAACO USAAGP Krema; Marko Evanston IL US - A generative model is used to develop at least one topic model and at least one sentiment model for a body of text. The at least one topic model is displayed such that, in response, a user may provide user input indicating modifications to the at least one topic model. Based on the received user input, the generative model is used to provide at least one updated topic model and at least one updated sentiment model based on the user input. Thereafter, the at least one updated topic model may again be displayed in order to solicit further user input, which further input is then used to once again update the models. The at least one updated topic model and the at least one updated sentiment model may be employed to analyze target text in order to identify topics and associated sentiments therein.01-17-2013
20110144975TYPEWRITER SYSTEM AND TEXT INPUT METHOD USING MEDIATED INTERFACE DEVICE - Disclosed is a typewriter system and a text input method capable of accurately recognizing words by correcting words input using a mediated interface device based on a dictionary. A plurality of texts are combined by referencing a text recognition order set in which recognition results of texts are arranged according to a recognition order from texts input through the mediated interface device and the combined text is filtered using part index maps formed of part words that are an accumulated set of texts forming complete words. The part words passing through the part index maps is again filtered using a dictionary including context information formed of a set of words in a specific category, thereby making it possible to accurately recognize the words. The part words that cannot form words in a dictionary are removed in advance using the part index maps, thereby improving the recognition efficiency.06-16-2011
20110161072LANGUAGE MODEL CREATION APPARATUS, LANGUAGE MODEL CREATION METHOD, SPEECH RECOGNITION APPARATUS, SPEECH RECOGNITION METHOD, AND RECORDING MEDIUM - A frequency counting unit (06-30-2011
20110161069METHOD, COMPUTER PROGRAM PRODUCT AND APPARATUS FOR PROVIDING A THREAT DETECTION SYSTEM - An apparatus for providing a threat detection system may include a processor configured to at least to perform parsing data to identify terms included in a lexicon of multi-dimensional threat factors, generating scoring results for at least some of the terms, and providing a graphical display of at least some of the terms based on the scoring results. A corresponding method and computer program product are also provided.06-30-2011
20110161068SYSTEM AND METHOD OF USING A SENSE MODEL FOR SYMBOL ASSIGNMENT - Systems and methods for automatically discovering and assigning symbols for identified text in a software application include receiving electronic signals from an input device indicating identified text for which symbol assignment is desired. Additional information such as part of speech, additional words, context of use, etc. may also be provided. The identified text and optional additional information is analyzed to establish a mapping of the identified text to one or more identified word senses from a word sense model database. An electronic determination of whether any of the identified word senses has an associated symbol is conducted. Related word senses may also be analyzed to determine if any related word senses have symbols. One of the determined symbols may then be associated with the identified text such that the symbol is thereafter displayed in conjunction with or instead of the text in the application.06-30-2011
20130024186Deep Model Statistics Method for Machine Translation - In one embodiment, the invention provides a method for machine translation of a source document in an input language to a target document in an output language, comprising generating translation options corresponding to at least portions of each sentence in the input language; and selecting a translation option for the sentence based on statistics associated with the translation options.01-24-2013
20130024184DATA PROCESSING SYSTEM AND METHOD FOR ASSESSING QUALITY OF A TRANSLATION - The invention provides a data processing system and method for analysing text. The invention uses statistical text classification techniques to assist with the quality assurance of translated texts by using a one pass analysis technique and calculating and ranking probed texts with a dissimilarity score. The use of ranked items to direct, inform, guide and assist human reviewers, auditors, proof-readers, post-editors and evaluators of the accuracy of the translation. The invention provides a significant time saving and accuracy of assessing document's adherence to an enterprises corporate messaging and authoring standards and provides for a level of automated quality assurance within automated translation workflows.01-24-2013
20130173248LEVERAGING LANGUAGE STRUCTURE TO DYNAMICALLY COMPRESS A SHORT MESSAGE SERVICE (SMS) MESSAGE - A message within a message queue can be identified. The message queue can be within a software entity of a computing device. The message can be analyzed to determine an encoding scheme to apply to the message. The message can be encoded using the encoding scheme to create an encoded message. The encoding scheme can be a word level encoding scheme, a language-based encoding scheme, or a grammar encoding scheme.07-04-2013
20130173250STATISTICAL STEMMING - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating suffix rewriting rules. A method includes obtaining a plurality of canonical suffix-rewriting rules each associated with one or more words, generating a suffix tree from the words, selecting a minimum colored subset of the nodes and leaves in the suffix tree, and generating a plurality of final suffix-rewriting rules from the nodes in the minimum colored subset. Another method includes receiving applicable and non-applicable words for a suffix-rewriting rule, generating a suffix tree from the applicable words and the non-applicable words, selecting a minimum colored subset of the nodes and leaves in the suffix tree, and generating a plurality of suffix-rewriting rules, wherein each rule corresponds to a node in the minimum colored subset with a valid status.07-04-2013
20130173251ELECTRONIC DEVICE AND NATURAL LANGUAGE ANALYSIS METHOD THEREOF - A natural language analysis method for an electronic device is provided. The language analysis method includes the steps of: receiving user inputs and generating signals; converting signals into textual information; segmenting the textual information into a number of vocabulary segments, each vocabulary segment including a number of separated vocabularies; retrieving the use frequency of each of vocabulary, sorting the vocabulary segments, and obtaining a first sorting of the number of vocabulary segments into descending order; segmenting the textual information into a number of sentence segmentations; obtaining a second sorting of the vocabulary segmentations, according to the number of sentence segmentations and the number of vocabulary segment results; and determining a reply to the textual information, according to the topmost result after the second sorting. An electronic device using the language analysis method is also provided.07-04-2013
20130173252ELECTRONIC DEVICE AND NATURAL LANGUAGE ANALYSIS METHOD THEREOF - A language analysis method for an electronic device storing a basic corpus and a temporary corpus is provided. The language analysis method includes steps of receiving user inputs and generating signals; converting signals into textualized information; analyzing the textualized information; obtaining a first understanding result according to the basic corpus, the vocabulary segmentation results, and the sentence segmentation results; determining whether the first understanding result is an appropriate understanding according to the context; determining one or more anaphoric vocabularies when the first understanding result is an inappropriate understanding; determining a temporary understanding result of the one or more anaphoric vocabularies and a second understanding result of the textualized information according to the context; and determining a reply for the textualized information, according to the second understanding result, the basic corpus, and the temporary corpus. An electronic device using the language analysis method is also provided.07-04-2013
20130173255Methods for Creating A Phrase Thesaurus - The invention enables creation of grammar networks that can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces (NLVUI). More specifically, the invention concerns a phrase-based modeling of generic structures of verbal interaction and use of these models for the purpose of automating part of the design of such grammar networks.07-04-2013
20130173257Systems and Processes for Identifying Features and Determining Feature Associations in Groups of Documents - Systems and computer-implemented processes for identification of features and determination of feature associations in a group of documents can involve providing a plurality of keywords identified among the terms of at least some of the documents. A value measure can be calculated for each keyword. High-value keywords are defined as those keywords having value measures that exceed a threshold. For each high-value keyword, term-document associations (TDA) are accessed. The TDA characterize measures of association between each term and at least some documents in the group. A processor quantifies similarities between unique pairs of high-value keywords based on the TDA for each respective high-value keyword and generates a similarity matrix that indicates one or more sets that each comprise highly associated high-value keywords.07-04-2013
20080243479OPEN INFORMATION EXTRACTION FROM THE WEB - To implement open information extraction, a new extraction paradigm has been developed in which a system makes a single data-driven pass over a corpus of text, extracting a large set of relational tuples without requiring any human input. Using training data, a Self-Supervised Learner employs a parser and heuristics to determine criteria that will be used by an extraction classifier (or other ranking model) for evaluating the trustworthiness of candidate tuples that have been extracted from the corpus of text, by applying heuristics to the corpus of text. The classifier retains tuples with a sufficiently high probability of being trustworthy. A redundancy-based assessor assigns a probability to each retained tuple to indicate a likelihood that the retained tuple is an actual instance of a relationship between a plurality of objects comprising the retained tuple. The retained tuples comprise an extraction graph that can be queried for information.10-02-2008
20100088087MULTI-TAPABLE PREDICTIVE TEXT - A multi-tapable predictive text method and device that allows for both multi-tap and predictive text entry to be used in conjunction thereby facilitating entry of text in languages having a large number or characters and/or on devices having a small number of keys. The method includes selecting at least one set of symbols from a plurality of sets of symbols associated with at least one key of an input device, at least one of the sets of symbols corresponding to at least two alphanumeric characters, and each set of symbols selectable by activating the key a prescribed number of times, analyzing the selected sets of symbols using a predictive text engine to generate a list of potential character strings, and displaying at least one of the potential character strings of the list of character strings for selection by a user.04-08-2010
20080235005Device, System and Method of Handling User Requests - Devices, systems and methods of handling user requests. For example, a method includes: receiving an electronic representation of a submitted request; calculating request-related information, submitter-related information, and/or recipient-related information; based on the request-related information and the submitter-related information, determining one or more recipients for the request; distributing the request to said one or more recipients; and storing the request and one or more responses received from said one or more recipients.09-25-2008
20130185060PHRASE BASED DOCUMENT CLUSTERING WITH AUTOMATIC PHRASE EXTRACTION - Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents.07-18-2013
20110264445METHOD OF USING VISUAL SEPARATORS TO INDICATE ADDITIONAL CHARACTER COMBINATION CHOICES ON A HANDHELD ELECTRONIC DEVICE AND ASSOCIATED APPARATUS - A method and associated apparatus for using visual separators to indicate additional character combination choices from a disambiguation function on a handheld electronic device.10-27-2011
20130179152Computer Implemented Method, Apparatus, Network Server And Computer Program Product - A computer implemented method for generating user element explanations for elements of a formal language may include: identifying at least one element of said formal language, selecting according to a target domain a respective mapping rule for each identified element, wherein the mapping rule refers to at least one word of a natural language and/or at least one audio file, generating at least one user element explanation for each identified element by automatically combining the respective at least one word of a natural language and/or at least one audio file according to predefined grammar rules, wherein the predefined grammar rules form part of the selected mapping rule and specify how to combine the at least one word of a natural language and/or audio file into said user element explanation, and linking the generated at least one user element explanation with the respective identified element of said formal language.07-11-2013
20080221873METHODS AND APPARATUS FOR NATURAL SPOKEN LANGUAGE SPEECH RECOGNITION - A word prediction apparatus and method that improves the precision accuracy, and a speech recognition method and an apparatus therefor are provided. For the prediction of a sixth word “?”, a partial analysis tree having a modification relationship with the sixth word is predicted. “sara-ni sho-senkyoku no” has two partial analysis trees, “sara-ni” and “sho-senkyoku no”. It is predicted that “sara-ni” does not have a modification relationship with the sixth word, and that “sho-senkyoku no” does. Then, “donyu”, which is the sixth word from “sho-senkyoku no”, is predicted. In this example, since “sara-ni” is not useful information for the prediction of “donyu”, it is preferable that “donyu” be predicted only by “sho-senkyoku no”.09-11-2008
20080221872METHODS AND APPARATUS FOR NATURAL SPOKEN LANGUAGE SPEECH RECOGNITION - A word prediction method and apparatus improves precision and accuracy. For the prediction of a sixth word “?”, a partial analysis tree having a modification relationship with the sixth word is predicted. “sara-ni sho-senkyoku no” has two partial analysis trees, “sara-ni” and “sho-senkyoku no”. It is predicted that “sara-ni” does not have a modification relationship with the sixth word, and that “sho-senkyoku no” does. Then, “donyu”, which is the sixth word from “sho-senkyoku no”, is predicted. In this example, since “sara-ni” is not useful information for the prediction of “donyu”, it is preferable that “donyu” be predicted only by “sho-senkyoku no”.09-11-2008
20080221871HUMAN/MACHINE INTERFACE - A method to allow a user to cause a machine to make an utterance, together with apparatus for obtaining input from a user. The method comprises the steps of: analysing the context within which the utterance is to be made; creating a list of utterances appropriate to the context; on a human/machine interface, creating an indication that identifies to the user those utterances that are available, and allowing the user to indicate one of those utterances; and causing an utterance indicated by the user to be made. The apparatus comprises: a visual display configured to display a plurality of indicia angularly spaced on a locus about an origin, each of the plurality of indicia corresponding to a respective option; and an input device for use in indicating an angular position, wherein the apparatus is configured to generate an input event corresponding to an option associated with one of the plurality of indicia at an angular position corresponding to an angular position indicated by the input device.09-11-2008
20080221870System and method for revising natural language parse trees - An improved system and method for revising natural language parse trees is provided. A revision dependency parser may learn a set of transformation rules that may be applied to dependency parse trees generated by a base parser for revising the dependency parse trees. A corpus of natural language sentences and a set of correct dependency parse trees may be used to train a revision dependency parser to correct dependency parse trees generated by the base parser. A revision engine may compare the dependency parse trees produced by the base parser with the correct ones present in the training data to produce an observation-rule pair for each dependency. A rule may specify a transformation on the predicted dependency parse tree generated by the base parser to replace an incorrect dependency with a corrected dependency or may change the type of dependency expressed for the grammatical function of the dependent word.09-11-2008
20080221869CONVERTING DEPENDENCY GRAMMARS TO EFFICIENTLY PARSABLE CONTEXT-FREE GRAMMARS - Dependency grammars are transformed to context-free grammars. The context-free grammars can be used in a parser to parse input sentences and identify relationships among words in the sentence.09-11-2008
20130179149COMMUNICATION PROCESSING - Disclosed are methods and apparatus for processing linguistic expressions (e.g., opinionated text documents). The linguistic expressions are processed by, firstly, detecting topics of interest discussed in the linguistic expressions. The sentiment, or sentiments, of an originator with respect to each of the topics detected in the linguistic expressions is then assessed. The originators are then grouped (or clustered) into one or more groups based on the similarities between the originators' respective sets of detected topics and corresponding sentiments. Semantic information is then associated with a given group. Finally, for a given member of a given group, a profile is created or updated. This profile comprises attributes that may be based on a degree of membership of the given member to the given group and the semantic information associated with the given group.07-11-2013
20130179150NOTE COMPILER INTERFACE - A computer implemented method is performed at an electronic device adapted to receive text-based input and having a display. The method comprises receiving text-based input, analyzing the inputted text, and performing, in dependence on the analyzing, at least one of two actions. The first action comprises comparing at least some of the analyzed text with other text accessed by the device; if a match is found between said at least some of the analyzed text and said accessed text, retrieving data associated with the matching accessed text, and associating the retrieved data with the inputted text for subsequent provision to the user. The second action comprises providing data associated with at least some of the analyzed text to a module of the device. In this way, data pull and push actions are performed in dependence on the analyzing. An electronic device and computer program product are also provided.07-11-2013
20130179148METHOD AND APPARATUS FOR DATABASE AUGMENTATION AND MULTI-WORD SUBSTITUTION - A method and communication device are provided for database augmentation using linguistic data stored on a device, and utilizing a database stored on a device to perform multi-word substitution. A database may be augmented by monitoring other databases that contain linguistic data, such as contact databases containing linguistic data regarding entities that a device may communicate with, and updating the database with linguistic data in the other databases. The linguistic data in the augmented database may be compared with words received from an input apparatus to determine whether any of the received words should be replaced with linguistic data from the augmented database. The augmented database may contain one word entries and multi-word entries to allow for multi-word substitution.07-11-2013
20130179151METHOD AND SYSTEM FOR CONSTRUCTING A LANGUAGE MODEL - Disclosed herein are various embodiments of methods and systems for constructing a first language model for use by a first Language Processing (LP) application of a plurality of LP applications. Each LP application of the plurality of LP applications receives one or more of a language based input, a derivative of the language based input, a response to the language based input and a derivative of the response. The method includes processing at least one input by a second LP application of the plurality of LP applications. Based on the processing of the second LP application, at least one output is generated. Subsequently, at least a portion of the first language model is constructed based on the at least one output.07-11-2013
20130179153Computer-Implemented Systems and Methods for Detecting Punctuation Errors - Systems and methods are provided for detecting punctuation errors in a text including one or more sentences. A sentence including a plurality of words is received, the sentence including one or more preexisting punctuation marks. One or more punctuation marks are determined with a statistical classifier based on a set of rules, to be inserted in the sentence. The determined punctuation marks are compared with the preexisting punctuation marks. A report of punctuation errors is output based on the comparison.07-11-2013
20130144608Incorporation of Variables Into Textual Content - Embodiments of the invention provide techniques for incorporating variable values into textual content. In one embodiment, an abstract phrase including a text phrase and a variable at a particular position in the text phrase is received. The abstract phrase may include multiple variables. A text value for the variable is received. The text phrase of the abstract phrase is combined with the text value according to the particular position of the variable. An integration rule is applied at a boundary of the text phrase of the abstract phrase and the text value, where the integration rule is based on a language rule. The integration rule modifies a portion of the text phrase of the abstract phrase or a portion of the text value to produce an integrated phrase.06-06-2013
20130144609TEXT PROCESSING SYSTEM, TEXT PROCESSING METHOD, AND TEXT PROCESSING PROGRAM - Provided is a text processing system capable of avoiding declining processing efficiency in analyses of text that does not contain breaks.06-06-2013
20130173258Broad-Coverage Normalization System For Social Media Language - A method for identification of a standard text token in a dictionary that corresponds to a non-standard token identified in text includes identification of a first standard token that is associated with the non-standard using a predetermined conditional random field (CRF) model and identification of a second standard token that is associated with the non-standard token using a spell checker. The method further includes identification of noisy channel scores using data from the CRF model and the spell checker for the first standard token and the second standard token, respectively. The method further includes presentation of one of the first and second standard tokens having the greatest identified noisy channel score to a user with a user interface device.07-04-2013
20120253793System for natural language understanding - A general-purpose apparatus for analyzing natural language text that allows for the implementation of a broad range of natural language understanding applications. The apparatus for natural language understanding analyzes a source text and transforms the source text into a semantically-interpretable syntactic representation (SISR), comprising a syntax template and semantic clause annotations. The general-purpose apparatus for natural language understanding is adaptable to various source text natural languages and is adaptable to various natural language understanding applications, such as query answering, translation, summarization, information extraction, disambiguation, and parsing. A natural language query answering apparatus for answering questions about a source text, whereby the query answering apparatus utilizes the general-purpose apparatus for transforming the natural language query into SISR format.10-04-2012
20120253791Task Driven User Intents - Identification of user intents may be provided. A plurality of network applications may be identified, and an ontology associated with each of the plurality of applications may be defined. If a phrase received from a user is associated with at least one of the defined ontologies, an action associated with the network application may be executed.10-04-2012
20110270607METHOD AND SYSTEM FOR SEMANTIC SEARCHING OF NATURAL LANGUAGE TEXTS - A method and system comprising an automated analysis of at least one corpus of natural language text is disclosed. For each sentence of a corpus, the analysis includes performing a syntactic analysis using linguistic descriptions to generate at least one syntactic structure for the sentence, building a semantic structure for the sentence, associating each generated syntactic and semantic structure with the sentence, and saving each generated syntactic and semantic structure. For each corpus text that was preliminary analyzed, performing an indexing operation to index lexical meanings and values of linguistic parameters of each syntactic structure and each semantic structure associated with sentences in the corpus text. A semantic search as disclosed herein includes at least one automatic preliminary analyzed corpus of sentences comprising searched values of linguistic, syntactic and semantic parameters. Due to deep semantic analysis of one or more corpora, the search may be executed in various languages, in resources of various languages, and in text corpora of various languages, regardless of the language of the query.11-03-2011
20110270604SYSTEMS AND METHODS FOR SEMI-SUPERVISED RELATIONSHIP EXTRACTION - Systems and methods are disclosed to perform relation extraction in text by applying a convolution strategy to determine a kernel between sentences; applying one or more semi-supervised strategies to the kernel to encode syntactic and semantic information to recover a relational pattern of interest; and applying a classifier to the kernel to identify the relational pattern of interest in the text in response to a query.11-03-2011
20130138429Method and Apparatus for Information Searching - Techniques for performing searches using synonym pairs generated from data mining are described herein. These techniques may include receiving, by a server, a query including a keyword. The server may generate multiple synonym pairs associated with the keyword by mining multiple item descriptions under a certain context, and then calculate a comprehensive relevance for individual synonym pair. If the comprehensive relevance is greater than a predetermined value, the server may perform searches based on the individual synonym pair.05-30-2013
20130138424Context-Aware Interaction System Using a Semantic Model - The subject disclosure is directed towards detecting symbolic activity within a given environment using a context-dependent grammar. In response to receiving sets of input data corresponding to one or more input modalities, a context-aware interactive system processes a model associated with interpreting the symbolic activity using context data for the given environment. Based on the model, related sets of input data are determined. The context-aware interactive system uses the input data to interpret user intent with respect to the input and thereby, identify one or more commands for a target output mechanism.05-30-2013
20130138423Contextual search for modeling notations - A method, an apparatus, and a computer program product for contextual-based search of modeling notations to be used in a model. The method comprises obtaining a contextual property of a notation to be used in a diagram, wherein the contextual property defines a context of a usage of the notation in the diagram; and searching in a notation-base for notations, whereby a search result set is obtained, wherein the search result set comprises notations that were previously used in a similar context to the contextual property, wherein the notation-base is stored in a data storage.05-30-2013
20130090917FILTERING PROHIBITED LANGUAGE FORMED INADVERTENTLY VIA A USER-INTERFACE - Some embodiments of the inventive subject matter are directed to detecting that a text string is subject to an algorithmic function that would modify one of more parts of the text string to be at least one proposed text substring for presentation via a user interface, wherein the at least one proposed text substring is a portion of the text string. Some embodiments are further directed to evaluating the at least one proposed text substring against one or more prohibited text strings prohibited for presentation via the user interface and detecting, in response to the evaluating of the at least one proposed text substring against the one or more prohibited text strings, that the at least one proposed text substring is one of the one or more prohibited text strings. Some embodiments are further directed to modifying the at least one proposed text substring, in response to detecting that the at least one proposed text substring is one of the one or more prohibited text strings.04-11-2013
20130090918SYSTEM, METHOD AND APPARATUS FOR DETECTING RELATED TOPICS AND COMPETITION TOPICS BASED ON TOPIC TEMPLATES AND ASSOCIATION WORDS - A system for detecting related topics and competition topics for a target topic includes an information extracting apparatus configured to create topic templates and association words from documents created online to generate topic templates and association words. The system also includes a related topic detecting apparatus configured to detect and trace related topics and competition topics for the target topic based on the topic templates and the association words.04-11-2013
20130090919ELECTRONIC DEVICE AND DICTIONARY DATA DISPLAY METHOD - An electronic device includes a display module and a dictionary storage module which stores dictionary data that causes a plurality of entry words including compound words obtained by connecting a plurality of words to correspond to explanatory information on the entry words. When the user retrieves a dictionary, entry words for compound words are retrieved from the entry words in the dictionary storage module and words common to the retrieved compound words are listed and displayed on the display module. Entry words for compound words connecting with a word specified by a user operation in the displayed list are read from the dictionary data and displayed in list form on the display module.04-11-2013
20130090916System and Method for Detecting and Correcting Mismatched Chinese Character - A system and method for detecting and correcting mismatched Chinese characters in a phrase. The system comprises a database for the look-up of characters and Chinese phrases, a module to compare the input phrases with the look-up data retrieved from the database and a module to correct the mismatched characters. The database contains correct phrases as well as attributes associated with each character, such as pronunciation and radical composition. The modules inputs a Chinese phrase that has at least two characters and compares it with the data retrieved from the database to determine if there are incorrect characters. The spell checking method includes two groups of steps: mismatched character detection and mismatched character correction. Whether there is any mismatched character to be corrected is determined by the edit distance, the phrase length and comparisons of the pronunciation and radical composition of the mismatched characters.04-11-2013
20130090920SYSTEMS AND METHODS FOR ACCESSING WEB PAGES USING NATURAL LANGUAGE - Systems and methods for building an interface that receives and responds to varied natural language expressions. In an embodiment, the system receives a natural language expression in text or audio, and translates it by building at least one data structure which reflects the concepts expressed in the natural language expression. The data structure may comprise a symbol representing each concept. In an embodiment, a parser utilizes the data structure to parse language expressions to single concept symbols that represent the meaning of the expressions. Response actions may also be performed in response to the parsed language expressions. In addition, a parser may receive a single concept symbol, and generate one or many natural language expressions of the meaning of the concept symbol. Furthermore, the system may be configured to understand the local meaning of words and phrases.04-11-2013
20130096909SYSTEM AND METHOD FOR SUGGESTION MINING - A system and method for extraction of suggestions for improvement form a corpus of documents, such as customer reviews, are disclosed. A structured terminology provided or a topic includes a set of semantic classes, each including a set of terms. A thesaurus of terms relating to suggestions of improvement is provided. Text elements of text strings in the documents which are instances of terms in the structured terminology are labeled with the corresponding semantic class and text elements which are instances of terms in the thesaurus are also labeled. A set of patterns is applied to the labeled text strings to identify suggestions of improvement expressions. The patterns define syntactic relations between text elements, some of which are required to be instances of one of the terms in a particular semantic class or thesaurus. A set of suggestions for improvements is output based on the identified suggestions of improvement expressions.04-18-2013
20130096911NORMALISATION OF NOISY TYPEWRITTEN TEXTS - Described herein is a method and system for normalising a SMS sequence in which the sequence is pre-processed to identify noisy segments in the sequence, normalising those noisy segments and normalising the rest of the SMS sequence in accordance with predefined rules. A morphosyntactic analysis is carried out on the normalised text before an output is provided either as a typewritten text or as a synthetic speech signal.04-18-2013
20130096910METHOD AND SYSTEM FOR ADAPTING TEXT CONTENT TO THE LANGUAGE BEHAVIOR OF AN ONLINE COMMUNITY - A method for adapting a piece of text content to the language behavior of an online community, comprising the following steps: 04-18-2013
20130103388DOCUMENT ANALYZING APPARATUS - A document analyzing apparatus includes a document analyzer and a comparator. The document analyzer is used for deconstructing a text file of a document stored in a data storage device to obtain a plurality of model sentences, and then storing the model sentences in the data storage device. The document analyzer further applies a position index to each of the model sentences, wherein the position index points to the storing position of the document having the model sentence in data storage device. The comparator is used for comparing a processing sentence and each of the model sentences for the similarity. The document analyzing apparatus in the present invention is capable of deconstructing text files into small units such as sentences so as to facilitate the user to search or classify the documents.04-25-2013
20130103389Selecting Terms in a Document - Determining a mapping between a textual representation in a document and a concept is disclosed. A document is received. A set of candidate textual representations in the document is identified. For at least one candidate textual representation included in the set, an associated concept included in a taxonomy of concepts is determined. The candidate textual representation and the associated concept are provided as output.04-25-2013
20130103386PERFORMING SENTIMENT ANALYSIS - There is provided a computer-implemented method of performing sentiment analysis. An exemplary method comprises identifying one or more sentences in a microblog. The microblog comprises an entity. The method further includes identifying one or more opinion words in the sentences based on an opinion lexicon. Additionally, the method includes determining, for each of the sentences, an opinion value for the entity. The opinion value is determined based on an opinion value for each of the opinion words in an opinion lexicon.04-25-2013
20130103385PERFORMING SENTIMENT ANALYSIS - There is provided a computer-implemented method of performing sentiment analysis. An exemplary method comprises performing a first sentiment analysis on microblogging data based on a method using an opinion lexicon. The method also includes training a classifier using training data from the first sentiment analysis. Additionally, the method includes identifying a new opinion term in the microblogging data by performing a statistical test. The new opinion terms are not in the opinion lexicon. The method also includes identifying new microblogging data based on the new opinion term. Further, the method includes performing a second sentiment analysis on the new microblogging data using the classifier.04-25-2013
20130103391NATURAL LANGUAGE PROCESSING FOR SOFTWARE COMMANDS - A system and method for facilitating user access to software functionality. An example method includes receiving natural language input; determining an identify of a user providing the input; employing the identity to facilitate selecting a software command to associate with the received natural language input; and employing software to act on the command. In a more specific embodiment, the method further includes determining an initial set of available software commands, and narrowing the initial set of available software commands based on the identity of a user and enterprise data associated with the identity of the user, resulting in a narrowed set of software commands in response thereto. Example enterprise data includes enterprise organizational chart information (e.g., corporate hierarchy information) and user access privilege information maintained by an ERP system.04-25-2013
20130103390METHOD AND APPARATUS FOR PARAPHRASE ACQUISITION - A computer based natural language processing method for identifying paraphrases in corpora using statistical analysis comprises deriving a set of starting paraphrases (SPs) from a parallel corpus, each SP having at least two phrases that are phrase aligned; generating a set of paraphrase patterns (PPs) by identifying shared terms within two aligned phrases of an SP, and defining a PP having slots in place of the shared terms, in right hand side (RHS) and left hand side (LHS) expressions; and collecting output paraphrases (OPs) by identifying instances of the PPs in a non-parallel corpus. By using the reliably derived paraphrase information from a small parallel corpus to generate the PPs, and extending the range of instances of the PPs over the large non-parallel corpus, better coverage of the paraphrases in the language and fewer errors are encountered.04-25-2013
20130103387COMPUTER PROCESSES FOR ANALYZING AND IMPROVING DOCUMENT READABILITY - Computer-based processes are disclosed for analyzing and improving document readability. Document readability is improved by using rules and associated logic to automatically detect various types of writing problems and to make and/or suggest edits for eliminating such problems. Many of the rules seek to generate more concise formulations of the analyzed sentences, such as by eliminating unnecessary words, rearranging words and phrases, and making various other types of edits.04-25-2013
20130124195Phrase-Based Dialogue Modeling With Particular Application to Creating a Recognition Grammar - The invention enables creation of grammar networks that can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces (NLVUI). The invention enables phrase-based modeling of generic structures of verbal interaction to be used for the purpose of automating part of the design of such grammar networks. Most particularly, the invention enables such grammar networks to be used in providing a voice-controlled user interface to human readable text data that is also machine-readable (such as a Web page, a word processing document, a PDF document, or a spreadsheet).05-16-2013
20130124190SYSTEM AND METHODOLOGY THAT FACILITATES PROCESSING A LINGUISTIC INPUT - Aspects for teaching processing linguistic expressions are disclosed, which include apparatuses, methods, and computer-readable storage media to facilitate such processing. In a particular aspect, modifying a linguistic expression includes receiving an input that includes the linguistic expression and a selection of a target vernacular, and retrieving a phonetic scheme corresponding to the target vernacular, which includes a set of accentuation rules associated with the target vernacular. An audible equivalent of the linguistic expression is then generated in the target vernacular according to the phonetic scheme. In another aspect, phonetic schemes are generated by aggregating linguistic information corresponding to a plurality of vernaculars, and analyzing the linguistic information to ascertain a plurality of accentuation rules. A phonetic scheme is then generated for each of the plurality of vernaculars, which includes a set of accentuation rules associated with the corresponding vernacular.05-16-2013
20130124194SYSTEMS AND METHODS FOR MANIPULATING DATA USING NATURAL LANGUAGE COMMANDS - Systems and methods for manipulating data using natural language commands in accordance with embodiments of the invention are disclosed. In one embodiment, a natural language enterprise system includes a database configured to store a natural language index, where the natural language index maps keywords to actions to data, a natural language application server configured to communicate with the database, wherein the natural language application server is configured to receive a command statement, parse the received command statement to identify at least one keyword in the command statement, query the database using at least one keyword to identify at least one actions to data using the natural language index, locate at least one piece of enterprise data to which at least one action to data may be performed, and initiate at least one action to data that is applied to at least one of the located pieces of enterprise data.05-16-2013
20130124189NETWORK-BASED BACKGROUND EXPERT - A system and methodology that provides a network-based, e.g., cloud-based, background expert for predicting and/or accomplishing a user's goals is disclosed herein. Moreover, the system monitors, in the background, user generated data and/or publicly available data to determine and/or infer a user's goal, with or without an active indication/request from the user. Typically, the user-generated data can include user conversations, such as, but not limited to, speech data in a voice call, text messages, chat dialogues, etc. Further, the system identifies an action or task that facilitates accomplishment of the user goal in real-time. Moreover, the system can automatically perform the action/task and/or request user authorization prior to performing the action/task.05-16-2013
20110313757SYSTEMS AND METHODS FOR ADVANCED GRAMMAR CHECKING - In embodiments of the present invention improved capabilities are described for a method of grammar checking, comprising providing a first level of grammar checking through a computer-based grammar checking facility to grammar check a body of text provided by a source in order to improve the grammatical correctness of the text; providing an excerpt of the body of text containing an identified grammatical error as a result of the first level of automated grammar checking to a second level of human augmented grammar checking consisting of at least one human proofreader for review; incorporating the results of the human proofreader review to contribute to an at least one corrected version of the provided body of text; and sending the at least one corrected version back the source. The method of grammar checking may provide for automatic grammar correction and text enrichment, such as when text is entered via a computer device with input limitations.12-22-2011
20110313756Text sizer (TM) - This invention called Text Sizer ™ is an innovative method and system for changing the length of a body of text. It may be embodied in the following steps. First, a first text segment may be selected in a body of text. Second, alternative text segments are automatically identified, wherein each alternative text segment may be substituted for the first text segment in the body of text without causing a grammatical error. Third, a second text segment with a length that is different than the length of the first text segment is selected from among the alternative text segments. Finally, the second text segment is substituted for the first text segment in the body of text. This method has many applications. One might wish to reduce the length of a body of text so that it fits within a constrained space. For example, a report or proposal may have page limits. Alternatively, one might wish to expand the length of selected portion of a body of text. For example, one might wish to elaborate or include additional information on topics covered in a particular segment of text. Text Sizer ™ provides users with this capability.12-22-2011
20130124193System and Method Implementing a Text Analysis Service - One embodiment includes a computer implemented method of processing documents. The method includes generating a text analysis task object that includes instructions regarding a document processing pipeline and a document identifier. The method further includes accessing, by a worker system, the text analysis task object and generating the document processing pipeline according to the instructions. The method further includes performing text analysis using the document processing pipeline on a document identified by the document identifier.05-16-2013
20130124192ALERT NOTIFICATIONS IN AN ONLINE MONITORING SYSTEM - An online monitoring system assists parents or other individuals in monitoring social networking activity and/or mobile phone usage of their children or others. The online monitoring system may gather data corresponding with monitored social networking and/or mobile phone accounts. The data may be analyzed to provide summarized information and alert notifications to parents or other individuals. The analyses provided by the online monitoring service may include several text-based analyses: keyword analysis, sentiment analysis, and structure analysis. The keyword analysis may include analyzing text to determine whether it includes any blacklisted or whitelisted words. The sentiment analysis may include determining an overall sentiment of text based on the sentiment of words within the text. The structure analysis may include analyzing the sentence structure of the text to identify grammatical parts. An overall structure score is determined based on the sentiment of the grammatical parts.05-16-2013
20130132072ENGINE FOR HUMAN LANGUAGE COMPREHENSION OF INTENT AND COMMAND EXECUTION - The invention provides a computer system for interacting with a user. A set of concepts initially forms a target set of concepts. An input module receives a language input from the user. An analysis system executes a plurality of narrowing cycles until a concept packet having at least one concept has been identified. Each narrowing cycle includes identifying at least one portion of the language and determining a subset of concepts from the target set of concepts to form a new target subset. An action item identifier identifies an action item from the action items based on the concept packet. An action executer that executes an action based on the action item that has been identified.05-23-2013
20130132071Method and Apparatus for Automatically Analyzing Natural Language to Extract Useful Information - An automatic language-processing system uses a human-curated lexicon to associate words and word groups with broad sentiments such as fear or anger, and topics such as accounting fraud or earnings projections. Grammar processing further characterizes the sentiments or topics with logical (“is” or “is not”), conditional (probability), temporal (past, present, future), quantitative (larger/smaller, higher/lower, etc.), and speaker identification (“I” or “He” or “Alan Greenspan”) measures. Information about the characterized sentiments and topics found in electronic messages is stored in a database for further analysis, display, and use in automatic trading systems.05-23-2013
20130132070Computer-Based Construction of Arbitrarily Complex Formal Grammar Expressions - A method, system and computer program product for building an expression, including utilizing any formal grammar of a context-free language, displaying an expression on a computer display via a graphical user interface, replacing at least one non-terminal display object within the displayed expression with any of at least one non-terminal display object and at least one terminal display object, and repeating the replacing step a plurality of times for a plurality of non-terminal display objects until no non-terminal display objects remain in the displayed expression, wherein the non-terminal display objects correspond to non-terminal elements within the grammar, and wherein the terminal display objects correspond to terminal elements within the grammar.05-23-2013
20130144603ENHANCED VOICE CONFERENCING WITH HISTORY - Techniques for ability enhancement are described. Some embodiments provide an ability enhancement facilitator system (“AEFS”) configured to enhance voice conferencing among multiple speakers. Some embodiments of the AEFS enhance voice conferencing by recording and presenting voice conference history information based on speaker-related information. The AEFS receives data that represents utterances of multiple speakers who are engaging in a voice conference with one another. The AEFS then determines speaker-related information, such as by identifying a current speaker, locating an information item (e.g., an email message, document) associated with the speaker, or the like. The AEFS records conference history information (e.g., a transcript) based on the determined speaker-related information. The AEFS then informs a user of the conference history information, such as by presenting a transcript of the voice conference and/or related information items on a display of a conferencing device associated with the user.06-06-2013
20130144604SYSTEMS AND METHODS FOR EXTRACTING ATTRIBUTES FROM TEXT CONTENT - Systems and method for extracting attributes from text content are described. Example embodiments may include a computer implemented method for extracting attributes from text data, wherein the text data is obtained from at least one information source. As described, the implementation may include receiving, from a user, an address for the at least one information source and an attribute name, creating a tagged information file by associating a part of speech tag to text data obtained from the at least one information source, identifying a location of the attribute name in the tagged information file using an approximate text matching technique and determining at least one attribute descriptor from the tagged information file wherein the tagged information file is parsed based on a part of speech tag associated with the attribute name to determine a conclusion of the attribute descriptor.06-06-2013
20100280820INTERACTIVE VOICE RESPONSE SYSTEM - Methods and systems for testing and analyzing integrated voice response systems are provided. Computer devices are used to simulate caller responses or inputs to components of the integrated voice response systems. The computer devices receive responses from the components. The responses may be in the form of VXML and grammar files that are used to implement call flow logic. The responses may to analyzed to evaluate the performance of the components and/or call flow logic.11-04-2010
20080208568SYSTEM AND METHOD FOR PROVIDING CONTEXT TO AN INPUT METHOD BY TAGGING EXISTING APPLICATIONS - An improved system and method for providing context information of executable code to an input method is provided. Advanced text input methods may be made aware of the type of text expected to be received as input so that input methods may achieve a higher accuracy in recognition of text input. Generic interfaces provide a framework for supporting application authoring platforms to allow application developers to easily specify context information to the system and have it reliably forwarded to the correct input methods. Additionally, a context tagging tool may associate specific text input fields of an existing application with a n input scope without modifying the application itself. The context tagging tool may create a manifest that contains tags associating the specific text input fields with an input scope. Any advanced input methods use by the application may be updated with instructions for accessing the context information stored in the manifest.08-28-2008
20080208567Web-based proofing and usage guidance - A system is disclosed for checking grammar and usage using a flexible portfolio of different mechanisms, and automatically providing a variety of different examples of standard usage, selected from analogous Web content. The system can be used for checking the grammar and usage in any application that involves natural language text, such as word processing, email, and presentation applications. The grammar and usage can be evaluated using several complementary evaluation modules, which may include one based on a trained classifier, one based on regular expressions, and one based on comparative searches of the Web or a local corpus. The evaluation modules can provide a set of suggested alternative segments with corrected grammar and usage. A followup, screened Web search based on the alternative segments, in context, may provide several different in-context examples of proper grammar and usage that the user can consider and select from.08-28-2008
20080208566Automated word-form transformation and part of speech tag assignment - A method of creating a data structure for use with a morphological algorithm is discussed. The method includes creating a data structure having a plurality of paths. The data structure maps a plurality of words into a set of classes. The method further includes modifying the data structure to remove a portion of one or more of the paths that is not necessary to unambiguously map the words to the set of classes and storing the data structure on a tangible computer readable medium.08-28-2008
20080201132SYSTEM AND METHOD FOR FINDING THE MOST LIKELY ANSWER TO A NATURAL LANGUAGE QUESTION - Automated question answering is disclosed that relates to the selection of an answer to a question from a pool of potential answers which awe manually or automatically extracted from a large collection of textual documents. The a feature extraction component, a feature combination component, an answer selection component, and an answer presentation component, among others, are included. The input to the system is a set of one or more natural language questions and a collection of textual document The output is a (possibly ranked) set of factual answers to the questions, these answers being extracted from the document collection.08-21-2008
20080201131METHOD AND APPARATUS FOR AUTOMATICALLY DISCOVERING FEATURES IN FREE FORM HETEROGENEOUS DATA - Techniques are provided for automatically discovering one or more features in free form heterogeneous data. In one aspect of the invention, the techniques include obtaining free form heterogeneous data, wherein the data comprises one or more data items, applying a label to each data item, using the labeled data to build a language model, wherein a word distribution associated with each label can be derived from the model, and using the word distribution associated with each label to discover one or more features in the data, wherein discovering one or more features in the data facilitates one or more operations that use at least a portion of the labeled data.08-21-2008
20080201130Text Segmentation and Label Assignment with User Interaction by Means of Topic Specific Language Models and Topic-Specific Label Statistics - The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labelling of successive parts of the document or the entire document. Furthermore the method comprises a learning functionality, logging and analyzing user introduced modifications for adaptation of user's preferences and for further training of the statistical models.08-21-2008
20110213610Processor Implemented Systems and Methods for Measuring Syntactic Complexity on Spontaneous Non-Native Speech Data by Using Structural Event Detection - Systems and methods are provided for providing a score for a spontaneous non-native speech response to a prompt. A transcription of the spontaneous speech response is accessed. A plurality of clauses are identified within the spontaneous speech response, where identifying a clause includes identifying a beginning boundary and an end boundary of the clause in the spontaneous speech response. A plurality of disfluencies in the spontaneous speech response is identified. One or more proficiency metrics are calculated based on the plurality of identified clauses and the plurality of the identified disfluencies, and a score for the spontaneous speech response is generated based on the one or more proficiency metrics.09-01-2011
20100286979AUTOMATIC CONTEXT SENSITIVE LANGUAGE CORRECTION AND ENHANCEMENT USING AN INTERNET CORPUS - A computer-assisted language correction system including spelling correction functionality, misused word correction functionality, grammar correction functionality and vocabulary enhancement functionality utilizing contextual feature-sequence functionality employing an internet corpus.11-11-2010
20080300862AUTHORING SYSTEM - A method for supervising text includes receiving input text in a natural language, the input text including at least one source sentence. The input text is analyzed, which includes, for a source sentence in the input text, generating a syntacetic representation. A target sentence is generated in the same natural language, based on the syntacetic representation. The source sentence is compared with the target sentence to determine whether there is a match. A decision is output, based on the comparison.12-04-2008
20110238408Semantic Clustering - Semantic clustering techniques are described. In various implementations, a conversational agent is configured to perform semantic clustering of a corpus of user utterances. Semantic clustering may be used to provide a variety of functionality, such as to group a corpus of utterances into semantic clusters in which each cluster pertains to a similar topic. These clusters may then be leveraged to identify topics and assess their relative importance, as for example to prioritize topics whose handling by the conversation agent should be improved. A variety of utterances may be processed using these techniques, such as spoken words, textual descriptions entered via live chat, instant messaging, a website interface, email, SMS, a social network, a blogging or micro-blogging interface, and so on.09-29-2011
20130151239ORTHOGRAPHICAL VARIANT DETECTION APPARATUS AND ORTHOGRAPHICAL VARIANT DETECTION PROGRAM - Provided is an orthographical variant detection apparatus which detects orthographical variant candidates with a high precision. The orthographical variant detection apparatus includes a term extraction unit that extracts terms from document data, a similarity computation unit that computes similarity of an arbitrary pair of the extracted terms, an orthographical variant candidate determination unit that determines, based on the similarity, whether or not the terms in the pair of terms are orthographical variant candidates, and a group classification unit that groups the orthographical variant candidates based on a character string commonly included in pair of terms as the orthographical variant candidates.06-13-2013
20130151240INTERACTIVE FACT CHECKING SYSTEM - A fact checking system is able to verify the correctness of information and/or characterize information by comparing the information with one or more sources. The fact checking system automatically monitors, processes, fact checks information and indicates a status of the information. The fact checking system is able to be interactive with a user, so that a user is able to respond to a fact check result and receive additional information.06-13-2013
20130151235LINGUISTIC KEY NORMALIZATION - Systems, methods, and apparatuses including computer program products are provided for training machine learning systems. In some implementations, a method is provided. The method includes receiving a collection of phrases, normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules, and generating a normalized phrase table including a plurality of key-value pairs, each key value pair includes a key corresponding to a normalized phrase and a value corresponding to one or more un-normalized phrases associated with the normalized key, each un-normalized phrase having one or more parameters.06-13-2013
20130151236COMPUTER IMPLEMENTED SEMANTIC SEARCH METHODOLOGY, SYSTEM AND COMPUTER PROGRAM PRODUCT FOR DETERMINING INFORMATION DENSITY IN TEXT - A method, computer program product and system are disclosed for determining the semantic density of textualized digital media is (a measure of how much information is conveyed in a sentence or clause relative to its length). The more semantically dense text is, the more information it conveys in a given space. Users input a topic, a timeline, and one or more target web media sources for analysis. Text in the target media sources is deconstructed to determine density, and a density rating assigned to the web media source. Over time, users can track trends in the density of text media relative to a given topic, and determine how much information is being conveyed in connection with the topic, such as a political campaign. Line graphs, pie charts, and other time-elapsed output graphic representations of the semantic density are generated and rendered for the user.06-13-2013
20130151237DYNAMIC METHOD FOR EMOTICON TRANSLATION - A vehicle communication system is provided and may include at least one communication device that audibly communicates information within the vehicle. A controller may receive a character string from an external device and may determine if the character string represents an emoticon. The controller may translate the character string into a face description if the character string represents an emoticon and may audibly communicate the face description via the at least one communication device.06-13-2013
20130151238Generation of Natural Language Processing Model for an Information Domain - Embodiments relate to a method, apparatus and program product and for generating a natural language processing model for an information domain. The method derives a skeleton of a natural language lexicon from a source model and uses it to form a dictionary. It also applies a set of syntactical rules defining concepts and relationships to the dictionary and expands the skeleton of the natural language lexicon based on a plurality of reference documents from the information domain. Using the expanded skeleton of the natural language lexicon, it also provides a natural language processing model for the information domain.06-13-2013
20080270119Generating sentence variations for automatic summarization - A new system is hereby provided that generates automatic summaries of groups of multiple documents using multiple variations of each sentence from a selected group of representative sentences from the documents, and then selecting from the multiple variations when assembling the automatic summary. The system may generate alternative strings of text, select from among the alternative strings of text, and provide a summary of the group of documents using the strings of text selected from among the alternatives. The alternative strings of text may be generated based on each of a plurality of sentences from the group of documents. Selecting from among the alternative strings of text may be based on one or more criteria indicating the strings of text to be representative of the content of the group of documents.10-30-2008
20120259620MESSAGE OPTIMIZATION - The present invention provides a system and method for optimizing a message. Components of a starting message are identified, and at least one rule is applied for modifying at least one message component to create at least one variation of the starting message. Message variants are tested by sending each variant to a sample of people and measuring a response rate for each sent message variant. The measured response rates are used to create an optimal version of the message. In one embodiment, message variants may be created and tested in multiple rounds.10-11-2012
20120259617SYSTEM AND METHOD FOR SLANG SENTIMENT CLASSIFICATION FOR OPINION MINING - The present disclosure describes a method of sentiment oriented slang for opinion mining. With increasing use of internet, many users can submit their review comments directly to the companies which can be automatically processed and summarized with critical issues from time to time and help the company get real time feedback from its customers. The method comprises, receiving at least one document comprising a plurality of sentiment oriented slang. The next step of the method comprises identifying the plurality of sentiment oriented slang in the at least one document. Further, a polarity score of each of a slang word identified is determined and sentiment information is displayed on an output device as an output.10-11-2012
20090216524METHOD AND SYSTEM FOR ESTIMATING A SENTIMENT FOR AN ENTITY - A method for estimating a sentiment conveyed by the content of information sources towards an entity is presented. The sentiment is obtained with respect to a query context that may be specified, e.g. by specific terms or expressions, like a product or service name. A sentiment dictionary having a plurality of sentiment terms is provided, wherein each sentiment term has assigned a sentiment value, and at least one of said sentiment terms is associated to a group context. Text documents are screened for occurrences of sentiment terms that are associated to a group context corresponding to the query context. Calculating a sentiment score value is performed as a function of the occurrences of sentiment terms having a similar or same group context as the query context. The method may be carried out automatically without manual analysis of the actual semantic content of the text documents under consideration.08-27-2009
20100312547CONTEXTUAL VOICE COMMANDS - Among other things, techniques and systems are disclosed for implementing contextual voice commands. On a device, a data item in a first context is displayed. On the device, a physical input selecting the displayed data item in the first context is received. On the device, a voice input that relates the selected data item to an operation in a second context is received. The operation is performed on the selected data item in the second context.12-09-2010
20130124191MICROBLOG SUMMARIZATION - Various embodiments provide summarization techniques that can be applied to blogs or microblogs to present information that is determined to be useful, in a shortened form. In one or more embodiments, a procedure is utilized to automatically acquire a set of concepts from various sources, such as free text. These acquired concepts are then used to guide a clustering process. Clusters are ranked and then summarized by incorporating sentiment and the frequency of words.05-16-2013
20130204608IMAGE ANNOTATIONS ON WEB PAGES - An image in a web page may be annotated after deriving information about an image when the image may be displayed on multiple web pages. The web pages that show the image may be analyzed in light of each other to determine metadata about the image, then various additional content may be added to the image. The additional content may be hyperlinks to other webpages. The additional content may be displayed as annotations on top of the images and in other manners. Many embodiments may perform searching, analysis, and classification of images prior to the web page being served.08-08-2013
20130204609LANGUAGE INDEPENDENT PROBABILISTIC CONTENT MATCHING - Content is received and compared against rules for identifying a type of content. Each rule has both segmented and unsegmented patterns. The content is matched against the patterns and assigned a confidence score that is higher if the content matches a segmented pattern and lower if the content matches an unsegmented pattern.08-08-2013
20130204610Quasi Natural Language Man-Machine Conversation Device Base on Semantic Logic - The presented is a tool and method for language presentation, browsing, editing, translation and communication based on Semantic Web, to be utilized as interface for collaborating software products and services or human-machine interaction. The conceptual system is extended to further include such objects as language components, sentence patterns or syntax rules, to get solutions for semantic logic representation devices, language presentation devices, semantic-language converting devices, the registry and delegation system, in forming a language-component-based system for browsing, editing, conversion and communication. It is always allowed to bring need-based control over the conceptual system and the registry with their scope and scale being kept at appropriate level; with a widespread community participation, the establishment of semantic-language converting device ecosystem will be important guarantee of a flexible and diversified language expression system; therefore to constitute the core of those pragmatic standards or specifications for machine translation, human-machine interface and the web system.08-08-2013
20130204611TEXTUAL ENTAILMENT RECOGNITION APPARATUS, TEXTUAL ENTAILMENT RECOGNITION METHOD, AND COMPUTER-READABLE RECORDING MEDIUM - A textual entailment recognition apparatus (08-08-2013
20130204612INTERACTIVE ENVIRONMENT FOR PERFORMING ARTS SCRIPTS - One or more embodiments present blocking information associated with a manuscript to a user. In one embodiment, a determination is made that at least one line from a digital representation of a manuscript has been selected. Another determination is made that the line is associated with a set of blocking information. The set of blocking information is presented on a digital representation of a stage.08-08-2013
20130185055System and Method for Performing Analysis on Information, Such as Social Media - A system for analyzing text-based information is presented. Each datum of information includes an author, a description and a timestamp. A fetcher fetches the raw information according to keywords. A parser parses the raw information to refine the results. A lexicon management module extracts lemmas from the raw information, and creates an edited lexicon containing the raw data and the lemmas for each datum. A data manager correlates lemmas in the edited lexicon and identifies clusters of lemmas that are correlated between each other. The results can be visually displayed to a user, and clusters of lemma that are less correlated than the other clusters can be visually identified. In one aspect, the user is able to excise the less correlated clusters, in order to further refine the results of the keyword search.07-18-2013
20130185057Computer-Implemented Systems and Methods for Scoring of Spoken Responses Based on Part of Speech Patterns - Systems and methods are provided for scoring a speech sample. Automatic speech recognition is performed on the speech sample using an automatic speech recognition system to generate a transcription of the sample. Words in the transcription are associated with parts of speech, and part of speech sequences are extracted from the parts of speech associations. A grammar metric is generated based on the part of speech sequences, and the speech sample is scored based on the grammar metric.07-18-2013
20130185058FORMAT FOR DISPLAYING TEXT ANALYTICS RESULTS - A system can receive text. The text can be divided into various portions. One or more significance indicators can be associated with each portion of text: these significance indicators can also be received by the system. The system can then display a portion of text and the associated significance indicators to the user.07-18-2013
20100318348APPLYING A STRUCTURED LANGUAGE MODEL TO INFORMATION EXTRACTION - One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model.12-16-2010
20100318347Content-Based Audio Playback Emphasis - Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript.12-16-2010
20120284019HANDHELD ELECTRONIC DEVICE WITH REDUCED KEYBOARD AND ASSOCIATED METHOD OF PROVIDING IMPROVED DISAMBIGUATION WITH REDUCED DEGRADATION OF DEVICE PERFORMANCE - In view of the foregoing, an improved handheld electronic device having a reduced keyboard provides facilitated language entry by making available to a user certain words that a user may reasonably be expected to enter. Incoming data, such as the text of a message, can be scanned for proper nouns, for instance, since such proper nouns might not already be stored in memory and might be expected to be entered by the user when, for example, forwarding or responding to the message. A proper noun can be identified, for instance, on the basis that it begins with an upper case letter. The proper nouns can be stored, for example, in memory that may, by way of further example, be a temporary dictionary.11-08-2012
20120284018HANDHELD ELECTRONIC DEVICE AND METHOD EMPLOYING LOGICAL PROXIMITY OF CHARACTERS IN SPELL CHECKING - An improved handheld electronic device and associated method employing an improved spell checking routine enable proposed spelling corrections having a close logical proximity to an active input to be output at a position of preference for easy selection by the user. By way of example, a base character and the various accented forms thereof can be said to have a logical proximity to one another that is closer than their logical proximity to any character having a different base character, whether additionally having a diacritical element or not.11-08-2012
20120284017Systems, Methods, and Programs for Detecting Unauthorized Use of Text Based Communications - Systems, methods, and programs for generating an authorized profile for a text communication device or account, may sample a text communication generated by the text communication device or account during communication and may store the text sample. The systems, methods, and programs may extract a language pattern from the stored text sample and may create an authorized profile based on the language pattern. Systems, methods, and programs for detecting unauthorized use of a text communication device or account may sample a text communication generated by the device or account during communication, may extract a language pattern from the audio sample, and may compare extracted language pattern of the sample with an authorized user profile.11-08-2012
20130158981LINKING NEWSWORTHY EVENTS TO PUBLISHED CONTENT - Methods, systems, and computer programs are presented for linking newsworthy events in a document to published content. One method includes an operation for receiving features by a classifier that is operable to determine a probability of the availability of news for a sentence. When the features are found in the sentence, the probability of the availability of news for the sentence increases, where the sentence includes one or more noun phrases and ends in a full stop. The classifier determines which sentences in a document are candidate sentences for being linked to news articles, and for each candidate sentence, the method includes an operation for finding an associated news article when there is an associated news article exceeding a relevance threshold. Further, the method includes operations for adding links in the document to the found associated news articles, and for displaying the document with the added links.06-20-2013
20130158982Computer-Implemented Systems and Methods for Content Scoring of Spoken Responses - Systems and methods are provided for scoring a non-scripted speech sample. A system includes one or more data processors and one or more computer-readable mediums. The computer-readable mediums are encoded with a non-scripted speech sample data structure, where the non-scripted speech sample data structure includes: a speech sample identifier that identifies a non-scripted speech sample, a content feature extracted from the non-scripted speech sample, and a content-based speech score for the non-scripted speech sample. The computer-readable mediums further include instructions for commanding the one or more data processors to extract the content feature from a set of words automatically recognized in the non-scripted speech sample and to score the non-scripted speech sample by providing the extracted content feature to a scoring model to generate the content-based speech score.06-20-2013
20130158983System and Method for Identifying Phrases in Text - A method includes accessing text that includes a plurality of words, tagging each of the plurality of words with one of a plurality of parts of speech (POS) tags, and creating a plurality of tokens, each token comprising one of the plurality of words and its associated POS tag. The method further includes clustering one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the POS tags of the one or more tokens, and forming a phrase based on the chunk of tokens, the phrase comprising the words of the one or more tokens clustered into the chunk of tokens.06-20-2013
20130158985System and Method for Converting Graphical Call Flows Into Finite State Machines - A method, system and module for automatically converting a call flow into a state-based representation are disclosed. The method comprises walking a call flow and converting each page of the call flow into a rule of a higher level representation of the call flow, augmenting the higher level representation with terminal symbols representing state variable assignments and comparisons associated with decision and computation shapes in the call flow and converting the higher level representation into a state-based representation.06-20-2013
20130158986COMMUNICATIONS ANALYSIS SYSTEM AND PROCESS - A communications analysis process, including: accessing communications data representing communications of processing the communications data to determine similarity data representing similarities between concepts expressed by the one or more persons at different times during said communications; and processing the similarity data to determine one or more metrics of said communications.06-20-2013
20110288855MULTI-PHONEME STREAMER AND KNOWLEDGE REPRESENTATION SPEECH RECOGNITION SYSTEM AND METHOD - A new approach to speech recognition that reacts to concepts conveyed through speech, which shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. A probabilistically unbiased multi-phoneme recognition process is employed, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response. Applications include improving accuracy or automatically generating punctuation for transcription and dictation, word or concept spotting in audio streams, concept spotting in electronic text, customer support, call routing and other command/response scenarios.11-24-2011
20110295594SYSTEM, METHOD, AND PROGRAM FOR PROCESSING TEXT USING OBJECT COREFERENCE TECHNOLOGY - System, method and program product for text processing using object coreference technology. In particular, the invention provides a text processing method which includes, acquiring text to be processed; extracting subject words and entity words corresponding to the subject words from the text; grouping the subject words; determining entity words that reference a same concerned object according to the grouped subject words; and generating processing policy for entity words that reference a same concerned object. The invention also includes a system with means for carrying out the method. The invention generally realizes automatic, more comprehensive, accurate, efficient analysis and processing on text data. The invention can be used to dig a large amount of comment data about some entity, and the invention can also be used to suggest insertion place in an article where embedded advertisement is inserted.12-01-2011
20110295591SYSTEM AND METHOD TO ACQUIRE PARAPHRASES - An automatic paraphrase acquisition technique is provided. A common theme of the various embodiments described herein resides in careful design of simple tasks that can elicit the necessary information for the automated process. These tasks are performed quickly and inexpensively. By gathering the results produced, paraphrases can be generated automatically using the method and/or system.12-01-2011
20130191115Methods and Systems for Transcribing or Transliterating to an Iconphonological Orthography - Described are methods of developing an iconographical, phonological, orthography for any spoken language. Such “iconophonological” orthographies can be applied to languages for which no written form exists, or can be used to supplement or replace extant writing systems. The iconicity of the orthographies represents features of the vocal tract, which limits the number of icons to easily learned sets. This simplification, and the phonological correspondence between the icons and spoken language, makes the orthographies easy to learn. The orthographies can use letters that represent the linguistic characteristics of the spoken language. By incorporation of cultural aesthetics, some embodiments bring a sense of ethnic belonging, and thus create an immediate emotional bond with the orthography.07-25-2013
20130191114SYSTEM AND METHOD FOR PROVIDING UNIVERSAL COMMUNICATION - Provided is a system for providing universal communication, by generating a universal communication signal including a frequency component including light, a sound, a language, a dialect, an electromagnetic wave, and a vibration, by recording/storing the generated universal communication signal, and by converting an input signal into a universal communication signal, to enable communication between a human and a communication media or a non-human entity.07-25-2013
20130191113USER OPINION EXTRACTION METHOD USING SOCIAL NETWORK - A user opinion extraction method that includes: searching for social network groups that respectively have made one or more connections to a site having a domain that relates to software to be developed using a search module; analyzing structures of social networks for the retrieved social network groups using an analysis module; selecting a social network group from which user opinions are to be extracted based on a result of the analysis and collecting user opinions from SNS sentences that are mutually transmitted and received between a plurality of nodes within the selected social network group using a collection module; calculating degrees of influences of the collected user opinions on the social network group using a calculation module; and extracting at least one user opinion from among the collected user opinions in order of a higher degree of the influence on the social network group using an extraction module.07-25-2013
20120290291INPUT PROCESSING FOR CHARACTER MATCHING AND PREDICTED WORD MATCHING - A mobile computing device that operates a method that processes handwritten user input for character matching and predictive word matching. A user inputs handwritten input on a touch-sensitive display using, for example, a stylus. The method determines and displays a set of candidate character matches for the handwritten input. The user then selects a character from the candidate character matches. The method determines and displays a set of candidate predicted word matches based on the user selected character match. The user can then select to input a desired candidate predicted word match.11-15-2012
20120290289METHOD AND APPARATUS FOR SUMMARIZING COMMUNICATIONS - A method, apparatus and computer program are provided for summarizing one or more communications. The method, apparatus and computer program process and/or facilitate a processing of one or more communications to generate at least one summary. The method, apparatus and computer program further cause, at least in part, a transformation of the at least one summary based, at least in part, on at least one narrative viewpoint. The method, apparatus and computer program further cause, at least in part, a presentation of the transformation.11-15-2012
20120004903RULE GENERATION - A method for implementing at least one rule for an application is described. The method includes receiving an input rule. Based on the input rule, a program executable code is generated. The generated program executable code can then be associated with the application.01-05-2012
20120029910System and Method for Inputting Text into Electronic Devices - The present invention provides a system comprising a user interface configured to receive text input by a user, a text prediction engine comprising a plurality of language models and configured to receive the input text from the user interface and to generate concurrently text predictions using the plurality of language models, and wherein the text prediction engine is further configured to provide text predictions to the user interface for display and user selection. An analogous method and an interface for use with the system and method are also provided.02-02-2012
20120029909SPEECH PROCESSING DEVICE, SPEECH PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT FOR SPEECH PROCESSING - According to one embodiment, a speech processing device includes an utterance error occurrence determination information storage unit that stores utterance error occurrence determination information; a related word information storage unit that stores related word information including words; an utterance error occurrence determining unit that compares each of the divided words with the condition, gives the error pattern to the word corresponding to the condition, and determines that the word which does not correspond to the condition does not cause the utterance error; and a phoneme string generating unit that generates a phoneme string of the utterance error. The one of the error patterns associated with one of the conditions is the speech error, the utterance error occurrence determining unit further gives an incorrectly spoken word from the related word information, and the phoneme string generating unit generates a phoneme string of the incorrectly spoken word.02-02-2012
20120029908INFORMATION PROCESSING DEVICE, RELATED SENTENCE PROVIDING METHOD, AND PROGRAM - There is provided an information processing device including an information providing unit that provides related information related to main information, a related sentence generation unit that generates a sentence indicating a relation between the main information and the related information and a related sentence providing unit that provides the sentence generated by the related sentence generation unit.02-02-2012
20120029907myMedicalpen - A digital pen designed to assist users in spelling words as they write. The invention is an electronic pen with a speaker located near the top of the device. A microphone may be located directly under the speaker in the form of a small screened concave or convex aperture. A switch on the back of the pen allows the user to choose between three settings: Medical Dictionary (D), Off (O), and Prescription Drug List (P). The device works by the user speaking the desired word into the microphone. The word will then appear on the illuminated digital display screen which lights up. The pen asks the user to confirm or deny the displayed word. The user says “yes” or “no” into the microphone. If denied, the pen displays another word until the correct word is located. Once confirmed, the pen will audibly and visibly spell the word one letter at a time as the user writes. The pen may be switched to the prescription drug list mode as needed.02-02-2012
20130197900Method and System for Determining Word Senses by Latent Semantic Distance - The invention relates to methods and systems for semantic disambiguation of a plurality of words. A representative method comprises providing a dataset of words associated by meaning into sets of synonyms; locating said sets at respective vertices of a graph according to semantic similarity and semantic relationship; transforming the graph into a Euclidean vector space comprising vectors indicative of respective locations of said sets; identifying a first group of said sets which include a first of said pair of words; identifying a second group of said sets which include a second of said pair of words; determining a closest pair in said vector space of said sets taken from said first and second groups of sets respectively; and outputting a meaning, of said plurality of words based on said closest pair of said sets and at least one of said semantic relationships between said closest pair of said sets.08-01-2013
20130197899SYSTEM AND METHOD FOR CONTEXTUALIZING DEVICE OPERATING PROCEDURES - A system and method for contextualizing operating procedures are provided. A set of procedures is provided, each including text describing user actions which are to be performed on a physical device to implement the procedure. A device model refers to components of the device on which user actions are performable and provides state charts which link an action performable on the respective component with states assumed by it. The text of each procedure is segmented to form a sequence of steps. Each step includes an action to be performed on one of the components of the device that is referred to in the device model. When a request for one of the procedures is received, the corresponding sequence of instruction steps is retrieved. A current one of the instruction steps is contextualized, based on device data received from the device and the state chart of the respective component.08-01-2013
20120296638METHOD AND SYSTEM FOR QUICKLY RECOGNIZING AND RESPONDING TO USER INTENTS AND QUESTIONS FROM NATURAL LANGUAGE INPUT USING INTELLIGENT HIERARCHICAL PROCESSING AND PERSONALIZED ADAPTIVE SEMANTIC INTERFACE - In embodiments of the present invention, capabilities are described for understanding and responding to the user intent and questions quickly wherein the understanding is based on supervised system learning, Intelligent layered semantic and syntactic information processing and personalized adaptive semantic interface. Supervised system learning creates reference pattern set for the intent repository and possible question categories. Each layer in the layered processing increases the probability of the intent/question recognition. Personalized adaptive voice interface learns from user's interactions over time by enriching the pattern sets and personal index for successfully resolved user intents and questions. Collectively, all these technologies improve the response time for correctly recognizing and responding to user's intents and questions.11-22-2012
20130204613LARGE-SCALE SENTIMENT ANALYSIS - A method for determining a sentiment associated with an entity includes inputting a plurality of texts associated with the entity, labeling seed words in the plurality of texts as positive or negative, determining a score estimate for the plurality of words based on the labeling, re-enumerating paths of the plurality of words and determining a number of sentiment alternations, determining a final score for the plurality of words using only paths whose number of alternations is within a threshold, converting the final scores to corresponding z-scores for each of the plurality of words, and outputting the sentiment associated with the entity.08-08-2013
20130204614REQUEST ACQUISITION SUPPORT SYSTEM IN SYSTEM DEVELOPMENT, REQUEST ACQUISITION SUPPORT METHOD AND RECORDING MEDIUM - A request pick-up assisting system includes: a question information registering unit registering a question item and attributes of a questionee; a basic connection word candidate extracting unit referring to information that includes words of the question, and extracting words that coexist with the words of the question; an attribute connection word candidate extracting unit referring to information including words of the question, and information that includes words constituting attributes of the questionee, and extracting, for each attribute, words in which the words of the question and the attributes coexist; an attribute specificity calculating unit calculating an attribute specificity based on dissimilarity between groups of words; an effective attribute extracting unit comparing attribute specificity for each attribute, and extracting a suitable attribute; a connection word extracting unit extracting a connection word about the effective attribute; and an association chart creating unit generating an association chart, by referencing the extracted connection word.08-08-2013
20120065960GENERATING PARSER COMBINATION BY COMBINING LANGUAGE PROCESSING PARSERS - A computer implemented method, a computer system, and a program for generating a parser combination. The method includes: generating a parser combination by combining parsers each associated with at least one grammar description, where the step is carried out using (i) at least one grammar description means and (ii) a computer device. The computer system includes: a processor, a memory connected to the processor, and a parser generator for generating a parser combination in the memory by combining parsers each associated with at least one grammar description, and at least one grammar description type means.03-15-2012
20120072205HANDHELD ELECTRONIC DEVICE AND METHOD FOR DISAMBIGUATION OF COMPOUND TEXT INPUT AND THAT EMPLOYS N-GRAM DATA TO LIMIT GENERATION OF LOW-PROBABILITY COMPOUND LANGUAGE SOLUTIONS - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to analyze the combinations of language objects in light of N-gram data stored on the device to avoid proposing low-probability compound language solutions.03-22-2012
20090157389SYSTEM AND METHOD FOR COMPUTERIZED PSYCHOLOGICAL CONTENT ANALYSIS OF COMPUTER AND MEDIA GENERATED COMMUNICATIONS TO PRODUCE COMMUNICATIONS MANAGEMENT SUPPORT, INDICATIONS AND WARNINGS OF DANGEROUS BEHAVIOR, ASSESSMENT OF MEDIA IMAGES, AND PERSONNEL SELECTION SUPPORT - At least one computer-mediated communication produced by or received by an author is collected and parsed to identify categories of information within it. The categories of information are processed with at least one analysis to quantify at least one type of information in each category. A first output communication is generated regarding the at least one computer-mediated communication, describing the psychological state, attitudes or characteristics of the author of the communication. A second output communication is generated when a difference between the quantification of at least one type of information for at least one category and a reference for the at least one category is detected involving a psychological state, attitude or characteristic of the author to which a responsive action should be taken. The content of the second output communication and the at least one category are programmable to define a psychological state, attitude or characteristic in response to which an action should be taken and the action that is to be taken in response.06-18-2009
20130211821User Experience with Customized User Dictionary - In one embodiment, constructing one or more customized dictionaries for a particular user, each of the customized dictionaries comprising a different blending of one or more frequently used words collected from texts submitted by one or more users; and in response to the user inputting text to an electronic device, selecting one of the customized dictionaries and utilizing it to aid the particular user in inputting text.08-15-2013