Entries |
Document | Title | Date |
20080201130 | Text Segmentation and Label Assignment with User Interaction by Means of Topic Specific Language Models and Topic-Specific Label Statistics - The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labelling of successive parts of the document or the entire document. Furthermore the method comprises a learning functionality, logging and analyzing user introduced modifications for adaptation of user's preferences and for further training of the statistical models. | 08-21-2008 |
20080201131 | METHOD AND APPARATUS FOR AUTOMATICALLY DISCOVERING FEATURES IN FREE FORM HETEROGENEOUS DATA - Techniques are provided for automatically discovering one or more features in free form heterogeneous data. In one aspect of the invention, the techniques include obtaining free form heterogeneous data, wherein the data comprises one or more data items, applying a label to each data item, using the labeled data to build a language model, wherein a word distribution associated with each label can be derived from the model, and using the word distribution associated with each label to discover one or more features in the data, wherein discovering one or more features in the data facilitates one or more operations that use at least a portion of the labeled data. | 08-21-2008 |
20080201132 | SYSTEM AND METHOD FOR FINDING THE MOST LIKELY ANSWER TO A NATURAL LANGUAGE QUESTION - Automated question answering is disclosed that relates to the selection of an answer to a question from a pool of potential answers which awe manually or automatically extracted from a large collection of textual documents. The a feature extraction component, a feature combination component, an answer selection component, and an answer presentation component, among others, are included. The input to the system is a set of one or more natural language questions and a collection of textual document The output is a (possibly ranked) set of factual answers to the questions, these answers being extracted from the document collection. | 08-21-2008 |
20080208566 | Automated word-form transformation and part of speech tag assignment - A method of creating a data structure for use with a morphological algorithm is discussed. The method includes creating a data structure having a plurality of paths. The data structure maps a plurality of words into a set of classes. The method further includes modifying the data structure to remove a portion of one or more of the paths that is not necessary to unambiguously map the words to the set of classes and storing the data structure on a tangible computer readable medium. | 08-28-2008 |
20080208567 | Web-based proofing and usage guidance - A system is disclosed for checking grammar and usage using a flexible portfolio of different mechanisms, and automatically providing a variety of different examples of standard usage, selected from analogous Web content. The system can be used for checking the grammar and usage in any application that involves natural language text, such as word processing, email, and presentation applications. The grammar and usage can be evaluated using several complementary evaluation modules, which may include one based on a trained classifier, one based on regular expressions, and one based on comparative searches of the Web or a local corpus. The evaluation modules can provide a set of suggested alternative segments with corrected grammar and usage. A followup, screened Web search based on the alternative segments, in context, may provide several different in-context examples of proper grammar and usage that the user can consider and select from. | 08-28-2008 |
20080208568 | SYSTEM AND METHOD FOR PROVIDING CONTEXT TO AN INPUT METHOD BY TAGGING EXISTING APPLICATIONS - An improved system and method for providing context information of executable code to an input method is provided. Advanced text input methods may be made aware of the type of text expected to be received as input so that input methods may achieve a higher accuracy in recognition of text input. Generic interfaces provide a framework for supporting application authoring platforms to allow application developers to easily specify context information to the system and have it reliably forwarded to the correct input methods. Additionally, a context tagging tool may associate specific text input fields of an existing application with a n input scope without modifying the application itself. The context tagging tool may create a manifest that contains tags associating the specific text input fields with an input scope. Any advanced input methods use by the application may be updated with instructions for accessing the context information stored in the manifest. | 08-28-2008 |
20080208569 | SYSTEM FOR IDENTIFYING WORD PATTERNS IN TEXT - A system for identifying word patterns in text is conducted in real time and is highly suitable for network and Internet use. The system comprises a semantic network that may be compiled on a local computer or at a remote host and a software text analysis module for receiving the text to be analyzed, parsing the text, submitting the text to the semantic network, and receiving the results. Recognized, words are then examined, together with surrounding words in the text to determine whether the words are part of a word pattern. Word patterns are located at nodes in the semantic network in a hierarchical structure, and certain word patterns correspond to objects of the semantic network. When all word patterns involving a word are located, links are followed to objects corresponding to the word patterns. Several nodes may point to a single object, but each object is represented only once in the semantic network. Identified objects may thus be identified in real time, as the text streams through the text analysis module. | 08-28-2008 |
20080215310 | Method and system for mapping a natural language text into animation - A method for analyzing a natural language sentence describing an action, to create an action structure to be used in creating an animation of the action, the method comprising: processing the natural language sentence to create a grammatical tree comprising an action word and its associated values; providing constructs for the action word, each of the constructs having parameter types for defining the action expressed by the action word; identifying from the constructs at least one construct wherein at least one of the parameter types can take on at least one of the associated values thereby defining a matching value; and recording the at least one of the parameter types from the at least one construct as well as the matching value, thereby creating the action structure. | 09-04-2008 |
20080215311 | DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR TEXT AND SPEECH CLASSIFICATION - Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers. | 09-04-2008 |
20080215312 | Handheld Electronic Device With Text Disambiguation - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device. | 09-04-2008 |
20080221869 | CONVERTING DEPENDENCY GRAMMARS TO EFFICIENTLY PARSABLE CONTEXT-FREE GRAMMARS - Dependency grammars are transformed to context-free grammars. The context-free grammars can be used in a parser to parse input sentences and identify relationships among words in the sentence. | 09-11-2008 |
20080221870 | System and method for revising natural language parse trees - An improved system and method for revising natural language parse trees is provided. A revision dependency parser may learn a set of transformation rules that may be applied to dependency parse trees generated by a base parser for revising the dependency parse trees. A corpus of natural language sentences and a set of correct dependency parse trees may be used to train a revision dependency parser to correct dependency parse trees generated by the base parser. A revision engine may compare the dependency parse trees produced by the base parser with the correct ones present in the training data to produce an observation-rule pair for each dependency. A rule may specify a transformation on the predicted dependency parse tree generated by the base parser to replace an incorrect dependency with a corrected dependency or may change the type of dependency expressed for the grammatical function of the dependent word. | 09-11-2008 |
20080221871 | HUMAN/MACHINE INTERFACE - A method to allow a user to cause a machine to make an utterance, together with apparatus for obtaining input from a user. The method comprises the steps of: analysing the context within which the utterance is to be made; creating a list of utterances appropriate to the context; on a human/machine interface, creating an indication that identifies to the user those utterances that are available, and allowing the user to indicate one of those utterances; and causing an utterance indicated by the user to be made. The apparatus comprises: a visual display configured to display a plurality of indicia angularly spaced on a locus about an origin, each of the plurality of indicia corresponding to a respective option; and an input device for use in indicating an angular position, wherein the apparatus is configured to generate an input event corresponding to an option associated with one of the plurality of indicia at an angular position corresponding to an angular position indicated by the input device. | 09-11-2008 |
20080221872 | METHODS AND APPARATUS FOR NATURAL SPOKEN LANGUAGE SPEECH RECOGNITION - A word prediction method and apparatus improves precision and accuracy. For the prediction of a sixth word “?”, a partial analysis tree having a modification relationship with the sixth word is predicted. “sara-ni sho-senkyoku no” has two partial analysis trees, “sara-ni” and “sho-senkyoku no”. It is predicted that “sara-ni” does not have a modification relationship with the sixth word, and that “sho-senkyoku no” does. Then, “donyu”, which is the sixth word from “sho-senkyoku no”, is predicted. In this example, since “sara-ni” is not useful information for the prediction of “donyu”, it is preferable that “donyu” be predicted only by “sho-senkyoku no”. | 09-11-2008 |
20080221873 | METHODS AND APPARATUS FOR NATURAL SPOKEN LANGUAGE SPEECH RECOGNITION - A word prediction apparatus and method that improves the precision accuracy, and a speech recognition method and an apparatus therefor are provided. For the prediction of a sixth word “?”, a partial analysis tree having a modification relationship with the sixth word is predicted. “sara-ni sho-senkyoku no” has two partial analysis trees, “sara-ni” and “sho-senkyoku no”. It is predicted that “sara-ni” does not have a modification relationship with the sixth word, and that “sho-senkyoku no” does. Then, “donyu”, which is the sixth word from “sho-senkyoku no”, is predicted. In this example, since “sara-ni” is not useful information for the prediction of “donyu”, it is preferable that “donyu” be predicted only by “sho-senkyoku no”. | 09-11-2008 |
20080221874 | Method and Apparatus for Fast Semi-Automatic Semantic Annotation - A method, apparatus and computer instructions is provided for fast semi-automatic semantic annotation. Given a limited annotated corpus, the present invention assigns a tag and a label to each word of the next limited annotated corpus using a parser engine, a similarity engine, and a SVM engine. A rover then combines the parse trees from the three engines and annotates the next chunk of limited annotated corpus with confidence, such that the efforts required for human annotation is reduced. | 09-11-2008 |
20080228466 | LANGUAGE NEUTRAL TEXT VERIFICATION - A resource string associated with output text is identified. A regular expression pattern is generated from the resource string. The regular expression pattern is matched to the output text. A verification result based on the matching of the regular expression pattern to the output text is provided. | 09-18-2008 |
20080228467 | NATURAL LANGUAGE PARSING METHOD TO PROVIDE CONCEPTUAL FLOW - A method for parsing the flow of natural human language to convert a flow of machine recognizable language into a conceptual flow includes, first, recognizing the lexical structure and then, a basic semantic grouping is determined for the language flow in the lexical structure. The basic semantic grouping is then determined that denotes the main action, occurrence or state of being for the language flow. The responsibility of the main action, occurrence or state of being for the language flow is then determined within the lexical structure followed by semantically parsing the lexical structure. Thereafter, any ambiguities in the responsibilities are resolved in a recursive manner by applying a predetermined set of rules thereto. | 09-18-2008 |
20080228468 | English-Language Translation Of Exact Interpretations of Keyword Queries - The present invention relates to a methodology to translate exact interpretations of keyword queries into meaningful and grammatically correct plain-language queries in order to convey the meaning of these interpretations to the initiator of the search. The method includes the steps of generating at least one grammatically valid plain-language sentence interpretation for a keyword query from a generated sentence plain-language sentence clauses, wherein the grammatically valid plain-language sentence is based upon differing matching elements, and presenting at least one grammatically valid plain-language sentence interpretation for the keyword query to a keyword query system user for the user's review. | 09-18-2008 |
20080235004 | DISAMBIGUATING TEXT THAT IS TO BE CONVERTED TO SPEECH USING CONFIGURABLE LEXEME BASED RULES - A software language including language constructs for disambiguating text that is to be converted to speech using configurable lexeme based rules. The language can include at least one conditional statement and a significance indicator. The conditional statement can define a sense of usage for a lexeme. The significance indicator can define a criteria for selecting an associated sense of usage. The language can also include an action expression that is associated with a conditional statement that defines a set of programmatic actions to be executed upon a selection of the associated usage sense. The conditional statement can include a context range specification that defines a scope of an input string for examination when evaluating the conditional statement. Further, the conditional statement can include a directive that represents a defined condition of the lexeme or the text surrounding the lexeme. | 09-25-2008 |
20080235005 | Device, System and Method of Handling User Requests - Devices, systems and methods of handling user requests. For example, a method includes: receiving an electronic representation of a submitted request; calculating request-related information, submitter-related information, and/or recipient-related information; based on the request-related information and the submitter-related information, determining one or more recipients for the request; distributing the request to said one or more recipients; and storing the request and one or more responses received from said one or more recipients. | 09-25-2008 |
20080243478 | Efficient Implementation of Morphology for Agglutinative Languages - A method for constructing an automaton for automated analysis of agglutinative languages, the method including constructing an affix automaton for each of a plurality of affix types of an agglutinative language, where each of the affix types is associated with one or more affixes associated with a morphological concept, combining any of the affix automatons to form a plurality of template automatons, where each of the template automatons is patterned after any of a plurality of agglutination templates of any of the affix types for the language, and combining the template automatons into a master automaton. | 10-02-2008 |
20080243479 | OPEN INFORMATION EXTRACTION FROM THE WEB - To implement open information extraction, a new extraction paradigm has been developed in which a system makes a single data-driven pass over a corpus of text, extracting a large set of relational tuples without requiring any human input. Using training data, a Self-Supervised Learner employs a parser and heuristics to determine criteria that will be used by an extraction classifier (or other ranking model) for evaluating the trustworthiness of candidate tuples that have been extracted from the corpus of text, by applying heuristics to the corpus of text. The classifier retains tuples with a sufficiently high probability of being trustworthy. A redundancy-based assessor assigns a probability to each retained tuple to indicate a likelihood that the retained tuple is an actual instance of a relationship between a plurality of objects comprising the retained tuple. The retained tuples comprise an extraction graph that can be queried for information. | 10-02-2008 |
20080243480 | System and method for determining semantically related terms - Systems and methods for determining semantically related terms are disclosed. Generally, a semantically related term tool receives a seed set and identifies a plurality of terms that constitute the seed set. For each term of the seed set, the semantically related term tool identifies concept terms associated with terms of the seed set other than the term being processed, joins the term being processed with each of the identified concept terms, and adds the resulting terms to a plurality of semantically related terms. The semantically related term tool removes invalid terms from the plurality of semantically related terms based on a language model and ranks at least a portion of the remaining terms of the plurality of semantically related terms based on a metric indicating a degree of semantical relationship between a term of the plurality of semantically related terms and one or more terms of the set seed. | 10-02-2008 |
20080243481 | Large Language Models in Machine Translation - Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n-1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus. | 10-02-2008 |
20080243482 | Method for performing effective drill-down operations in text corpus visualization and exploration using language model approaches for key phrase weighting - The invention relates to a method and an apparatus for performing a drill-down operation on a text corpus comprising documents, using language models for key phrase weighting, said method comprising the steps of weighting key phrases occurring both in a foreground language model, which contains a selected document cluster of said text corpus, and in a background language model, which does not contain said selected document cluster, by calculating for each key phrase a key phrase weight comprising a ratio between the foreground weight of said key phrase and a background weight of said key phrase, and assigning documents of the foreground language model to cluster labels which are formed by key phrases having high calculated key phrase weights. | 10-02-2008 |
20080243483 | UTILIZING SPEECH GRAMMAR RULES WRITTEN IN A MARKUP LANGUAGE - The present invention provides a method and apparatus that utilize a context-free grammar written in a markup language format. The markup language format provides a hierarchical format in which grammar structures are delimited within and defined by a set of tags. The markup language format also provides grammar switch tags that indicate a transitions from the context-free grammar to a dictation grammar or a text buffer grammar. In addition, the markup language format provides for the designation of code to be executed when particular grammar structures are recognized from a speech signal. | 10-02-2008 |
20080243484 | SYSTEMS AND METHODS FOR GENERATING WEIGHTED FINITE-STATE AUTOMATA REPRESENTING GRAMMARS - A context-free grammar can be represented by a weighted finite-state transducer. This representation can be used to efficiently compile that grammar into a weighted finite-state automaton that accepts the strings allowed by the grammar with the corresponding weights. The rules of a context-free grammar are input. A finite-state automaton is generated from the input rules. Strongly connected components of the finite-state automaton are identified. An automaton is generated for each strongly connected component. A topology that defines a number of states, and that uses active ones of the non-terminal symbols of the context-free grammar as the labels between those states, is defined. The topology is expanded by replacing a transition, and its beginning and end states, with the automaton that includes, as a state, the symbol used as the label on that transition. The topology can be fully expanded or dynamically expanded as required to recognize a particular input string. | 10-02-2008 |
20080249762 | CATEGORIZATION OF DOCUMENTS USING PART-OF-SPEECH SMOOTHING - A method and system is provided for classifying documents based on the subjectivity of the content of the documents using a part-of-speech analysis to help account for unseen words. A classification system trains a classifier using the parts of speech of training documents so that the classifier can classify unseen words based on the part of speech of the unseen word. The classification system then trains a part-of-speech model using the parts of speech of the n-grams of training data and labels of the training documents, and trains a term model using the term unigrams and labels. To classify a target document, the classification system applies the part-of-speech model to the part-of-speech n-grams of the target document and the term model to term n-grams of the target document. | 10-09-2008 |
20080249763 | USER-TAILORABLE ROMANIZED CHINESE TEXT INPUT SYSTEMS AND METHODS - Methods and systems for romanizing Chinese ideograms allow a user to create a personalized spelling dictionary that converts a user's desired roman-alphabet spelling to an equivalent Chinese character. A phonetic combination from a standard Chinese dialect is selected. The user defines a roman alphabet equivalent of the selected phonetic combination that fits the way the user pronounces the phonetic combination in the user's own dialect or idiolect. | 10-09-2008 |
20080249764 | Smart Sentiment Classifier for Product Reviews - A sentiment classifier is described. In one implementation, a system applies both full text and complex feature analyses to sentences of a product review. Each analysis is weighted prior to linear combination into a final sentiment prediction. A full text model and a complex features model can be trained separately offline to support online full text analysis and complex features analysis. Complex features include opinion indicators, negation patterns, sentiment-specific sections of the product review, user ratings, sequence of text chunks, and sentence types and lengths. A Conditional Random Field (CRF) framework provides enhanced sentiment classification for each segment of a complex sentence to enhance sentiment prediction. | 10-09-2008 |
20080262831 | Method for the Natural Language Recognition of Numbers - A method for the natural language recognition of numbers, in particular, for use in a voice recognition system. The recognition method is as the follows: a spoken numeral is detected and digitized, the numeral is broken down into number-related word components, the mutual position of the word components is determined within the numeral, the numerical values corresponding to the word components are compared and recognized using word component-number value pairs maintained in a digital dictionary, and the individual numerical values are strung together and/or added and/or multiplied according to the type and positions thereof of the corresponding word components in the numeral such that the numerical value corresponding to the input numeral is obtained. | 10-23-2008 |
20080270116 | Large-Scale Sentiment Analysis - A method for determining a sentiment associated with an entity includes inputting a plurality of texts associated with the entity, labeling seed words in the plurality of texts as positive or negative, determining a score estimate for the plurality of words based on the labeling, re-enumerating paths of the plurality of words and determining a number of sentiment alternations, determining a final score for the plurality of words using only paths whose number of alternations is within a threshold, converting the final scores to corresponding z-scores for each of the plurality of words, and outputting the sentiment associated with the entity. | 10-30-2008 |
20080270117 | Method and system for text compression and decompression - Creation and recovering of the pseudo-code (Y) form the basis of the present method of text compression and decompression. The pseudo-code (Y) is created by formula Y=C+X. The pseudo-code includes information of a repeating index/symbol (constant C) and a current index/symbol (X). The pseudo-code (Y) is converted back into original information by formula X=Y−C. To service the pseudo-code one needs to convert original symbols of text into indexes, and to create a permanent and temporary vocabulary. The present permanent vocabulary is a redundant vocabulary built in advance, includes dictionary with common symbols taken from books, articles, and dictionaries, and serves as a reference vocabulary stored in the permanent memory. The temporary vocabulary is built and is used during compression and decompression processes. The functionality of the temporary vocabulary is to convert a high bit length of indexes belonging to the permanent vocabulary into a low bit length indexes present in the temporary vocabulary. | 10-30-2008 |
20080270118 | Recognition architecture for generating Asian characters - Architecture for correcting incorrect recognition results in an Asian language speech recognition system. A spelling mode can be launched in response to receiving speech input, the spelling mode for correcting incorrect spelling of the recognition results or generating new words. Correction can be obtained using speech and/or manual selection and entry. The architecture facilitates correction in a single pass, rather than multiples times as in conventional systems. Words corrected using the spelling mode are corrected as a unit and treated as a word. The spelling mode applies to languages of at least the Asian continent, such as Simplified Chinese, Traditional Chinese, and/or other Asian languages such as Japanese. | 10-30-2008 |
20080270119 | Generating sentence variations for automatic summarization - A new system is hereby provided that generates automatic summaries of groups of multiple documents using multiple variations of each sentence from a selected group of representative sentences from the documents, and then selecting from the multiple variations when assembling the automatic summary. The system may generate alternative strings of text, select from among the alternative strings of text, and provide a summary of the group of documents using the strings of text selected from among the alternatives. The alternative strings of text may be generated based on each of a plurality of sentences from the group of documents. Selecting from among the alternative strings of text may be based on one or more criteria indicating the strings of text to be representative of the content of the group of documents. | 10-30-2008 |
20080270120 | Processing text with domain-specific spreading activation methods - A method for performing natural language processing of free text using domain-specific spreading activation. Embodiments of the present invention ontologize free text using an algorithm based on neurocognitive theory by simulating human recognition, semantic, and episodic memory approaches. Embodiments of the invention may be used to process clinical text for assignment of billing codes, analyze suicide notes or legal discovery materials, and for processing other collections of text. Further, embodiments of the invention may be used to more effectively search large databases, such as a database containing a large number of medical publications. | 10-30-2008 |
20080270121 | SYSTEM AND METHOD FOR COMPUTER ANALYSIS OF COMPUTER GENERATED COMMUNICATIONS TO PRODUCE INDICATIONS AND WARNING OF DANGEROUS BEHAVIOR - The present invention is a system and method for computer analysis of computer generated communications to produce indications and warnings of dangerous behavior. A method of computer analysis of computer generated communications in accordance with the invention, includes collecting at least one computer generated communication produced by or received by an author; parsing the collected at least one computer generated communication to identify categories of information therein; processing the categories of information with at least one analysis to quantify at least one type of information in each category; and generating an output communication when a difference between the quantification of at least one type of information for at least one category and a reference for the at least one category is detected involving a psychological state of the author to which a responsive action should be taken with content of the output communication and the at least one category being programmable to define a psychological state in response to which an action should be taken and what the action is to be taken in response to the defined psychological state. | 10-30-2008 |
20080275694 | METHOD AND SYSTEM FOR AUTOMATICALLY EXTRACTING RELATIONS BETWEEN CONCEPTS INCLUDED IN TEXT - A method and system for automatically extracting relations between concepts included in electronic text is described. Aspects the exemplary embodiment include a semantic network comprising a plurality of lemmas that are grouped into synsets representing concepts, each of the synsets having a corresponding sense, and a plurality of links connected between the synsets that represent semantic relations between the synsets. The semantic network further includes semantic information comprising at least one of: 1) an expanded set of semantic relation links representing: hierarchical semantic relations, synset/corpus semantic relations verb/subject semantic relations, verb/direct object semantic relations, and fine grain/coarse grain semantic relationship; 2) a hierarchical category tree having a plurality of categories, wherein each of the categories contains a group of one or more synsets and a set of attributes, wherein the set of attributes of each of the categories are associated with each of the synsets in the respective category; and 3) a plurality of domains, wherein one or more of the domains is associated with at least a portion of the synsets, wherein each domain adds information regarding a linguistic context in which the corresponding synset is used in a language. A linguistic engine uses the semantic network to performing semantic disambiguation on the electronic text using one or more of the expanded set of semantic relation links, the hierarchical category tree, and the plurality of domains to assign a respective one of the senses to elements in the electronic text independently from contextual reference. | 11-06-2008 |
20080281580 | DYNAMIC PARSER - The subject disclosure pertains to systems and methods for dynamic parsing. A dynamic parser can perform syntactic analysis or parsing of input data consisting of a set of tokens based upon a provided grammar including conditional tokens. While the parser grammar can be fixed, the dynamic parser can utilize an independent transform function at parse time to translate or replace particular tokens, effectively performing dynamic parsing. The transform function can be utilized in conjunction with conditional tokens to selectively activate and deactivate particular grammar rules. Additionally, systems and methods for automatically generating a dynamic parser from a grammar description are described herein. | 11-13-2008 |
20080281581 | METHOD OF IDENTIFYING DOCUMENTS WITH SIMILAR PROPERTIES UTILIZING PRINCIPAL COMPONENT ANALYSIS - The present invention generally provides methods and systems for characterizing texts, for example, for identifying textual documents by language, topic, author, or other attributes. In some embodiments, a method of the invention can include creating an n-gram frequency spectrum for a document under analysis, preferably selecting a subset of the n-gram frequency spectrum, transforming the n-gram frequency spectrum into principal component space, and identifying one or more attributes of the document according to its similarity to (or distinction from) reference documents in the principal component space. | 11-13-2008 |
20080288243 | Information Processing Apparatus, Informaton Processing Method, Program, and Recording Medium - Disclosed herein is an information processing apparatus for analyzing text data, including: acquisition means for acquiring the text data; morpheme information registration means for registering morpheme information for use in analyzing the text data morphologically; morphological analysis means for analyzing the text data acquired by the acquisition means; compound word processing rule registration means for registering compound word processing rules for creating a compound word not registered in the morpheme information registration means; and compound word processing means, by use of the compound word processing rules registered in the compound word processing rule registration means, for combining the morphemes included in the morphological analysis information created by the morphological analysis means, into the compound word not registered in the morpheme information registration means and detecting the created compound word. | 11-20-2008 |
20080288244 | METHOD AND SYSTEM FOR AUTOMATICALLY DETECTING MORPHEMES IN A TASK CLASSIFICATION SYSTEM USING LATTICES - In an embodiment, a lattice of phone strings in an input communication of a user may be recognized, wherein the lattice may represent a distribution over the phone strings. Morphemes in the input communication of the user may be detected using the recognized lattice. Task-type classification decisions may be made based on the detected morphemes in the input communication of the user. | 11-20-2008 |
20080294425 | Method and apparatus for performing semantic update and replace operations - A method of changing semantic information comprises changing a first bi-directional coupling between a surface region in a document and a first semantic object to a second bi-directional coupling between the surface region and a second semantic object. More particularly, the method may be comprised of identifying an occurrence of a surface region in a document, the surface region having a first link for coupling the surface region to a first semantic object, and the first semantic object having a first association for coupling the first semantic object with the surface region. The first link is replaced with a second link for coupling the surface region to a second semantic object. The first association is changed to a second association for coupling the second semantic object with the surface region. Another method for changing semantic information comprising selecting a semantic object stored in a data repository and changing the selected semantic object. A scope is then selected, either manually or automatically. A set of semantically anchored expressions associated with the semantic object is identified in response to the scope. A determination is made if the semantically anchored expressions are consistent with the changed semantic object and, if not, the semantically anchored expressions are updated so as to be consistent with the changed semantic object. | 11-27-2008 |
20080294426 | Method and apparatus for anchoring expressions based on an ontological model of semantic information - A method and apparatus for the recording and maintenance of semantic elements in electronically-held information objects provide for grounding semantic objects in an ontology, such that inheritance and other relations between concepts are preserved in persistent storage. The disclosed method and apparatus provide semantic document authors with a means to anchor concept references to specific, persistent, semantic objects, thereby providing the system with access to all properties of the underlying data model of the semantic objects being referenced, while also specifying the type and scope of their relations, as well as behavioral aspects of the visual and editing environment. | 11-27-2008 |
20080294427 | Method and apparatus for performing a semantically informed merge operation - A method and apparatus for performing an informed semantic merge operation comprises selecting a source region in a document and a target region in the same or a different document. A bi-directionally coupled surface region is identified in the source region and a bi-directionally coupled surface region is identified in the target region. A first semantic object coupled to the surface region in the source region is identified and a second semantic object coupled to the surface region in the target region is identified. The subcomponents of the first semantic object are combined with the subcomponents of the second semantic object by merging. | 11-27-2008 |
20080300862 | AUTHORING SYSTEM - A method for supervising text includes receiving input text in a natural language, the input text including at least one source sentence. The input text is analyzed, which includes, for a source sentence in the input text, generating a syntacetic representation. A target sentence is generated in the same natural language, based on the syntacetic representation. The source sentence is compared with the target sentence to determine whether there is a match. A decision is output, based on the comparison. | 12-04-2008 |
20080300863 | Publishing tool for translating documents - Some embodiments of a publishing tool to translate documents have been presented. In one embodiment, a master document written in a first natural language is received. The master document is repurposed to generate a set of output documents in one or more predetermined formats, wherein each of the output document is in a distinct one of a set of natural languages. | 12-04-2008 |
20080300864 | Syndication of documents in increments - Some embodiments of a publishing tool to provide syndication in increments have been presented. In one embodiment, a set of documents in different formats and/or different natural languages has been generated from a master document. In response to a change in the master document, a corresponding part in each of the plurality of documents is synchronously generated without regenerating an entirety of each of the plurality of documents. Then each of the set of documents is updated using the corresponding part generated. | 12-04-2008 |
20080300865 | METHOD, SYSTEM, AND APPARATUS FOR NATURAL LANGUAGE MIXED-INITIATIVE DIALOGUE PROCESSING - In a natural language, mixed-initiative system, a method of processing user dialogue can include receiving a user input and determining whether the user input specifies an action to be performed or a token of an action. The user input can be selectively routed to an action interpreter or a token interpreter according to the determining step. | 12-04-2008 |
20080306730 | System and method to modify text entry - A system and method to modify entry of text is provided. The system includes an input device, a display device, and a processor configured to store a correlation between at least one word with at least one candidate phrase; receive at least one word into the input device; identify the at least one candidate phrase correlated to the at least one word; replace the at least one word with a selected phrase from the at least one candidate phrase; and store the selected phrase in a computer readable storage medium. | 12-11-2008 |
20080312904 | Sub-Model Generation to Improve Classification Accuracy - A method of classifying text input for use with a natural language understanding system can include determining classification information including a primary classification and one or more secondary classifications for a received text input using a statistical classification model (statistical model). A statistical classification sub-model (statistical sub-model) can be selectively built according to a model generation criterion applied to the classification information. The method further can include selecting the primary classification or the secondary classification for the text input as a final classification according to the statistical sub-model and outputting the final classification for the text input. | 12-18-2008 |
20080312905 | Extracting Tokens in a Natural Language Understanding Application - A method of processing text within a natural language understanding system can include applying a first tokenization technique to a sentence using a statistical tokenization model. A second tokenization technique using a named entity can be applied to the sentence when the first tokenization technique does not extract a needed token according to a class of the sentence. A token determined according to at least one of the tokenization techniques can be output. | 12-18-2008 |
20080312906 | Reclassification of Training Data to Improve Classifier Accuracy - A method of creating a statistical classification model for a classifier within a natural language understanding system can include processing training data using an existing statistical classification model. Sentences of the training data correctly classified into a selected class of the statistical classification model can be selected. The selected sentences of the training data can be assigned to a fringe group or a core group according to confidence score. The training data can be updated by associating the fringe group with a fringe subclass of the selected class and the core group with a core subclass of the selected class. A new statistical classification model can be built from the updated training data. The new statistical classification model can be output. | 12-18-2008 |
20080312907 | METHOD AND SYSTEM FOR DATA MODELING ACCORDING TO USER PERSPECTIVES - Techniques for the design and use of a perception modeling language for communicating according to the perspective of at least two communicators. The disclosed method and system provide for forming a model including a predetermined number of states and a plurality of related transitions. The disclosed subject matter represents each of said predetermined number of states according to a plurality of perspectives, said perspectives including a plurality of states and a set of related transitions, and forms a perspective language by deriving a plurality of functions associating said plurality of perspectives for representing at least one actually observable system. Furthermore, the perspective modeling language derives a set of modeling perspectives for modeling said at least one actually observable system. | 12-18-2008 |
20080312908 | SYSTEMS AND METHODS FOR NORMALIZATION OF LINGUISTIC STRUCTURES - A text passage is analyzed to determine whether it contains a “be” verb or a “have” verb. If so, syntactic dependencies are obtained from the text passage, a direct object relation involving the “be” verb or “have” verb is obtained, and a verbal form of a noun appearing in the first direct object relation is obtained. The syntactic dependencies are rewritten based on the verbal form of the noun. Different syntactic rewriting criteria are applied if the text passage also contains a noun object preceding a past participle verb, or also contains an active present participle verb. | 12-18-2008 |
20080312909 | SYSTEM FOR ADAPTIVE MULTI-CULTURAL SEARCHING AND MATCHING OF PERSONAL NAMES - An automated name searching system incorporates an automatic name classifier and a multi-path architecture in which different algorithms are applied based on cultural identity of the query name. The name classifier operates with a preemptive list, analysis of morphological elements, length, and linguistic rules. A name regularizer produces a character based computational representation of the name. A pronunciation equivalent representation such as an IPA language representation, and language specific rules to generate name searching keys, are used in a first pass to eliminate database entries which are obviously not matches for the query name. The methods can also be implemented as a callable set of library routines including an intelligent preprocessor and a name evaluator that produces a score comparing a query name and database name, based on a variety of user-adjustable parameters. The user-controlled parameters permit tuning of the search methodologies for specific custom applications. | 12-18-2008 |
20080319735 | SYSTEMS AND METHODS FOR AUTOMATIC SEMANTIC ROLE LABELING OF HIGH MORPHOLOGICAL TEXT FOR NATURAL LANGUAGE PROCESSING APPLICATIONS - Systems and methods are provided for automated semantic role labeling for languages having complex morphology. In one aspect, a method for processing natural language text includes receiving as input a natural language text sentence comprising a sequence of white-space delimited words including inflicted words that are formed of morphemes including a stem and one or more affixes, identifying a target verb as a stem of an inflicted word in the text sentence, grouping morphemes from one or more inflicted words with the same syntactic role into constituents, and predicting a semantic role of a constituent for the target verb. | 12-25-2008 |
20080319736 | Discriminative Syntactic Word Order Model for Machine Translation - A discriminatively trained word order model is used to identify a most likely word order from a set of word orders for target words translated from a source sentence. For each set of word orders, the discriminatively trained word order model uses features based on information in a source dependency tree and a target dependency tree and features based on the order of words in the word order. The discriminatively trained statistical model is trained by determining a translation metric for each of a set of N-best word orders for a set of target words. Each of the N-best word orders are projective with respect to a target dependency tree and the N-best word orders are selected using a combination of an n-gram language model and a local tree order model. | 12-25-2008 |
20080319737 | Method and apparatus for connecting a cellular telephone user to the internet - A method and devices are described for providing a user of a mobile communication apparatus with a prediction of a string of characters based on one or more characters. The method includes the steps of: receiving information that relates to the location of the mobile communication apparatus; receiving character information which is part a the string of characters to be predicted; accessing databases which comprise pre-defined strings of characters and selecting therefrom a group of strings of characters associated with the vicinity at which the mobile communication apparatus is currently located; transmitting the selected group of strings of characters to the mobile communication apparatus; and displaying the selected group of strings of characters at the mobile communication apparatus. | 12-25-2008 |
20090006077 | SPATIALLY INDEXED GRAMMAR AND METHODS OF USE - Improved systems and methods are described which simplify the individual's interaction with speech recognition software, expand the database of spoken point names that can be recognized, and increase the quality and therefore likelihood of success of speech recognition applications. The present systems and methods apply to various uses, such as providing driving directions, finding the nearest location based service, and finding the nearest “Where Am I?” type of location based services. | 01-01-2009 |
20090006078 | METHOD AND SYSTEM FOR NATURAL LANGUAGE DICTIONARY GENERATION - A method and computer system for analyzing a text corpus in a natural language is provided. An initial morphological description having word inflection rules for various groups of words in the natural language is created by a linguist. A plurality of text corpuses are analyzed to obtain information on the occurrence of a plurality of word forms for each word token in each text corpus. A morphological dictionary which contains information about each base form and word inflection rules for each word token with verified hypothesis is generated. | 01-01-2009 |
20090006079 | Regular expression word verification - The present disclosure is directed to a method of verifying a compound word. The method includes receiving an input signal indicative of a textual input and accessing a rule and a lexical data structure from data stores. The rule is applied to the textual input to determine whether the textual input is a valid compound word. An output signal is provided that is indicative of whether the textual input is a compound word. | 01-01-2009 |
20090006080 | COMPUTER-READABLE MEDIUM HAVING SENTENCE DIVIDING PROGRAM STORED THEREON, SENTENCE DIVIDING APPARATUS, AND SENTENCE DIVIDING METHOD - A typical sentence having a specific typical characteristic in the sentence is divided. A division target typical sentence is divided on the basis of a small clause definition. The sentence is divided where positions suitable for dividing the typical sentence based on the structure are expressed by a user. A small clause string including small clauses that serve as independent sentences is created after the division. The small clause string is compared to the structure patterns, and a structure pattern that is determined to match the small clause string is selected as a result of the typical sentence division. | 01-01-2009 |
20090012777 | TOKEN STREAM DIFFERENCING WITH MOVED-BLOCK DETECTION - Methods and apparatus implementing systems and techniques for differencing token streams and detecting moved blocks of tokens. In general, in one implementation, the technique includes: obtaining a first token stream and a second token stream, comparing the first and second token streams to identify a group of tokens that are substantially similar in the first and second token streams, the similar-tokens group including common sub-sequences, which are identical in the first and second token streams, and at least one unmatched token, and presenting matched token information corresponding to the similar-tokens group to represent changes in document flow. | 01-08-2009 |
20090012778 | APPARATUS AND METHOD FOR EXPANDING NATURAL LANGUAGE QUERY REQUIREMENT - The present invention provides an apparatus for expanding a query requirement, comprising: a query requirement understanding device which generates an explicit query requirement according to a user query request; and a query requirement expanding device which generates an implicit query requirement associated with the explicit query requirement. The query requirement understanding device generates an explicit query requirement including a query concept and a question type by searching a knowledge base and a language base, and the query requirement expanding device generates an implicit query requirement including a query concept and a question type by searching the knowledge base, the language base and a relevancy database. The present invention further provides a method for expanding a query requirement. The apparatus and method for expanding a query requirement according to the present invention can facilitate a user's query and provide the user with an accurate, comprehensive query answer. | 01-08-2009 |
20090018817 | Method and System for Connecting Characters, Words and Signs to a Telecommunication Number - This invention relates to the field of wireless data and instant communication technologies and describes a method and a system for connecting words, phrases, or symbols of any languages or multimedia expressions, within the content of transmitted data, to telecommunication codes. The presented method of the invention selects a group of Telecom Codes, defines Content Names, assigns the Content Names to the Telecom Codes, receives the transmitted content, and redirects the content to the connected Telecom Codes after detecting the existence of the Content Names. The presented system of the invention combines both software and hardware functions, with the hardware portion comprising a Processor, a Memory, a Display Device, an input Device, and a Communication Interface, and the software portion comprising an Operating System, a Client Data Management Module including Management Interface, a Database Software, a group of Telecom Codes as well as other connectable Telecom Codes configured in the Database Software, a group of defined Content Names configured in the Database Software, the Connection Relations and the Rules of Directing configured in the Database Software, an Analysis and Redirecting Module, and a Communication Interface. The presented method and system, of the invention solves the difficulties in memorizing and input cumbersomeness when long Telecom Codes of many digits are used, and leads to five new application developments; Information Portal, Multiple Content Names Connecting to Single Telecom Code, Multiple Telecom Codes Connecting to Single Content Name, Connection of Classified Advertisements, and Interactive Customer Relation Management (ICRM) and Supplier Relation Management (ISRM) System. | 01-15-2009 |
20090018818 | OPERATING DEVICE FOR NATURAL LANGUAGE INPUT - An operating device for natural language input is disclosed. A user can express its own request to the operating device by inputting natural language, then a processor determines format of natural language, if the natural language is voice format, a voice identification cell transforms it into word data and transmits to a natural language analysis unit; if the natural language is word or character identification format, the natural language analysis unit directly analyzes sentence type and issues a instruction accordingly, and then, an executive interface is to find out a matched equipment end to transmit the instruction for operating in real time, such design can respond to equipment end as required by the user so as to achieve a best man-machine communicating channel. | 01-15-2009 |
20090018819 | TRACKING CHANGES IN STRATIFIED DATA-STREAMS - Disclosed are systems, methods, and computer readable media for detecting and coordinating changes in stratified data streams. The method embodiment comprises receiving one or more data streams, each data stream comprising at least one lexical item and having at least one metavalue, detecting a change in a frequency of the at least one lexical item for each metavalue separately, coordinating the change in frequency of the at least one lexical item with changes in frequencies of lexical items associated with the at least one lexical item by grouping the at least one lexical item and the associated lexical items over time and across at least one metavalue, wherein end grouping is a coordinated change-event, and presenting a summarization of the coordinated change-event to a user. | 01-15-2009 |
20090018820 | Character String Anonymizing Apparatus, Character String Anonymizing Method, and Character String Anonymizing Program - A character string anonymizing apparatus classifies each of a plurality of pieces of text data, each including a character string, into a plurality of kinds of data in accordance with a classification condition, extracts a plurality of words included in each of the plurality of pieces of text data (hereinafter, referred to as linked data) classified into the same kind by the classification, extracts, among word sets including one or more of the extracted words, a word set in which the number of pieces of linked data including all words forming each of the word sets is greater than or equal to a threshold, and anonymizes, among words included in a character string included in each of the plurality of pieces of text data, a word matching at least some of the extracted words and not matching words forming the extracted word set. | 01-15-2009 |
20090018821 | LANGUAGE PROCESSING DEVICE, LANGUAGE PROCESSING METHOD, AND LANGUAGE PROCESSING PROGRAM - A language processing device includes first analysis unit | 01-15-2009 |
20090024385 | SEMANTIC PARSER - A method and an apparatus for semantic parsing of electronic text documents. The electronic text documents can comprise a plurality of sentences with several language components. The method comprises analyzing at least one sentence of the electronic text document and dynamically generating a graph from the analyzed sentence of the text document. The graph represents a semantic representation of the analyzed one or more sentences. The method continues the analysis until an ambiguous sentence is determined and analyzed by evaluating at least a portion of the generated graph. | 01-22-2009 |
20090043565 | HANDHELD ELECTRONIC DEVICE WITH TEXT DISAMBIGUATION - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device. | 02-12-2009 |
20090048823 | SYSTEM AND METHODS FOR OPINION MINING - A system that incorporates teachings of the present disclosure may include, for example, a system having a controller to identify from commentaries of an object or service one or more context-dependent opinions associated with one or more features of the object or the service, and synthesize a semantic orientation for each of one or more context-dependent opinions of the one or more features. Additional embodiments are disclosed. | 02-19-2009 |
20090055163 | Dynamic Mixed-Initiative Dialog Generation in Speech Recognition - Disclosed are a method ( | 02-26-2009 |
20090055164 | Method and System of Optimal Selection Strategy for Statistical Classifications in Dialog Systems - An optimal selection or decision strategy is described through an example that includes use in dialog systems. The selection strategy or method includes receiving multiple predictions and multiple probabilities. The received predictions predict the content of a received input and each of the probabilities corresponds to one of the predictions. In an example dialog system, the received input includes an utterance. The selection method includes dynamically selecting a set of predictions from the received predictions by generating ranked predictions. The ranked predictions are generated by ordering the plurality of predictions according to descending probability. | 02-26-2009 |
20090055165 | DYNAMIC MIXED-INITIATIVE DIALOG GENERATION IN SPEECH RECOGNITION - Disclosed are a method ( | 02-26-2009 |
20090055166 | Method, Computer Program and Apparatus for Analysing Symbols in a Computer System - A computer-implemented method of analysing symbols in a computer system, and a computer program and apparatus therefor are provided. The symbols conform to a specification for the symbols. The specification is codified into a set of computer-readable rules. The symbols are analysed using the computer-readable rules to obtains patterns of the symbols by: determining the path that is taken by the symbols through the rules that successfully terminates, and grouping the symbols according to said paths. | 02-26-2009 |
20090055167 | METHOD FOR TRANSLATION SERVICE USING THE CELLULAR PHONE - Disclosed is a method for providing translation service using a mobile communication terminal. The method includes a button input step of pressing a voice recognition key to use a voice recognition function, a menu screen provision step of selecting a translator menu item, a translation recognition method determination step of selecting a sentence input method or a word input method, a Korean input step of inputting Korean, a confirmation step of confirming whether a completed Korean sentence matches an intended sentence, and a translated sentence output step of providing a relevant translated sentence in a text form and reproducing the relevant translated sentence in a voice form. | 02-26-2009 |
20090063131 | Methods and systems for language representation - A method of representing a language statement having one or more words includes capturing an expression of the language statement, associating one or more properties with each of the one or more words in the language statement, substantially removing as necessary one or more first ambiguities in the language statement, establishing one or more functional roles for each of the one or more words in the language statement, processing as necessary one or more second ambiguities in the language statement, and providing a representation of the language statement including the one or more properties associated with and the one or more functional roles established for each of the one or more words, the one or more first ambiguities substantially removed, and the one or more second ambiguities processed. | 03-05-2009 |
20090063132 | Information Processing Apparatus, Information Processing Method, and Program - An information processing apparatus includes: morphological analysis means for performing morphological analysis on a text document; managing means for managing a connection pattern indicating a connection relationship of a morpheme of a predetermined part of speech; and extracting means extracting, from a string of morphemes obtained by performing morphological analysis by the morphological analysis means, a phrase including a plurality of morphemes having a same connection relationship as the connection relationship indicated by the connection pattern managed by the managing means. | 03-05-2009 |
20090063133 | SYSTEM AND ARTICLE OF MANUFACTURE FOR FILTERING CONTENT USING NEURAL NETWORKS - Provided are a system and article of manufacture for filtering communications received from over a network for a person-to-person communication program. A communication is received for the person-to person communication program. The communication is processed to determine predefined language statements. Information on the determined language statements is inputted into a neural network to produce an output value. A determination is made as to whether the output value indicates that the communication is unacceptable. The communication is forwarded to the person-to-person communication program unchanged if the output value indicates that the communication is acceptable. An action is performed with respect to the communication upon determining that the communication is unacceptable that differs from the forwarding of the communication that occurs if the output value indicates that the communication is acceptable. | 03-05-2009 |
20090070100 | METHODS, SYSTEMS, AND COMPUTER PROGRAM PRODUCTS FOR SPOKEN LANGUAGE GRAMMAR EVALUATION - A method, system, and computer program product for spoken language grammar evaluation are provided. The method includes playing a recorded question to a candidate, recording a spoken answer from the candidate, and converting the spoken answer into text. The method further includes comparing the text to a grammar database, calculating a spoken language grammar evaluation score based on the comparison, and outputting the spoken language grammar evaluation score. | 03-12-2009 |
20090070101 | DEVICE FOR AUTOMATICALLY CREATING INFORMATION ANALYSIS REPORT, PROGRAM FOR AUTOMATICALLY CREATING INFORMATION ANALYSIS REPORT, AND METHOD FOR AUTOMATICALLY CREATING INFORMATION ANALYSIS REPORT | 03-12-2009 |
20090070102 | SPEECH RECOGNITION METHOD, SPEECH RECOGNITION SYSTEM AND SERVER THEREOF - A speech recognition method includes a model selection step which selects a recognition model and translation dictionary information based on characteristic information of input speech and a speech recognition step which translates input speech into text data based on the selected recognition model and translation step which translates the text data based on the selected translation dictionary information. | 03-12-2009 |
20090070103 | Management and Processing of Information - Disclosed is a method to perform natural language (NL) processing. The method includes accessing a data source having one or more data portions, and applying multi-stage NL processing on the one or more data portions, using a dynamically generated set of concepts relating to one or more subject matters and relationships between at least some of the concepts, to determine the association of the one or more data portions with one or more of the concepts. | 03-12-2009 |
20090076794 | ADDING PROTOTYPE INFORMATION INTO PROBABILISTIC MODELS - Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like. | 03-19-2009 |
20090076795 | System And Method Of Generating Responses To Text-Based Messages - In accordance with one aspect of the present invention, an automated method of and system for generating a response to a text-based natural language message is disclosed. The method includes identifying a sentence in the text-based natural language message. Also, identifying an input clause in the sentence. Further, comparing the input clause to a previously received clause, where the previously received clause is correlated with a previously generated response message. Additionally, generating an output response message based on the previously generated response message. The system includes means for performing the method steps. | 03-19-2009 |
20090076796 | Natural language processing method - Methods for converting a natural language sentence into a set of primitive sentences. The method include identifying verbal blocks in the sentence, splitting the sentence into a set of logical clauses, disambiguating ambiguous verbal blocks within each logical clause, and constructing a primitive sentence for each verbal block by duplicating the shared noun phrases of verbal blocks. | 03-19-2009 |
20090076797 | System and Method For Accessing Images With A Novel User Interface And Natural Language Processing - Systems and methods for accessing images with natural language processing are provided. The methods for accessing images include linking an image with image-summarizing text by applying a hierarchical clustering algorithm to cluster one or more abstract sentences and one or more images, and linking an image with image-summarizing text if the abstract sentence belongs to a cluster that includes the image. The systems for accessing images include a natural language processor that applies a hierarchical clustering algorithm to link one or more abstract sentences in an article with one or more images in the article, and a user interface in which selecting image- summarizing text displays one or more linked images. | 03-19-2009 |
20090076798 | Apparatus and method for post-processing dialogue error in speech dialogue system using multilevel verification - Provided are an apparatus and method for post-processing a dialogue error in a speech dialogue system using multilevel verification, in which both of a user's current utterance and a whole dialogue flow are taken into account through the multilevel verification including speech recognition results analysis, linguistic analysis, discourse analysis and dialogue analysis. As a result, various errors that may occur in the speech dialogue system are detected, and error post-processing appropriate to a detected error type is performed, so that speech recognition errors may be reduced. | 03-19-2009 |
20090076799 | Coreference Resolution In An Ambiguity-Sensitive Natural Language Processing System - Technologies are described herein for coreference resolution in an ambiguity-sensitive natural language processing system. Techniques for integrating reference resolution functionality into a natural language processing system can processes documents to be indexed within an information search and retrieval system. Ambiguity awareness features, as well as ambiguity resolution functionality, can operate in coordination with coreference resolution. Annotation of coreference entities, as well as ambiguous interpretations, can be supported by in-line markup within text content or by external entity maps. Information expressed within documents can be formally organized in terms of facts, or relationships between entities in the text. Expansion can support applying multiple aliases, or ambiguities, to an entity being indexed so that all of the possibly references or interpretations for that entity are captured into the index. Alternative stored descriptions can support retrieval of a fact by either the original description or a coreferential description. | 03-19-2009 |
20090083026 | Summarizing document with marked points - A summary of a text document may be presented in the form of a list of points. A summary of text can be created by choosing words or groups of words from the original text, by modifying words in the original text, etc. Collections of the chosen words can be presented in a list form together with a mark that indicates that the text is a list of words that might not form complete sentences. Presentation of a summary in list form may lower a reader's expectation as to readability issues such as sentence flow, word flow, etc., and thus the reader may be more accepting of a machine-generated summary presented in list form than of a machine generated summary presented as sentences or paragraphs. | 03-26-2009 |
20090083027 | AUTOMATIC TEXT SKIMMING USING LEXICAL CHAINS - Automatic text skimming using lexical chains may be provided. First, at least one lexical chain may be created from an electronic document. Next, a list of positions within the electronic document may be created. The positions may include where at least one concept represented by one of the at least one lexical chain is mentioned. In addition, a list of the position where the at least one concept is mentioned may be assembled. A selection of at least one concept may be received from the list. | 03-26-2009 |
20090083028 | AUTOMATIC CORRECTION OF USER INPUT BASED ON DICTIONARY - Methods, systems, and apparatus, including computer program products, in which input keystroke data can be interpreted by a current mapping and a determination can be made whether the current mapping is valid based upon the characters identified by the mapping and the keystroke data. Invalid mappings can be corrected based upon alternative mapping of the keystroke data. | 03-26-2009 |
20090089044 | INTENT MANAGEMENT TOOL - Linguistic analysis is used to identify queries that use different natural language formations to request similar information. Common intent categories are identified for the queries requesting similar information. Intent responses can then be provided that are associated with the identified intent categories. An intent management tool can be used for identifying new intent categories, identifying obsolete intent categories, or refining existing intent categories. | 04-02-2009 |
20090089045 | METHOD OF TRANSFORMING NATURAL LANGUAGE EXPRESSION INTO FORMAL LANGUAGE REPRESENTATION - This invention comprises a series of steps which transforms one or more natural language expressions into a single, well-formed formal language representation. Each natural language expression is partially parsed into simple fragments, each of which is then associated with one or more short formal expressions. Each formal expression is constructed in such a way as to contain one or more placeholder variables, each of which is associated with one or more attributes to constrain the types of entities that each variable can potentially represent. The resulting plurality of formal expressions is then filtered for relevance within a given context, and the surviving expressions manipulated based upon a plurality of rules, which are cognizant of the attributes associated with each variable contained therein. A user is then presented with the resulting plurality of formal expressions, whereupon the user optionally selects, rejects, adds to, logically connects and otherwise manipulates each member of said plurality. When the user is satisfied that the plurality represents an intended meaning, the formal expressions are combined into a single, formal representation. | 04-02-2009 |
20090089046 | Word Use Difference Information Acquisition Program and Device - A device or computer implemented program for accurately and automatically obtaining general-purpose information regarding the usage difference between a plurality of synonyms and quasi-synonyms, such as the types of words with which the synonyms and quasi-synonyms are often used, is provided with: means for receiving the input of a plurality of words; means for extracting sentence data including an inputted word from a corpus; means for analyzing the sentence structure of the sentence data and extracting nouns that are in a grammatical relationship with the inputted word included in the sentence data; means for extracting the nodes representing the nouns and the nodes representing the semantic category of the noun from a thesaurus and forming a directional graph for each inputted word; means for comparing a plurality of directional graphs and extracting the difference nodes; and means for outputting the extracted difference nodes as information relating to the usage difference of the inputted words. | 04-02-2009 |
20090089047 | Natural Language Hypernym Weighting For Word Sense Disambiguation - Technologies are described herein for probabilistically assigning weights to word senses and hypernyms of a word. The weights can be used in natural language processing applications such as information indexing and querying. A word hypernym weight (WHW) score can be determined by summing word sense probabilities of word senses from which the hypernym is inherited. WHW scores can be used to prune away hypernyms prior to indexing, to rank query results, and for other functions related to information indexing and querying. A semantic search technique can use WHW scores to retrieve an entry related to a word from an index in response to matching an indexed hypernym of the word with a query term applied to the index. More refined and accurate query results may be provided based on reduced user inputs. | 04-02-2009 |
20090094019 | Efficiently Representing Word Sense Probabilities - Word sense probabilities are compressed for storage in a semantic index. Each word sense for a word is mapped to one of a number of “buckets” by assigning a bucket score to the word sense. A scoring function is utilized to assign the bucket scores that maximizes the entropy of the assigned bucket scores. Once the bucket scores have been assigned to the word senses, the bucket scores are stored in the semantic index. The bucket scores stored in the semantic index may be utilized to prune one or more of the word senses prior to construction of the semantic index. The bucket scores may also be utilized to prune and rank the word senses at the time a query is performed using the semantic index. | 04-09-2009 |
20090094020 | Recommending Terms To Specify Ontology Space - In one embodiment, a set of target search terms for a search is received. Candidate terms are selected, where a candidate term is selected to reduce an ontology space of the search. The candidate terms are to a computer to recommend the candidate terms as search terms. In another embodiment, a document stored in one or more tangible media is accessed. A set of target tags for the document is received. Terms are selected, where a term is selected to reduce an ontology space of the document. The terms are sent to a computer to recommend the terms as tags. | 04-09-2009 |
20090094021 | Determining A Document Specificity - In one embodiment, determining a document specificity includes accessing a record that records the clusters of documents. The number of themes of a document is determined from the number of clusters of the document. The specificity of the document is determined from the number of themes. | 04-09-2009 |
20090099839 | System And Method For Prospecting Digital Information - A system and method for prospecting digital information is provided. A home evergreen index for a home subject area within a corpus of digital information is maintained and includes topic models matched to the corpus. A frontier evergreen index for a frontier subject area within the corpus topically distinct from the home subject area is identified. Quality assessments for frontier articles from the corpus identified by the topic models of the frontier evergreen index are obtained. The frontier articles with positive quality assessments are reclassified against the topic models in the home evergreen index. The frontier articles are provided in a display with home articles previously classified against the topic models in the home evergreen index. | 04-16-2009 |
20090099840 | Request Content Identification System, Request Content Identification Method Using Natural Language, and Program - A request content identification system performs an audio recognition process according to audio data inputted from an input device ( | 04-16-2009 |
20090099841 | AUTOMATIC SPEECH RECOGNITION METHOD AND APPARATUS - A system for calculating the look ahead probabilities at the nodes in a language model look ahead tree, wherein the words of the vocabulary of the language are located at the leaves of the tree,
| 04-16-2009 |
20090099842 | Creating A Voice Response Grammar From A Presentation Grammar - Methods, systems, and products are disclosed for creating a voice response grammar in a voice response server including identifying presentation documents for a presentation, each presentation document having a presentation grammar. Typical embodiments include storing each presentation grammar in a voice response grammar on a voice response server. In typical embodiments, identifying presentation documents for a presentation includes creating a data structure representing a presentation and listing at least one presentation document in the data structure representing a presentation. In typical embodiments listing the at least one presentation document includes storing a location of the presentation document in the data structure representing a presentation and storing each presentation grammar includes retrieving a presentation grammar of the presentation document in dependence upon the location of the presentation document. | 04-16-2009 |
20090106019 | METHOD AND SYSTEM FOR PRIORITIZING COMMUNICATIONS BASED ON SENTENCE CLASSIFICATIONS - A method and system for prioritizing communications based on classifications of sentences within the communications is provided. A sentence classification system may classify sentences of communications according to various classifications such as “sentence mode.” The sentence classification system trains a sentence classifier using training data and then classifies sentences using the trained sentence classifier. After the sentences of a communication are classified, a document ranking system may generate a rank for the communication based on the classifications of the sentences within the communication. The document ranking system trains a document rank classifier using training data and then calculates the rank of communications using the trained document rank classifier. | 04-23-2009 |
20090112576 | DISAMBIGUATED TEXT MESSAGE RETYPE FUNCTION - A method of editing delimited ambiguous input on a handheld electronic device, the handheld electronic device including an input apparatus, an output apparatus, and a memory having a plurality of objects stored therein, the plurality of objects including a plurality of language objects and a plurality of frequency objects having a frequency value, the input apparatus including a plurality of input members, at least one of the input members having a plurality of linguistic elements assigned thereto. The method comprises detecting a selection of a language object generated from a first delimited ambiguous input, outputting a plurality of language objects which are complete word solutions of said first delimited ambiguous input, as well as an edit option, and detecting a selection of the edit option. | 04-30-2009 |
20090112577 | SYSTEM AND METHOD FOR LOCALIZATION OF ASSETS USING DICTIONARY FILE BUILD - A system and method for organizing localization content for video game development is disclosed. The method includes generating executable instructions for a video game being developed, wherein the video game is being developed for deployment in a plurality of natural languages, wherein text strings and/or multimedia data to be rendered during game play are referenced by the executable instructions. The method further includes identifiably storing the text strings and/or multimedia data in one or more dictionary files such that the text strings and/or multimedia data are identifiably referenced by references in the executable instructions and are identifiably referenced by a corresponding natural language. | 04-30-2009 |
20090112578 | Handheld Electronic Device and Method for Disambiguation of Compound Text Input and for Prioritizing Compound Language Solutions According to Completeness of Text Components - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to prioritize compound language solutions according to various criteria, including the degree of completeness of the text components of a compound language solution. | 04-30-2009 |
20090119093 | METHOD AND SYSTEM TO PARSE ADDRESSES USING A PROCESSING SYSTEM - A method and system for parsing an address is disclosed. The method and system comprise separating the address into a plurality of tokens and providing one or more token meaning discovery passes based upon region specific configuration information to determine the meaning of each token in the address. In so doing, an address can be parsed by a processing system in an efficient and effective fashion. By disclosing the meaning of each token of the address in accordance with a region specific configuration information rule set a parsing process is provided which allows for easy modification as the requirements for the parsing change. | 05-07-2009 |
20090119094 | APPARATUS AND METHOD FOR LINGUISTIC SCORING - In embodiments of the invention, a system receives selections from a user based on a list of pre-defined monitoring categories and/or optionally receives custom category definitions from the user. The option for custom category definitions may be advantageous due to the flexibility provided to a system administrator or other user. In embodiments of the invention, the pre-defined and/or custom monitoring categories may be or include complex hierarchical behavior. Such an approach provides monitoring algorithms that can achieve improved accuracy compared to known methods. In embodiments of the invention, the order of computations used in resolving a monitoring category may be re-ordered, statically and/or dynamically, to improve the efficiency of monitoring operations. | 05-07-2009 |
20090119095 | Machine Learning Systems and Methods for Improved Natural Language Processing - Disclosed is a method to generate at least one new set of concepts to be used to perform natural language processing (NLP) on data. The method includes receiving one or more sources of input data, and determining, based on the one or more sources of input data and on at least one initial set of concepts, at least one attribute representative of a type of information detail to be included in the at least one new set of concepts. | 05-07-2009 |
20090125296 | METHODS AND SYSTEMS FOR USING DOMAIN SPECIFIC RULES TO IDENTIFY WORDS - Text entry systems are described that incorporate information from a specific domain to reduce the allowable words that can be spelled by ambiguous user input. The text entry systems can receive an indication identifying a key pressed by a user. The key may represent multiple characters such that the character intended by the user is ambiguous. The text entry systems identify words from a specific domain that can be spelled with any of the multiple characters represented by the key press. The text entry systems then display an indication to the user highlighting the letters of the identified words represented by the key press. Thus, the text entry systems reduce the possible words indicated by the user input based on domain-specific information. | 05-14-2009 |
20090125297 | AUTOMATIC GENERATION OF DISTRACTORS FOR SPECIAL-PURPOSE SPEECH RECOGNITION GRAMMARS - A computer-implemented method for dynamically generating a speech recognition grammar is provided. The method includes determining a target entry, and accessing a plurality of potential distractors. The method also includes selecting one or more distractors from the plurality of potential distractors. More particularly, each potential distractor selected is selected based upon an assessed acoustic dissimilarity between the distractor and the target entry. The method further includes dynamically generating a speech recognition grammar that includes the target entry and one or more of the distractors selected based upon an acoustic dissimilarity to the target entry. | 05-14-2009 |
20090132236 | SELECTION OR RELIABLE KEY WORDS FROM UNRELIABLE SOURCES IN A SYSTEM AND METHOD FOR CONDUCTING A SEARCH - The invention provides for a system to select data including a reception component that receives at least one data entry from at least one data source, a processor component to determine the entropy of a word extracted from the at least one data entry, a filtering component to select reliable words, wherein reliable words are words with low entropy values, the filtering component further excluding words with high entropy values, and a transmission component to output a set of reliable words, wherein the set of reliable words is associated with the at least one data entry from which the reliable words were extracted. | 05-21-2009 |
20090138257 | Document analysis, commenting, and reporting system - A document analysis, commenting, and reporting system provides tools that automate quality assurance analysis tailored to specific document types. As one example, the specific document type may be a requirements specification and the system may tag different parts of requirements, including actors, entities, modes, and a remainder. However, the flexibility of the system permits analysis of any other document type, such as instruction manuals and best practices guides. The system helps avoid confusion over the document when it is delivered because of non-standard terms, ambiguous language, conflicts between document sections, incomplete or inaccurate descriptions, size and complexity of the document, and other issues. | 05-28-2009 |
20090138258 | Natural language enhanced user interface in a business rule management system - Some embodiments of a natural language enhanced user interface in a business rule management system have been presented. In one embodiment, one or more rule templates in a natural language are generated from one or more prefabricated sentences. Then a user interface is created using the one or more rule templates to allow a user to compose rules for a business rule management system. | 05-28-2009 |
20090150140 | EFFICIENT STEMMING OF SEMITIC LANGUAGES - A system for stemming words of Semitic languages, the system including an affix scanner configured to scan a word of a Semitic language for at least one affix according to a predefined scanning sequence and determine if at least one predefined scanning criterion is met, and a stemmer configured to remove the affix from the word if the predefined scanning criterion is met. | 06-11-2009 |
20090150141 | Method and system for learning second or foreign languages - The present invention provides a method for providing linguistically interesting terms to a user, the method comprising processing a received digital text by a natural language processing technology, and then comparing the processed digital text with a linguistically interesting term database with a plurality of predetermined linguistically interesting terms. When the processed digital text has at least one predetermined linguistically interesting term, then at least one predetermined linguistically interesting term is extracted and is identified in a display. | 06-11-2009 |
20090150142 | BEHAVIOR DETERMINATION APPARATUS AND METHOD, BEHAVIOR LEARNING APPARATUS AND METHOD, ROBOT APPARATUS, AND MEDIUM RECORDED WITH PROGRAM - A robot includes a knowledge acquisition unit for extracting words from external instruction information, a network construction unit for constructing a network from the extracted words and updating weightings between the words, and a behavior determination unit for determining a behavior on the basis of a word network in which relationships between the words are weighted on a network. | 06-11-2009 |
20090157384 | SEMI-SUPERVISED PART-OF-SPEECH TAGGING - A word is selected from a received text and features are identified from the word. The features are applied to a model to identify probabilities for sets of part-of-speech tags. The probabilities for the sets of part-of-speech tags are used to weight scores for possible part-of-speech tags for the selected word to form weighted scores. The weighted scores are used to select a part-of-speech tag for the word and the selected part of speech tag is stored or output. The scores for the possible part-of-speech tags are based on variational approximation parameters trained from a sparse prior over probability distributions describing the probability of a part-of-speech tag given a word. | 06-18-2009 |
20090157385 | Inverse Text Normalization - Embodiments are directed to efficient multilingual inverse text normalization (ITN) of text in spoken form to produce normalized text for display. Embodiments are directed to preprocessing the multilingual text into a language-independent representation, tokenizing text in spoken form, segmenting the tokenized text into ITN items by grouping consecutive words using an ITN lexicon, classifying the ITN items into ITN categories by using the ITN lexicon or tagged information from language model, applying one or more ITN rules that are selected based on the ITN categories into which ITN items have been classified to rewrite the ITN items; and post processing the ITN item and outputting inversely normalized text in written form for display. The ITN lexicon may include ITN lexicon entries that are each located within an ITN category in the ITN lexicon. | 06-18-2009 |
20090157386 | DIAGNOSTIC EVALUATION OF MACHINE TRANSLATORS - A system for evaluating translation quality of a machine translator is discussed. The system includes a bilingual data generator configured to intermittently access a wide area network and generate a bilingual corpus from data received from the wide area network. The method also includes an example extraction component configured to receive an ontology input indicative of a plurality of ontological categories of evaluation and to extract evaluation examples from the bilingual corpus based on the ontology input. The system further includes an evaluation component configured to evaluate translation results from translation by a machine translator of the evaluation examples and to score the translation results according to the ontological categories. | 06-18-2009 |
20090157387 | Connected Text Data System - A connected text data system for efficiently and accurately translating connected text. The connected text data system includes inputting or receiving connected text, transmitting the connected text to a text iterator, scanning the connected text, identifying a plurality of words in the connected text, and translating the connected text to separated text by adding a space between each of the plurality of words. | 06-18-2009 |
20090157388 | METHOD AND DEVICE FOR OUTPUTTING INFORMATION AND/OR STATUS MESSAGES, USING SPEECH - In a method and device for outputting information and/or messages from at least one device using speech, the information and/or messages required for vocal output are provided in a voice memory, the information and/or messages are read by a processing device according to a demand, and the information and/or messages are output via acoustic output device. The information and/or messages are output with a varying intonation according to their relevance. | 06-18-2009 |
20090157389 | SYSTEM AND METHOD FOR COMPUTERIZED PSYCHOLOGICAL CONTENT ANALYSIS OF COMPUTER AND MEDIA GENERATED COMMUNICATIONS TO PRODUCE COMMUNICATIONS MANAGEMENT SUPPORT, INDICATIONS AND WARNINGS OF DANGEROUS BEHAVIOR, ASSESSMENT OF MEDIA IMAGES, AND PERSONNEL SELECTION SUPPORT - At least one computer-mediated communication produced by or received by an author is collected and parsed to identify categories of information within it. The categories of information are processed with at least one analysis to quantify at least one type of information in each category. A first output communication is generated regarding the at least one computer-mediated communication, describing the psychological state, attitudes or characteristics of the author of the communication. A second output communication is generated when a difference between the quantification of at least one type of information for at least one category and a reference for the at least one category is detected involving a psychological state, attitude or characteristic of the author to which a responsive action should be taken. The content of the second output communication and the at least one category are programmable to define a psychological state, attitude or characteristic in response to which an action should be taken and the action that is to be taken in response. | 06-18-2009 |
20090164207 | User device having sequential multimodal output user interace - In one aspect of the exemplary embodiments of this invention an apparatus includes a user interface that contains a plurality of input modalities and a plurality of output modalities, and a data processor coupled with the user interface and configurable to present a user with a content item that includes a plurality of attributes. In response to user input that data processor is operable to partition at least some of the attributes into a plurality of presentation tokens, where an individual presentation token comprises at least one attribute. The data processor is further configurable to respond to further user input to define one of the plurality of input modalities to generate a trigger condition for individual ones of the presentation tokens, where generation of a trigger condition results in an associated presentation token being made manifest to the user. The plurality of input modalities may include two or more of physical or virtual keys, an input acoustic transducer, a speech recognition unit, and a gesture detection unit, and where the plurality of output modalities may include two or more of an output acoustic transducer, a speech synthesis unit, a vibro-tactile transducer, and a display screen. | 06-25-2009 |
20090164208 | METHOD AND APPARATUS FOR ALIGNING PARALLEL SPOKEN LANGUAGE CORPORA - The method for aligning parallel spoken language corpora comprises obtaining a statistics method and dictionaries-based word alignment set from the parallel spoken language corpora, aligning chunks of the parallel spoken language corpora by using the statistics method and dictionaries-based word alignment set, to obtain a chunk alignment set, and aligning words in aligned chunks of the parallel spoken language corpora to obtain a chunk alignment-based word alignment set. Chunk alignment set and word alignment set are obtained by aligning chunks in parallel spoken language corpora in a corpus repository using a statistics method and dictionaries-based high precision word alignment set obtained from the parallel spoken language corpora and further aligning words in the chunks, and by using them in the speech-to-speech machine translation, the ambiguities of spoken language word alignment can be decreased by using the integrality of chunks. | 06-25-2009 |
20090182553 | METHOD AND APPARATUS FOR GENERATING A LANGUAGE INDEPENDENT DOCUMENT ABSTRACT - A method of extracting significant phrases from one or more documents stored in a computer-readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract. | 07-16-2009 |
20090182554 | TEXT ANALYSIS METHOD - A list of reference terms can be provided. Text and the list of reference terms can be broken down into tokens. At least one candidate can be generated in the text for mapping to at least one of the reference terms. Characters of the candidate can be compared to characters of the reference term according to one or more mapping rules. A confidence value of the mapping can be generated based on the comparison of characters. Candidates can be ranked according to their confidence value. | 07-16-2009 |
20090192784 | SYSTEMS AND METHODS FOR ANALYZING ELECTRONIC DOCUMENTS TO DISCOVER NONCOMPLIANCE WITH ESTABLISHED NORMS - A computer-implemented method for analyzing documents to discover noncompliance with an established norm is provided. The method can include receiving one or more terms indicating possible noncompliance with a pre-established norm, and, based upon the at least one term, constructing at least one grammatical unit. The grammatical unit can specify a predetermined syntax and can correspond to semantic content that is indicative of noncompliance with the pre-established norm, wherein the norm can include a statute, regulation, policy, or other standard. The method can further include identifying from among multiple electronic documents each document that contains one or more grammatical units specifying a predetermined syntax and corresponding to semantic content indicative of noncompliance with the pre-established norm. | 07-30-2009 |
20090192785 | SYSTEM AND METHOD FOR OPTIMIZING NATURAL LANGUAGE DESCRIPTIONS OF OBJECTS IN A VIRTUAL ENVIRONMENT - A system and method for constructing a natural language description of one or more objects in a virtual environment includes determining a plurality of properties of an object and an environment given a current viewpoint in a virtual environment. An object description is created using the plurality of properties where the object description reflects multiple display characteristics of the object in the virtual environment. Object descriptions in the virtual environment are combined by classifying objects in the virtual environment to condense a natural language description. | 07-30-2009 |
20090192786 | TEXT INPUT DEVICE AND METHOD - The present invention relates to a text input device and a method for inputting text, and a computer program for performing the method. A text input device ( | 07-30-2009 |
20090192787 | GRAMMER CHECKER - A method for parsing a computerized text, the method including preparing a set of logical rules, using logical grammatical links, for parsing a text, using the logical rules to identify a part of speech of each word of text and all links between the words in the text, and labeling the links as grammatically correct links or grammatically incorrect links for correction, so as to parse substantially every word in the text. | 07-30-2009 |
20090198488 | System and method for analyzing communications using multi-placement hierarchical structures - A system and method are provided for analyzing communications to disambiguate the meaning of the analyzed communications using placements into a fixed hierarchical structure based on the words, position and grammar of the communications. The disambiguated meaning of the communications can be used in conjunction with other functional programs (e.g., search engines, email, word processing). The system and method may further analyze communications associated with a communicator to determine a profile (attributes, preferences, relationships, trends, ratios) for the communicator indicated by the communications. Automated communications can be generated to match the attributes of any communicator's preferences stored in their profile, to match the attributes of other communications, or to match certain standards. | 08-06-2009 |
20090204391 | GAMING MACHINE WITH CONVERSATION ENGINE FOR INTERACTIVE GAMING THROUGH DIALOG WITH PLAYER AND PLAYING METHOD THEREOF - A player inputs a message or a conversation sentence in the form of a sound or characters into an input unit of a gaming machine to request an inquiry of a history of games with the gaming machine in the past or an inquiry of a gaming history of the player. Then, the message inputted in the sound or the characters is analyzed by a conversation engine, and the history of the games with the gaming machine in the past or the gaming history of the player in the past, which is a target of the request of the inquiry, is read out of a memory or a portable memory owned by the player. Further, data on a message or a response sentence including the history thus read out are created by the conversation engine and are outputted in the sound or characters from an output unit. | 08-13-2009 |
20090210218 | Deep Neural Networks and Methods for Using Same - A method and system for labeling a selected word of a sentence using a deep neural network includes, in one exemplary embodiment, determining an index term corresponding to each feature of the word, transforming the index term or terms of the word into a vector, and predicting a label for the word using the vector. The method and system, in another exemplary embodiment, includes determining, for each word in the sentence, an index term corresponding to each feature of the word, transforming the index term or terms of each word in the sentence into a vector, applying a convolution operation to the vector of the selected word and at least one of the vectors of the other words in the sentence, to transform the vectors into a matrix of vectors, each of the vectors in the matrix including a plurality of row values, constructing a single vector from the vectors in the matrix, and predicting a label for the selected word using the single vector. | 08-20-2009 |
20090216524 | METHOD AND SYSTEM FOR ESTIMATING A SENTIMENT FOR AN ENTITY - A method for estimating a sentiment conveyed by the content of information sources towards an entity is presented. The sentiment is obtained with respect to a query context that may be specified, e.g. by specific terms or expressions, like a product or service name. A sentiment dictionary having a plurality of sentiment terms is provided, wherein each sentiment term has assigned a sentiment value, and at least one of said sentiment terms is associated to a group context. Text documents are screened for occurrences of sentiment terms that are associated to a group context corresponding to the query context. Calculating a sentiment score value is performed as a function of the occurrences of sentiment terms having a similar or same group context as the query context. The method may be carried out automatically without manual analysis of the actual semantic content of the text documents under consideration. | 08-27-2009 |
20090216525 | SYSTEM AND METHOD FOR TREATING HOMONYMS IN A SPEECH RECOGNITION SYSTEM - A system and method for homonym treatment in a speech recognition system and method are provided. The system and method for homonym treatment in a speech recognition system may be used in a mobile wireless communication devices that are voice operated after their initial activation. | 08-27-2009 |
20090234639 | Human-Like Response Emulator - Human-like response emulator stores a library ( | 09-17-2009 |
20090234640 | Method and an apparatus for automatic semantic annotation of a process model - An apparatus and a method for automated semantic annotation of a process model having model elements named by natural language expressions, wherein said apparatus comprises at least one semantic pattern analyser which analyses the textual structure of each natural language expression on the basis of predefined semantic pattern descriptions to establish a semantic linkage between each model element to classes and instances of a reference process ontology for generating a semantically annotated process model. | 09-17-2009 |
20090234641 | METHOD AND SYSTEM FOR ASSISTING THE PROTECTION OF TRADE MARKS - Method for assisting the protection of trade marks comprising the following steps: collecting data comprising at least one natural language term relating to a field of activity, these data being indicated by a user; determining, in an automated manner, on the basis of the natural language term or terms indicated by the user, a suggestion comprising goods and/or services and their respective classes according to an administrative classification, and transmitting the suggestion to the user; receiving data indicative of a selection of good(s) and/or service(s) and/or class(es) chosen by the user from the automated suggestion; and compiling and/or storing this selection. The method can furthermore comprise the automated searching for priorities, the preparing of documents (paper or electronic) necessary for filing a trade mark application, and the tracking of the registration procedure. | 09-17-2009 |
20090240487 | MACHINE TRANSLATION - A method for computer-assisted translation from a source language to a target language makes use of number of rules. Each rule forms an association between a representation of a sequence of source language tokens with a corresponding tree-based structure in the target language. The tree-based structure for each of at least some of the rules represents one or more asymmetrical relations within a number of target tokens associated with the tree-based structure and provides an association of the target tokens with the sequence of source language tokens of the rule. An input sequence of source tokens is decoded according to the rules to generate a representation of one or more output sequences of target language tokens. Decoding includes, for each of at least some sub-sequences of the input sequence of source tokens, determining a tree-based structure associated with the sub-sequence according a match to one of the plurality of rules. | 09-24-2009 |
20090240488 | CORRECTIVE FEEDBACK LOOP FOR AUTOMATED SPEECH RECOGNITION - A method for facilitating the updating of a language model includes receiving, at a client device, via a microphone, an audio message corresponding to speech of a user; communicating the audio message to a first remote server; receiving, that the client device, a result, transcribed at the first remote server using an automatic speech recognition system (“ASR”), from the audio message; receiving, at the client device from the user, an affirmation of the result; storing, at the client device, the result in association with an identifier corresponding to the audio message; and communicating, to a second remote server, the stored result together with the identifier. | 09-24-2009 |
20090248397 | Service Initiation Techniques - Service initiation techniques are described. In at least one implementation, a computing device receives a selection of text that is displayed in a user interface by an application. Selection is detected of one of a plurality of services that are displayed in the user interface. Responsive to the detection, the selection of text is provided to the selected service without further user intervention. | 10-01-2009 |
20090248398 | Vocal Alert Unit Having Automatic Situation Awareness - A system and method for instructing dynamic nodes in a dynamically changing mobile network how to maneuver. A receiver receives situation data indicative of a respective situation of each dynamic node in space and a situation unit coupled to the receiver determines the respective situation of each dynamic node. An analysis unit coupled to the situation unit analyzes the respective situation of each dynamic node in combination with specified criteria to generate respective situation awareness date for each dynamic node. A dynamic selector unit coupled to the analysis unit determines from the respective situation awareness data appropriate action to be performed by each node; and a communication unit coupled to the dynamic selector unit conveys to the respective dynamic node command data to permit rendering of a personalized command for informing the respective node of appropriate action to be carried out thereby. | 10-01-2009 |
20090248399 | System and method for analyzing text using emotional intelligence factors - A system, method and computer program products for facilitating the automated reading, disambiguation, analysis, indexing, retrieval and scoring of text by utilizing emotional intelligence-based factors. Text quality is scored based upon character development, rhythm, per-page quality, gaps, and climaxes, among other factors. The scores may be standardized by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. | 10-01-2009 |
20090248400 | Rule Based Apparatus for Modifying Word Annotations - A rule based apparatus and method for modifying word annotations in an annotated text base is described. The apparatus includes an index creator component for creating an index of word annotations. An annotations modifying component for modifying word annotations, and a retriggering component, responsive to said annotations modifying component, for retriggering a rules engine to modify all occurrences of a matching word annotation in said annotated text base and updating the index of word annotations with the modified occurrences of a matching word annotation in said annotated text base. | 10-01-2009 |
20090254336 | PROVIDING A TASK DESCRIPTION NAME SPACE MAP FOR THE INFORMATION WORKER - Providing for generation of a task oriented data structure that can correlate natural language descriptions of computer related tasks to application level commands and functions is described herein. By way of example, a system can include an activity translation component that can receive a natural language description of an application level task. Furthermore, the system can include a language modeling component that can generate the data structure based on an association between the description of the task and at least one application level command utilized in executing the computer related task. Once generated, the data structure can be utilized to automate computer related tasks by input of a human centric description of those tasks. According to further embodiments, machine learning can be employed to train classifiers and heuristic models to optimize task/description relationships and/or tailor such relationships to the needs of particular users. | 10-08-2009 |
20090254337 | COMPUTER-IMPLEMENTED METHOD AND SYSTEM FOR CONDUCTING A SEARCH OF ELECTRONICALLY STORED INFORMATION - A computer-implemented method, system, and computer program product are provided for conducting a search of electronically stored information. The method includes: (a) providing a user with an interactive targeting rule editor to enable the user to formulate a targeting rule to identify desired search results, the targeting rule comprising a natural language text string, the interactive targeting rule editor allowing the user to change one or more designated editable portions of the natural language text string to one of a set of specified alternate portions, delete one or more designated removable portions of the natural language text string, or add one or more of a set of specified insertable portions to form a syntactically valid targeting rule in accordance with a targeting rule grammar; (b) receiving the text string or a representation thereof from the user; (c) translating the text string or a representation thereof into an executable query; and (d) executing the executable query against the electronically stored information to generate search results. | 10-08-2009 |
20090259459 | CONCEPTUAL WORLD REPRESENTATION NATURAL LANGUAGE UNDERSTANDING SYSTEM AND METHOD - A Natural Language Understanding system is provided for indexing of free text documents. The system according to the invention utilizes typographical and functional segmentation of text to identify those portions of free text that carry meaning. The system then uses words and multi-word terms and phrases identified in the free to text to identify concepts in the free text. The system uses a lexicon of terms linked to a formal ontology that is independent of a specific language to extract concepts from the free text based on the words and multi-word terms in the free text. The formal ontology contains both language independent domain knowledge concepts and language dependent linguistic concepts that govern the relationships between concepts and contain the rules about how language works. The system according to the current invention may preferably be used to index medical documents and assign codes from independent coding systems, such as, SNOMED, ICD-9 and ICD-10. The system according to the current invention may also preferably make use of syntactic parsing to improve the efficiency of the method. | 10-15-2009 |
20090265160 | COMPARING TEXT BASED DOCUMENTS - Text based documents are compared by lexically normalising each word of the text of a first document ( | 10-22-2009 |
20090265161 | TRANSFORMING A NATURAL LANGUAGE REQUEST FOR MODIFYING A SET OF SUBSCRIPTIONS FOR A PUBLISH/SUBSCRIBE TOPIC STRING - A method, apparatus and software is disclosed for transforming a natural language request for modifying a set of subscriptions for a publish/subscribe topic string in which a predetermined element in the request is transformed into a publish/subscribe symbol in the topic string. | 10-22-2009 |
20090265162 | Method for Retrieving Items Represented by Particles from an Information Database - A set of words is converted to a corresponding set of particles, wherein the words and the particles are unique within each set. For each word, all possible partitionings of the word into particles are determined, and a cost is determined for each possible partitioning. The particles of the possible partitioning associated with a minimal cost are added to the set of particles. | 10-22-2009 |
20090271179 | METHOD AND SYSTEM FOR EXTENDING KEYWORD SEARCHING TO SYNTACTICALLY AND SEMANTICALLY ANNOTATED DATA - Methods and systems for extending keyword searching techniques to syntactically and semantically annotated data are provided. Example embodiments provide a Syntactic Query Engine (“SQE”) that parses, indexes, and stores a data set as an enhanced document index with document terms as well as information pertaining to the grammatical roles of the terms and ontological and other semantic information. In one embodiment, the enhanced document index is a form of term-clause index, that indexes terms and syntactic and semantic annotations at the clause level. The enhanced document index permits the use of a traditional keyword search engine to process relationship queries as well as to process standard document level keyword searches. In one embodiment, the SQE comprises a Query Processor, a Data Set Preprocessor, a Keyword Search Engine, a Data Set Indexer, an Enhanced Natural Language Parser (“ENLP”), a data set repository, and, in some embodiments, a user interface or an application programming interface. | 10-29-2009 |
20090276208 | REDUCING SPAM EMAIL THROUGH IDENTIFICATION OF SOURCE - Embodiments of the present invention address deficiencies of the art in respect to email and provide a novel and non-obvious method and computer program product for detecting undesirable email. In one embodiment of the invention, the method includes receiving an email including text and identifying at least one natural language grammar mistake in the text. The method further includes calculating a country of origin of an author of the text based on the at least one natural language grammar mistake and calculating a first value based on the country of origin of the author of the text. The method further includes correcting the at least one natural language grammar mistake in the text and determining whether the email is undesirable based on the text that was corrected and the first value | 11-05-2009 |
20090276209 | SYSTEM AND METHOD FOR AUTOMATICALLY PROCESSING CANDIDATE RESUMES AND JOB SPECIFICATIONS EXPRESSED IN NATURAL LANGUAGE INTO A NORMALIZED FORM USING FREQUENCY ANALYSIS - Systems and methods for automatically processing candidate resumes and job specifications expressed in natural language into a normalized form using frequency analysis. A database of elements is provided in which each element is expressed in natural language and at least some of which are associated with a corresponding set of synonymous words or phrases. Candidate resumes and job specifications are received in electronic form and expressed in natural language. The candidate resumes and job specifications are analyzed to extract elements expressed in candidate resumes and job specifications. The extracted elements are compared to the database. For each extracted element, the most frequent element or synonym is identified an used as a common form for the extracted element. A set of candidate resumes is matched with a corresponding job specification by comparing the set of elements expressed in common form for the resumes with the set of elements expressed in common form for the job specification. | 11-05-2009 |
20090281791 | UNIFIED TAGGING OF TOKENS FOR TEXT NORMALIZATION - Raw input text is received, and divided into sequences of tokens. Each token is marked with a text normalization tag that identifies a text normalization operation to be performed on the token during text normalization. The tags are assigned to the tokens by determining a most likely tag sequence, given the sequence of tokens being processed. The text normalization operations are performed on the tokens in order to provide clean output text, which can be output for further natural language processing. | 11-12-2009 |
20090281792 | SELF-LEARNING DATA LENSES - A semantic conversion system ( | 11-12-2009 |
20090292525 | APPARATUS, METHOD AND STORAGE MEDIUM STORING PROGRAM FOR DETERMINING NATURALNESS OF ARRAY OF WORDS - An apparatus is provided which determines the naturalness of an array of words as a sentence. When an entire source text to be translated is not registered in a lexicon, the source text is divided into plural words. A parallel translation for each word in the source text is obtained to generate parallel translation patterns, and a web search is made for a text which includes each of the parallel translation patterns (step | 11-26-2009 |
20090292526 | MONITORING CONVERSATIONS TO IDENTIFY TOPICS OF INTEREST - A system and method for monitoring conversations of a community of users to identify topics of interest is provided. A user community which is based partly on social networking connections relative to a first user is identified. Conversations involving at least one member of the identified user community are monitored. Based in part on an aggregated analysis of the monitored conversations, keywords are selected to present to the first user. The first user is then provided with a display in which the selected keywords associated with the user community are presented to the first user such that the first user can select a keyword to access content associated therewith. | 11-26-2009 |
20090292527 | Methods, Apparatuses and Computer Program Products for Receiving and Utilizing Multidimensional Data Via A Phrase - Methods, apparatuses and computer program products are provided for receiving multidimensional data via a phrase. In this regard, various exemplary embodiments may guide a user in defining a phrase on a segment-by-segment basis. Recommendations may be provided to the user to guide the user in defining the segment to thereby define the phrase. Upon defining the phrase, the phrase may be parsed into one or more segments. The parsed segments may provide information about the phrase, and content associated with the parsed segments may be linked to data fields of, for example, a search engine or database. Using the linked data fields, operations may be performed with respect to the phrase including searches for data or storage of data. | 11-26-2009 |
20090292528 | APPARATUS FOR PROVIDING INFORMATION FOR VEHICLE - A system is provided with a conversation support means. A conversation support means creates a conversation response, and outputs it in a sound, a character, etc. A conversation response is created in a manner that combines words by inserting a reference keyword as a leading keyword in the response sentence model prepared separately. A conversation support means retrieves the reference keyword beforehand provided in conversation support by dictionary collation from the conversation entry content made by a sound, a manual entry, etc. by a user. Furthermore, the retrieved reference keyword themselves or another reference keyword associated with the retrieved reference keyword are handled as a leading keyword. A series of user conversation contents inputted by the conversation support are accumulated as a base data for determining a user interest. The base data is analyzed to determine a user interest for providing suitable information service. | 11-26-2009 |
20090292529 | SYSTEM AND METHOD OF PROVIDING A SPOKEN DIALOG INTERFACE TO A WEBSITE - Disclosed is a system and method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes converting data from a structured database associated with a website to a structured text data set and a structured task knowledge base, extracting linguistic items from the structured database, and training a spoken dialog service component using at least one of the structured text data, the structured task knowledge base, or the linguistic items. The system includes modules configured to implement the method. | 11-26-2009 |
20090299729 | PARALLEL FRAGMENT EXTRACTION FROM NOISY PARALLEL CORPORA - Machine translation algorithms for translating between a first language and a second language are often trained using parallel fragments, comprising a first language corpus and a second language corpus comprising an element-for-element translation of the first language corpus. Such training may involve large training sets that may be extracted from large bodies of similar sources, such as databases of news articles written in the first and second languages describing similar events; however, extracted fragments may be comparatively “noisy,” with extra elements inserted in each corpus. Extraction techniques may be devised that can differentiate between “bilingual” elements represented in both corpora and “monolingual” elements represented in only one corpus, and for extracting cleaner parallel fragments of bilingual elements. Such techniques may involve conditional probability determinations on one corpus with respect to the other corpus, or joint probability determinations that concurrently evaluate both corpora for bilingual elements. | 12-03-2009 |
20090299730 | MOBILE TERMINAL AND METHOD FOR CORRECTING TEXT THEREOF - A method for selecting text created in a mobile terminal by word and correcting it or changing it to another word, and a mobile terminal implementing the same are disclosed. The mobile terminal includes: a display unit to display one or more words of text, and to display tags for each of the one or more words; an input unit to select at least one of the tagged one or more words as selected one word; and a controller to display candidate words having a similar pronunciation to that of the word selected via the input unit, select one of the candidate words as selected one candidate word, and change the selected one word from the text to the selected one candidate word. | 12-03-2009 |
20090299731 | AURAL SIMILARITY MEASURING SYSTEM FOR TEXT - The aural similarity measuring system and method provides a measure of the aural similarity between a target text ( | 12-03-2009 |
20090306961 | SEMANTIC RELATIONSHIP-BASED LOCATION DESCRIPTION PARSING - An automated arrangement for parsing location descriptions is provided in which semantic verification is integrated into a parsing process to reduce the generation of false results. The semantic verification involves checking up to three semantic relationships between keywords (i.e., syntactical components) parsed from the location description in a tokenization process to determine if a tokenization result is valid. The semantic relationships include: a) a spatial “part-of” relationship between location keywords; b) a spatial “near-by” relationship; and, c) a spatial “intersect” relationship. The semantic relationships between particular locations may be pre-calculated and stored as extended vocabulary to enable the semantic verification to occur early in the parsing process to thus increase overall parsing efficiency. The results of the parsing are sorted based on a rank score that is derived using the semantic relationships between the locations. | 12-10-2009 |
20090306962 | SYSTEM AND METHOD TO PROVIDE WARNINGS ASSOCIATED WITH NATURAL LANGUAGE SEARCHES TO DETERMINE INTENDED ACTIONS AND ACCIDENTAL OMISSIONS - A method for providing notification of content potentially omitted from within an active document in a document preparation application comprises defining a natural language model for a set of phrasal forms associating each phrasal form with a content type; parsing a textual content of the active document to generate one or more natural language tokens; accessing the natural language model to identify each of the one or more natural language tokens that matches with a phrasal form; generating a list of expected content items having an expected content item for each of the one or more natural language tokens that matches with a phrasal form; scanning the active document to attempt to locate each expected content item; and displaying a notification of each expected content item not located. Each expected content item is generated based upon the content type associated with the corresponding matching phrasal form in the natural language model. | 12-10-2009 |
20090306963 | Representation of objects and relationships in databases, directories, web services, and applications as sentences as a method to represent context in structured data - Systems and methods are disclosed for tagging and translating database objects and relationships into sentences. The successive composition of these sentences form hierarchies which encode contextual information about the objects. A virtual directory/context server functions using a common abstraction layer to access data from databases, applications, directories, Web Services, and other data sources within the enterprise. The virtual directory/context server includes a sentence/context builder module that enables the translation or relationships between data and from the plurality of data sources into a human-readable form, for example, an English language sentence. Thus, applications can view, access, and/or modify the data from the data sources of the enterprise through the virtual directory/context server, for example, using the sentences representative of the relationships between the data. The sentences are indexed, which allows for searches that bring information not only about objects, but also about the context in which those objects appear. | 12-10-2009 |
20090306964 | DATA DETECTION - A method of processing a sequence of characters, the method comprising converting the sequence of characters into a sequence of tokens so that each token comprises a lexeme and one of a plurality of token types. Each of the plurality of token types relates to at least one of a plurality of predetermined functions, wherein at least one said token type relates to multiple functions of the plurality of predetermined functions. | 12-10-2009 |
20090306965 | DATA DETECTION - An apparatus for processing a sequence of tokens to detect predetermined data, wherein each said token has a token type, and the predetermined data has a structure that comprises a predetermined sequence of token types, including at least one optional token type. The apparatus comprises a processor arranged to: provide a tree for detecting the predetermined data, the tree comprising a plurality of states, each said state being linked with at least one other state by a respective condition, the arrangement of linked states forming a plurality of paths; and compare the token types of the sequence of tokens to respective conditions in the tree to match the sequence of tokens to one or more paths in the tree, wherein the predetermined data can be detected without using an epsilon reduction to take account of said at least one optional token type. | 12-10-2009 |
20090306966 | Method and apparatus to determine and use audience affinity and aptitude - An embodiment of the present invention is a method of presenting a media work which includes: detecting media work content properties in a portion of the media work; associating a presentation rate of the portion with the detected media work content properties; and presenting the portion at the presentation rate; wherein the media work content properties include one or more of: (a) indicia of a number of syllables in utterances; (b) indicia of a number of letters in a word; (c) indicia of the complexity of grammatical structures in portions of the media work; (d) indicia of arrival rate of newly presented objects; (e) indicia of temporal proximity of between events in portions of the media work or (f) indicia of number of phonemes per unit of time in portions of the media work. | 12-10-2009 |
20090306967 | Automatic Sentiment Analysis of Surveys - In one aspect, the invention provides apparatuses and methods for determining the sentiment expressed in answers to survey questions. Advantageously, the sentiment may be automatically determined using natural language processing. In another aspect, the invention provides apparatuses and methods for analyzing the sentiment of survey respondents and presenting the information as actionable data. | 12-10-2009 |
20090306968 | SYSTEM AND METHOD OF GRANTING IDENTIFICATION CODES TO ELECTRONIC TEACHING MATERIAL CONTENTS' SENTENCE STRUCTURES, SYSTEM AND METHOD OF SEARCHING DATA OF ELECTRONIC TEACHING MATERIAL CONTENTS, SYSTEM AND METHOD OF MANAGING POINTS OF USE AND SERVICE OF ELECTRONIC TEACHING MATERIAL CONTENTS - Disclosed is a system, which grants identification code to sentence structures of electronic teaching material contents, includes the following units. The identification code production unit distinguishes each syllable of electronic teaching material content's selected sentence structure according to type of language, and produces peculiar identification code using the first phoneme or syllable of each syllable. The identification code grant unit grants identification code to metadata of file which stores electronic teaching material contents of above. | 12-10-2009 |
20090326919 | Acquisition and application of contextual role knowledge for coreference resolution - Coreference resolution is the process of identifying when two noun phrases (NP) refer to the same entity. Two main contributions to computational coreference resolution are made. First, this work contributes a new method for recognizing when an NP is anaphoric. Second, traditional approaches to coreference resolution typically select the most appropriate antecedent by recognizing word similarity, proximity, and agreement in number, gender, and semantic class. This work contributes a new source of evidence that focuses on the roles that an anaphor and antecedent play in particular events or relationships. I show that using contextual role knowledge as part of the coreference resolution process increases the number of anaphors that can be resolved, and I demonstrate an unsupervised method for acquiring contextual role knowledge that does not require an annotated training corpus. A probabilistic model based on the Dempster-Shafer model of evidence is used to incorporate contextual role knowledge with traditional evidence sources. | 12-31-2009 |
20090326920 | Linguistic Service Platform - Linguistic service platform techniques are described. In implementations, one or more computer-readable media comprise instructions that are executable by a computer to designate a linguistic service having a particular property responsive to an application program interface call specifying the property. Communication may be brokered between the linguistic service and the application so that communication occurs without the application directly communicating with the linguistic service. | 12-31-2009 |
20090326921 | GRAMMAR CHECKER FOR VISUALIZATION - A visualization development system is provided. The system includes a visualization tool to develop one or more visualizations and a grammar engine that operates with the visualization tool to automatically detect visualization problems during the development of the visualizations. | 12-31-2009 |
20090326922 | CLIENT SIDE RECONCILIATION OF TYPOGRAPHICAL ERRORS IN MESSAGES FROM INPUT-LIMITED DEVICES - A method for reconciling typographical errors, includes: receiving an electronic text message from a pervasive device with limited input keypads on a receiving device configured with a messaging application; determining an input protocol of the pervasive device; examining the electronic text message for words that are not in the messaging application's dictionary; identifying words that are not in the messaging application's dictionary; mapping each of the identified words to a set of keystrokes used to produce each of the identified words based on a series of input protocols that the receiving device has stored in a memory; utilizing each set of keystrokes from each of the input protocols in an algorithm to compute each permutation of the keystrokes; checking the computed permutations against the messaging application's dictionary to determine viable matches of the computed permutations; and presenting the viable matches to a user of the receiving device. | 12-31-2009 |
20090326923 | METHOD AND APPARATUS FOR NAMED ENTITY RECOGNITION IN NATURAL LANGUAGE - The present invention provides a method for recognizing a named entity included in natural language, comprising the steps of: performing gradual parsing model training with the natural language to obtain a classification model; performing gradual parsing and recognition according to the obtained classification model to obtain information on positions and types of candidate named entities; performing a refusal recognition process for the candidate named entities; and generating a candidate named entity lattice from the refusal-recognition-processed candidate named entities, and searching for a optimal path. The present invention uses a one-class classifier to score or evaluate these results to obtain the most reliable beginning and end borders of the named entities on the basis of the forward and backward parsing and recognizing results obtained only by using the local features. | 12-31-2009 |
20090326924 | Projecting Semantic Information from a Language Independent Syntactic Model - Embodiments for the conversion of Computational Independent Model (CIM) rule expressions into semantically non-ambiguous syntax trees are disclosed. In accordance with one embodiment, a method includes analyzing a sentential structure of a Computational Independent Model (CIM) rule expression for clauses. The clauses include at least one expression and at least one rule. The method further includes constructing a semantically non-ambiguous LF syntax tree from the CIM rule expression. The construction being implemented using a logical form (LF) model. | 12-31-2009 |
20090326925 | PROJECTING SYNTACTIC INFORMATION USING A BOTTOM-UP PATTERN MATCHING ALGORITHM - Embodiments for converting a token collection that is derived from a natural language expression into a computational independent model (CIM) syntax tree representation are disclosed. In accordance with one embodiment, the conversion includes deriving a plurality of tokens from a natural language expression, where each of the plurality of tokens including at least one word. The conversion further includes transforming the plurality of tokens into a CIM syntax tree representation based on a CIM phrase tree model. The conversion also includes providing the CIM syntax tree representation to an application. | 12-31-2009 |
20090326926 | Displaying Time-Series Data and Correlated Events Derived from Text Mining - The present invention is directed to a method and system for correlating time-series data with events derived from text mining. The system is configured to receive a time period and a parameter concerning an entity, retrieve an event which is related to the entity and occurred within the time period from events which are previously extracted automatically from unstructured text, and display an indication of the event superimposed on a display representing the time series of the parameter for the time period. | 12-31-2009 |
20100004921 | AUTO-GENERATED TO-DO LIST - Methods, systems, and computer readable media for providing an auto-generated to-do list are described. Text is received in an instant messenger conversation, wherein the text comprises a task sender, a task body, and a task date, and an input is received selecting a selection of the text, wherein the selection comprises the task body. The text is analyzed to identify the task sender, the task body, and the task date. The task is then entered into the to-do list, wherein the task comprises the task sender, the task body, and the task date, thereby providing an auto-generated to-do list. | 01-07-2010 |
20100004922 | METHOD AND SYSTEM FOR AUTOMATICALLY GENERATING REMINDERS IN RESPONSE TO DETECTING KEY TERMS WITHIN A COMMUNICATION - A computer-implemented method of automatically generating an electronic reminder is provided. The method includes identifying, using term-recognition circuitry, at least one key term within an electronic message received with an electronic communications device. The method further includes generating at least one reminder based upon the at least one key term. One or more reminders are, according to the method, electronically conveyed to a user at a time later than when the message was received. | 01-07-2010 |
20100004923 | Method and an apparatus for clustering process models - The invention relates to an apparatus for clustering process models each consisting of model elements comprising a text phrase which describes in a natural language a process activity according to a process modeling language grammar and a natural language grammar, wherein said apparatus comprises a process object ontology memory for storing a process object ontology, a distance calculation unit for calculating a distance matrix employing said processing modeling language grammar and said natural language grammar, wherein said distance matrix consists of distances each indicating a dissimilarity of a pair of said process models, and a clustering unit which partitions said process models into a set of clusters based on said calculated distance matrix. | 01-07-2010 |
20100004924 | Method and system context-aware for identifying, activating and executing software that best respond to user requests generated in natural language - A computer-implemented method capable of identifying, activating, and executing commands, methods, functions, interfaces, and software-based applications that can satisfy a specific natural language user request represented by a text stream and generated from any means such as typing, voice, gestures, signs or by human thoughts. | 01-07-2010 |
20100004925 | Clique based clustering for named entity recognition system - A soft clustering method comprises (i) grouping items into non-exclusive cliques based on features associated with the items, and (ii) clustering the non-exclusive cliques using a hard clustering algorithm to generate item groups on the basis of mutual similarity of the features of the items constituting the cliques. In some named entity recognition embodiments illustrated herein as examples, named entities together with contexts are grouped into cliques based on mutual context similarity. Each clique includes a plurality of different named entities having mutual context similarity. The cliques are clustered to generate named entity groups on the basis of mutual similarity of the contexts of the named entities constituting the cliques. | 01-07-2010 |
20100010800 | Automatic Pattern Generation In Natural Language Processing - Disclosed herein is a computer implemented method and system of generating declared patterns from components of a sentence. Parts of speech in the sentence are tagged for identifying parts of speech of each word and phrase in the sentence. Sentence chunking is then performed using the identified parts of speech of each word and phrase to generate pattern units. A first dictionary and a database of equivalent pattern specification sets are then applied to identify grammatical roles and senses of the generated pattern units. A second dictionary and a conceptionary are then applied to identify an equivalent name set for each of the generated pattern units. The declared patterns are then generated for the sentence using the identified equivalent name set for each of the generated pattern units. | 01-14-2010 |
20100010801 | CONFLICT RESOLUTION AND ERROR RECOVERY STRATEGIES - A plethora of strategies is afforded to facilitate conflict resolution and error recovery with respect to parsing, among other things. Grammar authors can select amongst a range of strategies or options on a case-by-case basis to address conflicts, ambiguities, errors, and the like. The strategies can be either static or dynamic. In one instance, code external to a parsing system can be invoked to resolve conflicts or recover from errors, and further enable change of strategy without requiring modification of the parser. Interaction between the parsing system and the external code can also be formalized to ensure general type safety of the system. | 01-14-2010 |
20100010802 | System and Method for User Skill Determination - A system comprises a user interface configured to receive natural language input from a user. An input module couples to the user interface and is configured to process the received natural language input for selected words and phrases. A user skill determination module couples to the input module and is configured to determine a skill level of the user based on the selected words and phrases. | 01-14-2010 |
20100010803 | TEXT PARAPHRASING METHOD AND PROGRAM, CONVERSION RULE COMPUTING METHOD AND PROGRAM, AND TEXT PARAPHRASING SYSTEM - A paraphrase model of a question text inputted by a user is learned, and a paraphrase expression is generated in real time. When information in text set storage unit is updated, text pair extracting unit extracts a paraphrase text pair from the text set storage unit and stores it in text pair storage unit. Model learning unit learns a question text paraphrase model from the paraphrase text pair in text pair storage unit, and stores it in model storage unit. Text pair extracting unit extracts a paraphrase text pair again from the text set storage unit by using the question text paraphrase model which the model storage unit possesses, and stores it in the text pair storage unit. In case where the stored paraphrase text pair is the same as the paraphrase text pair stored in the text pair storage unit, learning of the question text paraphrase model is ended. Candidate creating unit reads the question text paraphrase model from the model storage unit and generates a paraphrase candidate of the inputted question text. | 01-14-2010 |
20100010804 | METHODS AND SYSTEMS FOR EXTRACTING PHENOTYPIC INFORMATION FROM THE LITERATURE VIA NATURAL LANGUAGE PROCESSING - Systems and methods for extracting and encoding genotype-phenotype information from journal articles and other publications are provided. In some embodiments, the disclosed subject matter includes a preprocessor, boundary identifier, parser, phrase recognizer and an encoder to convert natural-language input text and parameters into structured text. The structured text can take the form of codes which account for genotype-phenotype information and are compatible with a controlled vocabulary. | 01-14-2010 |
20100010805 | RELATIVE DELTA COMPUTATIONS FOR DETERMINING THE MEANING OF LANGUAGE INPUTS - A method for processing language input can include the step of determining at least two possible meanings for a language input. For each possible meaning, a probability that the possible meaning is a correct interpretation of the language input can be determined. At least one relative data computation can be computed based at least in part upon the probabilities. At least one irregularity within the language input can be detected based upon the relative delta computation. The irregularity can include mumble, ambiguous input, and/or compound input. At least one programmatic action can be performed responsive to the detection of the irregularity. | 01-14-2010 |
20100017194 | System and method for suggesting recipients in electronic messages - A system and method for dynamically recognizing a potential recipient of an electronic message. The method includes receiving content input for an electronic communication. The electronic communication includes at least one field of a plurality of fields, including a subject line, a message body, and a recipient address field. The at least one field of the electronic communication is populated with the content input. The method also includes parsing the content input of the at least one field of the electronic communication. The method also includes semantically analyzing the parsed content input of the at least one field of the electronic communication to identify a content qualifier of a recipient rule. The method also includes suggesting a potential recipient of the electronic communication based on the content qualifier of the recipient rule associated with the content input of the at least one of field of the electronic communication. | 01-21-2010 |
20100023318 | METHOD AND DEVICE FOR RETRIEVING DATA AND TRANSFORMING SAME INTO QUALITATIVE DATA OF A TEXT-BASED DOCUMENT - Method for extracting information from a data file comprising a first step wherein the data are transmitted to a device ( | 01-28-2010 |
20100023319 | MODEL-DRIVEN FEEDBACK FOR ANNOTATION - A system, a method and a computer readable media for providing model-driven feedback to human annotators. In one exemplary embodiment, the method includes manually annotating an initial small dataset. The method further includes training an initial model using said annotated dataset. The method further includes comparing the annotations produced by the model with the annotations produced by the annotator. The method further includes notifying the annotator of discrepancies between the annotations and the predictions of the model. The method further includes allowing the annotator to modify the annotations if appropriate. The method further includes updating the model with the data annotated by the annotator. | 01-28-2010 |
20100023320 | SYSTEM AND METHOD OF SUPPORTING ADAPTIVE MISRECOGNITION IN CONVERSATIONAL SPEECH - A system and method are provided for receiving speech and/or non-speech communications of natural language questions and/or commands and executing the questions and/or commands. The invention provides a conversational human-machine interface that includes a conversational speech analyzer, a general cognitive model, an environmental model, and a personalized cognitive model to determine context, domain knowledge, and invoke prior information to interpret a spoken utterance or a received non-spoken message. The system and method creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech or non-speech communication and presenting the expected results for a particular question or command. | 01-28-2010 |
20100030552 | DERIVING ONTOLOGY BASED ON LINGUISTICS AND COMMUNITY TAG CLOUDS - In some embodiments, a method comprises receiving a tag cloud including tags that hyperlink to web content. The method can also comprise separating the tags into different linguistic categories, assigning a weight to each tag, and grouping the tags into clusters, wherein tags in a cluster are associated with a context. The method can also include determining one or more domains for the tag clusters, wherein a domain is a broadest class that defines one or more of the tags in a linguistic category, determining a hierarchy for the tags based on the weights of the tags, and identifying linguistic relationships between the tags. The method can also comprise determining properties associated with one or more of the tags and one or more of the domains, wherein the tag's properties are determined using linguistic analysis and storing the tags, the hierarchies, the linguistic relationships, and the properties. | 02-04-2010 |
20100030553 | Linguistic Analysis - A method of operating a computer to perform linguistic analysis includes the steps of splitting an input text into words and sentences; for each sentence, comparing phrases in the sentence with known phrases stored in a database, as follows: for each word in the sentence, comparing its value and values of words following it with values of words of stored phrases, starting with the longest stored phrase that starts with that word, and working from longest to shortest; in the event a match is found for two or more consecutive words, and considering the words around the phrase, labelling the matched phrase with an overphrase that describes the grammar use of the matched phrase; after the penultimate word has been compared, recasting the sentence by replacing the matched phrases by their respective overphrases; and then repeating the comparison process with the recast sentence until there is no further recasting. | 02-04-2010 |
20100036654 | Systems and methods for identifying collocation errors in text - Systems and methods for detecting collocation errors in a text sample using a reference database from a corpus are provided. Collocation candidates are identified within the text sample based upon syntactic patterns in the text sample. Whether a given collocation candidate contains a collocation error is detected, the detecting including: determining a first association measure using the reference database for the given collocation candidate; determining whether the first association measure satisfies a predetermined condition and identifying the given collocation candidate as proper if the first association measure satisfies the predetermined condition; determining an additional association measure for a variation of the given collocation candidate using the reference database; and determining whether or not the collocation candidate contains an error based upon the additional association measure of the variation. | 02-11-2010 |
20100042400 | Method for Triggering at Least One First and Second Background Application via a Universal Language Dialog System - At least one transaction and at least one transaction parameter that is allocated thereto are determined based on at least one user statement in order to trigger at least one first and second background application via a universal language dialogue system, first transactions and first transaction parameters being assigned to the first background application and second transactions and second transaction parameters being associated with the second background application. The first and second transactions as well as the first and second transaction parameters are linked together via a universal dialogue specification which is evaluated to determine the at least one transaction and at least on associated transaction parameter in order to trigger at least one of the background application via the universal language dialogue system. | 02-18-2010 |
20100042401 | Semantic Cognitive Map - A semantic cognitive map created by associating each of a multitude of dictionary entries with a point among a multitude of points in a metric space, each of the dictionary entries associated with at least one onym, the at least one onym including at least one synonym or antonym, the metric space having a topology and metrics, the location of each of the multitude of points defined by a global minimum of an energy function of the multitude of points. | 02-18-2010 |
20100042402 | APPARATUS, AND ASSOCIATED METHOD, FOR DETECTING FRAUDULENT TEXT MESSAGE - An apparatus, and an associated method, detects spam and other fraudulent messages sent to a recipient station. The textual portion of a received message is analyzed to determine whether the message includes errors made by non-native language speakers when authoring a text message. A text analysis engine analyzes the text using rules sets that identify grammatical errors made by non-native language speakers, usage errors made by non-native language speakers, and other errors. | 02-18-2010 |
20100042403 | CONTEXT BASED ONLINE ADVERTISING - A software and/or hardware facility for inferring user context and delivering advertisements, such as coupons, using natural language and/or sentiment analysis is disclosed. The facility may infer context information based on a user's emotional state, attitude, needs, or intent from the user's interaction with or through a mobile device. The facility may then determine whether it is appropriate to deliver an advertisement to the user and select an advertisement for delivery. The facility may also determine an appropriate expiration time and/or discount amount for the advertisement. | 02-18-2010 |
20100042404 | METHOD FOR BUILDING A NATURAL LANGUAGE UNDERSTANDING MODEL FOR A SPOKEN DIALOG SYSTEM - A method of generating a natural language model for use in a spoken dialog system is disclosed. The method comprises using sample utterances and creating a number of hand crafted rules for each call-type defined in a labeling guide. A first NLU model is generated and tested using the hand crafted rules and sample utterances. A second NLU model is built using the sample utterances as new training data and using the hand crafted rules. The second NLU model is tested for performance using a first batch of labeled data. A series of NLU models are built by adding a previous batch of labeled data to training data and using a new batch of labeling data as test data to generate the series of NLU models with training data that increases constantly. If not all the labeling data is received, the method comprises repeating the step of building a series of NLU models until all labeling data is received. After all the training data is received, at least once, the method comprises building a third NLU model using all the labeling data, wherein the third NLU model is used in generating the spoken dialog service. | 02-18-2010 |
20100049498 | DETERMINING UTILITY OF A QUESTION - A question search system provides a collection of questions having words for use in evaluating the utility of the questions based on a language model. The question search system calculates n-gram probabilities for words within the questions of the collection. The n-gram probability of a word for a sequence of n−1 words indicates the probability of that word being next after that sequence in the collection of questions. The n-gram probabilities for the words of the collection represent the language model of the collection. The question search system calculates a language model utility score for each question within a collection that indicates the likelihood that a question is repeatedly asked by users. The question search system derives the language model utility score for a question from the n-gram probabilities of the words within that question. | 02-25-2010 |
20100049499 | DOCUMENT ANALYZING APPARATUS AND METHOD THEREOF - In a document analyzing apparatus ( | 02-25-2010 |
20100049500 | DIALOGUE GENERATION APPARATUS AND DIALOGUE GENERATION METHOD - A dialogue generation apparatus includes a transmission/reception unit configured to receive incoming text and transmit return text, a presentation unit configured to present the contents of the incoming text to a user, a morphological analysis unit configured to perform a morphological analysis of the incoming text to obtain first words included in the incoming text and linguistic information on the first words, a selection unit configured to select second words that characterize the contents of the incoming text from the first words based on the linguistic information, a speech recognition unit configured to perform speech recognition of the user's speech after the presentation of the incoming text in such a manner that the second words are recognized preferentially, and produce a speech recognition result representing the contents of the user's speech, and a generation unit configured to generate the return text based on the speech recognition result. | 02-25-2010 |
20100049501 | DYNAMIC SPEECH SHARPENING - An enhanced system for speech interpretation is provided. The system may include receiving a user verbalization and generating one or more preliminary interpretations of the verbalization by identifying one or more phonemes in the verbalization. An acoustic grammar may be used to map the phonemes to syllables or words, and the acoustic grammar may include one or more linking elements to reduce a search space associated with the grammar. The preliminary interpretations may be subject to various post-processing techniques to sharpen accuracy of the preliminary interpretation. A heuristic model may assign weights to various parameters based on a context, a user profile, or other domain knowledge. A probable interpretation may be identified based on a confidence score for each of a set of candidate interpretations generated by the heuristic model. The model may be augmented or updated based on various information associated with the interpretation of the verbalization. | 02-25-2010 |
20100049502 | METHOD AND SYSTEM OF GENERATING REFERENCE VARIATIONS FOR DIRECTORY ASSISTANCE DATA - Methods and systems of performing user input recognition are disclosed. A digital directory comprising listings is accessed. Metadata information is associated with individual listings describing the individual listings. The metadata information is modified to generate transformed metadata information. Therefore, the transformed metadata information is generated as a function of context information relating to a typical user interaction with the listings. Information is generated for aiding in an automated user input recognition process based on the transformed metadata information. | 02-25-2010 |
20100049503 | Method and apparatus for processing natural language using tape-intersection - Operations for weighted and non-weighted multi-tape automata are described for use in natural language processing tasks such as morphological analysis, disambiguation, and entity extraction. | 02-25-2010 |
20100057442 | DEVICE, METHOD, AND PROGRAM FOR DETERMINING RELATIVE POSITION OF WORD IN LEXICAL SPACE - The position of a word in the lexical space is determined stably and highly accurately by arbitrarily setting a predetermined initial condition, determining the occurrence frequency and cooccurrence relationship of the word under a given condition, and minimizing the difference between the values of the occurrence frequency and cooccurrence and the initial layout values arbitrarily set. | 03-04-2010 |
20100057443 | SYSTEMS AND METHODS FOR RESPONDING TO NATURAL LANGUAGE SPEECH UTTERANCE - Systems and methods are provided for receiving speech and non-speech communications of natural language questions and/or commands, transcribing the speech and non-speech communications to textual messages, and executing the questions and/or commands. The invention applies context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users presenting questions or commands across multiple domains. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech and non-speech communications and presenting the expected results for a particular question or command. | 03-04-2010 |
20100063795 | DATA PROCESSING DEVICE, DATA PROCESSING METHOD, AND DATA PROCESSING PROGRAM - [PROBLEMS] To provide a data processing device such as a text mining device capable of extracting characteristic structures properly even in case a plurality of words indicating identical contents or a plurality of words semantically associated are contained in input data. [MEANS FOR SOLVING PROBLEMS] Association node extraction unit ( | 03-11-2010 |
20100063796 | Word Sense Disambiguation Using Emergent Categories - Disclosed herein is a computer implemented method and system for word sense disambiguation in a natural language sentence. The natural language sentence is parsed for identifying possible parts of speech for each term and identifying possible phrase structures. Terms comprising one or more linguistic roles are identified. The possible sense combinations for the terms with linguistic roles are identified. Emergent categories are applied to identify possible valid senses for each of the terms with identified linguistic roles. Linguistic role pairs are identified from among the terms identified with linguistic roles. The correspondence functions with the correspondence function types matching the identified linguistic role pairs are identified from an emergent categories database. The pair-wise senses for each term are compared with the identified linguistic roles to identify the possible sense combinations. The possible senses are inferred for each term with identified linguistic roles in the natural language sentence and previous sentences. | 03-11-2010 |
20100063797 | DISCOVERING QUESTION AND ANSWER PAIRS - The present invention provides a new approach to extracting question-answer pairs from online forums. The system develops a classification-based technique to discover questions in forums using sequential patterns automatically extracted from both questions and non-question sentences in forums as features. Once the questions are discovered, the system discovers the answers. The invention includes a graph-based method is that it is complementary with supervised methods for knowledge extraction, and techniques for question answering. | 03-11-2010 |
20100063798 | ERROR-DETECTING APPARATUS AND METHODS FOR A CHINESE ARTICLE - The invention discloses an error-detecting method for a Chinese article, handling a Chinese sentence including a first erroneous Chinese character string in a first location. The method includes subdividing the first erroneous Chinese character string into a plurality of first subgroups, wherein each of the first subgroups consists of two consecutive and non-consecutive Chinese characters out of the first erroneous Chinese character string. The method further includes providing a database containing a plurality of first correct Chinese character strings and a plurality of corresponding first correct indices, wherein the first correct indices consist of two consecutive and non-consecutive Chinese characters out of the first correct Chinese character strings. The method further includes acquiring one of the first correct indices according to the first subgroup, and one of the first correct Chinese character strings according to the acquired first correct index. The method further includes generating a best candidate sentence according to the acquired first correct Chinese character string, and showing the Chinese sentence and the best candidate sentence on a display device. | 03-11-2010 |
20100063799 | Process for Constructing a Semantic Knowledge Base Using a Document Corpus - Related free-text documents, a corpus, are used to empirically derive a semantic knowledge base through a method in which documents are segmented into unique sentences, and then used to define sentential propositions which are arranged in a knowledge hierarchy. The method takes compound natural language sentences and transforms them to simple sentences by a process that is a part of the invention. A knowledge editor enables a domain expert using the methods of the invention to map the sentences in the corpus to sentential proposition(s). The resulting knowledge base can be used to semantically analyze documents in data mining and decision support applications, and can assist word processors or speech recognition devices. The invention is illustrated in connection with radiology reports, but it has wide applicability. | 03-11-2010 |
20100063800 | Method, System and Software for Implementing an Automated Call Routing Application in a Speech Enabled Call Center Environment - A system, method and software for implementing an automated call routing application in a speech enabled call center environment are provided. In operation, the invention provides for the identification of a call center transaction selection from a natural language user utterance and the invocation of one or more scripts operable to route the user to a call center service agent configured to service the selected transaction. In the event a transaction selection cannot be readily identified or can only be partially identified, the invention provides for the initiation of a dialog module or script directed to eliciting a discernable transaction selection and/or the presentation of one or more menus from which the user may select an available call center transaction. | 03-11-2010 |
20100076749 | LANGUAGE PROCESSING SYSTEM, LANGUAGE PROCESSING METHOD, LANGUAGE PROCESSING PROGRAM, AND RECORDING MEDIUM - A language processing system according to the present invention includes: an input device | 03-25-2010 |
20100076750 | System for Low-Latency Animation of Talking Heads - Methods and apparatus for rendering a talking head on a client device are disclosed. The client device has a client cache capable of storing audio/visual data associated with rendering the talking head. The method comprises storing sentences in a client cache of a client device that relate to bridging delays in a dialog, storing sentence templates to be used in dialogs, generating a talking head response to a user inquiry from the client device, and determining whether sentences or stored templates stored in the client cache relate to the talking head response. If the stored sentences or stored templates relate to the talking head response, the method comprises instructing the client device to use the appropriate stored sentence or template from the client cache to render at least a part of the talking head response and transmitting a portion of the talking head response not stored in the client cache, if any, to the client device to render a complete talking head response. If the client cache has no stored data associated with the talking head response, the method comprises transmitting the talking head response to be rendered on the client device. | 03-25-2010 |
20100082331 | SEMANTICALLY-DRIVEN EXTRACTION OF RELATIONS BETWEEN NAMED ENTITIES - A system and method of developing rules for text processing enable retrieval of instances of named entities in a predetermined semantic relation (such as the DATE and PLACE of an EVENT) by extracting patterns from text strings in which attested examples of named entities satisfying the semantic relation occur. The patterns are generalized to form rules which can be added to the existing rules of a syntactic parser and subsequently applied to text to find candidate instances of other named entities in the predetermined semantic relation. | 04-01-2010 |
20100082332 | METHODS AND APPARATUS FOR PROTECTING USERS FROM OBJECTIONABLE TEXT - Methods and apparatus are provided for protecting users from objectionable text. Users are protected from objectionable text, by obtaining a predefined acceptable word list containing a plurality of acceptable words; receiving a textual entry from at least one user; and limiting the textual entry to only the acceptable words. The acceptable word list may comprise a dictionary of the acceptable words, and can be maintained by a central server or by a client associated with at least one of the users. The textual entry can be limited by only allowing the user to enter a subsequent character following entry of one or more entered characters if the subsequent character following the one or more entered characters comprises at least a portion of one of the acceptable words. The acceptable word list can optionally be updated with one or more additional acceptable words. The acceptable word list optionally comprises a context sensitive word list or one or more context sensitive rules. | 04-01-2010 |
20100088087 | MULTI-TAPABLE PREDICTIVE TEXT - A multi-tapable predictive text method and device that allows for both multi-tap and predictive text entry to be used in conjunction thereby facilitating entry of text in languages having a large number or characters and/or on devices having a small number of keys. The method includes selecting at least one set of symbols from a plurality of sets of symbols associated with at least one key of an input device, at least one of the sets of symbols corresponding to at least two alphanumeric characters, and each set of symbols selectable by activating the key a prescribed number of times, analyzing the selected sets of symbols using a predictive text engine to generate a list of potential character strings, and displaying at least one of the potential character strings of the list of character strings for selection by a user. | 04-08-2010 |
20100094618 | Transcription data extraction - A computer program product, for performing data determination from medical record transcriptions, resides on a computer-readable medium and includes computer-readable instructions for causing a computer to obtain medical transcription of a dictation, the dictation being from medical personnel and concerning a patient, analyze the transcription for an indicating phrase associated with a type of data desired to be determined from the transcription, the type of desired data being relevant to medical records, determine whether data indicated by text disposed proximately to the indicating phrase is of the desired type, and store an indication of the data if the data is of the desired type. | 04-15-2010 |
20100100370 | SELF-ADJUSTING EMAIL SUBJECT AND EMAIL SUBJECT HISTORY - In one embodiment, an apparatus for automated generation of subject line content for e-mail messages includes an input operable to receive content data including text-based information corresponding to a body of an e-mail message, a text analyzer including logic operable to analyze received content data, a topic extractor including logic operable to extract topic data in accordance with an output of the text analyzer, a string generator including logic operable to generate subject line data in accordance with an output of the topic extractor, and a message output operable to output a multi-field e-mail message having a body field inclusive of the content data and a subject line field inclusive of generated subject line data. | 04-22-2010 |
20100100371 | Method, System, and Apparatus for Message Generation - Methods, systems, and apparatuses for message generation receive one or more keywords that are indicative of a message subject matter. Based on the keywords, information related to the keywords is searched, and message preferences of a message recipient are determined. A natural language message is created using the information related to the keywords and the message preferences, and the message is sent to the message recipient. | 04-22-2010 |
20100106485 | METHODS AND APPARATUS FOR CONTEXT-SENSITIVE INFORMATION RETRIEVAL BASED ON INTERACTIVE USER NOTES - Information retrieval systems and methods are provided based on interactive user notes. Information is retrieved from one or more data sources based on user notes by obtaining the user notes containing one or more information requests; identifying the one or more information requests from the user notes; interpreting at least one of the information requests in context; generating one or more queries required for the at least one interpreted information request; identifying an update to the user notes, the update containing one or more updated information requests; and processing the updated user notes to generate one or more queries required for the updated information requests. If the user notes contain multiple information requests, at least one query is generated for each of the plurality of information requests. The information requests can be interpreted based on user-specified context guides. | 04-29-2010 |
20100106486 | IMAGE-BASED SEMANTIC DISTANCE - Image-based semantic distance technique embodiments are presented that involve establishing a measure of an image-based semantic distance between semantic concepts. Generally, this entails respectively computing a semantic concept representation for each concept based on a collection of images associated with the concept. A degree of difference is then computed between two semantic concept representations to produce the aforementioned semantic distance measure for the pair of corresponding concepts. | 04-29-2010 |
20100106487 | Style-checking method and apparatus for business writing - The method is software-based, and carried out on a computer system and cooperating apparatus. It checks written text for problems impairing clarity, conciseness and reader comfort in business documents. One embodiment includes routines for checking writing style in sentences and paragraphs, and for generating informational, critical and commendatory display indicators relating to reader comfort. The routines check subject and verb juxtaposition, verb strength, prepositional phrase use, transition words, unity-creating constructions, gerund use, and sentence variety. The indicators are displayed in the form of highlighted text and diacritical marks. Another embodiment is a method for quantifying reader discomfort. It includes routines for quantifying, reporting and displaying points indicating comfort-impairing problems of the type located by running the routines of the first embodiment. Yet another embodiment includes a method for editing text documents for reader comfort, by locating and fixing problem words and constructions. | 04-29-2010 |
20100114560 | SYSTEMS AND METHODS FOR EVALUATING A SEQUENCE OF CHARACTERS - A sequence of characters may be evaluated to determine the presence of a natural language word. The sequence of characters may be analyzed to find a subsequence of alphabetical characters. Based on a statistical model of a natural language, a probability that the subsequence is a natural language word may be calculated. The probability may then be used to determine if the subsequence is indeed a natural language word. | 05-06-2010 |
20100114561 | LATENT METONYMICAL ANALYSIS AND INDEXING (LMAI) - The present invention relates to Latent Metonymical analysis and Indexing (LMai) is a novel concept for Advance Machine Learning or Unsupervised Machine Learning Techniques, which uses a statistical approach to identify the relationship between the words in a set of given documents (Unstructured Data). This approach does not necessarily need training data to make decisions on matching the related words together but actually has the ability to do the classification by itself. All that is needed is to give the algorithm a set of natural documents. The method is elegant enough to classify the relationships automatically without any human guidance during the process as shown in FIGS. | 05-06-2010 |
20100114562 | DOCUMENT PROCESSOR AND ASSOCIATED METHOD - A computer implemented method of processing a digitally encoded document having a text composed by an author by using a processor to analyse the segmentation, punctuation and linguistics of text and storing the results in a digitally accessible format. Author traits are then predicted using a machine learning system based on the results of the segmentation, punctuation and linguistics analysis of the text. | 05-06-2010 |
20100114563 | REAL-TIME SEMANTIC ANNOTATION SYSTEM AND THE METHOD OF CREATING ONTOLOGY DOCUMENTS ON THE FLY FROM NATURAL LANGUAGE STRING ENTERED BY USER - Disclosed herein are a real-time semantic annotation system and a method of converting user-entered natural language strings into semantically-readable knowledge structure documents using the system in real time. The real-time semantic annotation system includes a natural language character string input device for enabling a user to enter natural language character strings, a character string pattern triplet-mapping table for storing natural language character string patterns and their corresponding triplets, a triplet extraction device for converting the entered natural language character strings into triplets by analyzing and processing the entered natural language character strings using the pattern-triplet mapping table, an alternative word recommendation device for providing notification that a user should enter an alternative word, and a machine-readable document generation device for generating machine-readable documents from the triplets using a semantically-readable knowledge structure. | 05-06-2010 |
20100121631 | DATA DETECTION - A method for detecting data in a sequence of characters or text using both a statistical engine and a pattern engine. The statistical engine is trained to recognize certain types of data and the pattern engine is programmed to recognize the grammatical pattern of certain types of data. The statistical engine may scan the sequence of characters to output first data, and the pattern engine may break down the first data into subsets of data. Alternatively, the statistical engine may output items that have a predetermined probability or greater of being a certain type of data and the pattern engine may then detect the data from the output items and/or remove incorrect information from the output items. | 05-13-2010 |
20100125450 | SYNCHRONIZED TRANSCRIPTION RULES HANDLING - Methods, systems, and software are disclosed for providing rule handling functionality in a distributed transcription environment. Some embodiments provide client-server workflow management for providing and supporting distributed transcription services. Other embodiments provide audio-to-text synchronization to support certain transcription functionality. Still other embodiments provide logging functionality to support quality, personnel, billing, and/or other enterprise tasks. And other embodiments provide functionality to support rule generation, editing, validation, and/or execution. | 05-20-2010 |
20100125451 | Natural Language Recognition Using Context Information - A method of recognising digital ink input by a user into a computer-based digital ink recognition system is disclosed. The user interacts with a paper-based document. The paper-based document has disposed thereon coded data indicative of a particular field of the paper-based document and of at least one reference point of the paper-based document. An image sensor in a sensing device captures images of at least some of the coded data when the sensing device is placed in an operative position relative to the paper-based document. The sensing device then decodes at least some of the coded data to form indicating data indicative of the identity of the field of the paper-based document containing the coded data and at least one of a position and a movement of the sensing device relative to the paper-based document. A server receives the indicating data from the sensing device, and processes the indicating data using a recognizer residing on the server to produce intermediate format data. The intermediate format data is then transmitted to an application which decodes the intermediate format data into computer-readable format data using context information associated with the paper-based document. | 05-20-2010 |
20100131263 | Identifying and Generating Audio Cohorts Based on Audio Data Input - A computer implemented method, apparatus, and computer program product for generating audio cohorts. An audio analysis engine receives audio data from a set of audio input devices. The audio data is associated with a plurality of objects. The audio data comprises a set of audio patterns. The audio data is processed to identify attributes of the audio data to form digital audio data. The digital audio data comprises metadata describing the attributes of the audio data. A set of audio cohorts is generated using the digital audio data and cohort criteria. Each audio cohort in the set of audio cohorts comprises a set of objects from the plurality of objects that share at least one audio attribute in common. | 05-27-2010 |
20100131264 | SYSTEM AND METHOD FOR HANDLING MISSING SPEECH DATA - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for handling missing speech data. The computer-implemented method includes receiving speech with a missing segment, generating a plurality of hypotheses for the missing segment, identifying a best hypothesis for the missing segment, and recognizing the received speech by inserting the identified best hypothesis for the missing segment. In another method embodiment, the final step is replaced with synthesizing the received speech by inserting the identified best hypothesis for the missing segment. In one aspect, the method further includes identifying a duration for the missing segment and generating the plurality of hypotheses of the identified duration for the missing segment. The step of identifying the best hypothesis for the missing segment can be based on speech context, a pronouncing lexicon, and/or a language model. Each hypothesis can have an identical acoustic score. | 05-27-2010 |
20100131265 | Method, Apparatus and Computer Program Product for Providing Context Aware Queries in a Network - A method for providing context aware queries in a network may include receiving a question directed to a question answering service from an originating node, routing the question to one or more candidate nodes selected based at least in part on context information associated with the question, receiving an answer to the question from at least one of the candidate nodes, and providing the answer to the originating node based at least in part on result parameters associated with the originating node. An apparatus and computer program product corresponding to the method are also provided. | 05-27-2010 |
20100131266 | HANDHELD ELECTRONIC DEVICE INCLUDING AUTOMATIC PREFERRED SELECTION OF A PUNCTUATION, AND ASSOCIATED METHOD - A method of enabling input on a handheld electronic device, which includes an input apparatus having a number of input members that are capable of being actuated, wherein at least one of the input members has a plurality of selectable output alternatives, includes detecting as a first input an actuation of an input member, generating a first output, detecting as a second input an actuation of an input member having a plurality of selectable output alternatives comprising at least a primary punctuation and a secondary punctuation, determining that said first output has a predetermined characteristic, preferring as a second output said secondary punctuation, and outputting said second output. | 05-27-2010 |
20100138215 | SYSTEM AND METHOD FOR USING ALTERNATE RECOGNITION HYPOTHESES TO IMPROVE WHOLE-DIALOG UNDERSTANDING ACCURACY - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for using alternate recognition hypotheses to improve whole-dialog understanding accuracy. The method includes receiving an utterance as part of a user dialog, generating an N-best list of recognition hypotheses for the user dialog turn, selecting an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list, and responding to the user based on the selected underlying user intention. Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention. | 06-03-2010 |
20100138216 | METHOD FOR THE EXTRACTION OF RELATION PATTERNS FROM ARTICLES - A method for building a knowledge base containing entailment relations, including
| 06-03-2010 |
20100145676 | METHOD AND APPARATUS FOR ADJUSTING THE LENGTH OF TEXT STRINGS TO FIT DISPLAY SIZES - The various aspects provide methods and devices which can reduce the length of a text string to fit dimensions of a display by identifying and deleting elements of the string that are not essential to its meaning. In the various aspects, handheld devices may be configured with software configured to analyze and modify text strings to shorten their length by adjusting font size, changing fonts, deleting unnecessary words, such as articles, abbreviating some words, deleting letters (e.g., vowels) from some words, and deleting non-critical words. The order in which transformations are affected may vary depending upon the text string according to a priority of transformations. Such transformation operations may be applied incrementally until the text string fits within the display size requirements. Similar methods may be implemented to increase the length of text strings by adding words in a manner that does not substantially change the meaning of the text string. | 06-10-2010 |
20100145677 | System and Method for Making a User Dependent Language Model - A language model for a speech recognition engine is made based on user-viewed data files. The data files are reviewed and texts are extracted therefrom. The language model is generated based on the extracted texts. Transcriptions of previous user statements are not required. Different weighting factors can be applied to elements of the extracted texts based on the nature of the data files. The weighting factors are then considered during generation of the language model. A user dependent and application independent language model can be created prior to initial use of the speech recognition engine. | 06-10-2010 |
20100145678 | Method, System and Apparatus for Automatic Keyword Extraction - The present invention provides a method and a system for automatic keyword extraction based on supervised or unsupervised machine learning techniques. Novel linguistically-motivated machine learning features are introduced, including discourse comprehension features based on construction integration theory, numeric features making use of syntactic part-of-speech patterns, and probabilistic features based on analysis of online encyclopedia annotations. The improved keyword extraction methods are combined with word sense disambiguation into a system for automatically generating annotations to enrich text with links to encyclopedic knowledge. | 06-10-2010 |
20100145679 | Handheld Electronic Device With Text Disambiguation - In view of the foregoing, an improved handheld electronic device includes a keypad in the form of a reduced QWERTY keyboard and is enabled with disambiguation software. As a user enters keystrokes, the device provides output in the form of a default output and a number of variants from which a user can choose. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry, and when initiating an activity session on a word such as during editing, the display outputs variants of the entire word being edited, rather than providing as variants only those parts of a word that are being edited. The device also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. In certain predefined circumstances, the disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device. | 06-10-2010 |
20100153091 | USER-SPECIFIED PHRASE INPUT LEARNING - Architecture that enables a user to perform manual word-breaking by phrase input. Phrase input is where the user inserts a phrase-key (or separator) as a delimiter that indicates to an editor application such as an IME (input method editor) the composition of a specific phrase when entering characters (e.g., Asian). The word-breaking is controlled by the user. The conversion quality is improved as the user knows the desired input and ambiguous cases are reduced. A phrase can be specified while the user is composing the characters. By selecting a phrase-key separator, the user can specify the composing characters before the characters are presented as a phrase. Moreover, the architecture includes a phrase prioritization mechanism wherein each phrase can be treated as a single entity and assigned a character identifier (ID), which is related to the sequence of a candidate list. | 06-17-2010 |
20100153092 | Expanding Base Attributes for Terms - In one embodiment, a method for expanding concept attributes for a concept term includes receiving an attribute term for expansion and determining one or more word senses for the attribute term. A word sense is selected from the one or more word senses. One or more conceptually similar terms is selected for the attribute term based on the word sense and it is determined that that at least one of the one or more conceptually similar terms is an additional attribute. A first mapping associating the additional attribute with the attribute term is generated, and a second mapping associating the additional attribute with the concept term is generated. The mappings are stored in an onomasticon. | 06-17-2010 |
20100153093 | METHOD AND APPARATUS FOR PROVIDING CASE RESTORATION - A method and apparatus for providing case restoration in a communication network are disclosed. For example, the method obtains one or more content sources from one or more information feeds, and extracts textual information from the one or more content sources obtained from the one or more information feeds. The method then creates or updates a capitalization model based on the textual information. | 06-17-2010 |
20100153094 | TOPIC MAP BASED INDEXING AND SEARCHING APPARATUS - A topic map based indexing apparatus analyzes community Q/A lists to acquire Q/A analysis information, removes redundant answers depending on the Q/A analysis information, removes insignificant answers based on the degree of reliability, ranks answer lists, and extracts the highest ranking answer as a best answer, to thereby store, in a community Q/A topic map, index information containing the community Q/A lists and the Q/A analysis information. A topic map based searching apparatus analyzes a user question to acquire question analysis information, searches similar questions from community Q/A lists belonging to a specific topic node of a pre-stored community Q/A topic map, ranks the searched similar questions depending on the question analysis information, removes redundant answers among answers to the ranked similar questions, ranks the answers, and extracts the highest ranking answer as a best answer. | 06-17-2010 |
20100153095 | Virtual Pet Chatting System, Method, and Virtual Pet Question and Answer Server - A virtual pet chatting system includes a virtual pet client unit, a virtual pet data maintaining unit as well as a questioning and answering unit. A virtual pet chatting method includes: sending, by a first virtual pet, a natural language question to a second virtual pet; and generating, by the second virtual pet, a natural language response sentence according to the natural language question, after understanding the natural language and performing reasoning taking into account attributes of a virtual pet. A virtual pet questioning and answering server includes a natural language understanding module and a response sentence generating module. | 06-17-2010 |
20100153096 | Handheld Electronic Device and Method for Disambiguation of Compound Text Input and That Employs N-Gram Data to Limit Generation of Low-Probability Compound Language Solutions - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to analyze the combinations of language objects in light of N-gram data stored on the device to avoid proposing low-probability compound language solutions. | 06-17-2010 |
20100161312 | Method of semantic, syntactic and/or lexical correction, corresponding corrector, as well as recording medium and computer program for implementing this method - The method is suitable for dysorthographic or partially sighted persons, to facilitate the semantic, syntactic and/or lexical correction of an erroneous expression in a digital text input by a user. The method comprises the sequence of: a step ( | 06-24-2010 |
20100161313 | Region-Matching Transducers for Natural Language Processing - Computer methods, apparatus and articles of manufacture therefor, are disclosed for developing a region-matching transducer for marking language data having delimited strings. The region-matching transducer defines one or more patterns of one or more sequences of delimited strings, with at least one of the patterns defined in the region-matching transducer having an arrangement of a plurality of class-matching networks. The plurality of class-matching networks defines a combination of two or more entity classes from one or both of part-of-speech classes and application-specific classes. The region-matching transducer has, for each of the one or more patterns, an arc that leads from a penultimate state with a transition label that identifies the entity class of the pattern, and shares states between patterns leading to a penultimate state when segments of delimited strings making up two or more patterns overlap. | 06-24-2010 |
20100161314 | Region-Matching Transducers for Text-Characterization - Computer methods, apparatus and articles of manufacture therefor, are disclosed for text-characterization using a finite state transducer that along each path accepts on a first side an n-gram of text-characterization (e.g., a language or a topic) and outputs on a second side a sequence of symbols identifying one or more text-characterizations from a set of text-characterizations. The finite state transducer is applied to input data. For each n-gram accepted by the finite state transducer, a frequency counter associated with the n-gram of the one or more text-characterizations in the set of text-characterizations is incremented. The input data is classified as one or more text-characterizations from the set of text-characterizations using the frequency counters associated therewith. | 06-24-2010 |
20100161315 | CORRELATED CALL ANALYSIS - A method of correlating received communication data with operational communication characteristics is provided. The method includes receiving audible input from a source in a communication over a communications network, recording the received audible input, and transcribing the recorded audible input into a transcript. The method further includes outputting the transcript, specifying features of the transcript to be analyzed, specifying and recording operational communication characteristics particular to the communication, analyzing the transcript for the specified features to identify patterns associated with the audible input, computing statistical correlations of the identified patterns with the operational communication characteristics, and outputting results of the computed statistical correlations on a user interface. | 06-24-2010 |
20100161316 | PROBABILISTIC NATURAL LANGUAGE PROCESSING USING A LIKELIHOOD VECTOR - A method for natural language processing on a computing device is described. The computing device receives a free text document. The computing device parses the free text document for gross structure. The gross structure includes sections, paragraphs and sentences. The computing device determines an application of at least one knowledge base. The free text document is parsed for fine structure on the computing device. The fine structure includes sub-sentences. The computing device applies the parsed document and at least one likelihood vector to a Bayesian network. The computing device outputs meanings and probabilities. | 06-24-2010 |
20100161317 | SEMANTIC NETWORK METHODS TO DISAMBIGUATE NATURAL LANGUAGE MEANING - A computer implemented data processor system automatically disambiguates a contextual meaning of natural language symbols to enable precise meanings to be stored for later retrieval from a natural language database, so that natural language database design is automatic, to enable flexible and efficient natural language interfaces to computers, household appliances and hand-held devices. | 06-24-2010 |
20100169075 | ADJUSTMENT OF TEMPORAL ACOUSTICAL CHARACTERISTICS - Embodiments may be a standalone module or part of mobile devices, desktop computers, servers, stereo systems, or any other systems that might benefit from condensed audio presentations of item structures such as lists or tables. Embodiments may comprise logic such as hardware and/or code to adjust the temporal characteristics of items comprising words. The items maybe included in a structure such as a text listing or table, an audio listing or table, or a combination thereof, or may be individual words or phrases. For instance, embodiments may comprise a keyword extractor to extract keywords from the items and an abbreviations generator to generate abbreviations based upon the keywords. Further embodiments may comprise a text-to-speech generator to generate audible items based upon the abbreviations to render to a user while traversing the item structure. | 07-01-2010 |
20100169076 | Text-to-Scene Conversion - The invention relates to a method of converting a set of words into a three-dimensional scene description, which may then be rendered into three-dimensional images. The invention may generate arbitrary scenes in response to a substantially unlimited range of input words. Scenes may be generated by combining objects, poses, facial expressions, environments, etc., so that they represent the input set of words. Poses may have generic elements so that referenced objects may be replaced by those mentioned in the input set of words. Likewise, a character may be dressed according to its role in the set of words. Various constraints for object positioning may be declared. The environment, including but not limited to place, time of day, and time of year, may be inferred from the input set of words. | 07-01-2010 |
20100169077 | METHOD, SYSTEM AND COMPUTER READABLE RECORDING MEDIUM FOR CORRECTING OCR RESULT - Disclosed is a method, system and computer readable recording medium for correcting an OCR result. According to an exemplary embodiment of the present invention, there is provided a method for correcting an OCR result, the method including performing character recognition on content including character information using an OCR technique, removing extra carriage return information from the content, outputting the character recognition result, and correcting word spacing on the outputted result. | 07-01-2010 |
20100169078 | Style-checking method and apparatus for business writing - The method is software-based, and carried out on a computer system and cooperating apparatus. It checks written text for problems impairing clarity, conciseness and reader comfort in business documents. One embodiment includes routines for checking writing style in sentences and paragraphs, and for generating informational, critical and commendatory display indicators relating to reader comfort. The routines check subject and verb juxtaposition, verb strength, prepositional phrase use, transition words, unity-creating constructions, gerund use, and sentence variety. The indicators are displayed in the form of highlighted text and diacritical marks. Another embodiment is a method for quantifying reader discomfort. It includes routines for quantifying, reporting and displaying points indicating comfort-impairing problems of the type located by running the routines of the first embodiment. Yet another embodiment includes a method for editing text documents for reader comfort, by locating and fixing problem words and constructions. | 07-01-2010 |
20100174526 | SYSTEM AND METHODS FOR QUANTITATIVE ASSESSMENT OF INFORMATION IN NATURAL LANGUAGE CONTENTS - A method is disclosed for quantitatively assessing information in natural language contents related to an object name. The method includes identifying a sentence in a document, determining a subject and a predicate in the sentence, and retrieving an object-specific data set related to the object name. The object-specific data set includes property names and association-strength values. Each property name is associated with an association-strength value. The method also includes identifying a first property name in the property names that matches the subject, assigning a first association-strength value associated with the first property name to the subject, identifying a second property name in the property names that matches the predicate, assigning a second association-strength value associated with the second property name to the predicate, and multiplying the first association-strength value and the second association-strength value to produce a sentence information index. | 07-08-2010 |
20100174527 | DICTIONARY REGISTERING SYSTEM, DICTIONARY REGISTERING METHOD, AND DICTIONARY REGISTERING PROGRAM - There is provided a dictionary registration system which makes it possible to register a word into a user dictionary while minimizing an adverse effect that the word may have on natural language processing, if any. The dictionary registration system performs natural language processing by using a user dictionary, and includes a data processing apparatus that performs the natural language processing by managing and using the user dictionary and a storage apparatus that retains system dictionary information and user dictionary information for use in the natural language processing. The storage apparatus includes the system dictionary information for use in the natural language processing, and the user dictionary. The data processing apparatus includes: a word information registering init that registers information on an input word into the user dictionary; a difference creating unit that creates differences in a result of processing between a first result of processing when the natural language processing is performed, by using the system dictionary, information and a second result of processing when the natural language processing is performed by using the system dictionary information and the user dictionary information; a correct-incorrect accepting unit that accepts correct-incorrect judgments as to whether changes from the first result of processing to the second result of processing are correct or incorrect, the changes corresponding to the differences created by the difference creating unit; and dictionary registration unit that registers registration information on the accepted word into the user dictionary along with part or all of pairs of the correct-incorrect judgments accepted and input sentences from which the differences given the respective correct-incorrect judgments are created. | 07-08-2010 |
20100179804 | Natural Language Assertion Processor - A method of processing natural language assertions (NLAs) can include identifying an NLA and then translating that NLA into a verification language assertion (VLA) using a natural language parser (NLP) and synthesis techniques. This VLA can be translated into an interpreted NLA (NLA*) using a VLA parser and pattern matching techniques. At this point, the process can allow user review of the NLA* and the NLA. When the user determines that the NLA* and the NLA are the same or have insignificant difference, then verification can be performed using the VLA. The results of the verification can then be back annotated on the NLA. In one fully-automatic embodiment, in addition to comparing the NLA and the NLA*, the VLA and a VLA* (generated from the NLA*) can be compared, thereby providing yet another test of accuracy for the user during verification. | 07-15-2010 |
20100179805 | METHOD, APPARATUS, AND COMPUTER PROGRAM PRODUCT FOR ONE-STEP CORRECTION OF VOICE INTERACTION - A one-step correction mechanism for voice interaction is provided. Correction of a previous state is enabled simultaneously with recognition in a current or subsequent state. An application is decomposed into a set of tasks. Each task is associated with the collection of one piece of information. Each task may be in a different state. At any point during the interaction, while a task/state pair is active, the dialog manager may enable multiple other task/state pairs to be active in latent fashion. The application developer may then use those facilities or resources to the active task/state and the latent task/state pairs depending on contextual condition of the interaction state of the application. | 07-15-2010 |
20100185436 | Arabic poetry meter identification system and method - The Arabic poetry meter identification system and method produces coded Al-Khalyli transcriptions of Arabic poetry. The meters (Wazn, Awzan being forms of the Arabic poems units Bayt, Abyate) are identified. A spoken or written poem is accepted as input. A coded transcription of the poetry pattern forms is produced from input processing. The system identifies and distinguishes between proper spoken poetic meter and improper poetic meter. Error in the poem meters (Bahr, Buhur) and the ending rhyme pattern, “Qafiya” are detected and verified. The system accepts user selection of a desired poem meter and then interactively aids the user in the composition of poetry in the selected meter, suggesting alternative words and word groups that follow the desired poem pattern and dactyl components. The system can be in a stand-alone device or integrated with other computing devices. | 07-22-2010 |
20100185437 | PROCESS OF DIALOGUE AND DISCUSSION - A method for effecting a dialogue with an emulated brain. The method includes the step of receiving a query in the form of a semantic string. The semantic string is then parsed into basic concepts of the query. The basic concepts are then clumped into a clump concept. If the clump concept constitutes part of a dialogue, then the dialogue thread is activated by determining the context of the clump concept and assessing a potential reply from a group of weighted replies, which expected replies are weighted based on the parsed concepts produced in the step of parsing. The heaviest weighted one of the expected replies is selected and the weight of the selected reply after it is selected is downgraded. The selected reply is then generated for output in a sentence structure. | 07-22-2010 |
20100191519 | TOOL AND FRAMEWORK FOR CREATING CONSISTENT NORMALIZATION MAPS AND GRAMMARS - A runtime framework and authoring tool are provided for enabling linguistic experts to author text normalization maps and grammar libraries without requiring high level of technical or programming skills. Authors define or select terminals, map the terminals, and define rules for the mapping. The tool enables an author to validate their work, by executing the map in the same way the recognition engine does, causing consistency in results from authoring to user operations. The runtime is used by the speech engines and by the tools to provide consistent normalization for supported scenarios. | 07-29-2010 |
20100191520 | TEXT AND SPEECH RECOGNITION SYSTEM USING NAVIGATION INFORMATION - A system and method are provided for recognizing a user's speech input. The method includes the steps for detecting the user's speech input, recognizing the user's speech input by comparing the speech input to a list of entries using language model statistics to determine the most likely entry matching the user's speech input, and detecting navigation information of a trip to a predetermined destination, where the most likely entry is determined by modifying the language model statistics taking into account the navigation information. A system and method is further provided that takes into account navigation trip information to determine the most likely entry using language model statistics for recognizing text input. | 07-29-2010 |
20100191521 | METHODS AND APPARATUS FOR EVALUATING SEMANTIC PROXIMITY - Methods and apparatus to evaluate the semantic proximity between reference free-form text entry and a candidate free-form text request. | 07-29-2010 |
20100198583 | INDICATING METHOD FOR SPEECH RECOGNITION SYSTEM - The present invention relates to an indicating method for speech recognition system, comprising a multimedia electronic product and a speech recognition device. The steps of this method include: users enter voice commands into a voice input unit and convert these commands into speech signals, which are acquired and stored by a recording unit, converted by a microprocessor into a volume indicating oscillogram, and then displayed by a display module. At the same time, compliance with speech recognition conditions will be decided in that process. That is to say, an indicating module is used for diagram, letter or color marking or speech indication according to volume indicating oscillogram, followed by playing over a sound amplifying unit, so that users can understand the voice input status and adjust the volume to fulfill voice command operations virtually through voice indication, explanations in graphs or letters and other interactive guidance, together with audio indication oscillogram, thus further enhancing speech recognition rate and avoiding such problems and deficiencies as distortions related to abnormal and poor sound acquisition or inconvenience for use. | 08-05-2010 |
20100198584 | SERVER FOR AUTOMATICALLY SCORING OPINION CONVEYED BY TEXT MESSAGE CONTAINING PICTORIAL-SYMBOLS - A server is disclosed for computing a score of an opinion that a message in a text file is expected to convey regarding a subject to be evaluated, wherein the message is written using literal strings and pictorial symbols. In this server, by the use of a pictorial-symbol dictionary memory storing a correspondence between designated pictorial-symbols to be rated and scores of opinions expressed by the respective pictorial-symbols, at least one of the used pictorial-symbols in the message which is coincident with at least one of the designated pictorial-symbols stored in the pictorial-symbol dictionary memory, is extracted from the message, at least one of the opinion scores which corresponds to the at least one extracted pictorial-symbol is retrieved within the pictorial-symbol dictionary memory, and an aggregate net opinion score for the message is calculated, based on an aggregate opinion score for the at least one extracted pictorial-symbol. | 08-05-2010 |
20100204982 | System and Method for Generating Data for Complex Statistical Modeling for use in Dialog Systems - Embodiments of a dialog system that utilizes grammar-based labeling scheme to generate labeled sentences for use in training statistical models. During the process of training data development, a grammar is constructed manually based on the application domain or adapted from a general grammar rule. An annotation schema is created accordingly based on the application requirements, such as syntactic and semantic information. Such information is then included in the grammar specification. After the labeled grammar is constructed, a generation algorithm is then used to generate sentences for training various statistical models. | 08-12-2010 |
20100204983 | Method and System for Extracting Web Query Interfaces - A computer program product being embodied on a computer readable medium for extracting semantic information about a plurality of documents being accessible via a computer network, the computer program product including computer-executable instructions for: generating a plurality of tokens from at least one of the documents, each token being indicative of a displayed item and a corresponding position; and, constructing at least one parse tree indicative of a semantic structure of the at least one document from the tokens dependently upon a grammar being indicative of presentation conventions. | 08-12-2010 |
20100204984 | VIRTUAL PET SYSTEM, METHOD AND APPARATUS FOR VIRTUAL PET CHATTING - A virtual pet system includes: a virtual pet client, adapted to receive a sentence in natural language and send the sentence to a Q&A server; the Q&A server, adapted to receive the sentence, process the sentence through natural language comprehension, generate an answer in natural language based on a result of natural language comprehension and reasoning knowledge, and send the answer in natural language to the virtual pet client. A method for virtual pet chatting includes: receiving a sentence in natural language, perform natural language comprehension for the sentence, and generating an answer in natural language based on a result of natural language comprehension and reasoning knowledge. A Q&A server includes: a sentence comprehension engine unit, adapted to process a received sentence in natural language through natural language comprehension, and send a result of natural language comprehension to a reasoning engine unit; the reasoning engine unit, adapted to generate an answer in natural language based on reasoning knowledge and the result of natural language comprehension, and send the answer in natural language; a knowledge base, adapted to store the reasoning knowledge. | 08-12-2010 |
20100211378 | MODULAR APPROACH TO BUILDING LARGE LANGUAGE MODELS - Methods for building arbitrarily large language models are presented herein. The methods provide a scalable solution to estimating a language model using a large data set by breaking the language model estimation process into sub-processes and parallelizing computation of various portions of the process. | 08-19-2010 |
20100211380 | INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD, AND PROGRAM - An information processing apparatus includes: an acquiring unit acquiring text data as data associated with plural contents; a separating unit separating the text data acquired by the acquiring means into words of a predetermined unit in accordance with attributes; a comparing unit calculating a correspondence length indicating the number of words which continuously correspond to each other in order of the attributes between the text data, by comparing the words, which are separated by the separating means, between the text data of the plural contents; a calculating unit calculating a similarity degree score indicating a similarity degree between the contents corresponding to the text data on the basis of the correspondence length obtained by the comparing means; and a display controlling unit controlling displaying outlines of the plural contents on the basis of the similarity degree score between a predetermined content and another content among the plural contents. | 08-19-2010 |
20100217583 | SYSTEM AND METHOD FOR PROCESSING ONLINE READING INTERACTIONS - An online reading processing system and method for providing interactive messages to users at end computer devices are provided, which include storing online reading information in a data storage medium; setting the online reading information with a head mark and a tail mark of at least one expert-marked key range by a setting module; reading the information after being set and hiding the head mark and the tail mark thereof; receiving the key range marked by the user; determining whether the user-marked key range covers the head mark and the tail mark so as to form interactive messages according the determination, thereby solving the drawback of failing to provide appropriate feedback or assessment according to users' behaviors as encountered in the prior techniques, and also increasing online reading interaction and enjoyment. | 08-26-2010 |
20100223051 | Method and System for Determining Text Coherence - A method and system for determining text coherence in an essay is disclosed. A method of evaluating the coherence of an essay includes receiving an essay having one or more discourse elements and text segments. The one or more discourse elements are annotated either manually or automatically. A text segment vector is generated for each text segment in a discourse element using sparse random indexing vectors. The method or system then identifies one or more essay dimensions and measures the semantic similarity of each text segment based on the essay dimensions. Finally, a coherence level is assigned to the essay based on the measured semantic similarities. | 09-02-2010 |
20100228538 | COMPUTATIONAL LINGUISTIC SYSTEMS AND METHODS - An apparatus and corresponding method are disclosed for selecting and managing morphological, syntactic and semantic information found in natural languages using a reduced instruction set grammar (RISG). The apparatus and corresponding method 1) convert natural language inputs into morphological tokens and stores those tokens, 2) convert morphological tokens into syntactic groups and stores those groups, and/or 3) convert syntactic groups into semantic blocks and stores those blocks, and vice versa. The process can start with text and find the corresponding morphological tokens, syntactic groups and/or semantic blocks or start with semantic block(s) and find the corresponding morphological tokens. | 09-09-2010 |
20100228539 | METHOD AND APPARATUS FOR PSYCHOMOTOR AND PSYCHOLINGUISTIC PREDICTION ON TOUCH BASED DEVICE | 09-09-2010 |
20100228540 | Methods and Systems for Query-Based Searching Using Spoken Input - Systems and methods for query-based searching using spoken input are disclosed. In systems and methods according to embodiments of the invention, continuous speech natural language queries are accepted from a user using a client device. Speech processing tasks are divided between the client device and one or more server systems. Once user speech is recognized, the system searches one or more data repositories containing queries for at least one query that matches the recognized speech and returns information related to the query. | 09-09-2010 |
20100235163 | METHOD AND SYSTEM FOR ENCODING CHINESE WORDS - A Chinese character or word encoding system and method for encoding a Unicode Differentiation Index (UDI) into the least significant 3 bits of one of the three component color of the foreground color of the RTF Chinese text. This encoded UDI value allows the correct identification of the encoded Chinese word. It also allows the identification of the traditional Chinese or simplified Chinese counterpart correctly. Further, the encoded UDI allows the identification of the font file differentiator when user is generating a correct Dualese script for a given Chinese word, wherein Dualese refers to a dual-script-in-one type of script. | 09-16-2010 |
20100235164 | QUESTION-ANSWERING SYSTEM AND METHOD BASED ON SEMANTIC LABELING OF TEXT DOCUMENTS AND USER QUESTIONS - A question-answering system for searching exact answers in text documents provided in the electronic or digital form to questions formulated by user in the natural language is based on automatic semantic labeling of text documents and user questions. The system performs semantic labeling with the help of markers in terms of basic knowledge types, their components and attributes, in terms of question types from the predefined classifier for target words, and in terms of components of possible answers. A matching procedure makes use of mentioned types of semantic labels to determine exact answers to questions and present them to the user in the form of fragments of sentences or a newly synthesized phrase in the natural language. Users can independently add new types of questions to the system classifier and develop required linguistic patterns for the system linguistic knowledge base. | 09-16-2010 |
20100235165 | SYSTEM AND METHOD FOR AUTOMATIC SEMANTIC LABELING OF NATURAL LANGUAGE TEXTS - Systems and methods for automatic semantic labeling of natural language documents provided in electronic or digital form include a semantic processor that performs a basic linguistic analysis of text, including recognizing in the text semantic relationships of the type objects and/or classes of objects, facts and cause-effect relationships; matching linguistically analyzed text against target semantic relationship patterns, created by generalization of particular cases of target semantic relationships; and generating semantic relationship labels based on linguistically analyzed text and a result of the matching. | 09-16-2010 |
20100241418 | VOICE RECOGNITION DEVICE AND VOICE RECOGNITION METHOD, LANGUAGE MODEL GENERATING DEVICE AND LANGUAGE MODEL GENERATING METHOD, AND COMPUTER PROGRAM - A speech recognition device includes one intention extracting language model and more in which an intention of a focused specific task is inherent, an absorbing language model in which any intention of the task is not inherent, a language score calculating section that calculates a language score indicating a linguistic similarity between each of the intention extracting language model and the absorbing language model, and the content of an utterance, and a decoder that estimates an intention in the content of an utterance based on a language score of each of the language models calculated by the language score calculating section. | 09-23-2010 |
20100241419 | Method for identifying the integrity of information - A preferred method for identifying at least one of a grammatical, linguistic and/or conceptual integrity of a data corpus is disclosed. In a preferred method, the associations between several word elements of a data corpus are identified. Then, the word elements experiencing several associations are used for identifying the continuum between associations and the number of word elements involved and/or not involved in the associations which is then used for identifying at least one of a: linguistic, semantic, grammatical, conceptual or other integrity or coherence of the analyzed data corpus, such as a query for optionally displaying a data corpus understanding and/or selecting a particular search behavior or other. | 09-23-2010 |
20100241420 | AUTOMATED SENTENCE PLANNING IN A TASK CLASSIFICATION SYSTEM - The invention relates to a system that interacts with a user in an automated dialog system ( | 09-23-2010 |
20100250235 | TEXT ANALYSIS USING PHRASE DEFINITIONS AND CONTAINERS - In one example, a phrase analyzer may analyze a text input stream to identify phrases contained in the text input stream. The phrase analyzer may receive a specification, which includes dictionaries of phrases and synonyms, and a specification of the phrases, or sequences of phrases to be matched. The phrase analyzer may compare the input stream to the specification and may produce, as output, an identification of which phrases appear in the input stream, and where in the input stream those phrases occur. | 09-30-2010 |
20100250236 | COMPUTER-ASSISTED ABSTRACTION OF DATA AND DOCUMENT CODING - A computer-assisted method of abstracting and coding data includes receiving one or more documents is disclosed. The methods and systems extract information from a record based on extraction rules that correspond to an identified record type, determine codes corresponding to the information extracted from the record, present the correspondence between the extracted information and the codes, receive from the user-input device a validation of the correspondence between the extracted information and one of the codes, and output a report including the validated information and the validated code. | 09-30-2010 |
20100250237 | INTERACTIVE MANUAL, SYSTEM AND METHOD FOR VEHICLES AND OTHER COMPLEX EQUIPMENT - A method and system of providing an interactive manual, including a speech engine to receive and process speech from a user, convert the speech into a word sequence, and identify meaning structures from the word sequence, a structured manual including information related to an operation of a device, a visual model to relate visual representation of the information, a dialog management arrangement to interpret the meaning structures in a context and to extract pertinent information and the visual representation from the structured manual and the visual model, and an output arrangement to output the information and visual representation. | 09-30-2010 |
20100250238 | Lexical Association Metric for Knowledge-Free Extraction of Phrasal Terms - A method and system for determining a lexical association of phrasal terms are described. A corpus having a plurality of words is received, and a plurality of contexts including one or more context words proximate to a word in the corpus is determined. An occurrence count for each context is determined, and a global rank is assigned based on the occurrence count. Similarly, a number of occurrences of a word being used in a context is determined, and a local rank is assigned to the word-context pair based on the number of occurrences. A rank ratio is then determined for each word-context pair. A rank ratio is equal to the global rank divided by the local rank for a word-context pair. A mutual rank ratio is determined by multiplying the rank ratios corresponding to a phrase. The mutual rank ratio is used to identify phrasal terms in the corpus. | 09-30-2010 |
20100262419 | METHOD OF CONTROLLING COMMUNICATIONS BETWEEN AT LEAST TWO USERS OF A COMMUNICATION SYSTEM - A communication system includes at least a sound re-production system ( | 10-14-2010 |
20100268528 | Method & Apparatus for Identifying Contract Characteristics - A contract characteristic identification application includes a user interface, a plurality of contract characteristic definitions, a natural language processing module and a characteristic identification function. At least one contract characteristic is defined and evaluated and the text of at least one contract is entering into the application. A document evaluation function included in the natural language processing module operates to evaluate the contents of the text of the contract against the defined contract characteristic and returns a listing of contract text that is closest to the defined contract characteristic of interest. | 10-21-2010 |
20100280819 | Dialog Design Apparatus and Method - This invention relates to a dialog design apparatus and method. More specifically, this invention relates to a state oriented dialog design apparatus and method to facilitate the creation of natural language dialogs and creating data structures for voice user interfaces. The dialog design apparatus may include inputting means for receiving a user's prompt; response generating means for the user to generating at least one response; dialog structure generating means for structurally managing the user's input and response; and output means for outputting and displaying at least one dialog structure. A state in the present invention may include at least one system prompt and at least one response, and a linking unit may link a first state to a second state related to the first state, link the second state to a third state, and so on until certain system actions are achieved. A loop detecting unit in the present invention detects and identifies loops in the dialog structure. | 11-04-2010 |
20100280820 | INTERACTIVE VOICE RESPONSE SYSTEM - Methods and systems for testing and analyzing integrated voice response systems are provided. Computer devices are used to simulate caller responses or inputs to components of the integrated voice response systems. The computer devices receive responses from the components. The responses may be in the form of VXML and grammar files that are used to implement call flow logic. The responses may to analyzed to evaluate the performance of the components and/or call flow logic. | 11-04-2010 |
20100286979 | AUTOMATIC CONTEXT SENSITIVE LANGUAGE CORRECTION AND ENHANCEMENT USING AN INTERNET CORPUS - A computer-assisted language correction system including spelling correction functionality, misused word correction functionality, grammar correction functionality and vocabulary enhancement functionality utilizing contextual feature-sequence functionality employing an internet corpus. | 11-11-2010 |
20100292984 | METHOD FOR QUICKLY INPUTTING CORRELATIVE WORD - The present invention provides a text input method, which is integrated in a text input program or device supporting word input (e.g., software/hardware keyboard, input method, etc.) and assists a user in easily inputting a word or a phrase (e.g., various tense forms of a verb, etc.) relating to a certain word. The user may fast input a specific word relating to the certain word by a specific operation (e.g., clicking a software or hardware key, moving a screen contact point, etc.) or by a combination of a plurality of operations. | 11-18-2010 |
20100292985 | DOCUMENT MANAGEMENT APPARATUS AND DOCUMENT MANAGEMENT METHOD - A document management apparatus is aimed at easily processing, managing and reusing newly taken image data in accordance with user's needs. The apparatus includes: a document area analyzing unit configured to analyze and extract a document area from image data; a text information analyzing unit configured to analyze and extract text information with respect to the document area; a text information semantic analysis unit configured to analyze and extract semantics of the text information from the text information; a managing unit configured to associate the document area, the text information and the semantics of the text information with each other, and manage them as integrated information; an integrated information presenting unit configured to present to a user at least the semantics of the text information, of the integrated information managed by the managing unit; and a user-designated semantic setting unit configured to be capable of allowing the user to change the semantics of the text information presented by the integrated information presenting unit and to set the changed semantics. | 11-18-2010 |
20100299135 | Automated Extraction of Semantic Content and Generation of a Structured Document from Speech - Techniques are disclosed for automatically generating structured documents based on speech, including identification of relevant concepts and their interpretation. In one embodiment, a structured document generator uses an integrated process to generate a structured textual document (such as a structured textual medical report) based on a spoken audio stream. The spoken audio stream may be recognized using a language model which includes a plurality of sub-models arranged in a hierarchical structure. Each of the sub-models may correspond to a concept that is expected to appear in the spoken audio stream. Different portions of the spoken audio stream may be recognized using different sub-models. The resulting structured textual document may have a hierarchical structure that corresponds to the hierarchical structure of the language sub-models that were used to generate the structured textual document. | 11-25-2010 |
20100299136 | Dialogue System and a Method for Executing a Fully Mixed Initiative Dialogue (FMID) Interaction Between a Human and a Machine - A method for executing a fully mixed initiative dialogue (FMID) interaction between a human and a machine, a dialogue system for a FMID interaction between a human and a machine and a computer readable data storage medium having stored thereon computer code for instructing a computer processor to execute a method for executing a FMID interaction between a human and a machine are provided. The method includes retrieving a predefined grammar setting out parameters for the interaction; receiving a voice input; analysing the grammar to dynamically derive one or more semantic combinations based on the parameters; obtaining semantic content by performing voice recognition on the voice input; and assigning the semantic content as fulfilling the one or more semantic combinations. | 11-25-2010 |
20100299137 | STORAGE MEDIUM STORING PRONUNCIATION EVALUATING PROGRAM, PRONUNCIATION EVALUATING APPARATUS AND PRONUNCIATION EVALUATING METHOD - A game apparatus includes a CPU, and the CPU evaluates a pronunciation of a user with respect to an original sentence (ES). First, envelops as to a volume of a voice of the original sentence (ES) and a volume of a voice of the user are taken, and the average values of the volumes are made uniform. When the volumes are made uniform to each other, a degree of similarity (scoreA) of distributions of local solutions when the volumes are equal to or more than the average values, a degree of similarity (scoreB) of distributions (timing of concaves/convexes of the waveform) of values of the high or low level indicating whether or not the volume is equal to or more than a value multiplying the average value by a predetermined value, and a degree of similarity (scoreC) of dispersion values (dispersion of concaves/convexes of the waveform) of the envelopes are evaluated by utilizing the respective envelopes. On the basis of these degree of similarities (scoreA, scoreB, scoreC), the rhythm of the pronunciation by the user is evaluate. | 11-25-2010 |
20100299138 | APPARATUS AND METHOD FOR LANGUAGE EXPRESSION USING CONTEXT AND INTENT AWARENESS - A language expression apparatus and a method based on a context and a intent awareness, are provided. The apparatus and method may recognize a context and an intent of a user and may generate a language expression based on the recognized context and the recognized intent, thereby providing an interpretation/translation service and/or providing an education service for learning a language. | 11-25-2010 |
20100299139 | METHOD FOR PROCESSING NATURAL LANGUAGE QUESTIONS AND APPARATUS THEREOF - A method and an apparatus for selecting an answer to a natural language question. The method includes: detecting a named entity in the natural language question; extracting information related to an answer from the natural language question; searching in linked data according to the detected named entity; generating a candidate answer according to a search result; parsing the candidate answer according to the information related to the answer; and obtaining a value of a feature of the candidate answer; and evaluating each candidate answer by synthesizing the value of the feature of the candidate answer. | 11-25-2010 |
20100299140 | IDENTIFYING AND ROUTING OF DOCUMENTS OF POTENTIAL INTEREST TO SUBSCRIBERS USING INTEREST DETERMINATION RULES - A method, system and computer program product for identifying documents of interest. A profile of a subscriber is created based on information obtained about the subscriber. Subscriber-interest determination rules are used to identify potential topics of interest of the subscriber based on the subscriber's profile as well as based on external knowledge sources. Each potential interest of the subscriber may be represented by a pointer that references a concept. Additionally, concepts in the documents published by the publishers are identified. A comparison may be made between the concepts identified in the documents published by the publishers with those concepts representing the potential topics of interests of the subscriber. Those documents with matching concepts may then be identified as potentially being of interest for the subscriber. In this manner, documents of interest are more accurately identified for the document seeker. | 11-25-2010 |
20100299141 | Document Based Character Ambiguity Resolution - Methods and apparatus for document based ambiguous character resolution. An application searches a document for words that do not contain ambiguous characters and adds them to a dictionary, then searches the document for words that do contain ambiguous characters. For each ambiguous word, a set of candidate solutions is created by resolving the ambiguous characters in all possible ways. The dictionary is searched for words matching members of the candidate solution set. When a single member is matched, the ambiguous characters are resolved accordingly. When no member or more than one member is matched, a user is prompted to resolve the ambiguous characters. Alternatively, when more than one member is matched, the ambiguous characters are resolved to obtain the largest word, the smallest word, the most words, or the fewest words. | 11-25-2010 |
20100299142 | SYSTEM AND METHOD FOR SELECTING AND PRESENTING ADVERTISEMENTS BASED ON NATURAL LANGUAGE PROCESSING OF VOICE-BASED INPUT - A system and method for selecting and presenting advertisements based on natural language processing of voice-based inputs is provided. A user utterance may be received at an input device, and a conversational, natural language processor may identify a request from the utterance. At least one advertisement may be selected and presented to the user based on the identified request. The advertisement may be presented as a natural language response, thereby creating a conversational feel to the presentation of advertisements. The request and the user's subsequent interaction with the advertisement may be tracked to build user statistical profiles, thus enhancing subsequent selection and presentation of advertisements. | 11-25-2010 |
20100305941 | AUTOMATION OF AUDITING CLAIMS - Described are computer-based methods and apparatuses, including computer program products, for automation of auditing claims. Data indicative of an insurance company name is received, the data comprising one or more words. The data is processed through one or more processing steps to generate processed data comprising one or more processed words. One or more candidate word strings are selected based on the one or more processed words. Matching information is associated with each of the one or more candidate word strings. Analysis information is generated for each of the one or more candidate word strings based on the associated matching information. An insurance company identifier is associated with received data based on the analysis information and one or more matching rules. | 12-02-2010 |
20100305942 | METHOD AND APPARATUS FOR GENERATING A LANGUAGE INDEPENDENT DOCUMENT ABSTRACT - A method of extracting significant phrases from one or more documents stored in a computer-readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract. | 12-02-2010 |
20100312547 | CONTEXTUAL VOICE COMMANDS - Among other things, techniques and systems are disclosed for implementing contextual voice commands. On a device, a data item in a first context is displayed. On the device, a physical input selecting the displayed data item in the first context is received. On the device, a voice input that relates the selected data item to an operation in a second context is received. The operation is performed on the selected data item in the second context. | 12-09-2010 |
20100312548 | Querying Dialog Prompts - Implementations use hash values in proxy for images to enable aggregating of images for creating a knowledge base regarding certain images determined to be of interest. | 12-09-2010 |
20100318347 | Content-Based Audio Playback Emphasis - Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript. | 12-16-2010 |
20100318348 | APPLYING A STRUCTURED LANGUAGE MODEL TO INFORMATION EXTRACTION - One feature of the present invention uses the parsing capabilities of a structured language model in the information extraction process. During training, the structured language model is first initialized with syntactically annotated training data. The model is then trained by generating parses on semantically annotated training data enforcing annotated constituent boundaries. The syntactic labels in the parse trees generated by the parser are then replaced with joint syntactic and semantic labels. The model is then trained by generating parses on the semantically annotated training data enforcing the semantic tags or labels found in the training data. The trained model can then be used to extract information from test data using the parses generated by the model. | 12-16-2010 |
20100324888 | SOLVING CONSTRAINT SATISFACTION PROBLEMS FOR USER INTERFACE AND SEARCH ENGINE - A method for interpreting a Natural Language by an artificial construct using constraint satisfaction problem solving, comprises a) providing a plurality of ways suitable to define at least a grammar for at least a Natural Language, b) providing a plurality of constraint satisfaction problem instructions c) providing a plurality of values for solving a plurality of constraints, d) converting said plurality of constraints to at least one constraint satisfaction problem pattern, e) receiving a Natural Language construct, f) unifying said plurality of constraints through said at least one constraint satisfaction problem pattern at execution runtime by the artificial construct to solve theconstraint satisfaction problem, g) interpreting said Natural Language construct according to a plurality of constraint satisfaction problem instructions, and h) answering to a Natural Language construct by a Natural Language construct. | 12-23-2010 |
20100324889 | ENABLING GLOBAL GRAMMARS FOR A PARTICULAR MULTIMODAL APPLICATION - Methods, apparatus, and computer program products are described for enabling global grammars for a particular multimodal application according to the present invention by loading a multimodal web page; determining whether the loaded multimodal web page is one of a plurality of multimodal web pages of the particular multimodal application. If the loaded multimodal web page is one of the plurality of multimodal web pages of the particular multimodal application, enabling global grammars typically includes loading any currently unloaded global grammars of the particular multimodal application identified in the multimodal web page and maintaining any previously loaded global grammars. If the loaded multimodal web page is not one of the plurality of multimodal web pages of the particular multimodal application, enabling global grammars typically includes unloading any currently loaded global grammars. | 12-23-2010 |
20100332217 | Method for text improvement via linguistic abstractions - This invention provides hierarchical, gradual and iterative methods, systems, and software for improving and correcting natural language text. The methods comprise the steps of applying natural language processing (NLP) algorithms to a corpus of sentences so as to abstract each sentence; applying scoring and linguistic annotation to each abstract sentence; applying NLP algorithms to abstract input sentences; applying search algorithms to match an abstract input sentence to at least one abstract corpus sentence; and applying NLP algorithms to adapt said matched abstract corpus sentence to the input sentence. | 12-30-2010 |
20100332218 | KEYWORD BASED MESSAGE HANDLING - An apparatus comprising a controller, wherein said controller is configured to display a message text, receive an input indicating a keyword; determine an associated operation and to generate a response message according to the associated operation. | 12-30-2010 |
20100332219 | METHOD AND APPARATUS FOR DETERMINING TEXT PASSAGE SIMILARITY - According to one embodiment of the invention, a method classifying a number of noun phrases in a first text passage and a second text passage into a number of classifications. The method also includes determining a similarity between a noun phrase from the first text passage and a noun phase from the second text passage for each of the noun phrases of a same classification. Additionally, a similarity between a sentence from the first text passage and a sentence from the second text passage is determined for each of the sentences in the first and second text passages based on similarities between the noun phrases. The method also includes determining a similarity between the first text passage and the second text passage based on a similarity between sentences. | 12-30-2010 |
20110004462 | Generating Topic-Specific Language Models - Speech recognition may be improved by generating and using a topic specific language model. A topic specific language model may be created by performing an initial pass on an audio signal using a generic or basis language model. A speech recognition device may then determine topics relating to the audio signal based on the words identified in the initial pass and retrieve a corpus of text relating to those topics. Using the retrieved corpus of text, the speech recognition device may create a topic specific language model. In one example, the speech recognition device may adapt or otherwise modify the generic language model based on the retrieved corpus of text. | 01-06-2011 |
20110004463 | SYSTEMS AND METHODS FOR EXTRACTING PATTERNS FROM GRAPH AND UNSTRUCTURED DATA - A computing system receives input data having both graph and unstructured data and computes a current log likelihood of the input data. The computing system compares the current log likelihood with a previous log likelihood of the input data. If the current log likelihood is larger than the previous log likelihood, the computing system update topic modeling parameters, community modeling parameters, and the link generation parameter until the computing system obtains a maximal value of the log likelihood of the input data. Then, the computing system creates a graph indicating topic similarity between the input data based on the topic modeling parameters, creates another graph indicating community similarity between entities associated with the input data based on the community modeling parameters, and predicts a link existence between input data or entities based on the link generation parameter, the topic modeling parameter and the community modeling parameter. | 01-06-2011 |
20110004464 | METHOD AND SYSTEM FOR SMART MARK-UP OF NATURAL LANGUAGE BUSINESS RULES - Smart Mark-up or highlighting delimits a rule using ontology technology to identify words and fields as objects and/or possible values in the rule. These technologies support the user in formalizing parts of the rules in a manner consistent with the system's data. | 01-06-2011 |
20110004465 | Computation and Analysis of Significant Themes - Systems and computer-implemented processes for computation and analysis of significant themes in a corpus of documents. The computation and analysis of significant themes can be executed on a processor and involves generating a lexical unit document association (LUDA) vector for each lexical unit that has been provided and quantifying similarities between each unique pair of lexical units. The LUDA vector characterizes a measure of association between its corresponding lexical unit and documents in the corpus. The lexical units can then be grouped into clusters such that each cluster contains a set of lexical units that are most similar as determined by the LUDA vectors and a predetermined clustering threshold. | 01-06-2011 |
20110010163 | METHOD, DEVICE, COMPUTER PROGRAM AND COMPUTER PROGRAM PRODUCT FOR PROCESSING LINGUISTIC DATA IN ACCORDANCE WITH A FORMALIZED NATURAL LANGUAGE - A method, device and computer program product for processing, in a computer system, linguistic data in accordance with a grammar of a Formalized Natural Language. The grammar of the Formalized Natural language is a text grammar operating on a set of texts of type Text. This text grammar is defined by a set of four elements W, N, R and Text. W is a finite set of invariable words of type Word, to be used as terminal, elementary expressions of a text. N is a finite set of non-terminal help symbols, to be used for the derivation and the representation of texts. R is a finite set of inductive rules for the production of grammatical expressions of the Formalized Natural Language, and Text is an element of N and start-symbol for grammatical derivation of all texts of type Text of the Formalized Natural Language. Linguistic data to be processed are acquired and processed in accordance with the Formalized Natural Language. A physical representation of a processed syntactic and semantic structure of the linguistic data is provided. | 01-13-2011 |
20110010164 | SYSTEM AND METHOD FOR GENERATING MANUALLY DESIGNED AND AUTOMATICALLY OPTIMIZED SPOKEN DIALOG SYSTEMS - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for generating a natural language spoken dialog system. The method includes nominating a set of allowed dialog actions and a set of contextual features at each turn in a dialog, and selecting an optimal action from the set of nominated allowed dialog actions using a machine learning algorithm. The method includes generating a response based on the selected optimal action at each turn in the dialog. The set of manually nominated allowed dialog actions can incorporate a set of business rules. Prompt wordings in the generated natural language spoken dialog system can be tailored to a current context while following the set of business rules. A compression label can represent at least one of the manually nominated allowed dialog actions. | 01-13-2011 |
20110010165 | APPARATUS AND METHOD FOR OPTIMIZING A CONCATENATE RECOGNITION UNIT - An apparatus and method for optimizing a concatenate recognition unit are provided. The apparatus and method of optimizing a concatenate recognition unit may generate an optimized concatenate recognition unit based on a basic language model generated using the concatenate recognition unit extracted from statistical information. | 01-13-2011 |
20110015921 | SYSTEM AND METHOD FOR USING LINGUAL HIERARCHY, CONNOTATION AND WEIGHT OF AUTHORITY - An authoring environment comprising a linguistic construction tool and method to allow qualitative search and representation of results that may use one or any combination of lingual hierarchy, connotation and weight of authority for constructing a multidimensional conceptual model applicable to one or more documents. The linguistic construction tool and method may be used to augment the authoring process and the resulting documents. The linguistic construction tool may also be used to perform search related activities. | 01-20-2011 |
20110029301 | METHOD AND APPARATUS FOR RECOGNIZING SPEECH ACCORDING TO DYNAMIC DISPLAY - A speech recognition apparatus and method that can improve speech recognition rate and recognition speed by reflecting information for dynamic display, are provided. The speech recognition apparatus generates a display variation signal indicating that variations have occurred on a screen and creates display information about the varied screen. The speech recognition apparatus adjusts a word weight for at least one word related to the varied screen and a domain weight for at least one domain included in the varied screen, according to the display variation signal and the display information. The adjusted word weight and the adjusted domain weight are dynamically reflected in a language model that is used for speech recognition. | 02-03-2011 |
20110029302 | METHOD AND SYSTEM FOR CANDIDATE MATCHING - A method and system for candidate matching, such as used in match-making services, assesses narrative responses to measure candidate qualities. A candidate database includes self-assessment data and narrative data. Narrative data concerning a defined topic is analyzed to determine candidate qualities separate from topical information. Candidate qualities thus determined are included in candidate profiles and used to identify desirable candidates. | 02-03-2011 |
20110029303 | WORD CLASSIFICATION SYSTEM, METHOD, AND PROGRAM - A word classification system is provided with an inter-word pattern learning section for learning at least either the context information or the layout information between classification-known words which co-appear and creating an inter-word pattern for determining whether data relating to a word pair which is a combination of words is data relating to a same-classification word pair which is the combination of words in the same classification or data relating to a different-classification word pair which is a combination of words in different classifications on the basis of the relationship between the classification-known words which co-appear in a document. | 02-03-2011 |
20110035208 | System and Method for Extracting Radiological Information Utilizing Radiological Domain Report Ontology and Natural Language Processing - A system and method that employs radiological report domain ontology and natural language processing to specify and model historical radiological information as knowledge is provided. The system and method trains a statistical probability based natural language processing system to recognize the semantics of a radiological domain. A methodology is provided to submit portions or the entire content of textual historical radiological report to a natural language processor wherein such data is interpreted and reported in a structured hierarchy. | 02-10-2011 |
20110035209 | Entry of text and selections into computing devices - Aids for improving the use of computing devices incorporating touch sensitive screens and other computing devices, including a method for correcting words incorrectly entered into a computing device which has the steps of: selecting as the word to be corrected one of the one or more words displayed on a computing device display screen during use of text entry software; entering text correction mode and leaving the text entry program; displaying the characters comprising the word to be corrected in such a way that each character can be selected individually by the user; selecting a character to be corrected or deleted, or a character adjacent where a missing character(s) will be inserted; correcting the character selected in the previous step (which can include deleting the character selected) or inserting a character(s); optionally repeating the last two steps to correct additional characters until the word selected to be corrected is changed to a corrected word to which no more changes or corrections need to be made; exiting correction mode and re-entering the text entry program; and replacing the word selected to be corrected with the corrected word. | 02-10-2011 |
20110035210 | CONDITIONAL RANDOM FIELDS (CRF)-BASED RELATION EXTRACTION SYSTEM - A system for extracting information from text, the system including parsing functionality operative to parse a text using a grammar, the parsing functionality including named entity recognition functionality operative to recognize named entities and recognition probabilities associated therewith and relationship extraction functionality operative to utilize the named entities and the probabilities to determine relationships between the named entities, and storage functionality operative to store outputs of the parsing functionality in a database. | 02-10-2011 |
20110040553 | NATURAL LANGUAGE PROCESSING - A method and system for computational interpretation of natural language, wherein in an input string is received from input means. The input string is first tokenizde for providing a list of words. Then the list of words is stemmed for providing the words in the root form. The stemmed list is then tagged for providing classification tags for each word, which allows generating the context sensitive information for each word. Lastly said tags are used for parsing the structural dependencies for each word. | 02-17-2011 |
20110040554 | Automatic Evaluation of Spoken Fluency - A procedure to automatically evaluate the spoken fluency of a speaker by prompting the speaker to talk on a given topic, recording the speaker's speech to get a recorded sample of speech, and then analyzing the patterns of disfluencies in the speech to compute a numerical score to quantify the spoken fluency skills of the speakers. The numerical fluency score accounts for various prosodic and lexical features, including formant-based filled-pause detection, closely-occurring exact and inexact repeat N-grams, normalized average distance between consecutive occurrences of N-grams. The lexical features and prosodic features are combined to classify the speaker with a C-class classification and develop a rating for the speaker. | 02-17-2011 |
20110040555 | System and method for creating and playing timed, artistic multimedia representations of typed, spoken, or loaded narratives, theatrical scripts, dialogues, lyrics, or other linguistic texts - A system and method generate artistic multimedia representations of user-input texts, spoken or loaded narratives, theatrical scripts, or other linguistic corpus types, via a user interface, or batch interface, by classifying component words, and/or phrases into lexemes and/or parts of speech, and interpreting said classifications to construct playable structures. A database of natural language grammatical rules, a set of media objects, parameters, and rendering directives, and an algorithm facilitate the generation of sequential scenes from grammatical representations, convert user-input texts into playable structures of graphics, sounds, animations, and modifications, where playable structures may be combined to create a scene, or multiple scenes, and may be played in the order of occurrence in the input text as a sequential and timed multimedia representation of the input, and subsequently output, in real-time, or stored in memory for later output, via output devices such as a monitor and/or speakers. | 02-17-2011 |
20110046943 | METHOD AND APPARATUS FOR PROCESSING DATA - A data processing method and apparatus that may set emotion based on development of a story are provided. The method and apparatus may set emotion without inputting emotion for each sentence of text data. Emotion setting information is generated based on development of the story and the like, and may be applied to the text data. | 02-24-2011 |
20110046944 | PLAIN ENGLISH DOCUMENT TRANSLATION METHOD - A method for building a plain English guide containing translations of a complex document is provided. The method begins by creating a working template from an electronic copy of the complex document using a computing device. Next, the method displays the working template on a user interface connected to the computing device. After that, the method performs one or more translation actions. Then, the method selects one of a set of custom icons after performing each of the translation actions. Finally, the method produces the plain English guide using the computing device to generate a report. | 02-24-2011 |
20110054882 | MECHANISM FOR IDENTIFYING INVALID SYLLABLES IN DEVANAGARI SCRIPT - A mechanism for identifying invalid syllables in Devanagari script is disclosed. A method of embodiments of the invention includes receiving Devanagari text from an application of a computing device for parsing, determining a character type for a character of the Devanagari text, determining a new state associated with the character by referencing a Devanagari state machine with the determined character type and a current state of the Devanagari text, and transmitting an invalid syllable signal to the application for display on a display device to an end user of the application if the determined new state is invalid. | 03-03-2011 |
20110054883 | SPEECH UNDERSTANDING SYSTEM USING AN EXAMPLE-BASED SEMANTIC REPRESENTATION PATTERN - A speech understanding apparatus includes: a speech recognition unit for recognizing an input speech to produce a speech recognition result; a sentence analysis unit for performing morpheme analysis on a sentence corresponding to the speech recognition result, extracting additional information, and performing syntax analysis; a hierarchy describing unit for describing hierarchy of the sentence; a class transformation unit for performing class transformation on the sentence; a semantic representation determination unit for marking optional expressions for the sentence, deleting meaningless expressions and the additional information, converting the sentence into its base form, and deleting morphemic tags or symbols to determine a semantic representation; a semantic representation retrieval unit for retrieving the determined semantic representation from an example-based semantic representation pattern database; and a retrieval result processing unit for selectively producing a retrieved semantic representation. | 03-03-2011 |
20110054884 | SYSTEM FOR ASSISTING IN DRAFTING APPLICATIONS - This invention relates to a Method and System for assisting in drafting applications comprising a server ( | 03-03-2011 |
20110060584 | ERROR CORRECTION USING FACT REPOSITORIES - The disclosed system and method apply stores of factual information to correct errors in digital text, for example, generated from OCR, speech and/or handwriting recognition devices, and other automatic recognition devices. A text produced by OCR, speech recognition, handwriting recognition, and others may be processed to extract discussed facts. Databases of facts are searched based on information in the text. After comparing facts asserted in the text with the factual data from the databases, suggested corrections of the text are produced. | 03-10-2011 |
20110060585 | INPUTTING METHOD BY PREDICTING CHARACTER SEQUENCE AND ELECTRONIC DEVICE FOR PRACTICING THE METHOD - The present invention relates to a method of predicting and entering a character string and an electronic device in which the method is implemented. The method of predicting and entering a character string includes a step (S | 03-10-2011 |
20110066424 | Text Stitching From Multiple Images - A reading machine has processing for detecting common text between a pair of individual images. The reading machine combines the text from the pair of images into a file or data structure if common text is detected, and determines if incomplete text phrases are present in the common text. If incomplete text phrases are present, the machine signals a user to move an image input device in a direction to capture more of the text. | 03-17-2011 |
20110071819 | APPARATUS, SYSTEM, AND METHOD FOR NATURAL LANGUAGE PROCESSING - Various embodiments are described for searching and retrieving documents based on a natural language input. A computer-implemented natural language processor electronically receives a natural language input phrase from an interface device. The natural language processor attributes a concept to the phrase with the natural language processor. The natural language processor searches a database for a set of documents to identify one or more documents associated with the attributed concept to be included in a response to the natural language input phrase. The natural language processor maintains the concepts during an interactive session with the natural language processor. The natural language processor resolves ambiguous input patterns in the natural language input phrase with the natural language processor. The natural language processor includes a processor, a memory and/or storage component, and an input/output device. | 03-24-2011 |
20110077936 | SYSTEM AND METHOD FOR GENERATING VOCABULARY FROM NETWORK DATA - A method is provided in one example and includes receiving data propagating in a network environment and separating the data into one or more fields. At least some of the fields are evaluated in order to identify nouns and noun phrases within the fields. The method also includes identifying selected words within the nouns and noun phrases based on a whitelist and a blacklist. The whitelist includes a plurality of designated words to be tagged and the blacklist includes a plurality of rejected words that are not to be tagged. A resultant composite is generated for the selected nouns and noun phrases that are tagged. The resultant composite is incorporated into the whitelist if the resultant composite is approved. | 03-31-2011 |
20110082686 | REDUCED KEYBOARD SYSTEM AND A METHOD FOR GLOBAL DISAMBIGUATION - A reduced keyboard system for text input comprising: a first keyboard having a first plurality of keys, the keys being adapted to be keystroked for input of a word; a virtual keyboard having a plurality of virtual keys, the plurality of virtual keys corresponding respectively to the first plurality of keys and wherein the virtual keyboard is adapted to generate a linear pattern from the keystroked keys of the first keyboard; and a dictionary database associated with the virtual keyboard, the dictionary database having a plurality of classes wherein each of the classes contains at least one candidate word having first and last letters corresponding to predetermined keys of the virtual keyboard, wherein the linear pattern and dictionary database are adapted to enable recognition and disambiguation of the inputted word. | 04-07-2011 |
20110082687 | METHOD AND SYSTEM FOR TAKING ACTIONS BASED ON ANALYSIS OF ENTERPRISE COMMUNICATION MESSAGES - A computer-based system receives and analyzes digital communication between at least one party in a business enterprise and another party using a natural language analyzer to extract meanings from the message. The system includes a database storing specific actions to be taken upon the detection of specified meanings in such communications. Certain actions may require the system to interrogate the enterprise computer system's database to locate the existence or nature of specified data. The directed actions take the form of communications within an enterprise to assist activities related to the analyzed digital communication. | 04-07-2011 |
20110082688 | Apparatus and Method for Analyzing Intention - An apparatus and system for analyzing intention are provided. The apparatus for analyzing an intention applies a context-free grammar to each of one or more sentences in units of one or more phrases to perform phrase spotting on each sentence, thereby extending a recognition range for an out-of-grammar (OOG) expression. Meanwhile, the apparatus for analyzing an intention determines whether sentences that have undergone phrase spotting are grammatically valid by applying a dependency grammar to the sentences to filter an invalid sentence, and generates the intention analysis result of a valid sentence, thereby and grammatically and/or semantically verifying a sentence that has undergone speech recognition while extending a speech recognition range. | 04-07-2011 |
20110087482 | Method for identifying and manipulating language information - A preferred method and methods for manipulating linguistic information in grammatical disarray are disclosed. In a preferred method, a plurality of word elements in sequential order from a data corpus are analyzed with a conceptual-grammatical relational protocol such as CIRN producing an unsuccessful outcome; wherein said unsuccessful outcome involves the failure of forming an association or failure of identifying an association between said word elements. Then, the word elements are shuffled, forming different sequential orders to later be reanalyzed with same or other conceptual-grammatical relational protocols until a successful outcome is attained; wherein a successful outcome includes at least one of a: association between said word elements, and identification of an association between said word elements. | 04-14-2011 |
20110087483 | EMOTION ANALYZING METHOD, EMOTION ANALYZING SYSTEM, COMPUTER READABLE AND WRITABLE RECORDING MEDIUM AND EMOTION ANALYZING DEVICE - A system for analyzing a sentence emotion is provided. The system comprises a case repository, an input module, a sentence structure analyzing module, a similarity analyzing module and an emotion detection module. The case repository stores several case sentences and each case sentence comprises at least one major term and is corresponding to at least one emotion annotation. The input module receives an input sentence and the sentence structure analyzing module analyzes a sentence structure of the input sentence. The similarity analyzing module performs a semantic analysis and a syntax analysis according to the sentence structure to obtain a similarity level between the input sentence and each of the case sentences. The emotion detection module detects at least one emotion of the input sentence according to the similarity level between the input sentence and each of the case sentences. | 04-14-2011 |
20110087484 | APPARATUS AND METHOD FOR DETECTING SENTENCE BOUNDARIES - Provided are an apparatus and a method for detecting sentence boundaries. The apparatus includes a sentence boundary candidate extracting unit, a document context analyzing unit, a sentence boundary candidate classifying unit, a sentence generating unit. The sentence boundary candidate extracting unit extracts a sentence boundary candidate from an input document. The document context analyzing unit extracts features from information on preceding and following contexts of the sentence boundary candidate. The features are used in two or more statistical algorithms. The sentence boundary candidate classifying unit classifies whether the sentence boundary candidate is a sentence boundary or not, using the features and the two or more statistical algorithms. The sentence generating unit extracts sentence units from the document based on a result of the classification of whether the sentence boundary candidate is a sentence boundary or not. | 04-14-2011 |
20110087485 | NET MODERATOR - A method and an apparatus for moderating an inappropriate relationship between two parties by analyzing a dialog between the two parties. The method and apparatus creates an alert depending upon the nature of the dialog between the two parties. The alert is sent to a third party who can moderate the relationship between the two parties. The third party can ban or block the dialog between the two parties based upon the inappropriate relationship between the two parties. A banning or block of the dialog between the two parties can also be automated. | 04-14-2011 |
20110087486 | SYSTEM, REPORT, AND METHOD FOR GENERATING NATURAL LANGUAGE NEWS-BASED STORIES - The present invention generally relates to a system, report, and method for automatically generating a series of natural language news-based stories to be presented via a digital interface or printed publication to a portfolio user. The disclosure relates to a filter or selection of a handful of relevant and desired financial instruments, or events created in a large group of events such as sports results, travel information, auction related data, online shopping tools, social media, retail store promotion generation, search engine daily report, etc. for a specific use. These financial instruments, based on different selections from a portfolio manager via a management tool, are then used to either produce a strategies page where a list of useful covered call trade and hedged trade are displayed in the form of a table, or natural language news-based stories relating to a selected list of financial instruments found in a portfolio. The events are based on different selections from a portfolio manager via a management tool and are then used to either produce a secondary page where a list of the selected event data is displayed or natural language news-based stories relating to a selected list of events found in a portfolio from a large event database. | 04-14-2011 |
20110093256 | Method of disambiguating information - A preferred method and system for disambiguating information are disclosed. In a preferred method, the word elements of homonyms form a plurality of information corpuses which are analyzed by conceptual and/or grammatical relational analysis such as CIRN for identifying successful outcomes and unsuccessful outcomes; wherein said successful outcome involve the proper grammatical classification of the homonym thus leading to identify the correct meaning. | 04-21-2011 |
20110093257 | Information retrieval through indentification of prominent notions - A system and method for information retrieval from a corpus of text based on offline prominent sentences extraction, and online prominent sentences retrieval ordered by predefined criteria, and recommending online cross-interest prominent sentences. | 04-21-2011 |
20110093258 | SYSTEM AND METHOD FOR TEXT CLEANING - A method and system for cleaning an electronic document are provided. The method comprises: identifying at least one sentence in the electronic document; numerically representing features of the sentence to obtain a numeric feature representation associated with the sentence; inputting the numeric feature representation into a machine learning classifier, the machine learning classifier being configured to determine, based on each numeric feature representation, whether the sentence associated with that numeric feature representation is a bad sentence; and removing sentences determined to be bad sentences from the electronic document to create a cleaned document. | 04-21-2011 |
20110093259 | METHOD AND DEVICE FOR GENERATING VOCABULARY ENTRY FROM ACOUSTIC DATA - A method and a device ( | 04-21-2011 |
20110099001 | System for extracting information from a natural language text - In the method of extraction, the words of the text are encoded by comparing them with the contents of a lexicon of tool words (essentially articles, prepositions, conjunctions, and verbal auxiliaries), and nominal groups are then identified by searching subsets of the resulting succession of encoded words to look for groups of encoded words that comply with predefined syntactical rules. | 04-28-2011 |
20110099002 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM - An information processing apparatus includes a holding unit configured to hold, in advance, presentation data to be presented to a person; a detection unit configured to detect, in a captured image obtained by capturing an image of a photographic subject, the photographic subject; a reading unit configured to read presentation data associated with a detection result of the photographic subject from among items of presentation data held in advance; and an output unit configured to output the read presentation data. | 04-28-2011 |
20110099003 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM - An information processing apparatus includes a category classifying unit configured to classify a document into one or more categories, a word extracting unit configured to extract one or more words from the document, a word score calculating unit configured to calculate a word score for each of the one or more words extracted from the document on the basis of an appearance frequency of the word in each of the one or more categories, the word score serving as an index of interest of the word, a sentence-for-computation extracting unit configured to extract one or more sentences from the document, and a sentence score calculating unit configured to calculate a sentence score for each of the extracted one or more sentences on the basis of the word score calculated by the word score calculating unit, the sentence score serving as an index of interest of the sentence. | 04-28-2011 |
20110106526 | METHOD AND APPARATUS FOR PRUNING SIDE INFORMATION FOR GRAMMAR-BASED COMPRESSION - A computer-implemented method for generating side information for grammar-based data compression systems, such as YK compression systems, is described. An admissible grammar (G) for an input sequence (A(s | 05-05-2011 |
20110106527 | Method and Apparatus for Adapting a Voice Extensible Markup Language-enabled Voice System for Natural Speech Recognition and System Response - A system for analyzing natural language spoken through a voice recognition system comprising: a language separator for separating a natural language expression into multiple word segments; and a grammar module for creating XML-based description sets or binary sets using word segments as input. In a preferred embodiment, the word segments are further processed as class objects and then organized according to original spoken order and wherein content fields are created to contain the class objects for comparison during voice interaction using the voice recognition system. | 05-05-2011 |
20110106528 | Method and System to Automatically Change or Update the Configuration or Setting of a Communication System - A method and device for automatically changing or updating a configuration or setting of a communication system is disclosed. In one aspect, the method includes providing information to the communication system, the information comprising natural human language, storing the information in a digital storage device, detecting a triggering event in the information, and changing the configuration or setting of the communication system automatically using a processor. The information is an input to the communication system, an input from at least one alternate communication system, or a combination of an input to the communication system and an input from the at least one alternate communication system. | 05-05-2011 |
20110112823 | Ellipsis and movable constituent handling via synthetic token insertion - Movable and elliptic constituents are handled in a parser by inserting synthetic tokens that do not occur in the input. Parser actions can push a syntax tree or semantic value to be realized later as a synthetic token, and some synthetic tokens (for cataphoric ellipsis) may be inserted without a prior push but require a later definition. At clause boundary it may be checked that all mandatory tokens have been inserted. | 05-12-2011 |
20110112824 | DETERMINING AT LEAST ONE CATEGORY PATH FOR IDENTIFYING INPUT TEXT - In a method of determining at least one category path for identifying an input text, one or more categories that are most relevant to the input text are determined, one or more concepts that are most relevant to the input text using information from a labeled text data source and the one or more categories determined to be the most relevant to the input text are determined, and one or more category paths through a hierarchy of predefined category levels are determined for one or more of the determined concepts. | 05-12-2011 |
20110112825 | SENTIMENT PREDICTION FROM TEXTUAL DATA - A semantically organized domain space is created from a training corpus. Affective data are mapped onto the domain space to generate affective anchors for the domain space. A sentiment associated with an input text is determined based the affective anchors. A speech output may be generated from the input text based on the determined sentiment. | 05-12-2011 |
20110112826 | SYSTEM AND METHOD FOR SIMULATING EXPRESSION OF MESSAGE - A system and a method for simulating expression of a message are provided. The system comprises a network platform end and at least one user end. The network platform end comprises a message capturing module for capturing a user message; a feature analyzing module for performing a characteristic analysis on content of the user message, so as to mark at least one simulation action tag on the message content; and a simulation message generating module for acquiring simulation instructions corresponding to the at least one simulation action tag and combining the same with the message content to generate a simulation message. The user end comprises a user device for receiving the simulation message and outputting the message content and simulation instructions contained in the simulation message to a simulation device; and the simulation device for playing the received message content and executing corresponding simulation instructions. | 05-12-2011 |
20110112827 | SYSTEM AND METHOD FOR HYBRID PROCESSING IN A NATURAL LANGUAGE VOICE SERVICES ENVIRONMENT - A system and method for hybrid processing in a natural language voice services environment that includes a plurality of multi-modal devices may be provided. In particular, the hybrid processing may generally include the plurality of multi-modal devices cooperatively interpreting and processing one or more natural language utterances included in one or more multi-modal requests. For example, a virtual router may receive various messages that include encoded audio corresponding to a natural language utterance contained in a multi-modal interaction provided to one or more of the devices. The virtual router may then analyze the encoded audio to select a cleanest sample of the natural language utterance and communicate with one or more other devices in the environment to determine an intent of the multi-modal interaction. The virtual router may then coordinate resolving the multi-modal interaction based on the intent of the multi-modal interaction. | 05-12-2011 |
20110112828 | Handheld Electronic Device with Text Disambiguation - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device. | 05-12-2011 |
20110119047 | Joint disambiguation of the meaning of a natural language expression - At least two ambiguous aspects of the meaning of a natural language expression are disambiguated jointly. In the preferred embodiment, word sense ambiguity, reference ambiguity, and relation ambiguity are resolved simultaneously, finding the disambiguation result(s) that simultaneously optimize the weight of the solution, taking into account semantic information, constraints, and common sense knowledge. Choices are enumerated for each constituent being disambiguated, combinations of choices are constructed and evaluated according to semantic information on which meanings are sensible, and the choices with the best weights are selected, with the enumeration pruned aggressively to reduce computational cost. | 05-19-2011 |
20110119048 | SYSTEM AND METHOD FOR COMPUTERIZED PSYCHOLOGICAL CONTENT ANALYSIS OF COMPUTER AND MEDIA GENERATED COMMUNICATIONS TO PRODUCE COMMUNICATIONS MANAGEMENT SUPPORT, INDICATIONS AND WARNINGS OF DANGEROUS BEHAVIOR, ASSESSMET OF MEDIA IMAGES, AND PERSONNEL SELECTION SUPPORT - At least one computer-mediated communication produced by or received by an author is collected and parsed to identify categories of information within it. The categories of information are processed with at least one analysis to quantify at least one type of information in each category. A first output communication is generated regarding the at least one computer-mediated communication, describing the psychological state, attitudes or characteristics of the author of the communication. A second output communication is generated when a difference between the quantification of at least one type of information for at least one category and a reference for the at least one category is detected involving a psychological state, attitude or characteristic of the author to which a responsive action should be taken. The content of the second output communication and the at least one category are programmable to define a psychological state, attitude or characteristic in response to which an action should be taken and the action that is to be taken in response. | 05-19-2011 |
20110119049 | Specializing disambiguation of a natural language expression - Disambiguation of the meaning of a natural language expression proceeds by constructing a natural language expression, and then incrementally specializing the meaning representation to more specific meanings as more information and constraints are obtained, in accordance with one or more specialization hierarchies between semantic descriptors. The method is generalized to disjunctive sets of interpretations that can be specialized hierarchically. | 05-19-2011 |
20110119050 | Method for the automatic determination of context-dependent hidden word distributions - Described is method, the Latent Words Language Model (LWLM), that automatically determines context-dependent word distributions (called hidden or latent words) for each word of a text. The probabilistic word distributions reflect the probability that another word of the vocabulary of a language would occur at that position in the text. Furthermore, a method is described to use these word distributions in statistical language processing applications, such as information extraction applications (for example, semantic role labeling, named entity recognition), automatic machine translation, textual entailment, paraphrasing, information retrieval, and speech recognition. | 05-19-2011 |
20110125487 | Joint disambiguation of syntactic and semantic ambiguity - Ambiguities in a natural language expression are interpreted by jointly disambiguating multiple alternative syntactic and semantic interpretations. More than one syntactic alternative, represented by parse contexts, are analyzed together with joint analysis of referents, word senses, relation types, and layout of a semantic representation for each syntactic alternative. Best combinations of interpretations are selected from all participating parse contexts, and are used to form parse contexts for the next step in parsing. | 05-26-2011 |
20110131033 | Weight-Ordered Enumeration of Referents and Cutting Off Lengthy Enumerations - In many reference resolution problems there are many candidate referents, and the overhead of enumerating them can be considerable. The overhead is reduced by stopping enumeration before all candidate referents have been enumerated, utilizing the properties of ordered and semi-ordered enumerators. Converting semi-ordered enumerators into ordered enumerators and combining several ordered enumerators into a single using dynamic weightings for handling determiner interpretations are disclosed. | 06-02-2011 |
20110131034 | METHOD, A COMPUTER PROGRAM AND APPARATUS FOR PROCESSING A COMPUTER MESSAGE - Embodiments of the invention provide a method, computer program and apparatus for processing a computer message, the method comprising: upon receipt of a computer message at a computer, classifying the computer message and assigning it a message cluster identification in dependence thereon; and, utilising a message template to trans-denotate the message, wherein the message template is selected in dependence on the message cluster identification. | 06-02-2011 |
20110131035 | HANDHELD ELECTRONIC DEVICE AND METHOD FOR DISAMBIGUATION OF COMPOUND TEXT INPUT EMPLOYING DIFFERENT GROUPINGS OF DATA SOURCES TO DISAMBIGUATE DIFFERENT PARTS OF INPUT - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to generate compound language solutions by employing different groupings of data sources to generate different portions of the compound language solutions. | 06-02-2011 |
20110131036 | SYSTEM AND METHOD OF SUPPORTING ADAPTIVE MISRECOGNITION IN CONVERSATIONAL SPEECH - A system and method are provided for receiving speech and/or non-speech communications of natural language questions and/or commands and executing the questions and/or commands. The invention provides a conversational human-machine interface that includes a conversational speech analyzer, a general cognitive model, an environmental model, and a personalized cognitive model to determine context, domain knowledge, and invoke prior information to interpret a spoken utterance or a received non-spoken message. The system and method creates, stores, and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech or non-speech communication and presenting the expected results for a particular question or command. | 06-02-2011 |
20110137638 | ROBUST SPEECH RECOGNITION BASED ON SPELLING WITH PHONETIC LETTER FAMILIES - A system and method for entering a destination into a navigation system, usually a vehicle navigation system, that uses phonetic letter families, or groups of letters which sound similar, to improve the reliability and accuracy of speech recognition. The method involves grouping each letter of the English alphabet into a family of letters which sound similar, such as A, J, and K. When a destination name is spelled by a user, each letter is recognized in terms of the phonetic letter family to which it belongs. This phonetic equivalent spelling is compared to the navigation database of street, city, and state names, which has also been converted to its phonetic equivalent spelling. If a match is found, the user is asked to confirm that this is the desired destination. | 06-09-2011 |
20110137639 | ADAPTING A LANGUAGE MODEL TO ACCOMMODATE INPUTS NOT FOUND IN A DIRECTORY ASSISTANCE LISTING - A statistical language model is trained for use in a directory assistance system using the data in a directory assistance listing corpus. Calculations are made to determine how important words in the corpus are in distinguishing a listing from other listings, and how likely words are to be omitted or added by a user. The language model is trained using these calculations. | 06-09-2011 |
20110137640 | Handheld Electronic Device With Reduced Keyboard and Associated Method of Providing Quick Text Entry in a Message - An improved handheld electronic device having a reduced keyboard provides facilitated language entry by making available to a user certain words that a user may reasonably be expected to enter. In some situations, certain words can be stored, for example, in a temporary dictionary for use in particular situations. For instance, the names of the recipients of an electronic message might be stored in a temporary dictionary for rapid retrieval when entering a salutation in the message. As another example, a number of the words in an existing electronic message may be stored in a temporary dictionary and made available to a user when replying to or forwarding the message since the existing message might include words that the user might reasonably be expected to type in the reply message or the forwarded message. | 06-09-2011 |
20110137641 | INFORMATION ANALYSIS DEVICE, INFORMATION ANALYSIS METHOD, AND PROGRAM - An information analysis device ( | 06-09-2011 |
20110144975 | TYPEWRITER SYSTEM AND TEXT INPUT METHOD USING MEDIATED INTERFACE DEVICE - Disclosed is a typewriter system and a text input method capable of accurately recognizing words by correcting words input using a mediated interface device based on a dictionary. A plurality of texts are combined by referencing a text recognition order set in which recognition results of texts are arranged according to a recognition order from texts input through the mediated interface device and the combined text is filtered using part index maps formed of part words that are an accumulated set of texts forming complete words. The part words passing through the part index maps is again filtered using a dictionary including context information formed of a set of words in a specific category, thereby making it possible to accurately recognize the words. The part words that cannot form words in a dictionary are removed in advance using the part index maps, thereby improving the recognition efficiency. | 06-16-2011 |
20110144976 | APPLICATION USER INTERFACE SYSTEM AND METHOD - The application user interface system and method enables users to give instructions to business software applications by simply entering text in an interface bar in the most intuitive verbal or written human way for carrying out a particular task. The business application, in turn, processes the instruction and presents the summary for review to the users. Upon approval by user, the particular task or request gets executed. | 06-16-2011 |
20110144977 | WRITTEN EXPRESSION DEVELOPMENT SYSTEM - A system is disclosed for developing writing skills, and more particularly, developing written expression skills. The system may include various types of instruction sheets arranged in order of increasing complexity to aid the user in written expression and creative writing skill development. | 06-16-2011 |
20110153310 | MULTIMODAL AUGMENTED REALITY FOR LOCATION MOBILE INFORMATION SERVICE - In one or more embodiments, one or more methods and/or systems described can perform producing a lattice of object hypotheses based on multiple reference objects from image information; receiving input speech information that includes a request for information associated with at least one reference object of the multiple reference objects; producing a lattice of speech hypotheses based on at least a first possible description included in the speech information; producing a lattice of scored semantic hypotheses based on at least the lattice of object hypotheses and the lattice of speech hypotheses; determining that a single semantic interpretation score of the lattice of scored semantic hypotheses exceeds a predetermined value; and providing requested information associated with the at least the first reference object of the plurality of reference objects. | 06-23-2011 |
20110153311 | Method and an apparatus for automatically providing a common modelling pattern - At least one embodiment of the present invention is directed to a method and/or an apparatus for automatically providing a common modelling pattern as a function of a plurality of stored process models. The common modelling patterns are identified according to three substeps, namely semantic annotation, extraction of pattern based description and composite process pattern mining. The detected common modelling patterns serve as best practice candidates as regards process engineering. At least one embodiment of the present invention finds application in a variety of domains being related to process management, such as process design, process mining and semantic process planning. | 06-23-2011 |
20110153312 | METHOD AND COMPUTER SYSTEM FOR AUTOMATICALLY ANSWERING NATURAL LANGUAGE QUESTIONS - A computer system and method for automatically answering natural language questions. The system comprises an input to receive said natural language questions; a data store to record linked pairs of questions and corresponding answers; a matcher configured to compare a said received natural language question withsaid linked question and answer pairs and an output to transfer a said received natural language question to a researcher if no matches are found. The system may further comprise a system to link pairs of questions and corresponding answers into groups, to enable the generation of a prototypicalanswer for each group of pairs of questions and answers and to store said prototypicalanswers in said data store; wherein said matcher compares a said received naturallanguage question with a question in said data store having an associated prototypicalanswer and output said associated prototypical answer for said question in response to said matching. Alternatively, said matcher may be configured to output all linked question and answer pairs which matchsaid received natural language question. The system may be further adapted to distribute natural language questions to be answered to researchers by assigning unpopularity scores to each of said natural language questions. | 06-23-2011 |
20110161068 | SYSTEM AND METHOD OF USING A SENSE MODEL FOR SYMBOL ASSIGNMENT - Systems and methods for automatically discovering and assigning symbols for identified text in a software application include receiving electronic signals from an input device indicating identified text for which symbol assignment is desired. Additional information such as part of speech, additional words, context of use, etc. may also be provided. The identified text and optional additional information is analyzed to establish a mapping of the identified text to one or more identified word senses from a word sense model database. An electronic determination of whether any of the identified word senses has an associated symbol is conducted. Related word senses may also be analyzed to determine if any related word senses have symbols. One of the determined symbols may then be associated with the identified text such that the symbol is thereafter displayed in conjunction with or instead of the text in the application. | 06-30-2011 |
20110161069 | METHOD, COMPUTER PROGRAM PRODUCT AND APPARATUS FOR PROVIDING A THREAT DETECTION SYSTEM - An apparatus for providing a threat detection system may include a processor configured to at least to perform parsing data to identify terms included in a lexicon of multi-dimensional threat factors, generating scoring results for at least some of the terms, and providing a graphical display of at least some of the terms based on the scoring results. A corresponding method and computer program product are also provided. | 06-30-2011 |
20110161070 | PRE-HIGHLIGHTING TEXT IN A SEMANTIC HIGHLIGHTING SYSTEM - A method, computer system and/or computer program product pre-highlight text that is located in a search. A text highlight and a triple statement semantic annotation based on the text highlight of a first document are received. The triple statement semantic annotation comprises a subject, a relationship and an object. A natural language processing (NLP) pattern based on the triple statement semantic annotation is generated. The NLP pattern is representative of a linguistic pattern between the text highlight and the triple statement semantic annotation. A multi-dimensional linguistic profile is generated based on the text highlight, the triple statement semantic annotation and the NLP pattern, wherein the multi-dimensional linguistic profile defines entities, relationships and attributes associated with document text. Text in a second document is compared with the multi-dimensional linguistic profile, and text in the second document is highlighted based on the comparison. | 06-30-2011 |
20110161071 | SYSTEM AND METHOD FOR DETERMINING SENTIMENT EXPRESSED IN DOCUMENTS - A system, computer readable storage medium storing instructions, and computer-implemented method for determining sentiment expressed in documents is disclosed. A document is received from a plurality of documents. A sentence in the document that includes at least one sentiment signature within a predetermined distance of at least one keyword from a list of keywords is identified, wherein the list of keywords is extracted from the plurality of documents and is filtered using a phase transition formula, and wherein the at least one sentiment signature corresponds to an expression of at least one sentiment in the sentence. At least one category corresponding to the at least one keyword of the sentence is determined, wherein the at least one category is included in a list of categories that is generated using the list of keywords. At least one sentiment corresponding to the at least one category is determined based on the at least one sentiment signature. | 06-30-2011 |
20110161072 | LANGUAGE MODEL CREATION APPARATUS, LANGUAGE MODEL CREATION METHOD, SPEECH RECOGNITION APPARATUS, SPEECH RECOGNITION METHOD, AND RECORDING MEDIUM - A frequency counting unit ( | 06-30-2011 |
20110166850 | CROSS-GUIDED DATA CLUSTERING BASED ON ALIGNMENT BETWEEN DATA DOMAINS - A system and associated method for cross-guided data clustering by aligning target clusters in a target domain to source clusters in a source domain. The cross-guided clustering process takes the target domain and the source domain as inputs. A common word attribute shared by both the target domain and the source domain is a pivot vocabulary, and all other words in both domains are a non-pivot vocabulary. The non-pivot vocabulary is projected onto the pivot vocabulary to improve measurement of similarity between data items. Source centroids representing clusters in the source domain are created and projected to the pivot vocabulary. Target centroids representing clusters in the target domain are initially created by conventional clustering method and then repetitively aligned to converge with the source centroids by use of a cross-domain similarity graph that measures a respective similarity of each target centroid to each source centroid. | 07-07-2011 |
20110166851 | Word-Level Correction of Speech Input - The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word. | 07-07-2011 |
20110166852 | DIALOGUE SYSTEM USING EXTENDED DOMAIN AND NATURAL LANGUAGE RECOGNITION METHOD AND COMPUTER-READABLE MEDIUM THEREOF - A dialogue system uses an extended domain in order to have a dialogue with a user using natural language. If a dialogue pattern actually input by the user is different from a dialogue pattern predicted by an expert, an extended domain generated in real time based on user input is used and an extended domain generated in advance is used to have a dialogue with the user. | 07-07-2011 |
20110166853 | Method and System for Text Retrieval for Computer-Assisted Item Creation - A tool, method, and system for use in the development of sentence-based test items are disclosed. The tool may include a user interface that may include a database selection field, a sentence pattern entry field, an option pane, and an output pane. The tool may search a database for one or more sentences and may generate one or more responses to the one or more sentences. The one or more sentences and one or more responses may be used to produce the sentence-based test items. The tool may allow test items to be developed more quickly and easily than manual test item authoring. Accordingly, test item development costs may be lowered and test security may be enhanced. | 07-07-2011 |
20110172988 | ADAPTIVE CONSTRUCTION OF A STATISTICAL LANGUAGE MODEL - A statistical language model (SLM) may be iteratively refined by considering N-gram counts in new data, and blending the information contained in the new data with the existing SLM. A first group of documents is evaluated to determine the probabilities associated with the different N-grams observed in the documents. An SLM is constructed based on these probabilities. A second group of documents is then evaluated to determine the probabilities associated with each N-gram in that second group. The existing SLM is then evaluated to determine how well it explains the probabilities in the second group of documents, and a weighting parameter is calculated from that evaluation. Using the weighting parameter, a new SLM is then constructed as a weighted average of the existing SLM and the new probabilities. | 07-14-2011 |
20110172989 | INTELLIGENT AND PARSIMONIOUS MESSAGE ENGINE - A message engine for analyzing or examining a message and generating a textual description of the message. The message engine can provide a textual description of a voice message. The message engine does not present a speech to text conversion of the complete voice message (that is, it does not convert the entire message to text and present the textual version of the entire voice message to the user). Rather, the message engine presents only the conceptual key words that describe the essence of the voice message to the user. As such, the message engine is a more intelligent version of a speech-to-text convertor. An exemplary message engine will only present in text the key conceptual words of the message rather than the entire speech to text translation of the whole message. | 07-14-2011 |
20110172990 | Knowledge Utilization - Data is organized in a knowledge network by defining a set of nodes, each node comprising data describing knowledge and a task pertinent to the knowledge, and defining relationships between the nodes based on the data. | 07-14-2011 |
20110172991 | SENTENCE EXTRACTING METHOD, SENTENCE EXTRACTING APPARATUS, AND NON-TRANSITORY COMPUTER READABLE RECORD MEDIUM STORING SENTENCE EXTRACTING PROGRAM - A sentence similar to the sampling sentence group can be efficiently extracted from the extraction target sentence group by repeating the process of narrowing a plurality of pairs of morphemes extracted from the sampling sentence group in the order of closer number of higher similarity to the extraction target sentence including each pair of morphemes. | 07-14-2011 |
20110178793 | DIALOGUE ANALYZER CONFIGURED TO IDENTIFY PREDATORY BEHAVIOR - A dialogue analyzer configured to identify online communications relating to lewd, predatory, hostile, and/or otherwise inappropriate subject matter is disclosed. Identified communications include those occurring via social networks, instant messaging, online chat rooms, computer in-game chat, email and the like. The communications of a monitored computer user are scanned to identify those communications that match predetermined lexical rules. The rules comprise sets of word-concepts that may be associated based on spelling, sound, meaning, appearance or probability of appearance in a text string, etc. Various numbers and configurations of word concepts may be implemented in a rule in order to more accurately scan the online communication data for a potential match. When a match is found, a copy of the communication, along with contextual information, is presented to a parent or guardian user. This information is presented at a central website and via an email notification to the parent or guardian. Various embodiments are described. | 07-21-2011 |
20110178794 | METHODS AND SYSTEMS FOR INTERPRETING TEXT USING INTELLIGENT GLOSSARIES - A computer implemented method used to interpret text, including from a set of formal glossaries which may refer one to the other and are intended to define precisely the terminology of a field of endeavor. Such glossaries are known as intelligent, in the sense that they allow machines to make deductions, without the need for human intervention. However, they may also accept human intervention. Once a word is defined in an intelligent glossary, all the logical consequences of the use of that word in a formal and well-formed sentence are computable. The process includes a question and answer mechanism, which applies the definitions contained in the intelligent glossaries to a given formal sentence. The methods may be applied in the development of knowledge management methods and tools that are based on semantics; for example: modeling of essential knowledge in the field based on the relevant semantics. | 07-21-2011 |
20110184724 | SPEECH RECOGNITION - Presented is a method and system for speech recognition. The method includes determining noise level in an environment, comparing the determined noise level with a predetermined noise level threshold value, using a first set of grammar for speech recognition, if the determined noise level is below the predetermined noise level threshold value, and using a second set of grammar for speech recognition, if the determined noise level is above the predetermined noise level threshold value. | 07-28-2011 |
20110184725 | Multi-stage text morphing - This invention is a multi-stage method for “text morphing,” wherein text morphing involves integrating or blending together substantive content from two or more bodies of text into a single body of text based on locations of linguistic commonality among the two or more bodies of text. This method for multi-stage text morphing entails: substitution of phrase synonyms between two bodies of text; substitution, between two bodies of text, of text segments with synonymous starting phrases and synonymous ending phrases; and substitution, between two bodies of text, of phrases or segments using associations within a larger reference body of text. Text morphing as disclosed herein can be useful for creative ideation, product development, integrative search engines, and entertainment purposes. | 07-28-2011 |
20110184726 | Morphing text by splicing end-compatible segments - This invention is a method for “text morphing,” wherein text morphing involves integrating or blending together substantive content from two or more bodies of text into a single body of text based on locations of linguistic commonality among the two or more bodies of text. This method entails: identifying pairs of “Synonym-Different-Synonym” (SDS) text segments between an import body of text and an export body of text; and, for each selected pair of SDS text segments, substituting some or all of the SDS text segment from the export body of text for some or all of the SDS text segment in the import body of text. In some respects, this method is analogous to splicing and substituting gene segments with compatible starting and ending sequences, but different middle sequences. Text morphing as disclosed herein can be useful for creative ideation, product development, integrative search engines, and entertainment purposes. | 07-28-2011 |
20110184727 | Prose style morphing - This invention is a system and method for incrementally and multi-dimensionally adjusting prose style. It comprises: a prose input interface, through which the user inputs, or otherwise selects, prose; a multi-dimensional style-adjusting interface, through which the user incrementally adjusts multiple dimensions of prose style; and a style-morphing engine which executes the adjustments specified by the user through the multi-dimensional style-adjusting interface. The dimensions of prose style to be adjusted may be selected from the group consisting of: person perspective; tense; voice; length; vocabulary; formality; colloquiality; complexity; emotion; emoticons; color; font; romantic; positivity; strength; precision; certainty; alliteration; humor; nationality; regionality; gender specificity; obscenity filter; academic jargon; business jargon; legal jargon; medical jargon; scientific jargon; and connectivity jargon. In an example, the style-morphing engine may include a database of sets of phrase synonyms and may use this database to make phrase substitutions that incrementally and multi-dimensionally change the style of the prose. In another example, the style-morphing engine may include a semantic algorithm or Natural Language Processor (NLP) that identifies phrases with similar meanings but different values across different style dimensions and may use it to make phrase substitutions that incrementally and multi-dimensionally change the style of the prose. This invention that enables incremental and multi-dimensional adjustment of prose style has a wide variety of useful applications. | 07-28-2011 |
20110184728 | HANDHELD ELECTRONIC DEVICE AND METHOD FOR DISAMBIGUATION OF COMPOUND TEXT INPUT AND FOR PRIORITIZING COMPOUND LANGUAGE SOLUTIONS ACCORDING TO COMPLETENESS OF TEXT COMPONENTS - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to prioritize compound language solutions according to various criteria, including the degree of completeness of the text components of a compound language solution. | 07-28-2011 |
20110184729 | APPARATUS AND METHOD FOR EXTRACTING AND ANALYZING OPINION IN WEB DOCUMENT - The present invention deals with an apparatus and method for extracting and analyzing opinions in web documents, wherein automatic extraction and analysis are performed effectively on user opinion information from web documents that are scattered across many websites on the Internet so that opinion search services may be easily implemented which enable search and statistical results to be checked as affirmative/negative opinions, and opinion search users can easily implement a system that helps in searching and monitoring the opinions of other users with respect to a specific keyword. In addition, according to the present invention, marketing representatives and stock inventors and corporate value assessors of each company can quickly check the opinions of many users about an applicable corporation and goods that exist on the vast Internet, and expenses that used to be spent on questionnaires and consulting companies to find opinions of existing users can be greatly reduced, and opinion extraction and statistics for each user can be effectively performed and utilized. | 07-28-2011 |
20110191097 | Systems and Methods for Word Offensiveness Processing Using Aggregated Offensive Word Filters - Computer-implemented systems and methods are provided for identifying language that would be considered obscene or otherwise offensive to a user or proprietor of a system. A first plurality of offensive words are received, and a second plurality of offensive words are received. A string of words are received, where one or more detected offensive words are selected from the string of words that matches words from the first plurality of offensive words or the second plurality of offensive words. The string of words is processed based upon the detection of offensive words in the string of words. | 08-04-2011 |
20110191098 | PHRASE-BASED DOCUMENT CLUSTERING WITH AUTOMATIC PHRASE EXTRACTION - Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents. | 08-04-2011 |
20110191099 | System and Methods for Improving Accuracy of Speech Recognition - The invention provides a system and method for improving speech recognition. A computer software system is provided for implementing the system and method. A user of the computer software system may speak to the system directly and the system may respond, in spoken language, with an appropriate response. Grammar rules may be generated automatically from sample utterances when implementing the system for a particular application. Dynamic grammar rules may also be generated during interaction between the user and the system. In addition to arranging searching order of grammar files based on a predetermined hierarchy, a dynamically generated searching order based on history of contexts of a single conversation may be provided for further improved speech recognition. Dialogue between the system and the user of the system may be recorded and extracted for use by a speech recognition engine to refine or create language models so that accuracy of speech recognition relevant to a particular knowledge area may be improved. | 08-04-2011 |
20110196668 | Integrated Language Model, Related Systems and Methods - An integrated language model includes an upper-level language model component and a lower-level language model component, with the upper-level language model component including a non-terminal and the lower-level language model component being applied to the non-terminal. The upper-level and lower-level language model components can be of the same or different language model formats, including finite state grammar (FSG) and statistical language model (SLM) formats. Systems and methods for making integrated language models allow designation of language model formats for the upper-level and lower-level components and identification of non-terminals. Automatic non-terminal replacement and retention criteria can be used to facilitate the generation of one or both language model components, which can include the modification of existing language models. | 08-11-2011 |
20110196669 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM - There is provided an information processing apparatus including: an acquiring unit acquiring a title of content; an analyzing unit dividing the title into tokens; a calculating unit calculating, for each token, an evaluation value based on a token length and weighted according to the token's position in the title; a mapping unit mapping, for each token, a token point shown by an ordinal number showing the token's position in the title and the evaluation value, onto a coordinate plane; a deciding unit deciding, based on the mapped token points, coordinates of a criterion point used as a criterion for extracting a series identifier and an extraction criterion based on the criterion point; an extracting unit extracting token points that confoitu to the extraction criterion out of the token points; and a generating unit generating the series identifier from the character strings included in tokens associated with the extracted token points. | 08-11-2011 |
20110196670 | INDEXING CONTENT AT SEMANTIC LEVEL - Systems and methods are disclosed that perform automated semantic tagging. Automated semantic tagging produces semantically linked tags for a given text content. Embodiments provide ontology mapping algorithms and concept weighting algorithms that create accurate semantic tags that can be used to improve enterprise content management, and search for better knowledge management and collaboration. | 08-11-2011 |
20110196671 | HANDHELD ELECTRONIC DEVICE AND METHOD FOR DISAMBIGUATION OF COMPOUND TEXT INPUT AND FOR PRIORITIZING COMPOUND LANGUAGE SOLUTIONS ACCORDING TO QUANTITY OF TEXT COMPONENTS - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to prioritize compound language solutions according to various criteria. | 08-11-2011 |
20110202335 | HANDHELD ELECTRONIC DEVICE PROVIDING A LEARNING FUNCTION TO FACILITATE CORRECTION OF ERRONEOUS TEXT ENTRY AND ASSOCIATED METHOD - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate text input. In addition to identifying and outputting representations of language objects that are stored in the memory and that correspond with a text input, the device provides a learning function which facilitates providing proposed corrected output by the device in certain circumstances of erroneous input. | 08-18-2011 |
20110208507 | Speech Correction for Typed Input - A method, computer program product, and system are provided for correcting one or more typed words on an electronic device. The method can include receiving one or more typed words from a text input device and generating one or more candidate words for the one or more typed words. The method can also include receiving an audio stream at the electronic device that corresponds to the one or more typed words. The audio stream can then be translated into text using the one or more candidate words, where the translation includes assigning a confidence score to each of the one or more candidate words. Based on the confidence score associated with each of the one or more candidate words, a candidate word can be selected among the one or more candidate words to represent each portion of the text. A word from the one or more typed words can be replaced with the selected candidate word based on the value of the confidence score associated with the selected candidate word. | 08-25-2011 |
20110208508 | Interactive Language Training System - An interactive language training system allows practice word/phrase lists customized by the user for training. The customized lists may include words/phrases extracted from content sources based upon user selections. Extraction may include analysis of the content sources to determine word/phrase frequency, topic, and/or other parameters. A video sample of a student speaking a word/phrase is compared with video examples of a speaker and provides visual feedback for pronunciation and articulation. Progress is monitored for each word/phrase on the list, and performance feedback provided. Communication, such as voice or video chat may be established between the student and a speaker to provide for additional practice. | 08-25-2011 |
20110208509 | SYSTEM AND METHOD FOR THE TRANSFORMATION AND CANONICALIZATION OF SEMANTICALLY STRUCTURED DATA - A method of transforming and canonicalizing semantically structured data includes obtaining data from a network of computers, applying text patterns to the obtained data and placing the data in a first data file, providing a second data file containing the obtained data in a uniform format, and generating interface specific sentences from the data in the second data file. | 08-25-2011 |
20110208510 | System and Method for Converting Graphical Call Flows Into Finite State Machines - A method, system and module for automatically converting a call flow into a state-based representation are disclosed. The method comprises walking a call flow and converting each page of the call flow into a rule of a higher level representation of the call flow, augmenting the higher level representation with terminal symbols representing state variable assignments and comparisons associated with decision and computation shapes in the call flow and converting the higher level representation into a state-based representation. | 08-25-2011 |
20110208511 | METHOD AND SYSTEM FOR ANALYZING TEXT - An apparatus for providing a control input signal for an industrial process or technical system having one or more controllable elements includes elements for generating a semantic space for a text corpus, and elements for generating a norm from one or more reference words or texts, the or each reference word or text being associated with a defined respective value on a scale, and the norm being calculated as a reference point or set of reference points in the semantic space for the or each reference word or text with its associated respective scale value. Elements for reading at least one target word included in the text corpus, elements for predicting a value of a variable associated with the target word based on the semantic space and the norm, and elements for providing the predicted value in a control input signal to the industrial process or technical system. A method for predicting a value of a variable associated with a target word is also disclosed together with an associated system and computer readable medium. | 08-25-2011 |
20110208512 | METHOD AND SYSTEM FOR GENERATING DERIVATIVE WORDS - The present invention provides a method for generating derivative words including the steps of: creating a number of derivative grammar arrays; matching the inputting character information with the derivative grammar arrays and obtaining the match derivative grammar arrays; obtaining match words from the language database according to the condition arrays of the obtained derivative grammar arrays and the inputting character information; and generating derivative words by adding the suffix alphabetic character sets of the obtained derivative grammar arrays to the ends of the words. In accordance with the established grammar rules, the words in the language database can be converted to derivative words and the derivative words do not need to be stored in the language database. Therefore, the storage space of the language database can be remarkably reduced. The present invention also provides a system for generating derivative words. | 08-25-2011 |
20110213609 | LANGUAGE-INDEPENDENT PROGRAM INSTRUCTION - A natural language-independent computer program is constructed. A data element is defined by a graphical representation in a user interface. A data element has a data type and a value. An operator is defined on multiple data elements by association of the graphical representations in the user interface. A natural language-independent graph data structure is defined by the association of data elements representing the logic of a computer program. The data types and operators have referenced descriptions in one or more natural languages, enabling a logical expression such as a computer program to be defined and understood in one or more natural languages. | 09-01-2011 |
20110213610 | Processor Implemented Systems and Methods for Measuring Syntactic Complexity on Spontaneous Non-Native Speech Data by Using Structural Event Detection - Systems and methods are provided for providing a score for a spontaneous non-native speech response to a prompt. A transcription of the spontaneous speech response is accessed. A plurality of clauses are identified within the spontaneous speech response, where identifying a clause includes identifying a beginning boundary and an end boundary of the clause in the spontaneous speech response. A plurality of disfluencies in the spontaneous speech response is identified. One or more proficiency metrics are calculated based on the plurality of identified clauses and the plurality of the identified disfluencies, and a score for the spontaneous speech response is generated based on the one or more proficiency metrics. | 09-01-2011 |
20110224971 | N-Gram Selection for Practical-Sized Language Models - Described is a technology by which a statistical N-gram (e.g., language) model is trained using an N-gram selection technique that helps reduce the size of the final N-gram model. During training, a higher-order probability estimate for an N-gram is only added to the model when the training data justifies adding the estimate. To this end, if a backoff probability estimate is within a maximum likelihood set determined by that N-gram and the N-gram's associated context, or is between the higher-order estimate and the maximum likelihood set, then the higher-order estimate is not included in the model. The backoff probability estimate may be determined via an iterative process such that the backoff probability estimate is based on the final model rather than any lower-order model. Also described is additional pruning referred to as modified weighted difference pruning. | 09-15-2011 |
20110224972 | Localization for Interactive Voice Response Systems - A language-neutral speech grammar extensible markup language (GRXML) document and a localized response document are used to build a localized GRXML document. The language-neutral GRXML document specifies an initial grammar rule element. The initial grammar rule element specifies a given response type identifier and a given action. The localized response document contains a given response entry that specifies the given response type identifier and a given response in a given language. The localized GRXML document specifies a new grammar rule element. The new grammar rule element specifies the given response in the given language and the given action. The localized GRXML document is installed in an interactive voice response (IVR) system. The localized GRXML document configures the IVR system to perform the given action when a user of the IVR system speaks the given response to the IVR system. | 09-15-2011 |
20110224973 | SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT FOR DYNAMICALLY CORRECTING GRAMMAR ASSOCIATED WITH TEXT - In accordance with embodiments, there are provided mechanisms and methods for dynamically correcting grammar associated with text. These mechanisms and methods for dynamically correcting grammar associated with text can enable enhanced data display, simplified language support, etc. | 09-15-2011 |
20110231182 | MOBILE SYSTEMS AND METHODS OF SUPPORTING NATURAL LANGUAGE HUMAN-MACHINE INTERACTIONS - A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network. | 09-22-2011 |
20110231183 | LANGUAGE MODEL CREATION DEVICE - This device | 09-22-2011 |
20110238408 | Semantic Clustering - Semantic clustering techniques are described. In various implementations, a conversational agent is configured to perform semantic clustering of a corpus of user utterances. Semantic clustering may be used to provide a variety of functionality, such as to group a corpus of utterances into semantic clusters in which each cluster pertains to a similar topic. These clusters may then be leveraged to identify topics and assess their relative importance, as for example to prioritize topics whose handling by the conversation agent should be improved. A variety of utterances may be processed using these techniques, such as spoken words, textual descriptions entered via live chat, instant messaging, a website interface, email, SMS, a social network, a blogging or micro-blogging interface, and so on. | 09-29-2011 |
20110238409 | Semantic Clustering and Conversational Agents - Semantic clustering techniques are described. In various implementations, a conversational agent is configured to perform semantic clustering of a corpus of user utterances. Semantic clustering may be used to provide a variety of functionality, such as to group a corpus of utterances into semantic clusters in which each cluster pertains to a similar topic. These clusters may then be leveraged to identify topics and assess their relative importance, as for example to prioritize topics whose handling by the conversation agent should be improved. A variety of utterances may be processed using these techniques, such as spoken words, textual descriptions entered via live chat, instant messaging, a website interface, email, SMS, a social network, a blogging or micro-blogging interface, and so on. | 09-29-2011 |
20110238410 | Semantic Clustering and User Interfaces - Semantic clustering techniques are described. In various implementations, a conversational agent is configured to perform semantic clustering of a corpus of user utterances. Semantic clustering may be used to provide a variety of functionality, such as to group a corpus of utterances into semantic clusters in which each cluster pertains to a similar topic. These clusters may then be leveraged to identify topics and assess their relative importance, as for example to prioritize topics whose handling by the conversation agent should be improved. A variety of utterances may be processed using these techniques, such as spoken words, textual descriptions entered via live chat, instant messaging, a website interface, email, SMS, a social network, a blogging or micro-blogging interface, and so on. | 09-29-2011 |
20110238411 | DOCUMENT PROOFING SUPPORT APPARATUS, METHOD AND PROGRAM - According to one embodiment, a document proofing support apparatus includes an input unit, an analysis unit, a detection unit, a database unit, a retrieval unit, and a display unit. The input unit is configured to receive input of one of at least one proof document and at least one entry document. The analysis unit is configured to perform a morphological, a syntactic and a dependency analysis and generate analysis information including a dependency relation. The detection unit is configured to detect as a possible coined word character string a compound word having a nominal continuation relation. The database unit is configured to store syntactic information. The retrieval unit is configured to retrieve a dependency-relation sentence, and to determine the possible coined word character string as a coined word if the dependency-relation sentence exists. The display unit is configured to display a message including the coined word. | 09-29-2011 |
20110246179 | SIGNAL PROCESSING APPROACH TO SENTIMENT ANALYSIS FOR ENTITIES IN DOCUMENTS - A document can be processed to provide sentiment values for phrases in the document. The sequence of sentiment values associated with the sequence of phrases in a document can be handled as if they were a sampled discrete time signal. For phrases which have been identified as entities, a filtering operation can be applied to the sequence of sentiment values around each entity to determine a sentiment value for the entity. | 10-06-2011 |
20110246180 | ENHANCING LANGUAGE DETECTION IN SHORT COMMUNICATIONS - A method, system, and computer usable program product for enhancing language detection in short communications are provided in the illustrative embodiments. A short communication is stored in an element of a line cache. The line cache is accessible to an application executing in a data processing system. The element is an element in a set of elements in the line cache. A compound text is assembled from contents of a subset of the elements of the line cache. A language identifier (language ID) is received for the compound text from a language detection algorithm. The language ID is stored in a language cache element of a language ID cache. The language ID cache is accessible to the application and includes a set of language cache elements. A language of the short communication is determined using the contents of a subset of language cache elements. | 10-06-2011 |
20110246181 | NLP-BASED SYSTEMS AND METHODS FOR PROVIDING QUOTATIONS - Techniques for providing quotations obtained from text documents using natural language processing techniques are described. Some embodiments provide a content recommendation system (“CRS”) configured to provide quotations by extracting quotations from a corpus text documents, and providing access to the extracted quotations in response to search requests received from users. The CRS may extract quotations by using natural language processing-based techniques to identify one or more entities, such as people, places, objects, concepts, or the like, that are referenced by the extracted quotations. The CRS may then store the extracted quotations along with identified entities, such as quotation speakers and subjects, for later access via search requests. | 10-06-2011 |
20110246182 | SYSTEMS FOR DYNAMICALLY GENERATING AND PRESENTING NARRATIVE CONTENT - In some embodiments, a non-transitory processor-readable medium stores code representing instructions that when executed cause a processor to select a narrative content template based at least in part on a predetermined content type associated with a real-world and/or virtual event. The code further represents instructions that when executed cause the processor to select a narrative tone type. The code further represents instructions that when executed cause the processor to, for each phrase included in an ordered set of phrases associated with the narrative content template, select, based at least in part on the narrative tone type, a phrase variation from a set of phrase variations associated with that phrase, and define, based on the selected phrase variation and at least one datum from a set of data, a narrative content portion associated with the real-world event. The code further represents instructions that when executed cause the processor to output, at a display, the narrative content portion. | 10-06-2011 |
20110246183 | TOPIC TRANSITION ANALYSIS SYSTEM, METHOD, AND PROGRAM - The present invention provides a topic transition analysis system that determines a position on a primary media stream leading to a certain statement made in a language communication carried out in a secondary channel associated with the primary media stream. The topic transition analysis system includes a statement trigger string determination unit receiving a primary media stream and one or a plurality of language communication streams (hereinafter, language streams) executed in parallel with the media stream and determining whether or not a certain statement on the one or plurality of language streams has been made newly in response to contents of the media stream. | 10-06-2011 |
20110246184 | SYSTEM AND METHOD FOR INCREASING ACCURACY OF SEARCHES BASED ON COMMUNICATION NETWORK - Disclosed are systems, methods and computer-readable media for using a local communication network to generate a speech model. The method includes retrieving for an individual a list of numbers in a calling history, identifying a local neighborhood associated with each number in the calling history, truncating the local neighborhood associated with each number based on the at least one parameter, retrieving a local communication network associated with each number in the calling history and each phone number in the local neighborhood, and creating a language model for the individual based on the retrieved local communication network. The generated language model may be used for improved automatic speech recognition for audible searches as well as other modules in a spoken dialog system. | 10-06-2011 |
20110251839 | METHOD AND SYSTEM FOR INTERACTIVELY FINDING SYNONYMS USING POSITIVE AND NEGATIVE FEEDBACK - Determining synonyms of words in a set of documents. Particularly, when provided with a word or phrase as input, in exemplary embodiments there is afforded the return of a predetermined number of “top” synonym words (or phrases) for an input word (or phrase) in a specific collection of text documents. Further, a user is able to provide ongoing and iterative positive or negative feedback on the returned synonym words, by manually accepting or rejecting such words as the process is underway. | 10-13-2011 |
20110257960 | METHOD AND APPARATUS FOR CONTEXT-INDEXED NETWORK RESOURCE SECTIONS - Techniques to provide context-indexed network resource sections include, in response to receiving first data that describes a network resource, determining a section of a plurality of sections included in the network resource. A section context token that indicates a probability in the section of a topic from a context vocabulary is determined. The context vocabulary includes concepts describing temporal, spatial, environmental or activity circumstances of consumers. Second data that indicates the section in association with the section context token is stored. | 10-20-2011 |
20110257961 | SYSTEM AND METHOD FOR GENERATING QUESTIONS AND MULTIPLE CHOICE ANSWERS TO ADAPTIVELY AID IN WORD COMPREHENSION - An adaptive learning system and method provides for automatically generating question types to a user for word comprehension and selecting multiple choice answers for display. Questions are developed for the user by obtaining online content and indexing the content into individual sentences and questions. The system provides questions in a series of rounds to the user and then adaptively tracks the progress of the user based on the categorization of each question. | 10-20-2011 |
20110257962 | SYSTEM AND METHOD FOR AUTOMATICALLY GENERATING SENTENCES OF A LANGUAGE - A system and method for automatically generating sentences in a language is disclosed. The system comprising a grammar processor for converting an input grammar into a hierarchical representation, and a grammar explorer module for traversing the grammar hierarchy based on an explore specification, which defines what nodes of the hierarchy should be explored. The explorer module takes the exploration specification as input and traverses the hierarchy according to the exploration types specified in the exploration specification. The system and method can be used to automatically generate assembly instructions for a microprocessor given its assembly language grammar, to generate sentences of a natural language like English from its grammar and to generate programs in a high-level programming language like C. | 10-20-2011 |
20110257963 | METHOD AND SYSTEM FOR SEMANTIC SEARCHING - A method comprising a preliminary automated analysis of at least one corpus of natural language text is disclosed. For each sentence of a corpus, the method includes performing a syntactic analysis using linguistic descriptions to generate at least one syntactic structure for the sentence, building a semantic structure for the sentence, associating each generated syntactic and semantic structure with the sentence, and saving each structure. For each corpus text that was preliminary analyzed, performing an indexing operation to index lexical meanings and values of linguistic parameters of each syntactic structure and each semantic structure associated with sentences in the corpus text. A semantic search includes at least one automatic preliminary analyzed corpus of sentences comprising searched values of linguistic, syntactic and semantic parameters. Due to a deep semantic analysis of a corpus, the search may be executed in various languages, in resources of various languages, and in the text of corpora of various languages regardless of the language of the query. | 10-20-2011 |
20110264441 | NAVLIPI - Articles, surfaces, media or educational material containing a universal script, comprised of glyphs derived almost entirely from the Roman script and with only a few new glyphs, for transcription of all the world's languages, with particular attention to a means for expression of the phonemic idiosyncrasies within and between languages and language families are provided. | 10-27-2011 |
20110264442 | VISUALLY EMPHASIZING PREDICTED KEYS OF VIRTUAL KEYBOARD - A computing system includes a touch display and a virtual keyboard visually presented by the touch display. The virtual keyboard includes a plurality of touch-selectable keys each having a visual appearance that dynamically changes. A touch-selectable key has a deemphasized visual appearance if the touch-selectable key is not predicted to be a next selected key, and the touch-selectable key has a prediction-emphasized visual appearance if the touch-selectable key is predicted to be a next selected key. | 10-27-2011 |
20110264443 | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM - Disclosed herein is an information processing device including: a data acquirer configured to acquire a sentence collection having a plurality of sentences and a plurality of phrases included in the sentence collection; a phrase feature decider configured to decide phrase features each representing a characteristic of a respective one of the phrases acquired by the data acquirer; a collection feature decider configured to decide a collection feature representing a characteristic of the sentence collection; and a compressor configured to generate compressed phrase features by using the phrase features and the collection feature, the compressed phrase features having a dimension lower than a dimension of the phrase features and each representing a characteristic of the respective one of the phrases acquired by the data acquirer. | 10-27-2011 |
20110264444 | KEYWORD DISPLAY SYSTEM, KEYWORD DISPLAY METHOD, AND PROGRAM - The present invention is a keyword display system that includes a speaker specifier for specify a speaker; a weight determinator for determining a weight of the specified speaker; a keyword extractor for extracting keywords from a speech of the aforementioned speaker; a keyword relation degree calculator for calculating a relation degree between the aforementioned extracted keywords, carrying out a weighting for this calculated relation degree by using the weight of the speaker having spoken the aforementioned keywords, and calculating a keyword relation degree between the keywords; and a keyword display controller for displaying a relevancy between the aforementioned extracted keywords responding to the aforementioned keyword relation degree. | 10-27-2011 |
20110264445 | METHOD OF USING VISUAL SEPARATORS TO INDICATE ADDITIONAL CHARACTER COMBINATION CHOICES ON A HANDHELD ELECTRONIC DEVICE AND ASSOCIATED APPARATUS - A method and associated apparatus for using visual separators to indicate additional character combination choices from a disambiguation function on a handheld electronic device. | 10-27-2011 |
20110270603 | Method and Apparatus for Language Processing - A method for language enhancement, including receiving text, identifying grammatical constructs within the text, and suggesting at least one alternate text portion for at least one original portion of the text, the alternate text portion being consistent with the grammatical constructs of the original portion and having substantially the same meaning as the original portion but conveying a different impression. Apparatus is also described and claimed. | 11-03-2011 |
20110270604 | SYSTEMS AND METHODS FOR SEMI-SUPERVISED RELATIONSHIP EXTRACTION - Systems and methods are disclosed to perform relation extraction in text by applying a convolution strategy to determine a kernel between sentences; applying one or more semi-supervised strategies to the kernel to encode syntactic and semantic information to recover a relational pattern of interest; and applying a classifier to the kernel to identify the relational pattern of interest in the text in response to a query. | 11-03-2011 |
20110270605 | ASSESSING SPEECH PROSODY - A method, system and computer readable storage medium for assessing speech prosody. The method includes the steps of: receiving input speech data; acquiring a prosody constraint; assessing prosody of the input speech data according to the prosody constraint; and providing assessment result where at least of the steps is carried out using a computer device. | 11-03-2011 |
20110270606 | SYSTEMS AND METHODS FOR SEMANTIC SEARCH, CONTENT CORRELATION AND VISUALIZATION - Methods and systems for searching over large (i.e., Internet scale) data to discover relevant information artifacts based on similar content and/or relationships are disclosed. Improvements over simple keyword and phrase based searching over internet scale data are shown. Search engines providing accurate and contextually relevant search results are disclosed. Users are enabled to identify related documents and information artifacts and quickly, ascertain, via visualization, which of these documents are original, which are derived (or copied) from a source document or information artifact, and which subset is independently generated (i.e., an original document or information artifact). | 11-03-2011 |
20110270607 | METHOD AND SYSTEM FOR SEMANTIC SEARCHING OF NATURAL LANGUAGE TEXTS - A method and system comprising an automated analysis of at least one corpus of natural language text is disclosed. For each sentence of a corpus, the analysis includes performing a syntactic analysis using linguistic descriptions to generate at least one syntactic structure for the sentence, building a semantic structure for the sentence, associating each generated syntactic and semantic structure with the sentence, and saving each generated syntactic and semantic structure. For each corpus text that was preliminary analyzed, performing an indexing operation to index lexical meanings and values of linguistic parameters of each syntactic structure and each semantic structure associated with sentences in the corpus text. A semantic search as disclosed herein includes at least one automatic preliminary analyzed corpus of sentences comprising searched values of linguistic, syntactic and semantic parameters. Due to deep semantic analysis of one or more corpora, the search may be executed in various languages, in resources of various languages, and in text corpora of various languages, regardless of the language of the query. | 11-03-2011 |
20110276322 | TEXTUAL ENTAILMENT METHOD FOR LINKING TEXT OF AN ABSTRACT TO TEXT IN THE MAIN BODY OF A DOCUMENT - Aspects of the exemplary embodiment relate to a system and method for processing a document which enables assessment of the coherence of an abstract of the document. The method includes storing the document in memory and, for each sentence of the abstract, comparing the sentence with sentences of a main body of the document using textual entailment techniques to identify whether the sentence of the abstract entails a sentence in the main body of the document. Links can then be generated between the entailing sentences of the abstract and the corresponding entailed sentences of the document. The document and generated links are output. The links enable the coherence of the abstract to be evaluated, either manually or automatically, using an evaluation component of the system. | 11-10-2011 |
20110282651 | GENERATING SNIPPETS BASED ON CONTENT FEATURES - Systems, methods, and computer storage media having computer-executable instructions embodied thereon that facilitate generation of snippets. In embodiments, text features within a keyword-sentence window are identified. The text features are utilized to determine break features that indicate favorability of breaking at a particular location of the keyword-sentence window. The break features are used to recognize features of partial snippets such that a snippet score to indicate the strength of the partial snippet can be calculated. Snippet scores associated with partial snippets are compared to select an optimal snippet, that is, the snippet having the highest snippet score. | 11-17-2011 |
20110282652 | MAPPING OF RELATIONSHIP ENTITIES BETWEEN ONTOLOGIES - Methods, apparatus and systems, including computer program products, for reducing an error rate when mapping entities between a first ontology and a second ontology. One or more of a general language dictionary and an industry-specific dictionary are provided. Natural language processing of the first ontology is performed to identify one or more candidate relationship entities in the first ontology. Each candidate relationship entity includes a compound name having two or more semantic labels, and each candidate relationship entity has a name that neither exists in the general language dictionary or the industry-specific dictionary. Each of the one or more candidate relationship entities in the first ontology is mapped to one or more entities in the second ontology using one or more configurable computer-implemented mapping algorithms. | 11-17-2011 |
20110282653 | TEXT PROCESSING APPARATUS, TEXT PROCESSING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM - A text processing apparatus is provided with a segment determination unit | 11-17-2011 |
20110288854 | HUMAN READABLE SENTENCES TO REPRESENT COMPLEX COLOR CHANGES - Methods and a system for a natural language control interface are provided to enable a user to modify various properties of a document. The modifications comprise building sentences from modification words, and combining them together in one display. The modifications are displayed in real time for a user to observe as they are inputted. The order of the modifications is managed by the user and is configured to be changed, added and/or removed. | 11-24-2011 |
20110288855 | MULTI-PHONEME STREAMER AND KNOWLEDGE REPRESENTATION SPEECH RECOGNITION SYSTEM AND METHOD - A new approach to speech recognition that reacts to concepts conveyed through speech, which shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. A probabilistically unbiased multi-phoneme recognition process is employed, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response. Applications include improving accuracy or automatically generating punctuation for transcription and dictation, word or concept spotting in audio streams, concept spotting in electronic text, customer support, call routing and other command/response scenarios. | 11-24-2011 |
20110288856 | HANDHELD ELECTRONIC DEVICE WITH TEXT DISAMBIGUATION AND SELECTIVE DISABLING OF FREQUENCY LEARNING - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user is likely to have intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The learning function is disabled, however, when the relevant words are found to be in a special category for which frequency learning, i.e., frequency revision, is not employed. | 11-24-2011 |
20110295591 | SYSTEM AND METHOD TO ACQUIRE PARAPHRASES - An automatic paraphrase acquisition technique is provided. A common theme of the various embodiments described herein resides in careful design of simple tasks that can elicit the necessary information for the automated process. These tasks are performed quickly and inexpensively. By gathering the results produced, paraphrases can be generated automatically using the method and/or system. | 12-01-2011 |
20110295592 | Survey Analysis and Categorization Assisted by a Knowledgebase - The disclosure generally relates to knowledge retrieval using a knowledgebase storing general and/or expert knowledge. In particular, the disclosure relates to using an enhanced knowledgebase to implement a tool for analysis and categorization of surveys. | 12-01-2011 |
20110295593 | AUTOMATED MESSAGE ATTACHMENT LABELING USING FEATURE SELECTION IN MESSAGE CONTENT - Embodiments are directed towards an automated machine learning framework to extract keywords within a message that are relevant to an attachment to the message. The machine learning model finds a set of relevant sentences within the message determined to be relevant to the one or more attachments based on identification of one or more sentence level features within a given sentence. The sentence level features include, for example, anchor features, noisy sentence features, short message features, threading features, anaphora detections, and lexicon features. From the set of relevant sentences, useful keywords may be extracted using a sequence of heuristics to convert the sentence set into the set of useful keywords. The set of useful keywords may then be associated to at least one attachment such that the keywords may subsequently be used to perform various indexing, searching, sorting, and to provide further context to the attachment. | 12-01-2011 |
20110295594 | SYSTEM, METHOD, AND PROGRAM FOR PROCESSING TEXT USING OBJECT COREFERENCE TECHNOLOGY - System, method and program product for text processing using object coreference technology. In particular, the invention provides a text processing method which includes, acquiring text to be processed; extracting subject words and entity words corresponding to the subject words from the text; grouping the subject words; determining entity words that reference a same concerned object according to the grouped subject words; and generating processing policy for entity words that reference a same concerned object. The invention also includes a system with means for carrying out the method. The invention generally realizes automatic, more comprehensive, accurate, efficient analysis and processing on text data. The invention can be used to dig a large amount of comment data about some entity, and the invention can also be used to suggest insertion place in an article where embedded advertisement is inserted. | 12-01-2011 |
20110295595 | DOCUMENT PROCESSING, TEMPLATE GENERATION AND CONCEPT LIBRARY GENERATION METHOD AND APPARATUS - The present invention relates to document processing method and apparatus which can edit a natural language and generate a machine-processable document; a template generating method and apparatus which can be used for document processing method and apparatus; a concept library generating method and apparatus which can be used for the document processing method and apparatus and the template generating method and apparatus. The present invention provided a possibility for semantic interaction of documents in different systems and enhances efficiency. | 12-01-2011 |
20110301940 | FREE TEXT VOICE TRAINING - A system and method provide acoustic training of a voice or speech recognition engine and/or voice or speech recognition software application. Instead of requiring a user to read from a prepared or predetermined script, the system and method described herein enable acoustic training using any free text spoken phrases provided by the user directly, or by a previously recorded speech, presentation, or the like, performed by the user. | 12-08-2011 |
20110301941 | NATURAL LANGUAGE PROCESSING METHOD AND SYSTEM - A computer implemented natural language processing method, the method including the steps of: analysing a sentence string within textual information to determine sub-components of the sentence string, assigning one or more unique tokens to each determined sub-component, determining a probability of use that a determined sub-component has one or more specific meanings, based on the determined probability of use, creating a valid set of unique tokens that are associated with the sentence string, and linking verb sub-components associated with one or more of the unique tokens in the valid set of unique tokens to a pre-defined limited sub-set of verbs to create an identification tuple that maps onto the sub-set of verbs. | 12-08-2011 |
20110301942 | Method and Apparatus for Full Natural Language Parsing - The method and apparatus for discriminative natural language parsing, uses a deep convolutional neural network adapted for text and a structured tag inference in a graph. In the method and apparatus, a trained recursive convolutional graph transformer network, formed by the deep convolutional neural network and the graph, predicts “levels” of a parse tree based on predictions of previous levels. | 12-08-2011 |
20110301943 | SYSTEM AND METHOD OF DICTATION FOR A SPEECH RECOGNITION COMMAND SYSTEM - In embodiments of the present invention, a system and computer-implemented method for enabling dictation may include parsing standard reports in order to identify a plurality of logical phrases in the report used for discrete sections and descriptions. In the report method, the phrases may be parsed and identifier words throughout the report may be compared to eliminate ambiguities. The method may then involve constructing text macros that follow the parsed text, thereby enabling the user to speak the identifiers to indicate full, formatted text. Finally, the report method may involve constructing a mnemonic document so both beginner and experienced users can easily read the identifiers out loud to produce a report. The result of the method is an intuitive, notes-style way to use speech commands to quickly produce a standard, formatted report. | 12-08-2011 |
20110307246 | Methods And Systems For Changing A Communication Quality Of A Communication Session Based On A Meaning Of Speech Data - Methods and systems are described for changing a communication quality of a communication session based on a meaning of speech data. Speech data exchanged between clients participating in a communication session is parsed. A meaning of the parsed speech data is determined. An action is performed to change a communication quality of the communication session based on the meaning of the parsed speech data. | 12-15-2011 |
20110313756 | Text sizer (TM) - This invention called Text Sizer ™ is an innovative method and system for changing the length of a body of text. It may be embodied in the following steps. First, a first text segment may be selected in a body of text. Second, alternative text segments are automatically identified, wherein each alternative text segment may be substituted for the first text segment in the body of text without causing a grammatical error. Third, a second text segment with a length that is different than the length of the first text segment is selected from among the alternative text segments. Finally, the second text segment is substituted for the first text segment in the body of text. This method has many applications. One might wish to reduce the length of a body of text so that it fits within a constrained space. For example, a report or proposal may have page limits. Alternatively, one might wish to expand the length of selected portion of a body of text. For example, one might wish to elaborate or include additional information on topics covered in a particular segment of text. Text Sizer ™ provides users with this capability. | 12-22-2011 |
20110313757 | SYSTEMS AND METHODS FOR ADVANCED GRAMMAR CHECKING - In embodiments of the present invention improved capabilities are described for a method of grammar checking, comprising providing a first level of grammar checking through a computer-based grammar checking facility to grammar check a body of text provided by a source in order to improve the grammatical correctness of the text; providing an excerpt of the body of text containing an identified grammatical error as a result of the first level of automated grammar checking to a second level of human augmented grammar checking consisting of at least one human proofreader for review; incorporating the results of the human proofreader review to contribute to an at least one corrected version of the provided body of text; and sending the at least one corrected version back the source. The method of grammar checking may provide for automatic grammar correction and text enrichment, such as when text is entered via a computer device with input limitations. | 12-22-2011 |
20110320186 | ENTITY RECOGNITION - The invention relates to a method of querying technical domains that recognises the concepts represented by strings of characters, rather than merely comparing strings. It can be used to compute conceptual similarity between terms. The method employs string distance metrics and a cyclic progression of lexical processing to recognise constituent term concepts that are then combined to form full-term concepts by means of a grammar. Terms can be extracted and identified as being conceptually similar (or dissimilar) to other terms even if they have never previously been encountered. | 12-29-2011 |
20110320187 | Natural Language Question Answering System And Method Based On Deep Semantics - In a computer system, systems and methods for automatically answering natural language questions using deep semantics are provided. Methods include receiving a natural language question, mapping it into one or more deductive database queries that captures one or more intents behind the question, computing one or more result sets of the question using one or more deductive database queries and a deductive database and providing one or more result sets. Systems include natural language question compilers and deductive databases. The natural language question compiler is configured to receive a natural language question and map it into one or more deductive database queries that capture one or more intents behind the question. The deductive database is configured to receive the mapped one or more deductive database queries, compute one or more result sets of the question using the one or more deductive database queries, and provide one or more result sets. | 12-29-2011 |
20110320188 | WEB-BASED SPEECH RECOGNITION WITH SCRIPTING AND SEMANTIC OBJECTS - The present invention is a system and method for creating and implementing transactional speech applications (SAs) using Web technologies, without reliance on server-side standard or custom services. A transactional speech application may be any application that requires interpretation of speech in conjunction with a speech recognition (SR) system, such as, for example, consumer survey systems. A speech application in accordance with the present invention is represented within a Web page, as an application script that interprets semantic objects according to a context. Any commonly known scripting language can be used to write the application script, such as JavaScript (or ECMAScript), PerlScript, and VBscript. The present invention is “Web-based” to the extent that it implements Web technologies, but it need not include or access the World Wide Web. | 12-29-2011 |
20110320189 | SYSTEMS AND METHODS FOR FILTERING DICTATED AND NON-DICTATED SECTIONS OF DOCUMENTS - A system and method for filtering documents to determine section boundaries between dictated and non-dictated text. The system and method identifies portions of a text report that correspond to an original dictation and, correspondingly, those portions that are not part of the original dictation. The system and method include comparing tokenized and normalized forms of the original dictation and the final report, determining mismatches between the two forms, and applying machine-learning techniques to identify document headers, footers, page turns, macros, and lists automatically and accurately. | 12-29-2011 |
20110320190 | AUTOMATED SENTENCE PLANNING IN A TASK CLASSIFICATION SYSTEM - The invention relates to a task classification system ( | 12-29-2011 |
20110320191 | TEXT CREATION SYSTEM AND METHOD - A text creation system and method is described. Input text is provided in an authoring language and may be provided in one or more rendering languages and/or writing styles. The input text is analyzed to determine the semantic content of the input, and the semantic information is stored in a database. In one embodiment, templates are provided to acquire input and information about the template and the semantic data of the input, along with a database of equivalent meanings in different languages, are utilized to instantly generate output in a number of languages. One embodiment is a method and system of presenting advertisements in multiple languages. | 12-29-2011 |
20120004901 | PHONETIC KEYS FOR THE JAPANESE LANGUAGE - Various embodiments of phonetic keys for the Japanese language are described herein. A Kana rule set is applied to Kana characters provided by a user. The Kana characters are defined in an alphabetic language based on the sound of the Kana characters. A full phonetic key is then generated based on the defined Kana characters. A replaced-vowel phonetic key is generated by replacing a vowel in the full phonetic key and a no-vowel phonetic key is generated by removing the vowel in the full phonetic key. Kana records in a database are then processed to determine a relevant Kana record that has a phonetic key identical to at least one of the full phonetic key, the replaced-vowel phonetic key, and the no-vowel phonetic key. The relevant Kana records are then presented to the user. | 01-05-2012 |
20120004902 | Computerized Selection for Healthcare Services - A method for producing healthcare data records from graphical inputs by computer users. Includes generating a plurality of user input categories, displaying on a graphical display icons that correspond to a first of the user input categories and receiving a first user selection of a first icon of the plurality of icons, and displaying on the graphical display a plurality of icons that correspond to a second of the user input categories and receiving a second user selection of a second icon of the plurality of icons. The method also includes displaying icons that correspond to a physical target on which the medical action or observation is performed and receiving a third user selection of the physical target, and applying a syntax to populate a data record of the action using the at least two of the first, second, and third user selections. | 01-05-2012 |
20120004903 | RULE GENERATION - A method for implementing at least one rule for an application is described. The method includes receiving an input rule. Based on the input rule, a program executable code is generated. The generated program executable code can then be associated with the application. | 01-05-2012 |
20120004904 | METHOD AND SYSTEM FOR PROVIDING REPRESENTATIVE PHRASE - A method and system for providing a representative phrase corresponding to a real time (current time) popular keyword. The method and system may extend a representative criterion word, determined by analyzing morphemes of words in documents grouped into a cluster, and may combine the extended representative criterion word and the popular keyword, thereby providing the representative phrases. The method and system may display the popular keyword and the representative phrases on a web page, or the like. | 01-05-2012 |
20120004905 | TECHNIQUES FOR CREATING COMPUTER GENERATED NOTES - Text is extracted from and information resource such as documents, emails, relational database tables and other digitized information sources. The extracted text is processed using a decomposition function to create. Nodes are a particular data structure that stores elemental units of information. The nodes can convey meaning because they relate a subject term or phrase to an attribute term or phrase. Removed from the node data structure, the node contents are or can become a text fragment which conveys meaning, i.e., a note. The notes generated from each digital resource are associated with the digital resource from which they are captured. The notes are then stored, organized and presented in several ways which facilitate knowledge acquisition and utilization by a user. | 01-05-2012 |
20120010872 | Method and System for Semantic Searching - In one embodiment, there is provided a computer-implemented method and system for implementing the method. The method comprises: preliminarily analyzing at least one corpus of natural language text comprising for each sentence of each natural language text of the corpus, performing syntactic analysis using linguistic descriptions to generate at least one syntactic structure for the sentence; building a semantic structure for the sentence; associating each generated syntactic and semantic structure with the sentence; and saving each generated syntactic and semantic structure; for each corpus of natural language text that was preliminarily analyzed, performing an indexing operation to index lexical meanings and values of linguistic parameters of each syntactic structure and each semantic structure associated with sentences in the corpus; and searching in at least one preliminarily analyzed corpora for sentences comprising searched values for the linguistic parameters. | 01-12-2012 |
20120010873 | SENTENCE TRANSLATION APPARATUS AND METHOD - Disclosed herein are a sentence translation apparatus and method. The sentence translation apparatus includes a voice recognition unit, a morphemic part-of-speech tagging unit, a pause extraction unit, and a sentence separation unit. The voice recognition unit creates a sentence in a first language based on results of recognition of a voice in a first language. The morphemic part-of-speech tagging unit tags morphemic parts of speech from the sentence in the first language. The pause extraction unit extracts pause information from the voice in the first language. The sentence separation unit separates the sentence in the first language based on information about the morphemic parts of speech tagged by the morphemic part-of-speech tagging unit and the pause information extracted by the pause extraction unit. | 01-12-2012 |
20120010874 | METHOD AND SYSTEM FOR PROVIDING A REPRESENTATIVE PHRASE BASED ON KEYWORD SEARCHES - Provided is a method and system for providing a representative phrase with respect to a real time popular keyword, which may determine programs including a popular keyword from broadcast information, and may generate a representative phrase with respect to the popular keyword using the determined programs, thereby providing the representative phrase by combining the generated representative phrase and the popular keyword. | 01-12-2012 |
20120010875 | CLASSIFYING TEXT VIA TOPICAL ANALYSIS, FOR APPLICATIONS TO SPEECH RECOGNITION - An assignment device ( | 01-12-2012 |
20120010876 | VOICE INTEGRATION PLATFORM - A voice integration platform and method provide for integration of a voice interface with a data system that includes stored data. The voice integration platform comprises one or more generic software components, the generic software components being configured to enable development of a specific voice user interface that is designed to interact with the data system in order to present the stored data to a user. | 01-12-2012 |
20120016661 | SYSTEM, METHOD AND DEVICE FOR INTELLIGENT TEXTUAL CONVERSATION SYSTEM - A method of intelligent textual markup in an information exchange includes: determining semantic elements in said information exchange; determining relations between said semantic elements; representing said semantic elements as nodes in a directed graph; and representing said relations as edges connecting said nodes. A data processing system for enabling a visual representation of semantic relations in an information exchange includes: a semantic analysis engine adapted to determine semantic elements of said information exchange; a relation analysis engine adapted to determine relations between said semantic elements; and a presentation engine adapted to present said semantic elements as nodes and said relations as edges in a directed graph representing said information exchange. | 01-19-2012 |
20120016662 | METHOD AND APPARATUS FOR PROCESSING BIOMETRIC INFORMATION USING DISTRIBUTED COMPUTATION - An approach is provided for providing biometric information processing using distributed computation. A biometric information processing infrastructure determines to receive an input including, at least in part, biometric information. The biometric information processing infrastructure selects one or more analyses for processing the input. The biometric information processing infrastructure also determines one or more processes associated with the one or more analyses. The biometric information processing infrastructure further determines to derive one or more computation closures from the one or more processes. The biometric information processing infrastructure determines to decompose the one or more computation closures for distribution in one or more computation spaces. | 01-19-2012 |
20120016663 | IDENTIFYING RELATED NAMES - Provided are techniques for identifying related names. A collection of names from different languages is stored, wherein each of the names has a native orthographic form and a romanized form. An input name is received in a known encoding scheme. An alphabet of the input name is determined based on the known encoding scheme. One or more romanized names are generated based on the query name and the determined query name alphabet. Culture-sensitive regularization rules are applied to create an additional romanized name. The one or more romanized names and the additional romanized name are matched against the romanized names in the collection of names from the different languages. Data store records that have romanized names that match the one or more romanized names or the additional romanized name are returned. | 01-19-2012 |
20120016664 | LANGUAGE ANALYSIS APPARATUS, LANGUAGE ANALYSIS METHOD, AND LANGUAGE ANALYSIS PROGRAM - A language analysis apparatus of the invention includes division rules, each of which is classified into one of levels according to the degree of risk of causing analysis accuracy problems when applied; a division point candidate generation unit | 01-19-2012 |
20120022854 | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM - An apparatus and method provide logic for processing information. In one implementation, an apparatus includes a receiving unit configured to receive a selection of displayed content from a user. An obtaining unit is configured to obtain data corresponding to the selection. The data includes text data. An identification unit is configured to identify a keyword within the text data, and a storage unit is configured to generate a command to transmit the keyword to a device. | 01-26-2012 |
20120022855 | Searching and Browsing of Contextual Information - Systems and methods for searching and browsing a data store of contextually related data objects. The system includes a search/browse module that receives a search query. The search/browse module identifies data objects that match the search query and generates sentences from data objects that are contextually related to the matching data objects. The sentences are human-readable sentences, for example in subject-verb-object format, where each sentence represents the relationship between two data objects. The sentences are output for display as a hierarchy of sentences. Additionally, a user can browse the data store of contextually related data objects by selecting a sentence that is displayed to the user. The search/browse module then outputs attributes of the data object represented by the sentence for display in two separate regions of a user interface. | 01-26-2012 |
20120022856 | Browsing of Contextual Information - Systems and methods for searching and browsing a data store of contextually related data objects. The system includes a search/browse module that receives a search query. The search/browse module identifies data objects that match the search query and generates sentences from data objects that are contextually related to the matching data objects. The sentences are human-readable sentences, for example in subject-verb-object format, where each sentence represents the relationship between two data objects. The sentences are output for display as a hierarchy of sentences. Additionally, a user can browse the data store of contextually related data objects by selecting a sentence that is displayed to the user. The search/browse module then outputs attributes of the data object represented by the sentence for display in two separate regions of a user interface. | 01-26-2012 |
20120022857 | SYSTEM AND METHOD FOR A COOPERATIVE CONVERSATIONAL VOICE USER INTERFACE - A cooperative conversational voice user interface is provided. The cooperative conversational voice user interface may build upon short-term and long-term shared knowledge to generate one or more explicit and/or implicit hypotheses about an intent of a user utterance. The hypotheses may be ranked based on varying degrees of certainty, and an adaptive response may be generated for the user. Responses may be worded based on the degrees of certainty and to frame an appropriate domain for a subsequent utterance. In one implementation, misrecognitions may be tolerated, and conversational course may be corrected based on subsequent utterances and/or responses. | 01-26-2012 |
20120022858 | HANDHELD ELECTRONIC DEVICE AND ASSOCIATED METHOD EMPLOYING A MULTIPLE-AXIS INPUT DEVICE AND PROVIDING A LEARNING FUNCTION IN A TEXT DISAMBIGUATION ENVIRONMENT - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. | 01-26-2012 |
20120029907 | myMedicalpen - A digital pen designed to assist users in spelling words as they write. The invention is an electronic pen with a speaker located near the top of the device. A microphone may be located directly under the speaker in the form of a small screened concave or convex aperture. A switch on the back of the pen allows the user to choose between three settings: Medical Dictionary (D), Off (O), and Prescription Drug List (P). The device works by the user speaking the desired word into the microphone. The word will then appear on the illuminated digital display screen which lights up. The pen asks the user to confirm or deny the displayed word. The user says “yes” or “no” into the microphone. If denied, the pen displays another word until the correct word is located. Once confirmed, the pen will audibly and visibly spell the word one letter at a time as the user writes. The pen may be switched to the prescription drug list mode as needed. | 02-02-2012 |
20120029908 | INFORMATION PROCESSING DEVICE, RELATED SENTENCE PROVIDING METHOD, AND PROGRAM - There is provided an information processing device including an information providing unit that provides related information related to main information, a related sentence generation unit that generates a sentence indicating a relation between the main information and the related information and a related sentence providing unit that provides the sentence generated by the related sentence generation unit. | 02-02-2012 |
20120029909 | SPEECH PROCESSING DEVICE, SPEECH PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT FOR SPEECH PROCESSING - According to one embodiment, a speech processing device includes an utterance error occurrence determination information storage unit that stores utterance error occurrence determination information; a related word information storage unit that stores related word information including words; an utterance error occurrence determining unit that compares each of the divided words with the condition, gives the error pattern to the word corresponding to the condition, and determines that the word which does not correspond to the condition does not cause the utterance error; and a phoneme string generating unit that generates a phoneme string of the utterance error. The one of the error patterns associated with one of the conditions is the speech error, the utterance error occurrence determining unit further gives an incorrectly spoken word from the related word information, and the phoneme string generating unit generates a phoneme string of the incorrectly spoken word. | 02-02-2012 |
20120029910 | System and Method for Inputting Text into Electronic Devices - The present invention provides a system comprising a user interface configured to receive text input by a user, a text prediction engine comprising a plurality of language models and configured to receive the input text from the user interface and to generate concurrently text predictions using the plurality of language models, and wherein the text prediction engine is further configured to provide text predictions to the user interface for display and user selection. An analogous method and an interface for use with the system and method are also provided. | 02-02-2012 |
20120035914 | System and method for handling multiple languages in text - A system and method for processing text are disclosed. The method includes receiving text to be processed. A main language of the text is identified. At least one unknown sequence in the text is identified, each unknown sequence comprising at least one word that is unknown in the main language. For a secondary language, for each of the at least one unknown sequence, the method includes determining whether the unknown sequence includes a first word recognized in the secondary language and, if so, identifying a sequence of words in the secondary language which includes at least the first word. The identifying of the sequence of words in the secondary language includes applying an algorithm for determining whether the sequence of words in the secondary language is expandable beyond the first word to include adjacent words. The text is labeled based on the identified sequences of words in the secondary language. | 02-09-2012 |
20120035915 | LANGUAGE MODEL CREATION DEVICE, LANGUAGE MODEL CREATION METHOD, AND COMPUTER-READABLE STORAGE MEDIUM - The present invention uses a language model creation device | 02-09-2012 |
20120035916 | Handheld Electronic Device and Associated Method Employing a Multiple-Axis Input Device and Learning a Context of a Text Input For Use by a Disambiguation Routine - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate text input. In addition to identifying and outputting representations of language objects that are stored in the memory and that correspond with a text input, the device is able to employ contextual data in certain circumstances to prioritize output and to learn new contextual data. | 02-09-2012 |
20120041756 | VOICE RECOGNITION DEVICE - Voice recognition is realized by a pattern matching with a voice pattern model, and when a large number of paraphrased words are required for one facility, such as a name of a hotel or a tourist facility, the pattern matching needs to be performed with the voice pattern models of all the paraphrased words, resulting in an enormous amount of calculation. Further, it is difficult to generate all the paraphrased words, and a large amount of labor is required. A voice recognition device includes: voice recognition means for applying the voice recognition to an input voice by using a language model and an acoustic model, and outputting a predetermined number of recognition results each including a set of a recognition score and a text representation; and N-best candidate rearrangement means for: comparing the recognition result to a morpheme dictionary held in a morpheme dictionary memory; checking whether a representation of the recognition result can be expressed by any one of combinations of the morphemes of the morpheme dictionaries; correcting the recognition score when the representation can be expressed; and rearranging an order according to the corrected recognition score so as to acquire recognition results. | 02-16-2012 |
20120041757 | HANDHELD ELECTRONIC DEVICE WITH TEXT DISAMBIGUATION - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device. | 02-16-2012 |
20120046936 | System and method for distributed audience feedback on semantic analysis of media content - A system and computer implemented method of distributed audience feedback of media content in real time or substantially real time, including: semantically analyzing, at a semantic speech analysis engine, media content from a media program and identifying relevant topic data; distributing, at a topic data publisher, the identified relevant topic data to an audience of the media program; collecting, at a server, audience opinions on the identified relevant topic data; and processing the collected audience opinions. Other embodiments are disclosed. | 02-23-2012 |
20120046937 | SEMANTIC CLASSIFICATION OF VARIABLE DATA CAMPAIGN INFORMATION - A method and system for semantically classifying variable data campaign information. The method and system include loading, by a processing device, a variable data campaign from a computer readable storage medium operably connected to the processing device; extracting, by the processing device, variable data from the campaign; semantically classifying, by the processing device, the variable data to produce semantically classified variable data; building, by the processing device, a variable data campaign model based upon the semantically classified variable data; and storing, by the processing device, the variable data campaign model in the computer readable storage medium. | 02-23-2012 |
20120046938 | LARGE-SCALE SENTIMENT ANALYSIS - A method for determining a sentiment associated with an entity includes inputting a plurality of texts associated with the entity, labeling seed words in the plurality of texts as positive or negative, determining a score estimate for the plurality of words based on the labeling, re-enumerating paths of the plurality of words and determining a number of sentiment alternations, determining a final score for the plurality of words using only paths whose number of alternations is within a threshold, converting the final scores to corresponding z-scores for each of the plurality of words, and outputting the sentiment associated with the entity. | 02-23-2012 |
20120046939 | SYSTEMS AND METHODS FOR GENERATING WEIGHTED FINITE-STATE AUTOMATA REPRESENTING GRAMMARS - A context-free grammar can be represented by a weighted finite-state transducer. This representation can be used to efficiently compile that grammar into a weighted finite-state automaton that accepts the strings allowed by the grammar with the corresponding weights. The rules of a context-free grammar are input. A finite-state automaton is generated from the input rules. Strongly connected components of the finite-state automaton are identified. An automaton is generated for each strongly connected component. A topology that defines a number of states, and that uses active ones of the non-terminal symbols of the context-free grammar as the labels between those states, is defined. The topology is expanded by replacing a transition, and its beginning and end states, with the automaton that includes, as a state, the symbol used as the label on that transition. The topology can be fully expanded or dynamically expanded as required to recognize a particular input string. | 02-23-2012 |
20120059647 | Touchless Texting Exercise - A method, system, and computer program product are provided for touchless texting that enhances user activity. A plurality of graphical images are displayed on a computer display. An exercise motion is detected using a camera, and the motion is resolved to a selected graphical image from the plurality of graphical images. The selected graphical image is entered into an application. | 03-08-2012 |
20120065959 | WORD GRAPH - One example embodiment includes a method for constructing a word graph. The method includes obtaining a subject text and dividing the subject text into one or more units. The method also includes dividing the units into one or more sub-units and recording each of the one or more sub-units. | 03-15-2012 |
20120065960 | GENERATING PARSER COMBINATION BY COMBINING LANGUAGE PROCESSING PARSERS - A computer implemented method, a computer system, and a program for generating a parser combination. The method includes: generating a parser combination by combining parsers each associated with at least one grammar description, where the step is carried out using (i) at least one grammar description means and (ii) a computer device. The computer system includes: a processor, a memory connected to the processor, and a parser generator for generating a parser combination in the memory by combining parsers each associated with at least one grammar description, and at least one grammar description type means. | 03-15-2012 |
20120065961 | SPEECH MODEL GENERATING APPARATUS, SPEECH SYNTHESIS APPARATUS, SPEECH MODEL GENERATING PROGRAM PRODUCT, SPEECH SYNTHESIS PROGRAM PRODUCT, SPEECH MODEL GENERATING METHOD, AND SPEECH SYNTHESIS METHOD - According to one embodiment, a speech model generating apparatus includes a spectrum analyzer, a chunker, a parameterizer, a clustering unit, and a model training unit. The spectrum analyzer acquires a speech signal corresponding to text information and calculates a set of spectral coefficients. The chunker acquires boundary information indicating a beginning and an end of linguistic units and chunks the speech signal into linguistic units. The parameterizer calculates a set of spectral trajectory parameters for a trajectory of the spectral trajectory parameters of the linguistic unit on the basis of the spectral coefficients. The clustering unit clusters the spectral trajectory parameters calculated for each of the linguistic units into clusters on the basis of linguistic information. The model training unit obtains a trained spectral trajectory model indicating a characteristic of a cluster based on the spectral trajectory parameters belonging to the same cluster. | 03-15-2012 |
20120065962 | Systems and Methods of Building and Using Custom Word Lists - Standard word lists that are often used for such operations as predictive text, spell checking, and word completion are based on general linguistic data that might not accurately reflect actual text usage patterns of particular users. Systems and methods of building and using a custom word list for use in text operations on an electronic device are provided. A collection of text items associated with a user of the electronic device is scanned to identify words in the text items. A weighting is then assigned to each identified word, and the words and corresponding weightings are stored. | 03-15-2012 |
20120065963 | System And Method Of Generating Responses To Text-Based Messages - In accordance with one aspect of the present invention, an automated method of and system for generating a response to a text-based natural language message is disclosed. The method includes identifying a first selected input clause in a sentence in the text-based natural language message. Also, assigning a semantic tag to the first selected input clause and matching the semantic tag to a historical input tag. The historical input tag associated with a first previously generated response clause. Further; generating an output response message based on the historical response clause, the output response message derived from the historical input tag and a second previously generated response clause. The system includes means for performing the method steps. | 03-15-2012 |
20120072204 | SYSTEMS AND METHODS FOR NORMALIZING INPUT MEDIA - A method and system for processing input media for provision to a text to speech engine comprising: a rules engine configured to maintain and update rules for processing the input media; a pre-parsing filter module configured to determine one or more metadata attributes using pre-parsing rules; a parsing filter module configured to identify content component from the input media using the parsing rules; a context and language detector configured to determine a default context and a default language; a learning agent configured to divide the content component into units of interest; a tagging module configured to iteratively assign tags to the units of interest using the tagging rules, wherein each tag is associated with a post-parsing rule; a post-parsing filter module configured to modify the content component by executing the post-parsing rules identified by the tags assigned to the phrases and strings. The context and language detector, tagging module, learning agent and post-parsing filter module are configured to iteratively process the content component and modifications thereto until there are no further modifications or a threshold number of iterations are performed. | 03-22-2012 |
20120072205 | HANDHELD ELECTRONIC DEVICE AND METHOD FOR DISAMBIGUATION OF COMPOUND TEXT INPUT AND THAT EMPLOYS N-GRAM DATA TO LIMIT GENERATION OF LOW-PROBABILITY COMPOUND LANGUAGE SOLUTIONS - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to analyze the combinations of language objects in light of N-gram data stored on the device to avoid proposing low-probability compound language solutions. | 03-22-2012 |
20120078610 | DETERMINING OFFER TERMS FROM TEXT - Systems, methods, and machine readable and executable instructions are provided for determining offer terms from text. A method for determining offer terms from text can include mapping keywords to a domain of a procurement event, and receiving, to a computing device, an offer text associated with the procurement event. Event-specific entities are identified, by the computing device, in the offer text. The computing device determines the domain of the procurement event from the identified event-specific entities, and using the mapped keywords corresponding to the determined domain, determines offer components from the offer text, extracts offer parameters from the offer text, and constructs the offer structure using the identified event-specific entities, derived offer components, and extracted offer parameters. | 03-29-2012 |
20120078611 | CONTEXT-AWARE CONVERSATIONAL USER INTERFACE - An input handler may receive natural language input associated with a command from a user through a user interface, and a language parser may parse the natural language input to determine parsed natural language input. A context monitor may receive context information associated with the user, and a context parser may parse the context information to obtain parsed context information associated with the natural language input and with the command. A command interpreter may interpret the parsed natural language input, using the parsed context information, to thereby determine the command. | 03-29-2012 |
20120078612 | SYSTEMS AND METHODS FOR NAVIGATING ELECTRONIC TEXTS - Disclosed herein are systems and methods for navigating electronic texts. According to an aspect, a method may include determining text subgroups within an electronic text. The method may also include selecting a text seed within one of the text subgroups. Further, the method may include determining a similarity relationship between the text seed and one or more adjacent text subgroups that do not include the selected text seed. The method may also include associating the text seed with the one or more adjacent text subgroups based on the similarity relationship to create a text cluster. | 03-29-2012 |
20120078613 | METHOD, SYSTEM, AND COMPUTER READABLE MEDIUM FOR GRAPHICALLY DISPLAYING RELATED TEXT IN AN ELECTRONIC DOCUMENT - Disclosed herein are systems and methods for navigating electronic texts. According to an aspect, a method may include receiving search criteria for searching an electronic text. Further, the method may include determining text subgroups within the electronic text. The method may also include determining, for each text subgroup, a similarity relationship between the search criteria and the text subgroup. Further, the method may include presenting, for each text subgroup, a graphic representing the similarity relationship between the text subgroup and the search criteria. | 03-29-2012 |
20120078614 | Virtual keyboard for a non-tactile three dimensional user interface - A method, including presenting, by a computer system executing a non-tactile three dimensional user interface, a virtual keyboard on a display, the virtual keyboard including multiple virtual keys, and capturing a sequence of depth maps over time of a body part of a human subject. On the display, a cursor is presented at positions indicated by the body part in the captured sequence of depth maps, and one of the multiple virtual keys is selected in response to an interruption of a motion of the presented cursor in proximity to the one of the multiple virtual keys. | 03-29-2012 |
20120078615 | Multiple Touchpoints For Efficient Text Input - Methods and systems for using multiple simultaneous touchpoints of a touch-sensitive keyboard, such as an on-screen keyboard, for more efficient text input are provided. A method for generating text using a touch-sensitive keyboard may include receiving touch input from multiple simultaneous touchpoints. The method may also include determining a text character for each respective simultaneous touchpoint based on the touch input. The method may further include generating a text word based on the text characters determined from the multiple simultaneous touchpoints. A system for generating text using a touch-sensitive keyboard may include a touch input receiver, a slide detector and a text word generator. | 03-29-2012 |
20120078616 | Handheld Electronic Device and Associated Method Enabling Spell Checking in a Text Disambiguation Environment - An improved handheld electronic device and associated method enable spell checking in a reduced keyboard and disambiguation environment. The improved spell checking routine converts a misspelled word into a canonical version thereof and receives from a dictionary | 03-29-2012 |
20120084074 | Association Of Semantic Meaning With Data Elements Using Data Definition Tags - Technology is described for associating semantic meaning with data elements. The system can include a messaging module configured to receive a message having data elements. A storage module can store the data elements from the message in a structured format. A message dictionary can be configured to identify a type of the message received and to lexically identify data elements of the message using the message dictionary and the type of message. In addition, a taxonomy module can be configured to provide a semantic meaning for the data elements of the lexically identified portions of the message. Further, a data definition tag repository can store data definition tags and link the message dictionary, the taxonomy, and storage location of the data elements in the storage module. The data definition tags can enable the semantic meaning of data elements to be queried. | 04-05-2012 |
20120084075 | CHARACTER INPUT APPARATUS EQUIPPED WITH AUTO-COMPLETE FUNCTION, METHOD OF CONTROLLING THE CHARACTER INPUT APPARATUS, AND STORAGE MEDIUM - A character input apparatus which makes it possible to suppress degradation of use-friendliness in a case where a visually disabled user inputs characters using an auto-complete function. In the character string input apparatus, a character string to be input as a portion following a character string input by a user is predicted based on the character string input by the user, and the character string input by the user is completed using the predicted character string as a portion complementary thereto. In a voice guidance mode, information associated with a key selected by the user is read aloud by voice. When the voice guidance mode is enabled, the character string input apparatus disables the auto-complete function and performs control such that a character string cannot be automatically completed. | 04-05-2012 |
20120084076 | CONTEXT-BASED DISAMBIGUATION OF ACRONYMS AND ABBREVIATIONS - Context-based disambiguation of acronyms and/or abbreviations may determine a target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage, the target abbreviation representing a shortened form of one or more word. A contextual search query including the target abbreviation and said one or more keywords may be generated. A pseudo document index may be searched for one or more expansions of the target abbreviation by invoking the contextual search query, the pseudo document index containing index of one or more pseudo documents, associated one or more abbreviations and associated context keywords. One or more pseudo documents associated with the target abbreviation may be returned based on the searching of the pseudo document index. | 04-05-2012 |
20120089387 | GENERAL PURPOSE CORRECTION OF GRAMMATICAL AND WORD USAGE ERRORS - Architecture that detects and corrects writing errors in a human language based on the utilization of three different stages: error detection, correction candidate generation, and correction candidate ranking. The architecture is a generic framework for generating fluent alternatives to non-grammatical word sequences in a written sample. Error detection is addressed by a suite of language model related scores and other scores such as parse scores that can identify a particularly unlikely sequence of words. Correction candidate generation is addressed by a lookup in a very large corpus of “correct” English that looks for alternative arrangements of the same or similar words or subsequences of these words in the same context. Correction candidate ranking is addressed by a language model ranker. | 04-12-2012 |
20120089388 | Segmenting Words Using Scaled Probabilities - Systems, methods, and apparatuses including computer program products for segmenting words using scaled probabilities. In one implementation, a method is provided. The method includes receiving a probability of a n-gram identifying a word, determining a number of atomic units in the corresponding n-gram, identifying a scaling weight depending on the number of atomic units in the n-gram, and applying the scaling weight to the probability of the n-gram identifying a word to determine a scaled probability of the n-gram identifying a word. | 04-12-2012 |
20120095750 | PARSING OBSERVABLE COLLECTIONS - Parsing technology is applied to observable collections. More specifically, a parser, such as combinator parser, can be employed to perform syntactic analysis over one or more observable collections. Further, multiple observable collections can be combined into a single collection and time can be captured by annotating collection items or generating time items. | 04-19-2012 |
20120095751 | TEXT SEGMENTATION AND LABEL ASSIGNMENT WITH USER INTERACTION BY MEANS OF TOPIC SPECIFIC LANGUAGE MODELS AND TOPIC-SPECIFIC LABEL STATISTICS - The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document. | 04-19-2012 |
20120095752 | LEVERAGING BACK-OFF GRAMMARS FOR AUTHORING CONTEXT-FREE GRAMMARS - A system and method of refining context-free grammars (CFGs). The method includes deriving back-off grammar (BOG) rules from an initially developed CFG and utilizing the initial CFG and the derived BOG rules to recognize user utterances. Based on a response of the initial CFG and the derived BOG rules to the user utterances, at least a portion of the derived BOG rules are utilized to modify the initial CFG and thereby produce a refined CFG. The above method can carried out iterativey, with each new iteration utilizing a refined CFG from preceding iterations. | 04-19-2012 |
20120101805 | METHOD AND APPARATUS FOR DETECTING A SENTIMENT OF SHORT MESSAGES - A method, computer readable medium and apparatus for detecting a sentiment for a short message are disclosed. For example, the method receives the short message, and obtains an abstraction of the short message. The method then determines the sentiment of the short message based upon the abstraction. | 04-26-2012 |
20120101806 | SEMANTICALLY GENERATING PERSONALIZED RECOMMENDATIONS BASED ON SOCIAL FEEDS TO A USER IN REAL-TIME AND DISPLAY METHODS THEREOF - Systems and methods of selecting recommendations for a user in an online environment are disclosed. In one aspect, embodiments of the present disclosure include a method, which may be implemented on a system, of performing semantic analysis on a content item associated with the user, online interactions of the user, and profile information related to the user to identify associated content metadata and keywords, assigning a weight to the content metadata and keywords based on semantic-type categories, comparing the content metadata and keywords to target metadata and keywords to identify recommendation matches, and selecting one or more recommendations to be provided to the user based on the recommendation matches. | 04-26-2012 |
20120101807 | QUESTION TYPE AND DOMAIN IDENTIFYING APPARATUS AND METHOD - A question type and domain identifying apparatus includes: a question type identifier for recognizing the number of words of a user's question to identify whether the user's question is a query for performing information searching or a question for performing a question and answer (Q&A); a question domain distributor for distributing one of plural preset domain specialized Q&A engines, as a Q&A engine of the user's question based on the recognized word number; and a Q&A engine block, including the domain specialized Q&A engines, for selectively performing information searching or a Q&A with respect to the user's question in response to the distribution of the question domain distributor. | 04-26-2012 |
20120101808 | SENTIMENT ANALYSIS FROM SOCIAL MEDIA CONTENT - Methods and systems for extracting and analyzing user-generated content (UGC) in order to provide opinion-bearing information concerning different categories of a product. Harvested Web pages are examined for keywords to identify categories to which they pertain. Opinion-bearing information regarding those categories is then extracted and analyzed to determine its orientation and, optionally, its strength. The resulting sentiment determinations can be aggregated across multiple product reviews and the like to develop a sentiment summary, which can be reported and used as the basis for advertising, marketing and purchasing decisions, among others. | 04-26-2012 |
20120101809 | SYSTEM AND METHOD FOR DYNAMICALLY GENERATING A RECOGNITION GRAMMAR IN AN INTEGRATED VOICE NAVIGATION SERVICES ENVIRONMENT - The system and method described herein may dynamically generate a recognition grammar associated with a conversational voice user interface in an integrated voice navigation services environment. In particular, in response to receiving a natural language utterance that relates to a navigation context at the voice user interface, a conversational language processor may generate a dynamic recognition grammar that organizes grammar information based on one or more topological domains. For example, the one or more topological domains may be determined based on a current location associated with a navigation device, whereby a speech recognition engine may use the grammar information organized in the dynamic recognition grammar according to the one or more topological domains to generate one or more interpretations associated with the natural language utterance. | 04-26-2012 |
20120101810 | SYSTEM AND METHOD FOR PROVIDING A NATURAL LANGUAGE VOICE USER INTERFACE IN AN INTEGRATED VOICE NAVIGATION SERVICES ENVIRONMENT - A conversational, natural language voice user interface may provide an integrated voice navigation services environment. The voice user interface may enable a user to make natural language requests relating to various navigation services, and further, may interact with the user in a cooperative, conversational dialogue to resolve the requests. Through dynamic awareness of context, available sources of information, domain knowledge, user behavior and preferences, and external systems and devices, among other things, the voice user interface may provide an integrated environment in which the user can speak conversationally, using natural language, to issue queries, commands, or other requests relating to the navigation services provided in the environment. | 04-26-2012 |
20120109636 | SUBSTITUTION, INSERTION, AND DELETION (SID) DISTANCE AND VOICE IMPRESSIONS DETECTOR (VID) DISTANCE - A device may receive user input, select two strings to compare based on the user input, obtain a first set of keyboard codes for a first of the two strings, obtain a second set of keyboard codes for a second of the two strings, and determine a distance between the two strings based on the first and the second set of keyboard codes. In addition, the device may send a result associated with determining the distance to another device, store the result in a storage device, or display the result. | 05-03-2012 |
20120109637 | EXTRACTING RICH TEMPORAL CONTEXT FOR BUSINESS ENTITIES AND EVENTS - Methods and apparatus for performing computer-implemented extraction of temporal information for business entities and events are disclosed. In one embodiment, a sequence of text is obtained. A label is assigned to one or more of a plurality of segments of the text such that each of the one or more of the plurality of segments of the text is classified as temporal data in one of a plurality of classes of temporal data. One or more rules are applied to the one or more segments of the text that have been classified as temporal data to generate a structured representation of the temporal data, where the rules include one or more schematic rules. Each of the schematic rules pertains to one or more of the plurality of classes of temporal data and indicates a structure in which temporal data in the corresponding one or more of the plurality of classes is to be stored. | 05-03-2012 |
20120109638 | ELECTRONIC DEVICE AND METHOD FOR EXTRACTING COMPONENT NAMES USING THE SAME - A method for extracting component names from a document reads text content of the document, searches for component labels in the text content, and stores a position of each component label in the text content in a storage device. The method further extract a component name corresponding to each component label in the text content according to the position of each component label, and creates a component table according to the component label and the component name. | 05-03-2012 |
20120109639 | METHOD, COMPUTER PROGRAM AND APPARATUS FOR ANALYZING SYMBOLS IN A COMPUTER SYSTEM - The present invention provides a computer-implemented method of analyzing messages in a computer system to allow workflows constituted by the messages to be identified, the method comprising: analyzing a sequence of messages in a computer system in order to classify the messages, thereby producing a corresponding sequence of classifications of the messages; and, applying sequence induction to the sequence of classifications of the messages to produce (i) a set or sub-sequences of the classifications of the messages and (ii) a sequence grammar for the sub-sequences, from which a workflow constituted by the sequence of messages can be identified. | 05-03-2012 |
20120109640 | METHOD AND SYSTEM FOR ANALYZING AND TRANSLATING VARIOUS LANGUAGES WITH USE OF SEMANTIC HIERARCHY - A method and computer system for analyzing sentences of various languages and constructing a language-independent semantic structure are provided. On the basis of comprehensive knowledge about languages and semantics, exhaustive linguistic descriptions are created, and lexical, morphological, syntactic, and semantic analyses for one or more sentences of a natural or artificial language are performed. A computer system is also provided to implement, analyze and store various linguistic structures and to perform lexical, morphological, syntactic, and semantic analyses. As result, a generalized data structure, such as a semantic structure, is generated and used to describe the meaning of one or more sentences in language-independent form, applicable to automated abstracting, machine translation, control systems, Internet information retrieval, etc. | 05-03-2012 |
20120109641 | METHOD, SYSTEM, AND APPARATUS FOR VALIDATION - In a method for validating data, a text of a document is received. At least one fact is extracted from the text. At least one expert refinement is merged with the at least one fact to create at least one modified fact. The at least one modified fact is provided for a review. An expert refinement to the at least one modified fact is captured in response to the review. A superset document based on the at least one pre-existing refinement and the expert refinement is stored. | 05-03-2012 |
20120109642 | COMPUTER-IMPLEMENTED PATENT PORTFOLIO ANALYSIS METHOD AND APPARATUS - A computer-implemented apparatus and method for performing patent portfolio analysis. The patent portfolio analysis apparatus and method clusters a group of patents based upon one or more techniques. The clustering techniques include linguistic clustering techniques (e.g., eigenvector analysis), claim meaning, and patent classification techniques. Different aspects of the clusters are analyzed, including financial, claim breadth, and assignee patent comparisons. Moreover, patents and/or their clusters are linked to the Internet in order to determine what products might be covered by the claims of the patents or whether materials on the Internet might render patent claims invalid. | 05-03-2012 |
20120123767 | AUTOMATICALLY ASSESSING DOCUMENT QUALITY FOR DOMAIN-SPECIFIC DOCUMENTATION - Methods and arrangements for document quality assessment. Documents are accepted and a quality specification containing predetermined quality criteria is assimilated. Each document is assessed based on the predetermined quality criteria, and a quality score is assigned to each document, the quality score being a function of positive and negative attributes assessed for each document. | 05-17-2012 |
20120123768 | METHOD AND APPARATUS FOR DETERMINING TEXT PASSAGE SIMILARITY - According to one embodiment of the invention, a method classifying a number of noun phrases in a first text passage and a second text passage into a number of classifications. The method also includes determining a similarity between a noun phrase from the first text passage and a noun phase from the second text passage for each of the noun phrases of a same classification. Additionally, a similarity between a sentence from the first text passage and a sentence from the second text passage is determined for each of the sentences in the first and second text passages based on similarities between the noun phrases. The method also includes determining a similarity between the first text passage and the second text passage based on a similarity between sentences. | 05-17-2012 |
20120130705 | TEXT SEGMENTATION WITH MULTIPLE GRANULARITY LEVELS - Text processing includes: segmenting received text based on a lexicon of smallest semantic units to obtain medium-grained segmentation results; merging the medium-grained segmentation results to obtain coarse-grained segmentation results, the coarse-grained segmentation results having coarser granularity than the medium-grained segmentation results; looking up in the lexicon of smallest semantic units respective search elements that correspond to segments in the medium-grained segmentation results; and forming fine-grained segmentation results based on the respective search elements, the fine-grained segmentation results having finer granularity than the medium-grained segmentation results. | 05-24-2012 |
20120130706 | SYSTEMS AND METHODS FOR CHARACTER CORRECTION IN COMMUNICATION DEVICES - A system and method for character error correction is provided, useful for a user of mobile appliances to produce written text with reduced errors. The system includes an interface, a word prediction engine, a statistical engine, an editing distance calculator, and a selector. A string of characters, known as the inputted word, may be entered into the mobile device via the interface. The word prediction engine may generate word candidates similar to the inputted word using fuzzy logic and user preferences generated from past user behavior. The statistical engine may generate variable error costs determined by the probability of erroneously inputting any given character. The editing distance calculator may determine the editing distance between the inputted word and each of the word candidates by grid comparison using the variable error costs. The selector may choose one or more preferred candidates from the word candidates using the editing distances. | 05-24-2012 |
20120130707 | Linguistic Assistance Systems And Methods - System and Methods determine a linguistic preference between two or more phrases. Each of the phrases is submitted to at least one search engine as a search string. Search results are retrieved from each of the at least one search engine for each submitted search string and total hit values of each search result are compared. One of the two or more phrases associated with the greatest total hit value are displayed to a user as the preferred phrase. | 05-24-2012 |
20120130708 | INFORMATION PROCESSOR - An information processor includes a keyword registration means for accepting an input of a keyword composed of a predetermined character string and storing the accepted keyword in a storage device; and a content display means for displaying externally acquired content on a display device. The content display means is configured to display the content on the display device by replacing a character string in a preset range containing the keyword with other display data if the keyword stored in the storage device exists in character information contained in the content. | 05-24-2012 |
20120136649 | Natural Language Interface - The present disclosure involves systems, software, and computer implemented methods for providing a natural language interface for searching a database. One process includes operations for receiving a natural language query. One or more tokens contained in the natural language query are identified. A set of sentences is generated based on the identified tokens, each sentence representing a possible logical interpretation of the natural language query and including a combination of at least one of the identified tokens. At least one sentence in the set of sentences is selected for searching a database based on the identified tokens. | 05-31-2012 |
20120136650 | SUGGESTING SPELLING CORRECTIONS FOR PERSONAL NAMES - Personal name spelling correction suggestion technique embodiments are presented which provide suggestions for alternate spellings of a personal name. This involves creating a personal name directory which can be queried to suggest spelling corrections for personal names. A hash function that maps any personal name in a particular language and misspellings thereof to similar binary codewords is used to produce one or more binary codewords for each personal name in the directory. The same hash function is used to produce one or more binary codewords from a personal name presented in a query. The personal name directory is employed to identify up to a prescribed number of personal names, each of which has one or more associated binary codewords that are similar to one or more of the binary codewords produced from the personal name query. The identified personal names are suggested as alternate names for the query personal name. | 05-31-2012 |
20120136651 | ONE-ROW KEYBOARD AND APPROXIMATE TYPING - In one aspect, the present invention comprises an apparatus for character entry on an electronic device, comprising: a keyboard with one row of keys; and an electronic display device in communication with the keyboard; wherein one or more keys on the keyboard has a correspondence with a plurality of characters, and wherein the correspondence enables QWERTY-based typing. In another aspect, the invention comprises an apparatus for character entry on an electronic device, comprising: a keyboard with a plurality of keys; and an electronic display device in communication with the keyboard; wherein one or more keys on the keyboard has a correspondence with a plurality of characters, and wherein, for each of the one or more keys, the plurality of characters comprises: (a) a home row character associated with a particular finger when touch typing; and (b) a non-home-row character associated with the particular finger when touch typing. | 05-31-2012 |
20120136652 | METHOD, A COMPUTER PROGRAM AND APPARATUS FOR ANALYZING SYMBOLS IN A COMPUTER - The invention provides a computer-implemented method of analyzing symbols in a computer system, the symbols conforming to a specification for the symbols, in which the specification has been codified into a set of computer-readable rules; and, the symbols analyzed using the computer-readable rules to obtain patterns of the symbols by determining the path that is taken by the symbols through the rules that successfully terminates, and grouping the symbols according to said paths, the method comprising; upon receipt of a message at a computer, performing a lexical analysis of the message; and, in dependence on lexical analysis of the message assigning the message to one of the groups identified according to said paths. The invention also provides a computer programmed to perform the method and a computer program comprising program instructions for causing a computer to perform the method. | 05-31-2012 |
20120143595 | FAST TITLE/SUMMARY EXTRACTION FROM LONG DESCRIPTIONS - Techniques are described herein for automatic generation of a title or summary from a long body of text. A grammatical tree representing one or more sentences of the long body of text is generated. One or more nodes from the grammatical tree are selected to be removed. According to one embodiment, a particular node is selected to be removed based on its position in the grammatical tree and its node-type, where the node type represents a grammatical element of the sentence. Once the particular node is selected, a branch of the tree is cut at the node. After branch has been cut, one or more sub-sentences are generated from the remaining nodes in the grammatical tree. The one or more sub-sentences may be returned as a title or summary. | 06-07-2012 |
20120143596 | Voice Communication Management - A method, a computer program product, and an apparatus for managing a voice communication are provided. In one illustrative embodiment, an audio phrase produced by a first user is identified in the voice communication between the first user and a second user. A determination is made whether the audio phrase is present in a policy which prohibits the transmission of the set of undesired audio phrases. Responsive to a determination that the audio phrase is present in the policy which prohibits the transmission of the set of undesired audio phrases, a communication of the audio phrase is modified. | 06-07-2012 |
20120143597 | System and Methods for Evaluating Feature Opinions for Products, Services, and Entities - A system for evaluating a review having unstructured text comprises a segment splitter for separating at least a portion of the unstructured text into one or more segments, each segment comprising one or more words; a segment parser coupled to the segment splitter for assigning one or more lexical categories to one or more of the one or more words of each segment; an information extractor coupled to the segment parser for identifying a feature word and an opinion word contained in the one or more segments; and a sentiment rating engine coupled to the information extractor for calculating an opinion score based upon an opinion grouping, the opinion grouping including at least the feature word and the opinion word identified by the information extractor. | 06-07-2012 |
20120150531 | SYSTEM AND METHOD FOR LEARNING LATENT REPRESENTATIONS FOR NATURAL LANGUAGE TASKS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for learning latent representations for natural language tasks. A system configured to practice the method analyzes, for a first natural language processing task, a first natural language corpus to generate a latent representation for words in the first corpus. Then the system analyzes, for a second natural language processing task, a second natural language corpus having a target word, and predicts a label for the target word based on the latent representation. In one variation, the target word is one or more word such as a rare word and/or a word not encountered in the first natural language corpus. The system can optionally assigning the label to the target word. The system can operate according to a connectionist model that includes a learnable linear mapping that maps each word in the first corpus to a low dimensional latent space. | 06-14-2012 |
20120150532 | SYSTEM AND METHOD FOR FEATURE-RICH CONTINUOUS SPACE LANGUAGE MODELS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for predicting probabilities of words for a language model. An exemplary system configured to practice the method receives a sequence of words and external data associated with the sequence of words and maps the sequence of words to an X-dimensional vector, corresponding to a vocabulary size. Then the system processes each X-dimensional vector, based on the external data, to generate respective Y-dimensional vectors, wherein each Y-dimensional vector represents a dense continuous space, and outputs at least one next word predicted to follow the sequence of words based on the respective Y-dimensional vectors. The X-dimensional vector, which is a binary sparse representation, can be higher dimensional than the Y-dimensional vector, which is a dense continuous space. The external data can include part-of-speech tags, topic information, word similarity, word relationships, a particular topic, and succeeding parts of speech in a given history. | 06-14-2012 |
20120150533 | PROVIDING DEFINITIONS THAT ARE SENSITIVE TO THE CONTEXT OF A TEXT - Systems and techniques for providing definitions to a user. The provision embodies the context of a text in which the defined term appears. In one aspect, a system includes an electronic device that includes one or more data processing devices programmed to respond to receipt of the user selection of the first term by performing operations. The operations include accessing, from the one or more persistent data storage devices, the characterizations of the contexts of the texts, comparing the accessed characterizations of the contexts of the texts with one or more characteristics of the context of the textual content of a media file, and ranking the definitions of the first term according to respective likelihoods that the definitions appropriately characterize the usage of the first term within the textual content of the media file. | 06-14-2012 |
20120150534 | Computer-Implemented Systems and Methods for Determining a Difficulty Level of a Text - Systems and methods are provided for determining a difficulty level of a text. A determination is made as to a number of cohesive devices present in a text. A further determination is made as to a number of cohesive devices expected in the text. A cohesiveness metric is calculated based on the number of cohesive devices present in the text and the number of cohesive devices expected in the text, where the cohesiveness metric is used to identify a difficulty level of the text. | 06-14-2012 |
20120158399 | SAMPLE CLUSTERING TO REDUCE MANUAL TRANSCRIPTIONS IN SPEECH RECOGNITION SYSTEM - Techniques for grouping a plurality of samples automatically transcribed from a plurality of utterances. The method comprises forming clusters from the plurality of samples, wherein the clusters include two or more of the plurality of samples. One or more samples are selected from a cluster and manually-processed data samples for the one or more samples are obtained. A weighting factor may be assigned to the data samples based, at least in part, on the number of samples in the cluster associated with the selected data sample. | 06-21-2012 |
20120158400 | METHODS AND SYSTEMS FOR KNOWLEDGE DISCOVERY - In an aspect, provided is a Natural Language Processing (NLP) workflow engine to analyze text. The engine can combine one or more independent NLP components (e.g. Tokenization, Part of Speech Tagging, Named Entity Recognition) into a meaningful processing workflow. | 06-21-2012 |
20120166177 | SYSTEMS AND METHODS FOR ACCESSING APPLICATIONS BASED ON USER INTENT MODELING - In one embodiment, the present invention includes a computer-implemented method comprising storing information in a datastore, the information corresponding to a plurality of computer applications, wherein the plurality of computer applications have associated annotations, receiving an input from a user, providing a first verb and a first noun corresponding to a user intent based on said input, and specifying one or more of said plurality of applications based on the verb and noun annotations for the plurality of applications and the first verb and first noun corresponding to the user intent. The annotations comprise a verb describing one or more activities performed by an associated application and a noun describing work objects on which the activities are performed. Users access the applications in the datastore. | 06-28-2012 |
20120166178 | SYSTEMS AND METHODS FOR MODEL-BASED PROCESSING OF LINGUISTIC USER INPUTS - The present invention includes model-based processing of linguistic user inputs. In one embodiment, the present invention includes a computer-implemented method comprising receiving linguistic inputs, parsing the linguistic inputs, mapping the linguistic inputs to a formal representation used by a model, applying the formal representation against the model, where the model comprises said formal representation, and where the model specifies relationships between the elements of the formal representation and defines process information, and accessing software resources based on the formal representation of the user input and the relationships and process information in said model. | 06-28-2012 |
20120166179 | SYSTEM AND METHOD FOR CLASSIFYING COMMUNICATIONS THAT HAVE LOW LEXICAL CONTENT AND/OR HIGH CONTEXTUAL CONTENT INTO GROUPS USING TOPICS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for identifying document topics. A system configured to practice the method receives a document from a corpus of documents, learns interpersonal relationships of users associated with the document, performs a lexical analysis of the document, and, based on the interpersonal relationships of the users and the lexical analysis, identifying a topic for the document. The approaches disclosed herein can integrate user-people relationships to identify topics for documents with low lexical or high contextual content. The system can learn this user-people relationship from context. The system uses this learned behavior to identify communication documents correctly. Another aspect is the separation of the two phases. The system overlays the learned model on the lexical topic analysis, allowing the system to capture user-defined topics and user behavior that is learned from other factors such as medium (calls, events, etc) or user preferences. | 06-28-2012 |
20120166180 | Compassion, Variety and Cohesion For Methods Of Text Analytics, Writing, Search, User Interfaces - The present invention increases precision and recall of search engines, while decreasing hardware resources needed, using musical rhythmic analysis to detect sentiment and emotion, using poetic and metaphoric resonances with dictionary meanings, to annotate, distinguish and summarize n-grams of word meanings, then intersecting n-grams to locate mutually salient sentences, using metaphor salience analysis to cluster sentences and paragraphs into automatically named concepts, automatically characterizing quality and depth to which documents describe concepts, using editorial metrics of compassion, variety of perspectives and logical cohesion, to automatically set pricing for written works and their copyrights, and to monitor blogs and social media for newly important concepts, and provide advanced user interfaces. | 06-28-2012 |
20120166181 | Method For Locating Line Breaks In Text - A method for locating line breaks in text, carried out by a computer device having a processor and system memory, includes the steps of creating a probabilistic model of a paragraph of text, parameterized by inter-word spacing, and running an inference on the model to find a sequence of line-breaks that maximize the joint probability of line break positions with minimum deviation of inter-word spacing from an ideal value. | 06-28-2012 |
20120166182 | Autocompletion for Partially Entered Query - A server system receives, respectively, a first character string from a first user and a second character string from a second user. There are one or more differences between the first and second character strings. The server system obtains from a plurality of previously submitted complete queries, respectively, a first set of predicted complete queries corresponding to the first character string and a second set of predicted complete queries corresponding to the second character string. There are one or more identical queries in both the first and second sets. The server system conveys at least a first subset of the first set to the first user and at least a second subset of the second set to the second user. Both the first subset and the second subset include a respective identical query. | 06-28-2012 |
20120166183 | SYSTEM AND METHOD FOR THE LOCALIZATION OF STATISTICAL CLASSIFIERS BASED ON MACHINE TRANSLATION - A system and method for localizing a spoken dialog system is disclosed. Source data from a source language spoken dialog system is accessed, including semantic annotations and transcriptions of a plurality of utterances. The transcriptions are machine-translated into a target language. Semantic classifiers are trained on the machine translated transcriptions and the source language semantic annotations. | 06-28-2012 |
20120173227 | METHOD, TERMINAL, AND COMPUTER-READABLE RECORDING MEDIUM FOR SUPPORTING COLLECTION OF OBJECT INCLUDED IN THE IMAGE - The present invention relates to a method for supporting a collection of an object included in a created image. The method includes the steps of: (a) creating an image of an object; (b) automatically creating and providing a combined sentence correct under the grammar of a language for the object on a first area on a screen of the terminal by using at least part of recognition information on what an identity of the object is, a place where the image was created and a time when the image was created, and automatically getting and providing a thumbnail corresponding to the recognized object on a second area on the screen of the terminal; and (c) if a Collection button is selected, storing data provided on the first and the second areas onto a storage space, to thereby complete the collection of the object. | 07-05-2012 |
20120179453 | PREPROCESSING OF TEXT - Performance of statistical machine learning techniques, particularly classification techniques applied to the extraction of attributes and values concerning products, is improved by preprocessing a body of text to be analyzed to remove extraneous information. The body of text is split into a plurality of segments. In an embodiment, sentence identification criteria are applied to identify sentences as the plurality of segments. Thereafter, the plurality of segments are clustered to provide a plurality of clusters. One or more of the resulting clusters are then analyzed to identify segments having low relevance to their respective clusters. Such low relevance segments are then removed from their respective clusters and, consequently, from the body of text. As the resulting relevance-filtered body of text no longer includes portions of the body of text containing mostly extraneous information, the reliability of any subsequent statistical machine learning techniques may be improved. | 07-12-2012 |
20120179454 | APPARATUS AND METHOD FOR AUTOMATICALLY GENERATING GRAMMAR FOR USE IN PROCESSING NATURAL LANGUAGE - Provided is an apparatus and method for automatically generating grammar for use in the processing of natural language. The apparatus may extract a corpus relevant to a target domain from a collection of corpora and may generate grammar for use in the target domain based on the extracted corpus. The apparatus may set one domain out of a plurality of domains as a target domain to be processed by an intention analysis system. The apparatus may extract a corpus relevant to the target domain from a collection of corpora and generate grammar based on the extracted corpus. | 07-12-2012 |
20120185238 | Auto Generation of Social Media Content from Existing Sources - The invention provides for the automatic creation of custom content for social media based on existing text source and a set of preferences and parameters, including automatically preparing the input material from existing text source, automatically generating the social media content of said input material, automatically generating the published content of said social media content, and automatically producing the analysis and report of said published content and its consumption. | 07-19-2012 |
20120191446 | System and method for creating a parser generator and associated computer program - A system is provided for building a parser generator. The system includes a grammar input module for inputting in the parser generator a grammar expressed in a given formalism. A checking module formally verifies that a given grammar belongs to a predetermined class of grammars for which a translation to a correct, terminating parser is feasible. A checking module formally verifies that a grammar expressed in the formalism is well-formed. A semantic action module defines a parsing result depending on semantic actions embedded in the grammar. The semantic action module ensures in a formal way that all semantic actions of the grammar are terminating semantic actions. A formal module generates a parser with total correctness guarantees, using the modules to verify that the grammar is well-formed, belongs to a certain class of feasible, terminating grammars and all its semantic actions are terminating. | 07-26-2012 |
20120197630 | METHODS AND SYSTEMS TO SUMMARIZE A SOURCE TEXT AS A FUNCTION OF CONTEXTUAL INFORMATION - Methods and systems to summarize a source text as a function of contextual information, including to fit a summary within a context-based allotted time. The context-based allotted time may be apportioned amongst multiple portions of the source text, such as by relevance. The context-based allotted time and/or relevance may be user-specified and/or determined, such as by look-up, rule, computation, inference, and/or machine learning. During summary presentation, one or more portions of the source text may be re-summarized, such as to adjust a level of detail. A presentation rate may be user-controllable. Where new and/or changed contextual information affects an available time to review a remaining portion of the summary, the summary presentation may be automatically adjusted, and/or one or more portions of the source text may be re-summarized based on a revised context-based allotted time. | 08-02-2012 |
20120197631 | System for Identifying Textual Relationships - A computer-implemented method identifies textual statement relationships. Textual statement pairs including a first and second textual statement are identified, and parsed word group pairs are extracted from first and second textual statements. The parsed word groups are compared, and a parsed word score for each statement pair is calculated. Word vectors for the first and second textual statements are created and compared. A word vector score is calculated based on the comparison of the word vectors for the first and second textual statements. A match score is determined for the textual statement pair, with the match score being representative of at least one of the parsed word score and the word vector score. | 08-02-2012 |
20120197632 | SYSTEM AND METHOD FOR THE TRANSFORMATION AND CANONICALIZATION OF SEMANTICALLY STRUCTURED DATA - A method of transforming and canonicalizing semantically structured data includes obtaining data from a network of computers, applying text patterns to the obtained data and placing the data in a first data file, providing a second data file containing the obtained data in a uniform format, and generating interface specific sentences from the data in the second data file. | 08-02-2012 |
20120203543 | Method for Analyzing Message Archives and Corresponding Computer Program - A method for analysing a large number of messages, wherein the number of messages is reduced based on pattern recognition and pattern simplification, rules for the pattern recognition and pattern simplification are based on a regular grammatical structure, and patterns are sought in the remaining messages, or directly, i.e., without previous simplification. Syntactic pattern recognition is used for each type of pattern search, and a finite machine is derivable using the regular grammatical structure underlying each pattern recognition by transforming the mapping rules into transfer function, such that structural connections between the messages can be displayed graphically. | 08-09-2012 |
20120203544 | CORRECTING TYPING MISTAKES BASED ON PROBABILITIES OF INTENDED CONTACT FOR NON-CONTACTED KEYS - Systems and methods for identifying word candidates based on a sequence of contact events within one or more keys on a keyboard. In some examples, the system identifies a probability of intended contact for keys adjacent to a contacted key, and returns the identified probabilities to a typing correction system that identifies likely word candidates that correspond to text input sequences. | 08-09-2012 |
20120203545 | SYSTEM AND METHOD FOR COMPUTERIZED PSYCHOLOGICAL CONTENT ANALYSIS OF COMPUTER AND MEDIA GENERATED COMMUNICATIONS TO PRODUCE COMMUNICATIONS MANAGEMENT SUPPORT, INDICATIONS AND WARNINGS OF DANGEROUS BEHAVIOR, ASSESSMENT OF MEDIA IMAGES, AND PERSONNEL SELECTION SUPPORT - At least one computer-mediated communication produced by or received by an author is collected and parsed to identify categories of information within it. The categories of information are processed with at least one analysis to quantify at least one type of information in each category. A first output communication is generated regarding the at least one computer-mediated communication, describing the psychological state, attitudes or characteristics of the author of the communication. A second output communication is generated when a difference between the quantification of at least one type of information for at least one category and a reference for the at least one category is detected involving a psychological state, attitude or characteristic of the author to which a responsive action should be taken. | 08-09-2012 |
20120209592 | STATISTICAL STEMMING - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating suffix rewriting rules. A method includes obtaining a plurality of canonical suffix-rewriting rules each associated with one or more words, generating a suffix tree from the words, selecting a minimum colored subset of the nodes and leaves in the suffix tree, and generating a plurality of final suffix-rewriting rules from the nodes in the minimum colored subset. Another method includes receiving applicable and non-applicable words for a suffix-rewriting rule, generating a suffix tree from the applicable words and the non-applicable words, selecting a minimum colored subset of the nodes and leaves in the suffix tree, and generating a plurality of suffix-rewriting rules, wherein each rule corresponds to a node in the minimum colored subset with a valid status. | 08-16-2012 |
20120209593 | HANDHELD ELECTRONIC DEVICE WITH REDUCED KEYBOARD AND ASSOCIATED METHOD OF PROVIDING QUICK TEXT ENTRY IN A MESSAGE - An improved handheld electronic device having a reduced keyboard provides facilitated language entry by making available to a user certain words that a user may reasonably be expected to enter. In some situations, certain words can be stored, for example, in a temporary dictionary for use in particular situations. For instance, the names of the recipients of an electronic message might be stored in a temporary dictionary for rapid retrieval when entering a salutation in the message. As another example, a number of the words in an existing electronic message may be stored in a temporary dictionary and made available to a user when replying to or forwarding the message since the existing message might include words that the user might reasonably be expected to type in the reply message or the forwarded message. | 08-16-2012 |
20120209594 | METHOD AND AN APPARATUS TO DISAMBIGUATE REQUESTS - A method and an apparatus to disambiguate requests are presented. In one embodiment, the method includes receiving a request for information from a user. Then data is retrieved from a back-end database in response to the request. Based on a predetermined configuration of a disambiguation system and the data retrieved, the ambiguity within the request is dynamically resolved. | 08-16-2012 |
20120215522 | HANDHELD ELECTRONIC DEVICE PROVIDING A LEARNING FUNCTION TO FACILITATE CORRECTION OF ERRONEOUS TEXT ENTRY, AND ASSOCIATED METHOD - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate text input. In addition to identifying and outputting representations of language objects that are stored in the memory and that correspond with a text input, the device provides a learning function which facilitates providing proposed corrected output by the device in certain circumstances of erroneous input. | 08-23-2012 |
20120215523 | TIME-SERIES ANALYSIS OF KEYWORDS - Processing for a time-series analysis of keywords comprises clustering or classifying pieces of document data, each of which is description of a phenomenon in a natural language, on the basis of frequencies of occurrence of keywords in the pieces of document data, individual keywords being also clustered or classified by clustering or classifying the pieces of document data, and performing a time-series analysis of frequencies of occurrence of pieces of document data containing individual keywords in clusters or classes into which the pieces of document data are clustered or classified or a time-series analysis of frequencies of occurrence of pieces of document data containing clusters or classes into which the individual keywords are clustered or classified. Frequency distribution showing variation of the frequencies of occurrence of the pieces of document data is acquired by the time-series analysis. | 08-23-2012 |
20120221324 | Document Processing Apparatus - In a text document processing apparatus, there is provided standard knowledge network data composed of networked phrases having strong mutual relation to each other, the phrases being selected from a knowledge field including contents of a text document to be examined. In addition, there is provided a document knowledge preparing function that prepares knowledge network data of the document to be examined, the knowledge network data being composed of networked phrases having strong mutual relation to each other, the phrases being selected from the text document. Further, a processing unit that checks a specified word constituting the knowledge network data of the document to be examined and a standard knowledge network data, and in a case when information of phrases which are networked to the specified word are different from each other, outputs difference information including information of the specified word. | 08-30-2012 |
20120226492 | INFORMATION PROCESSING APPARATUS, NATURAL LANGUAGE ANALYSIS METHOD, PROGRAM AND RECORDING MEDIUM - An apparatus and method for calculating a score of matching a sentence with a query pattern having a dependency structure. The apparatus includes: an input unit acquiring an analysis target sentence, a query pattern and an index value indexing how a linguistic unit in the sentence tends to modify another; and a score calculation unit calculating a matching score indexing the degree of matching of the sentence with the query pattern. The matching score is represented by a function having an index value with which a dependency relation included in the query pattern is associated. The score is calculated by attempting association between a substructure of the query pattern and a range in the sentence and by performing recursive calculation in the substructure and the range while storing partial calculation result of the function in a memory area for reuse. | 09-06-2012 |
20120226493 | System and Methods for Using Short-Hand Interpretation Dictionaries in Collaboration Environments - A method for creating and using a short-hand interpretation dictionary in a collaboration environment includes creating or editing a document in a collaboration environment, said document comprising at least one short-hand notation; and replacing the at least one short-hand notation with an interpretation from at least one short-hand dictionary. | 09-06-2012 |
20120232885 | SYSTEM AND METHOD FOR BUILDING DIVERSE LANGUAGE MODELS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model. | 09-13-2012 |
20120232886 | COMPUTER NETWORK, COMPUTER-IMPLEMENTED METHOD, COMPUTER PROGRAM PRODUCT, CLIENT, AND SERVER FOR NATURAL LANGUAGE-BASED CONTROL OF A DIGITAL NETWORK - The present application relates to a computer network, a computer-implemented method, a computer program product, a client, and a server for natural language-based control of a digital network. In one aspect, the computer network for natural language-based control of a digital network may comprise: a digital network operable to provide sharing of access to a network between a plurality of devices connected in the digital network; a client installed in the digital network and operable to provide a unified natural language interface to a user to control the digital network using natural language; a server connected to the client over the network and operable to process a user request of the user performed through the unified natural language interface; and one or more software agents operable to execute at least one action on at least one of the plurality of devices based on the processed user request. | 09-13-2012 |
20120232887 | INFORMATION EXTRACTION ACROSS MULTIPLE EXPERTISE-SPECIFIC SUBJECT AREAS - Techniques are disclosed for bridging terminology differences between at least two subject areas. By way of example, a computer-implemented method includes executing the following steps on a computer. A first affinity measure is computed between a first term in a first corpus, corresponding to a first subject area, and a bridge term. A second affinity measure is computed between a second term in a second corpus, corresponding to a second subject area, and the bridge term. A third affinity measure is computed between the first term and the second term based on the first affinity measure and the second affinity measure. The bridge term is a term that appears in both the first corpus and the second corpus. | 09-13-2012 |
20120239380 | Classification-Based Redaction in Natural Language Text - When redacting natural language text, a classifier is used to provide a sensitive concept model according to features in natural language text and in which the various classes employed are sensitive concepts reflected in the natural language text. Similarly, the classifier is used to provide an utility concepts model based on utility concepts. Based on these models, and for one or more identified sensitive concept and identified utility concept, at least one feature in the natural language text is identified that implicates the at least one identified sensitive topic more than the at least one identified utility concept. At least some of the features thus identified may be perturbed such that the modified natural language text may be provided as at least one redacted document. In this manner, features are perturbed to maximize classification error for sensitive concepts while simultaneously minimizing classification error in the utility concepts. | 09-20-2012 |
20120239381 | SEMANTIC PHRASE SUGGESTION ENGINE - A semantic phrase suggestion engine that provides term and sentence suggestions based on context-specific user groups. Knowledge domains within a semantic network may be automatically derived from user software applications, and each term within the knowledge domain includes meta-data about the terms, e.g., term type and an importance indicator. The indicators may be defined within the context of specific user groups and relate to how many times that group has used the term (e.g., in documents, emails, etc.) The semantic phrase suggestion engine may also include spelling conditions and grammar conditions, which can then provide phrase suggestions according to the conditions and importance indicators, specific to a user group. | 09-20-2012 |
20120239382 | RECOMMENDATION METHOD AND RECOMMENDER COMPUTER SYSTEM USING DYNAMIC LANGUAGE MODEL - A recommendation method and a recommender computer system using dynamic language model are provided. The recommender computer system using dynamic language model includes a language model constructing computer module, a language model adapting computer module, a sentence selecting computer module and a sentence recommendation computer module. The language model constructing computer module is used for constructing a language model. The language model adapting computer module is used for dynamically emerging different language models to construct a dynamic language model. The sentence selecting computer module generates a plurality of recommended sentences from a database according to a search keyword. The sentence recommendation computer module analyzes the difference level between the recommended sentences and the dynamic language model and sorts recommended sentences to provide a recommendation list. | 09-20-2012 |
20120239383 | SYSTEM AND METHOD OF SPOKEN LANGUAGE UNDERSTANDING IN HUMAN COMPUTER DIALOGS - A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self-contained clauses, identifying a dialog act in each of the self-contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify. | 09-20-2012 |
20120245923 | CORPUS-BASED SYSTEM AND METHOD FOR ACQUIRING POLAR ADJECTIVES - A system, method, and computer program product for generating a polar vocabulary are provided. The method includes extracting textual content from each review in a corpus of reviews. Each of the reviews includes an author's rating, e.g., of a specific product or service to which the textual content relates. A set of frequent nouns is identified from the textual content of the reviews. Adjectival terms are extracted from the textual content of the reviews. Each adjectival term is associated in the textual content with one of the frequent nouns. A polar vocabulary including at least some of the extracted adjectival terms is generated. A polarity measure is associated with each adjectival term in the vocabulary which is based on the ratings of those reviews from which the adjectival term was extracted. | 09-27-2012 |
20120245924 | CUSTOMER REVIEW AUTHORING ASSISTANT - An authoring assistant includes a parser which automatically identifies opinion expressions in input text. The text may include an author's review of an item, such as a product or service. A computer-implemented opinion review component generates an analysis of the text, which is based on the identified opinion expressions. The opinion review component computes an effective opinion of the text as a function of a measure of polarity associated with the identified opinion expressions. A representation generator generates a representation of the analysis for display on an associated user interface. The representation of the analysis includes a representation of the effective opinion. In the case of a review, the authoring assistant may allow the author to modify the review to reduce incoherence with a rating of the item. | 09-27-2012 |
20120245925 | METHODS AND DEVICES FOR ANALYZING TEXT - A method, operating model, system, method, computer program, application, online service, or application program interface (API) Application Program Interface (API), and computer program product for analyzing any email message or text, online post, online web pages, social media sites, and online news sites to detect predefined and actionable events and intent. A method for detecting important emails or messages, and actionable emails or messages that signify intent including questions or promises. A method for detecting past or possible future events in any online posts where the event is defined a priori. | 09-27-2012 |
20120245926 | METHODS AND APPARATUS FOR FORMATTING TEXT FOR CLINICAL FACT EXTRACTION - An original text that is a representation of a narration of a patient encounter provided by a clinician may be received and re-formatted to produce a formatted text. One or more clinical facts may be extracted from the formatted text. A first fact of the clinical facts may be extracted from a first portion of the formatted text, and the first portion of the formatted text may be a formatted version of a first portion of the original text. A linkage may be maintained between the first fact and the first portion of the original text. | 09-27-2012 |
20120253788 | Augmented Conversational Understanding Agent - An augmented conversational understanding agent may be provided. Upon receiving, by an agent, at least one natural language phrase from a user, a context associated with the at least one natural language phrase may be identified. The natural language phrase may be associated, for example, with a conversation between the user and a second user. An agent action associated with the identified context may be performed according to the at least one natural language phrase and a result associated with performing the action may be displayed. | 10-04-2012 |
20120253789 | Conversational Dialog Learning and Correction - Conversational dialog learning and correction may be provided. Upon receiving a natural language phrase from a first user, at least one second user associated with the natural language phrase may be identified. A context state may be created according to the first user and the at least one second user. The natural language phrase may then be translated into an agent action according to the context state. | 10-04-2012 |
20120253790 | Personalization of Queries, Conversations, and Searches - Personalization of user interactions may be provided. Upon receiving a phrase from a user, a plurality of semantic concepts associated with the user may be loaded. If the phrase is determined to comprise at least one of the plurality of semantic concepts associated with the user, a first action may be performed according to the phrase. If the phrase is determined not to comprise at least one of the plurality of semantic concepts associated with the user, a second action may be performed according to the phrase. | 10-04-2012 |
20120253791 | Task Driven User Intents - Identification of user intents may be provided. A plurality of network applications may be identified, and an ontology associated with each of the plurality of applications may be defined. If a phrase received from a user is associated with at least one of the defined ontologies, an action associated with the network application may be executed. | 10-04-2012 |
20120253792 | Sentiment Classification Based on Supervised Latent N-Gram Analysis - A method for sentiment classification of a text document using high-order n-grams utilizes a multilevel embedding strategy to project n-grams into a low-dimensional latent semantic space where the projection parameters are trained in a supervised fashion together with the sentiment classification task. Using, for example, a deep convolutional neural network, the semantic embedding of n-grams, the bag-of-occurrence representation of text from n-grams, and the classification function from each review to the sentiment class are learned jointly in one unified discriminative framework. | 10-04-2012 |
20120253793 | System for natural language understanding - A general-purpose apparatus for analyzing natural language text that allows for the implementation of a broad range of natural language understanding applications. The apparatus for natural language understanding analyzes a source text and transforms the source text into a semantically-interpretable syntactic representation (SISR), comprising a syntax template and semantic clause annotations. The general-purpose apparatus for natural language understanding is adaptable to various source text natural languages and is adaptable to various natural language understanding applications, such as query answering, translation, summarization, information extraction, disambiguation, and parsing. A natural language query answering apparatus for answering questions about a source text, whereby the query answering apparatus utilizes the general-purpose apparatus for transforming the natural language query into SISR format. | 10-04-2012 |
20120259615 | TEXT PREDICTION - One or more techniques and/or systems are provided for suggesting a word and/or phrase to a user based at least upon a prefix of one or more characters that the user has inputted. Words in a database are respectively assigned a unique identifier. Generally, the unique identifiers are assigned sequentially and contiguously, beginning with a first word alphabetically and ending with a last word alphabetically. When a user inputted prefix is received, a range of unique identifiers corresponding to words respectively having a prefix that matches the user inputted prefix are identified. Typically, the range of unique identifiers corresponds to substantially all of the words that begin with the given prefix and does not correspond to words that do not begin with the given prefix. The unique identifiers may then be compared to a probability database to identify which words have a higher probability of being selected by the user. | 10-11-2012 |
20120259616 | SYSTEMS, METHODS AND DEVICES FOR GENERATING AN ADJECTIVE SENTIMENT DICTIONARY FOR SOCIAL MEDIA SENTIMENT ANALYSIS - Embodiments generally relate to systems and methods for generating a sentiment dictionary and calculating sentiment scores of adjectives within the sentiment dictionary. A set of seed words can be identified and expanded using synonyms and antonyms of the set of seed words. Social media data can be parse to identify adjectives that link to the set of seed words with the words “and” or “but.” Matrices representing the attraction and repulsion among the linked adjectives can be generated. A factorization algorithm can be minimized to determine an output matrix that comprises positive and negative sentiment scores for each of the adjectives. In embodiments, a sentiment score for part of all of the social media data can be calculated using the output matrix, and one or more parts of the social media data can be classified as a positive or negative sentiment. | 10-11-2012 |
20120259617 | SYSTEM AND METHOD FOR SLANG SENTIMENT CLASSIFICATION FOR OPINION MINING - The present disclosure describes a method of sentiment oriented slang for opinion mining. With increasing use of internet, many users can submit their review comments directly to the companies which can be automatically processed and summarized with critical issues from time to time and help the company get real time feedback from its customers. The method comprises, receiving at least one document comprising a plurality of sentiment oriented slang. The next step of the method comprises identifying the plurality of sentiment oriented slang in the at least one document. Further, a polarity score of each of a slang word identified is determined and sentiment information is displayed on an output device as an output. | 10-11-2012 |
20120259618 | COMPUTING DEVICE AND METHOD FOR COMPARING TEXT DATA - A method for comparing text data reads two patent documents comprising varying text sections. The method compares characters of a first text section in a first patent document with a corresponding second text section in a second patent document, and acquires a same sub-character string that has a maximum matching length and matching positions of the first and second text sections. The method marks characters before the matching positions of the first and second text sections as different characters. The method displays a comparison result list of the comparison between the first patent document and the second patent document on a display device. | 10-11-2012 |
20120259619 | SHORT MESSAGE AGE CLASSIFICATION - Systems and methods for short message age classification in accordance with embodiments of the invention are disclosed. In one embodiment of the invention, classifying messages using a classifier includes determining keyword feature information for a message using the classifier, classifying the determined feature information using the classifier, and estimating user age using the classifier. | 10-11-2012 |
20120259620 | MESSAGE OPTIMIZATION - The present invention provides a system and method for optimizing a message. Components of a starting message are identified, and at least one rule is applied for modifying at least one message component to create at least one variation of the starting message. Message variants are tested by sending each variant to a sample of people and measuring a response rate for each sent message variant. The measured response rates are used to create an optimal version of the message. In one embodiment, message variants may be created and tested in multiple rounds. | 10-11-2012 |
20120259621 | Translating Texts Between Languages - Methods and computer systems for translating sentences between languages from an intermediate language-independent semantic representation are provided. Based on a comprehensive understanding about languages and semantics, exhaustive linguistic descriptions are used to analyze sentences, build syntactic structures and language independent semantic structures and representations, and synthesize one or more sentences in a natural or artificial language. A computer system is also provided to analyze and synthesize various linguistic structures and perform translation of a wide spectrum of various sentence types. As result, a generalized data structure, such as a semantic structure, is generated from a sentence of an input language and can be transformed into a natural sentence expressing its meaning correctly in an output language. The methods and systems can be applied to automated abstracting, machine translation, natural language processing, control systems, Internet information retrieval, etc. | 10-11-2012 |
20120265519 | SYSTEM AND METHOD FOR OBJECT DETECTION - A system and method for object detection is provided, which system and method combines parsing and classification technologies for extracting objects, e.g., events, entities or the like, from text. In exemplary embodiment, the output of a parsing technique is transformed into a model suitable as input for classification in order to provide event or entity detection results. | 10-18-2012 |
20120265520 | TEXT PROCESSOR AND METHOD OF TEXT PROCESSING - A text processor and a method of text processing comprises obtaining a plurality of word groups each comprising a sequence of words from a text, determining a frequency of occurrence of each of the word groups within a text corpus, by interrogating a database including the frequency information, and indicating word groups that have a frequency of occurrence that is below a threshold value. | 10-18-2012 |
20120265521 | Methods and systems relating to information extraction - The invention relates to information extraction systems having discriminative models which utilize hierarchical cluster trees and active learning to enhance training. | 10-18-2012 |
20120271624 | PROCESSING GEOGRAPHICAL LOCATION DATA IN A DOCUMENT - Techniques for processing geographical location data in a document comprise: obtaining geographical location data in the document; grading the geographical location data according to a predetermined condition to determine an associated relationship between the geographical location data; marking on an electronic map the associated relationship between the geographical location data; and presenting the marked electronic map. | 10-25-2012 |
20120271625 | MULTIMODAL NATURAL LANGUAGE QUERY SYSTEM FOR PROCESSING AND ANALYZING VOICE AND PROXIMITY BASED QUERIES - The present disclosure provides a natural language query system and method for processing and analyzing multimodally-originated queries, including voice and proximity-based queries. The natural language query system includes a Web-enabled device including a speech input module for receiving a voice-based query in natural language form from a user and a location/proximity module for receiving location/proximity information from a location/proximity device. The query system also includes a speech conversion module for converting the voice-based query in natural language form to text in natural language form and a natural language processing module for converting the text in natural language form to text in searchable form. The query system further includes a semantic engine module for converting the text in searchable form to a formal database query and a database-look-up module for using the formal database query to obtain a result related to the voice-based query in natural language form from a database. | 10-25-2012 |
20120271626 | APPARATUS AND METHOD FOR LINGUISTIC SCORING - In embodiments of the invention, a system receives selections from a user based on a list of pre-defined monitoring categories and/or optionally receives custom category definitions from the user. The option for custom category definitions may be advantageous due to the flexibility provided to a system administrator or other user. In embodiments of the invention, the pre-defined and/or custom monitoring categories may be or include complex hierarchical behavior. Such an approach provides monitoring algorithms that can achieve improved accuracy compared to known methods. In embodiments of the invention, the order of computations used in resolving a monitoring category may be re-ordered, statically and/or dynamically, to improve the efficiency of monitoring operations. | 10-25-2012 |
20120271627 | CROSS-LANGUAGE TEXT CLASSIFICATION - Methods are described for performing classification (categorization) of text documents written in various languages. Language-independent semantic structures are constructed before classifying documents. These structures reflect lexical, morphological, syntactic, and semantic properties of documents. The methods suggested are able to perform cross-language text classification which is based on document properties reflecting their meaning. The methods are applicable to genre classification, topic detection, news analysis, authorship analysis, etc. | 10-25-2012 |
20120271628 | METHOD OF USING VISUAL SEPARATORS TO INDICATE ADDITIONAL CHARACTER COMBINATIONS ON A HANDHELD ELECTRONIC DEVICE AND ASSOCIATED APPARATUS - A method and associated apparatus for using visual separators to indicate additional character combination choices from a disambiguation function on a handheld electronic device. | 10-25-2012 |
20120278064 | SYSTEM AND METHOD FOR DETERMINING SENTIMENT FROM TEXT CONTENT - A system and method for determining sentiment from user-generated text content is provided. A sentiment score is determined for one or more terms in a user-generated text content. A sentiment value is determined for the text content that is based at least in part on the sentiment score for the one or more terms. | 11-01-2012 |
20120278065 | GENERATING SNIPPET FOR REVIEW ON THE INTERNET - A method and system for generating snippet for review on the Internet. The method includes the steps of: receiving a review and a set of feedbacks corresponding to the review, where the review includes a plurality of evaluating sentences that evaluates product features of a product; calculating support degrees of each of the plurality of evaluating sentences by using the set of feedbacks; extracting, by relying on calculated support degrees of each of the evaluating sentences, at least one of the evaluating sentences from the plurality of evaluating sentences; and designating extracted evaluating sentence as a snippet of the review; where at least one of the steps is carried out by using a computer device. | 11-01-2012 |
20120278066 | COMMUNICATION INTERFACE APPARATUS AND METHOD FOR MULTI-USER AND SYSTEM - A communication interface apparatus for a system and a plurality of users is provided. The communication interface apparatus for the system and the plurality of users includes a first process unit configured to receive voice information and face information from at least one user, and determine whether the received voice information is voice information of at least one registered user based on user models corresponding to the respective received voice information and face information; a second process unit configured to receive the face information, and determine whether the at least one user's attention is on the system based on the received face information; and a third process unit configured to receive the voice information, analyze the received voice information, and determine whether the received voice information is substantially meaningful to the system based on a dialog model that represents conversation flow on a situation basis. | 11-01-2012 |
20120284016 | TEXT MINING METHOD, TEXT MINING DEVICE AND TEXT MINING PROGRAM - Disclosed are a text mining method, device, and program capable of performing text mining with a specific topic as an object with high precision. An element identification unit calculates a feature degree, which is an index for indicating a degree that within a text set of interest, which is a set of text that is to be analyzed, an element of the text appears. An output unit identifies distinctive elements within the text set of interest on the basis of the calculated feature degree and outputs the identified elements. The element identification unit corrects the feature degree on the basis of a topic relatedness degree, which is a value indicating a degree related to a topic of analysis, which is a topic for which each text portion of the text being analyzed has been partitioned into predetermined units that are to be analyzed. | 11-08-2012 |
20120284017 | Systems, Methods, and Programs for Detecting Unauthorized Use of Text Based Communications - Systems, methods, and programs for generating an authorized profile for a text communication device or account, may sample a text communication generated by the text communication device or account during communication and may store the text sample. The systems, methods, and programs may extract a language pattern from the stored text sample and may create an authorized profile based on the language pattern. Systems, methods, and programs for detecting unauthorized use of a text communication device or account may sample a text communication generated by the device or account during communication, may extract a language pattern from the audio sample, and may compare extracted language pattern of the sample with an authorized user profile. | 11-08-2012 |
20120284018 | HANDHELD ELECTRONIC DEVICE AND METHOD EMPLOYING LOGICAL PROXIMITY OF CHARACTERS IN SPELL CHECKING - An improved handheld electronic device and associated method employing an improved spell checking routine enable proposed spelling corrections having a close logical proximity to an active input to be output at a position of preference for easy selection by the user. By way of example, a base character and the various accented forms thereof can be said to have a logical proximity to one another that is closer than their logical proximity to any character having a different base character, whether additionally having a diacritical element or not. | 11-08-2012 |
20120284019 | HANDHELD ELECTRONIC DEVICE WITH REDUCED KEYBOARD AND ASSOCIATED METHOD OF PROVIDING IMPROVED DISAMBIGUATION WITH REDUCED DEGRADATION OF DEVICE PERFORMANCE - In view of the foregoing, an improved handheld electronic device having a reduced keyboard provides facilitated language entry by making available to a user certain words that a user may reasonably be expected to enter. Incoming data, such as the text of a message, can be scanned for proper nouns, for instance, since such proper nouns might not already be stored in memory and might be expected to be entered by the user when, for example, forwarding or responding to the message. A proper noun can be identified, for instance, on the basis that it begins with an upper case letter. The proper nouns can be stored, for example, in memory that may, by way of further example, be a temporary dictionary. | 11-08-2012 |
20120290288 | Parsing of text using linguistic and non-linguistic list properties - A system and method are disclosed for extracting information from text which can be performed without prior knowledge as to whether the text includes a list. The method applies parser rules to a sentence spanning lines of text to identify a set of candidate list items in the sentence. Each candidate list item is assigned a set of features including one or more non-linguistic feature and a linguistic feature. The linguistic feature defines a syntactic function of an element of the candidate list item that is able to be in a dependency relation with an element of an identified candidate list introducer in the same sentence. When two or more candidate list items are found with compatible sets of features, a list is generated which links these as list items of a common list introducer. Dependency relations are extracted between the list introducer and list items and information based on the extracted dependency relations is output. | 11-15-2012 |
20120290289 | METHOD AND APPARATUS FOR SUMMARIZING COMMUNICATIONS - A method, apparatus and computer program are provided for summarizing one or more communications. The method, apparatus and computer program process and/or facilitate a processing of one or more communications to generate at least one summary. The method, apparatus and computer program further cause, at least in part, a transformation of the at least one summary based, at least in part, on at least one narrative viewpoint. The method, apparatus and computer program further cause, at least in part, a presentation of the transformation. | 11-15-2012 |
20120290290 | Sentence Simplification for Spoken Language Understanding - Sentence simplification may be provided. A spoken phrase may be received and converted to a text phrase. An intent associated with the text phrase may be identified. The text phrase may then be reformatted according to the identified intent and a task may be performed according to the reformatted text phrase. | 11-15-2012 |
20120290291 | INPUT PROCESSING FOR CHARACTER MATCHING AND PREDICTED WORD MATCHING - A mobile computing device that operates a method that processes handwritten user input for character matching and predictive word matching. A user inputs handwritten input on a touch-sensitive display using, for example, a stylus. The method determines and displays a set of candidate character matches for the handwritten input. The user then selects a character from the candidate character matches. The method determines and displays a set of candidate predicted word matches based on the user selected character match. The user can then select to input a desired candidate predicted word match. | 11-15-2012 |
20120290292 | UNSTRUCTURED DATA SUPPORT WITH AUTOMATIC RULE GENERATION - A system to process unstructured data is provided. An example system to process unstructured data comprises a receiver to access a source of unstructured data, an entity type module to determine an entity type, a rules generator to automatically generate a linguistic rule based on the determined entity type, and an entity extractor to obtain an entity from the source of unstructured data, using the linguistic rule. The entity comprises an alpha-numeric string. | 11-15-2012 |
20120290293 | Exploiting Query Click Logs for Domain Detection in Spoken Language Understanding - Domain detection training in a spoken language understanding system may be provided. Log data associated with a search engine, each associated with a search query, may be received. A domain label for each search query may be identified and the domain label and link data may be provided to a training set for a spoken language understanding model. | 11-15-2012 |
20120296634 | SYSTEMS AND METHODS FOR CATEGORIZING AND MODERATING USER-GENERATED CONTENT IN AN ONLINE ENVIRONMENT - Exemplary embodiments provide systems, devices and methods for computer-based categorization and moderation of user-generated content for publication of the content in an online environment. Exemplary embodiments automatically determine a probability value indicating that the user-generated content is either a positive example or a negative example of one or more unsuitable categories. If the user-generated content is determined to be a positive example of any of the unsuitable categories to a predefined degree of certainty, exemplary embodiments may automatically exclude the content from publication in the online environment. | 11-22-2012 |
20120296635 | USER-MODIFIABLE WORD LATTICE DISPLAY FOR EDITING DOCUMENTS AND SEARCH QUERIES - An “Interactive Word Lattice” provides a user interface for interacting with and selecting user-modifiable paths through a lattice-based representation of alternative suggested text segments in response to a user's text segment input, such as phrases, sentences, paragraphs, entire documents, etc. More specifically, the user input is provided to a trained paraphrase generation model that returns a plurality of alternative text segments having the same or similar meaning as the original user input. An interactive graphical lattice-based representation of the alternative text segments is then presented to the user. One or more words of each alternative text segment represents a “node” of the lattice, while each connection between nodes represents a lattice “edge. Both nodes and edges are user modifiable. Each possible path through the lattice corresponds to a different alternative text segment. Users select a path through the lattice to select an alternative text to the original input. | 11-22-2012 |
20120296636 | TAXONOMY AND APPLICATION OF LANGUAGE ANALYSIS AND PROCESSING - Words can be identified in text. Membership numerical values for the words can be determined in categories, or in communication types generated using those categories. The membership numerical values for the words can then be used to generate a signature. The signature can then be used to identify documents with a similar attitude. | 11-22-2012 |
20120296637 | METHOD AND APPARATUS FOR CALCULATING TOPICAL CATEGORIZATION OF ELECTRONIC DOCUMENTS IN A COLLECTION - A computer implemented method calculates topical categorization of electronic documents in a collection. A processor applies a metric to categorize semantic distance between two sections of a document or between two documents. The processor executes a topic algorithm using the categorization provided by the metric to determine topic boundaries. Topics are extracted based upon the topic boundaries; and the extracted topics are compared for similarity with topics in other documents for organizational and research purposes. | 11-22-2012 |
20120296638 | METHOD AND SYSTEM FOR QUICKLY RECOGNIZING AND RESPONDING TO USER INTENTS AND QUESTIONS FROM NATURAL LANGUAGE INPUT USING INTELLIGENT HIERARCHICAL PROCESSING AND PERSONALIZED ADAPTIVE SEMANTIC INTERFACE - In embodiments of the present invention, capabilities are described for understanding and responding to the user intent and questions quickly wherein the understanding is based on supervised system learning, Intelligent layered semantic and syntactic information processing and personalized adaptive semantic interface. Supervised system learning creates reference pattern set for the intent repository and possible question categories. Each layer in the layered processing increases the probability of the intent/question recognition. Personalized adaptive voice interface learns from user's interactions over time by enriching the pattern sets and personal index for successfully resolved user intents and questions. Collectively, all these technologies improve the response time for correctly recognizing and responding to user's intents and questions. | 11-22-2012 |
20120296639 | Verification of Extracted Data - Facts are extracted from speech and recorded in a document using codings. Each coding represents an extracted fact and includes a code and a datum. The code may represent a type of the extracted fact and the datum may represent a value of the extracted fact. The datum in a coding is rendered based on a specified feature of the coding. For example, the datum may be rendered as boldface text to indicate that the coding has been designated as an “allergy.” In this way, the specified feature of the coding (e.g., “allergy”-ness) is used to modify the manner in which the datum is rendered. A user inspects the rendering and provides, based on the rendering, an indication of whether the coding was accurately designated as having the specified feature. A record of the user's indication may be stored, such as within the coding itself. | 11-22-2012 |
20120303355 | Method and System for Text Message Normalization Based on Character Transformation and Web Data - A method for generating non-standard tokens that correspond to standard tokens used in speech synthesis systems has been developed. The method includes selecting a standard token from a plurality of standard tokens stored in memory, using a random field model to select a predetermined operation to perform on each character in the selected token, performing the selected operation on each character to generate an output token, and storing the output token in the memory in association with the selected token. The output token is different from each token in the plurality of standard tokens. | 11-29-2012 |
20120303356 | AUTOMATED SELF-SERVICE USER SUPPORT BASED ON ONTOLOGY ANALYSIS - A method for providing information to a user in response to a received user query. A natural language analysis generates substrings relevant to the user query. An ontology analysis outputs: terms of an ontology matching the relevant generated substrings; and relationships between the terms. A query analysis analyzes the user query regarding the outputted terms and relationships, including ascertaining whether the user query is more suitable for service than for an information search. If it is so ascertained, then service actions for the user to perform are identified to the user. If it is not so ascertained, then: the user query is refined based on the outputted terms and relationships; a search query is generated based on the refined user query, a search is initiated based on the search query, and results of the search are provided to the user. | 11-29-2012 |
20120303357 | SELF-LEARNING METHODS FOR AUTOMATICALLY GENERATING A SUMMARY OF A DOCUMENT, KNOWLEDGE EXTRACTION AND CONTEXTUAL MAPPING - Advance Machine Learning or Unsupervised Machine Learning Techniques are provided that relate to Self-learning processes by which a machine generates a sensible automated summary, extracts knowledge, and extracts contextually related Topics along with the justification that explains “why they are related” automatically without any human intervention or guidance (backed ontology's) during the process. Such processes also relate to generating a 360-Degree Contextual Result (360-DCR) using Auto-summary, Knowledge Extraction and Contextual Mapping. | 11-29-2012 |
20120303358 | SEMANTIC TEXTUAL ANALYSIS - A method of comparing the semantic similarity of two different text phrases in which the grammatical structure of the two different text phrases is analysed and a keyword set for each of the different text phrases is derived The semantic similarity of the phrases can be determined in accordance with the grammatical structure of the two different text phrases and the contents of the two keyword sets. | 11-29-2012 |
20120310627 | DOCUMENT CLASSIFICATION WITH WEIGHTED SUPERVISED N-GRAM EMBEDDING - Methods and systems for document classification include embedding n-grams from an input text in a latent space, embedding the input text in the latent space based on the embedded n-grams and weighting said n-grams according to spatial evidence of the respective n-grams in the input text, classifying the document along one or more axes, and adjusting weights used to weight the n-grams based on the output of the classifying step. | 12-06-2012 |
20120310628 | METHOD AND SYSTEM FOR PROVIDING ACCESS TO INFORMATION OF POTENTIAL INTEREST TO A USER - The present invention provides a method and system for providing access to information of potential interest to a user. Closed-caption information is analyzed to find related information on the Internet. User interactions with a TV which receives programming including closed-caption information are monitored to determine user interests or topics. | 12-06-2012 |
20120310629 | SYSTEMS AND METHODS FOR AUTOMATICALLY DETERMINING CULTURE-BASED BEHAVIOR IN CUSTOMER SERVICE INTERACTIONS - Systems and methods are provided to automatically determine culture-based behavioral tendencies and preferences of individuals in the context of customer service interactions. For example, systems and methods are provided to process natural language dialog input of an individual to detect linguistic features indicative of individualistic and collectivistic behavioral tendencies and predict whether such individual will be cooperative or uncooperative with automated customer service. | 12-06-2012 |
20120310630 | TOKENIZATION PLATFORM - A tokenization platform and method is described for accurately tokenizing character strings, including but not limited to non-delimited character strings of the type commonly used in Internet domain names and computer filenames, to accurately identify words and phrases occurring therein. In one embodiment, a phased tokenization approach is used in which the final phase is a lexical analysis-based tokenization using a dictionary. The dictionary may be advantageously created and updated based upon one or more query logs associated with respective information retrieval systems, thereby ensuring that the dictionary accurately reflects currently-used terminology and captures alternative spellings and presentations of words and phrases submitted by users. | 12-06-2012 |
20120310631 | HANDHELD ELECTRONIC DEVICE AND METHOD FOR DISAMBIGUATION OF COMPOUND TEXT INPUT AND THAT EMPLOYS N-GRAM DATA TO LIMIT GENERATION OF LOW-PROBABILITY COMPOUND LANGUAGE SOLUTIONS - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to analyze the combinations of language objects in light of N-gram data stored on the device to avoid proposing low-probability compound language solutions. | 12-06-2012 |
20120310632 | COMPUTER SYSTEM WITH SECOND TRANSLAATOR FOR VEHICLE PARTS - Described are computer-based methods and apparatuses, including computer program products, for automation of auditing claims. Data indicative of an insurance company name is received, the data comprising one or more words. The data is processed through one or more processing steps to generate processed data comprising one or more processed words. One or more candidate word strings are selected based on the one or more processed words. Matching information is associated with each of the one or more candidate word strings. Analysis information is generated for each of the one or more candidate word strings based on the associated matching information. An insurance company identifier is associated with received data based on the analysis information and one or more matching rules. | 12-06-2012 |
20120310633 | FILTERING DEVICE AND FILTERING METHOD - A filtering device includes: a table storage unit that stores an allowed word table in which a plurality of morphemes and the number of appearances thereof are associated with each other; a program stream acquiring unit that acquires a program stream generated according to a broadcasting code of ethics; a table update unit that extracts caption data or program information, which is a first text data item related to the content of a program, from the program stream when the acquired program stream includes the caption data or the program information, divides the extracted caption data; a data acquiring unit that acquires an arbitrary second text data item; and a data processing unit that divides the second text data item into morphemes, replaces a divided morpheme with a predetermined symbol when the divided morpheme has not been registered in the allowed word table. | 12-06-2012 |
20120316864 | READING ORDER DETERMINATION APPARATUS, METHOD, AND PROGRAM FOR DETERMINING READING ORDER OF CHARACTERS - A method and apparatus for determining a reading order of characters The method includes preparing a list of character information, which is character information extracted from image data by character recognition processing and preparing a list of line information, which is made up of a line box surrounding a set of characters which are continuously aligned in the same direction in image data and an alignment direction of characters in the line box. In response to a request for adding character information to the list of character information, extracting a line box containing a character region of the character to be added, obtaining all character information having the character region contained in the concerned line box from the list of character information and rearranging according to the position with respect to the alignment direction of characters corresponding to the line box to determine a new reading order of characters. | 12-13-2012 |
20120316865 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM - An information processing apparatus performs topic analysis on one or more collected documents to calculate a probability indicating the degree of fitness of each sentence constituting the collected document for each item of a local topic, performs linguistic analysis on the collected document to detect a unique expression pattern in each item of the local topic, sets topic usefulness for each sentence constituting the collected document on the basis of evaluation of the sentence by an evaluator, sets a total evaluation value with respect to each item of the local topic on the basis of the topic analysis result and the topic usefulness, selects an item of the local topic on the basis of the total evaluation values, and extracts an appropriate sentence for a unique expression pattern in the selected item of the local topic from the collected document as a profound text candidate. | 12-13-2012 |
20120316866 | SYSTEM AND METHOD OF PROVIDING A SPOKEN DIALOG INTERFACE TO A WEBSITE - Disclosed is a system and method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes converting data from a structured database associated with a website to a structured text data set and a structured task knowledge base, extracting linguistic items from the structured database, and training a spoken dialog service component using at least one of the structured text data, the structured task knowledge base, or the linguistic items. The system includes modules configured to implement the method. | 12-13-2012 |
20120316867 | COMPUTER SYSTEM WITH SECOND TRANSLATOR FOR VEHICLE PARTS - Described are computer-based methods and apparatuses, including computer program products, for automation of auditing claims. Data indicative of an insurance company name is received, the data comprising one or more words. The data is processed through one or more processing steps to generate processed data comprising one or more processed words. One or more candidate word strings are selected based on the one or more processed words. Matching information is associated with each of the one or more candidate word strings. Analysis information is generated for each of the one or more candidate word strings based on the associated matching information. An insurance company identifier is associated with received data based on the analysis information and one or more matching rules. | 12-13-2012 |
20120323558 | METHOD AND APPARATUS FOR CREATING A PREDICTING MODEL - A method for creating a predictive model is disclosed herein, including the steps of determining trends and patterns in electronic data, using at least a first machine language algorithm, refining the determination of the algorithm, searching for social models that describe the identified trends and patterns using at least a second machine language algorithm, verifying causal links, constructing at least one model about human node behavior and interactions, utilizing the social models to do at least one of the following: validate hypotheses, predict future behavior, and examine hypothetical scenarios, and automatically updating predictions when new data is introduced. | 12-20-2012 |
20120323559 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM - An apparatus is provided for determining a lyric importance level, comprising a memory and a processor executing instructions stored in the memory. The processor executes instructions stored in the memory to acquire lyric information, the lyric information identifying: lyrics of a song; and lyric location information indicating locations of the lyrics within the song. The processor further executes instructions stored in the memory to acquire section information, the section information identifying: sections of the song; section importance levels corresponding to the sections; and section location information indicating locations of the sections within the song. The processor still further executes instructions stored in the memory to identify, based on the lyric location information and the section location information, one or more sections corresponding to a subset of the lyrics; and determine, based on the section importance levels, a lyric importance level of the subset. | 12-20-2012 |
20120323560 | METHOD FOR SYMBOLIC CORRECTION IN HUMAN-MACHINE INTERFACES - Disclosed embodiments include methods and systems for symbolic correction in human-machine interfaces that comprise (a) implementing a language model; (b) implementing a hypothesis model; (c) implementing an error model; and (d) processing a symbolic input message based on weighted finite-state transducers to encode 1) a set of input hypothesis using the hypothesis model, 2) the language model, and 3) the error model to perform correction on the sequential pre-segmented symbolic input message in the human-machine interface. According to a particular embodiment, the processing step comprises a combination of the language model, the hypothesis model, and the error model performed without parsing by employing a composition operation between the transducers and a lowest cost path search, exact or approximate, on the composed transducer. | 12-20-2012 |
20120323561 | HANDHELD ELECTRONIC DEVICE WITH TEXT DISAMBIGUATION - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device. | 12-20-2012 |
20120323562 | METHOD AND SYSTEM FOR CONVERTING IMAGE TEXT DOCUMENTS IN BIT-MAPPED FORMATS TO SEARCHABLE TEXT AND FOR SEARCHING THE SEARCHABLE TEXT - A system and method for searching optical character recognition results of image text documents includes an image text transformer that linguistically analyzes the optical character recognition results within a context of multiple lexicons to form edited text results and creates a reflection repository having reflection files therein corresponding to the image documents from the optical character recognition results. A search engine searches the reflection files and a user device displays a first reflection file from the reflection files or a first image document from the image documents in response to searching. The files are displayed on a display associated with a user device. | 12-20-2012 |
20120323563 | GENERATING SNIPPET FOR REVIEW ON THE INTERNET - A method and system for generating snippet for review on the Internet. The method includes the steps of: receiving a review and a set of feedbacks corresponding to the review, where the review includes a plurality of evaluating sentences that evaluates product features of a product; calculating support degrees of each of the plurality of evaluating sentences by using the set of feedbacks; extracting, by relying on calculated support degrees of each of the evaluating sentences, at least one of the evaluating sentences from the plurality of evaluating sentences; and designating extracted evaluating sentence as a snippet of the review; where at least one of the steps is carried out by using a computer device. | 12-20-2012 |
20120323564 | PROGRAM SEARCH DEVICE AND PROGRAM SEARCH METHOD - A program search device includes: a table storage unit that stores an allowed word table; a program stream acquiring unit that acquires a program stream generated according to a broadcasting code of ethics; a table update unit that extracts caption data or program information, which is a first text data item related to the content of a program; a program storage unit that stores a program included in the acquired program stream; a data acquiring unit that acquires a second text data item; a data processing unit that divides the second text data item into morphemes, replaces the divided morpheme with a predetermined symbol; an index giving unit that gives a set of the recombined third text data item; and a program extracting unit that extracts the program stored in the program storage unit. | 12-20-2012 |
20120330647 | HIERARCHICAL MODELS FOR LANGUAGE MODELING - The described implementations relate to natural language processing, and more particularly to training a language prior model using a model structure. The language prior model can be trained using parameterized representations of lexical structures such as training sentences, as well as parameterized representations of lexical units such as words or n-grams. During training, the parameterized representations of the lexical structures and the lexical units can be adjusted using the model structure. When the language prior model is trained, the parameterized representations of the lexical structures can reflect how the lexical units were used in the lexical structures. | 12-27-2012 |
20120330648 | CONTEXT-BASED DISAMBIGUATION OF ACRONYMS AND ABBREVIATIONS - Context-based disambiguation of acronyms and/or abbreviations may determine a target abbreviation and one or more keywords appearing in context with the target abbreviation in a received passage, the target abbreviation representing a shortened form of one or more word. A contextual search query including the target abbreviation and said one or more keywords may be generated. A pseudo document index may be searched for one or more expansions of the target abbreviation by invoking the contextual search query, the pseudo document index containing index of one or more pseudo documents, associated one or more abbreviations and associated context keywords. One or more pseudo documents associated with the target abbreviation may be returned based on the searching of the pseudo document index. | 12-27-2012 |
20120330649 | SYSTEMS AND METHODS FOR EXTRACTING PATTERNS FROM GRAPH AND UNSTRUCTURED DATA - A computing system receives input data having both graph and unstructured data and computes a current log likelihood of the input data. The computing system compares the current log likelihood with a previous log likelihood of the input data. If the current log likelihood is larger than the previous log likelihood, the computing system update topic modeling parameters, community modeling parameters, and the link generation parameter until the computing system obtains a maximal value of the log likelihood of the input data. Then, the computing system creates a graph indicating topic similarity between the input data based on the topic modeling parameters, creates another graph indicating community similarity between entities associated with the input data based on the community modeling parameters, and predicts a link existence between input data or entities based on the link generation parameter, the topic modeling parameter and the community modeling parameter. | 12-27-2012 |
20130006608 | Generating Complex Event Processing Rules - Techniques for generating complex event processing rules in a controlled natural language are provided. The techniques include obtaining one or more vocabularies that encompass a set of one or more noun and verb concepts, dynamically building an inheritance hierarchy of one or more named vocabulary concepts from the one or more vocabularies, parsing a controlled natural language input textual statement by using one or more names and the inheritance hierarchy to identify one or more temporal concepts and one or more complex event processing concepts, and converting the controlled natural language input textual statement to a complex event processing language statement by generating a representation of a lexical structure of the controlled natural language input textual statement that contains a reference to each identified temporal and complex event processing concept. | 01-03-2013 |
20130006609 | METHOD, SYSTEM AND PROGRAM STORAGE DEVICE FOR AUTOMATIC INCREMENTAL LEARNING OF PROGRAMMING LANGUAGE GRAMMAR - The embodiments provide for automatic incremental learning of programming language grammar. A corpus (i.e., a text file of software code written in a particular programming language) is parsed based on a set of grammar rules. An unparsed statement from the corpus is identified along with a section thereof, which did not match any of the grammar rules in the set. A subset of the set of grammar rules at fault for the parsing failure is identified. Groups of new grammar rules are developed such that each group comprises at least one new grammar rule, such that each group can parse the unparsed statement, and such that each new grammar rule is a modification of grammar rule(s) in the subset. One specific group can then be selected for possible incorporation into the set of grammar rules. Optionally, before a specific group is selected, the groups can be heuristically pruned and/or ranked. | 01-03-2013 |
20130006610 | SYSTEMS AND METHODS FOR PROCESSING DATA - A method for processing at least partially unstructured data is provided. The method includes receiving, at a data processing tool, at least partially unstructured data from at least one data source, and processing the at least partially unstructured data to generate at least partially structured data that includes tagged data, wherein processing the at least partially unstructured data includes at least one of processing the at least partially unstructured data using an associative memory application, and processing the at least partially unstructured data using a regular expression processing program. The method further includes transmitting the at least partially structured data to a main application, and incorporating the at least partially structured data into the main application based at least in part on the tagged data, wherein incorporating the at least partially structured data includes at least one of including and excluding data based on the existence, content and/or type of a tag. | 01-03-2013 |
20130006611 | METHOD AND SYSTEM FOR EXTRACTING SHADOW ENTITIES FROM EMAILS - One embodiment provides a system for extracting shadow entities from emails. During operation, the system receives a number of document corpora. The system then calculates word-collocation statistics associated with different n-gram sizes for the document corpora. Next, the system receives an email and identifies shadow entities in the email based on the calculated word-collocation statistics for the document corpora. | 01-03-2013 |
20130006612 | TRAINING ACOUSTIC MODELS - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models. Speech data and data identifying a transcription for the speech data are received. A phonetic representation for the transcription is accessed. Training sequences are identified for a particular phone in the phonetic representation. Each of the training sequences includes a different set of contextual phones surrounding the particular phone. A partitioning key is identified based on a sequence of phones that occurs in each of the training sequences. A processing module to which the identified partitioning key is assigned is selected. Data identifying the training sequences and a portion of the speech data are transmitted to the selected processing module. | 01-03-2013 |
20130006613 | AUTOMATIC CONTEXT SENSITIVE LANGUAGE CORRECTION USING AN INTERNET CORPUS PARTICULARLY FOR SMALL KEYBOARD DEVICES - A computer-assisted language correction system particularly suitable for use with small keyboard devices including spelling correction functionality, misused word correction functionality and grammar correction functionality utilizing contextual feature-sequence functionality employing an interne corpus. | 01-03-2013 |
20130006614 | AUTOMATION OF AUDITING CLAIMS - Described are computer-based methods and apparatuses, including computer program products, for automation of auditing claims. A data file is received comprising one or more auditable items, each auditable item comprising a word string having one or more words. Each word string for each auditable item is translated using one or more translation steps into a translated item description. Each translated item description is compared to a plurality of terms to generate matching information. Each translated item description is associated with an item identifier based on the matching information. Each auditable item is accepted or rejected based on the item identifier and one or more rules associated with the data file. | 01-03-2013 |
20130006615 | MOBILE WIRELESS COMMUNICATIONS DEVICE PROVIDING ENHANCED PREDICTIVE WORD ENTRY AND RELATED METHODS - A mobile wireless communications device may include a portable, handheld housing, and a display and keyboard carried by the portable, handheld housing. The keyboard may include a plurality of multi-symbol keys each having indicia of a plurality of respective symbols thereon, and a predetermined multi-symbol key may have a comma/apostrophe symbol and at least one other symbol thereon. A controller may be used for generating, in response to an ambiguous input including an ambiguous punctuations input, a menu of possible desired words including at least one word with a comma and at least one word with an apostrophe and at least one other character additional to the ambiguous input. The device further includes a multiple-axis input device that is operable to provide movement inputs to move to, for example, a variant output and is further operable to provide a selection input as to, for example, the variant output. | 01-03-2013 |
20130013289 | Method of Extracting Experience Sentence and Classifying Verb in Blog - Provided are a method of extracting an experience-revealing sentence from a blog document and a method of classifying verbs into activity verbs and state verbs in a sentence recorded in a blog document. The method of extracting an experience sentence from a blog document includes generating a sentence classifier using a machine learning algorithm based on grammatical features, and classifying experience sentences that represent actual experiences of users and non-experience sentences that represent no experience in the blog document using the sentence classifier. By classifying sentences in a blog document into experience sentences and non-experience sentences, it is possible to extract experiences that a user has actually had or that have actually happened to a user from the document. | 01-10-2013 |
20130013290 | LANGUAGE PROCESSOR - A referring expression processor which uses a probabilistic model and in which referring expressions including descriptive, anaphoric and deictic expressions are understood and generated in the course of dialogue is provided. The referring expression processor according to the present invention includes: a referring expression processing section which performs at least one of understanding and generation of referring expressions using a probabilistic model constructed with a referring expression Bayesian network, each referring expression Bayesian network representing relationships between a reference domain (D) which is a set of possible referents, a referent (X) in the reference domain, a concept (C) concerning the referent and a word (W) which represents the concept; and a memory which stores data necessary for constructing the referring expression Bayesian network. | 01-10-2013 |
20130013291 | SYSTEMS AND METHODS FOR SENTENCE COMPARISON AND SENTENCE-BASED SEARCH - Systems and methods for performing logical semantic sentence comparisons and sentence-based searches. Training is performed by running an NLP pipeline on unstructured text comprising sentences and creating sentence matrix representations on the unstructured text; storing the matrix representations in an indexed database; combining the stored matrix representations; running an SVD on the combined matrix; storing the SVD components in the indexed database; reiterating through the output of the NLP pipeline the sentences of the unstructured training text to form a low-dimensional matrix conversion for each sentence for storage in the database based on the calculated SVD components. Subsequent query statements are run through the same process based and converted into low-dimensional matrix representations using the SVD components from training; the low-dimensionality query matrix is compared to the stored low-dimensional matrices to determine the closest relevant documents, that are returned to the user. | 01-10-2013 |
20130013292 | HANDHELD ELECTRONIC DEVICE WITH TEXT DISAMBIGUATION - A method for disambiguating user inputs through a handheld mobile device is disclosed. According to the method, an ambiguous input sequence is received from an input device. A list including one or more disambiguated character sequences is generated on a display device corresponding to the ambiguous input sequence. An additional input is received from the input device. A processor determines that the additional input is an operational input associated with one of a plurality of operations on the disambiguated character sequences. The processor processes the disambiguated character sequences according the one of the plurality of operations associated with the operational input. | 01-10-2013 |
20130013293 | HANDHELD ELECTRONIC DEVICE PROVIDING A LEARNING FUNCTION TO FACILITATE CORRECTION OF ERRONEOUS TEXT ENTRY, AND ASSOCIATED METHOD - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate text input. In addition to identifying and outputting representations of language objects that are stored in the memory and that correspond with a text input, the device provides a learning function which facilitates providing proposed corrected output by the device in certain circumstances of erroneous input. | 01-10-2013 |
20130013294 | HANDHELD ELECTRONIC DEVICE WITH REDUCED KEYBOARD AND ASSOCIATED METHOD OF PROVIDING QUICK TEXT ENTRY IN A MESSAGE - An improved handheld electronic device having a reduced keyboard provides facilitated language entry by making available to a user certain words that a user may reasonably be expected to enter. In some situations, certain words can be stored, for example, in a temporary dictionary for use in particular situations. For instance, the names of the recipients of an electronic message might be stored in a temporary dictionary for rapid retrieval when entering a salutation in the message. As another example, a number of the words in an existing electronic message may be stored in a temporary dictionary and made available to a user when replying to or forwarding the message since the existing message might include words that the user might reasonably be expected to type in the reply message or the forwarded message. | 01-10-2013 |
20130013295 | METHOD AND SYSTEM FOR PROVIDING INITIAL PATENT CLAIM ANALYSIS - Information relating to intellectual property, across one or more intellectual property applications having various types of intellectual property data, can be provided and/or accessed in an integrated manner. Commonality(ies) are determined between disparate intellectual property applications, that may be applied by the intellectual property applications in accessing the intellectual property information. Responsive to a user request, which may include a specified commonality, stored information regarding the disparate data corresponding to the disparate intellectual property applications is retrieved. The commonality is utilized in bridging the gap to the intellectual property data for the disparate intellectual property applications. The bridging is provided by use of a commonality and by an IP engine. | 01-10-2013 |
20130013296 | HANDHELD ELECTRONIC DEVICE WITH REDUCED KEYBOARD AND ASSOCIATED METHOD OF PROVIDING IMPROVED DISAMBIGUATION WITH REDUCED DEGRADATION OF DEVICE PERFORMANCE - In view of the foregoing, an improved handheld electronic device having a reduced keyboard provides facilitated language entry by making available to a user certain words that a user may reasonably be expected to enter. Incoming data, such as the text of a message, can be scanned for proper nouns, for instance, since such proper nouns might not already be stored in memory and might be expected to be entered by the user when, for example, forwarding or responding to the message. A proper noun can be identified, for instance, on the basis that it begins with an upper case letter. The proper nouns can be stored, for example, in memory that may, by way of further example, be a temporary dictionary. | 01-10-2013 |
20130018649 | System and a Method for Generating Semantically Similar Sentences for Building a Robust SLMAANM Deshmukh; Om D.AACI New DelhiAACO INAAGP Deshmukh; Om D. New Delhi INAANM Joshi; SachindraAACI New DelhiAACO INAAGP Joshi; Sachindra New Delhi INAANM Mohamed; Shajith I.AACI KarnatakaAACO INAAGP Mohamed; Shajith I. Karnataka INAANM Verma; AshishAACI New DelhiAACO INAAGP Verma; Ashish New Delhi IN - A system and method are described for generating semantically similar sentences for a statistical language model. A semantic class generator determines for each word in an input utterance a set of corresponding semantically similar words. A sentence generator computes a set of candidate sentences each containing at most one member from each set of semantically similar words. A sentence verifier grammatically tests each candidate sentence to determine a set of grammatically correct sentences semantically similar to the input utterance. Also note that the generated semantically similar sentences are not restricted to be selected from an existing sentence database. | 01-17-2013 |
20130018650 | Selection of Language Model Training Data - An intelligent selection system selects language model training data to obtain in-domain training datasets. The selection is accomplished by estimating a cross-entropy difference for each candidate text segment from a generic language dataset. The cross-entropy difference is a difference between the cross-entropy of the text segment according to the in-domain language model and the cross-entropy of the text segment according to a language model trained on a random sample of the data source from which the text segment is drawn. If the difference satisfies a threshold condition, the text segment is added as an in-domain text segment to a training dataset. | 01-17-2013 |
20130018651 | PROVISION OF USER INPUT IN SYSTEMS FOR JOINTLY DISCOVERING TOPICS AND SENTIMENTSAANM Djordjevic; DivnaAACI AntibesAACO FRAAGP Djordjevic; Divna Antibes FRAANM Ghani; RayidAACI ChicagoAAST ILAACO USAAGP Ghani; Rayid Chicago IL USAANM Krema; MarkoAACI EvanstonAAST ILAACO USAAGP Krema; Marko Evanston IL US - A generative model is used to develop at least one topic model and at least one sentiment model for a body of text. The at least one topic model is displayed such that, in response, a user may provide user input indicating modifications to the at least one topic model. Based on the received user input, the generative model is used to provide at least one updated topic model and at least one updated sentiment model based on the user input. Thereafter, the at least one updated topic model may again be displayed in order to solicit further user input, which further input is then used to once again update the models. The at least one updated topic model and the at least one updated sentiment model may be employed to analyze target text in order to identify topics and associated sentiments therein. | 01-17-2013 |
20130018652 | EVIDENCE DIFFUSION AMONG CANDIDATE ANSWERS DURING QUESTION ANSWERING - Diffusing evidence among candidate answers during question answering may identify a relationship between a first candidate answer and a second candidate answer, wherein the candidate answers are generated by a question-answering computer process, the candidate answers have associated supporting evidence, and the candidate answers have associated confidence scores. All or some of the evidence may be transferred from the first candidate answer to the second candidate answer based on the identified relationship. A new confidence score may be computed for the second candidate answer based on the transferred evidence. | 01-17-2013 |
20130018653 | HANDHELD ELECTRONIC DEVICE AND METHOD FOR DISAMBIGUATION OF COMPOUND TEXT INPUT EMPLOYING DIFFERENT GROUPINGS OF DATA SOURCES TO DISAMBIGUATE DIFFERENT PARTS OF INPUT - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to generate compound language solutions by employing different groupings of data sources to generate different portions of the compound language solutions. | 01-17-2013 |
20130024184 | DATA PROCESSING SYSTEM AND METHOD FOR ASSESSING QUALITY OF A TRANSLATION - The invention provides a data processing system and method for analysing text. The invention uses statistical text classification techniques to assist with the quality assurance of translated texts by using a one pass analysis technique and calculating and ranking probed texts with a dissimilarity score. The use of ranked items to direct, inform, guide and assist human reviewers, auditors, proof-readers, post-editors and evaluators of the accuracy of the translation. The invention provides a significant time saving and accuracy of assessing document's adherence to an enterprises corporate messaging and authoring standards and provides for a level of automated quality assurance within automated translation workflows. | 01-24-2013 |
20130024185 | Automatic Dynamic Contextual Date Entry Completion - A method performed in a computer device having associated therewith a plurality of unstructured documents having words therein, the method involves accessing at least some of the plurality of unstructured documents, extracting a multiset of words, forming a matrix from the documents in which each word in the multiset is represented in a column and each document from which the words came is represented in a row, treating each document as a vector in a multidimensional Euclidean space, uniquely pairing the unique documents, measuring the similarity between the pairs as a cosine of the angle between vectors, comparing the cosines to a specified threshold to determine relatedness among the documents, and based upon the relatedness, when an input is received by the computer device representing a string of a threshold number of characters, the computer device will provide at least one word that would complete the character string. | 01-24-2013 |
20130024186 | Deep Model Statistics Method for Machine Translation - In one embodiment, the invention provides a method for machine translation of a source document in an input language to a target document in an output language, comprising generating translation options corresponding to at least portions of each sentence in the input language; and selecting a translation option for the sentence based on statistics associated with the translation options. | 01-24-2013 |
20130030792 | Customization of a Natural Language Processing Engine - A method, an apparatus and an article of manufacture for customizing a natural language processing engine. The method includes enabling selection of one or more parameters of a desired natural language processing task, the one or more parameters intended for use by a trained and an untrained user, mapping the one or more selected parameters to a collection of one or more intervals of an input parameter to an optimization algorithm, and applying the optimization algorithm with the collection of one or more intervals of an input parameter to a model used by a natural language processing engine to produce a customized model. | 01-31-2013 |
20130030793 | LINGUISTIC ERROR DETECTION - Potential linguistic errors within a sequence of words of a sentence are identified based on analysis of a configurable sliding window. The analysis is performed based on an assumption that if a sequence of words occurs frequently enough within a large, well-formed corpus, its joint probability for occurring in a sentence is very likely to be greater than the same words randomly ordered. | 01-31-2013 |
20130030794 | APPARATUS AND METHOD FOR CLUSTERING SPEAKERS, AND A NON-TRANSITORY COMPUTER READABLE MEDIUM THEREOF - According to one embodiment, a speaker clustering apparatus includes a clustering unit, an extraction unit, and an error detection unit. The clustering unit is configured to extract acoustic features for speakers from an acoustic signal, and to cluster utterances included in the acoustic signal into the speakers by using the acoustic features. The extraction unit is configured to acquire character strings representing contents of the utterances, and to extract linguistic features of the speakers by using the character strings. The error detection unit is configured to decide that, when one of the character strings does not fit with a linguistic feature of a speaker into which an utterance of the one is clustered, the utterance is erroneously clustered by the clustering unit. | 01-31-2013 |
20130035928 | TxtAnalizer - Text to motion pictures software program working on sentences that can have visual interpretation: you type in the sentence, the program understands the meaning of the sentence (that means one can say the same thing in other words) and starts an external process, a video file. The video file closes itself after the video finishes and the program is ready to process the next sentence. | 02-07-2013 |
20130035929 | INFORMATION PROCESSING APPARATUS AND METHOD - According to one embodiment, an information processing apparatus includes an acquisition unit, an analysis unit, and a generation unit. The acquisition unit is configured to acquire a status of a user while the user is working with a resource. The analysis unit is configured to acquire text information included in the resource by analyzing the resource. The generation unit is configured to generate at least one work label from the status of the user and the text information, and to generate a work history including a part of the text information, to which the work label is assigned. | 02-07-2013 |
20130035930 | PREDICTING LEXICAL ANSWER TYPES IN OPEN DOMAIN QUESTION AND ANSWERING (QA) SYSTEMS - In an automated Question Answer (QA) system architecture for automatic open-domain Question Answering, a system, method and computer program product for predicting the Lexical Answer Type (LAT) of a question. The approach is completely unsupervised and is based on a large-scale lexical knowledge base automatically extracted from a Web corpus. This approach for predicting the LAT can be implemented as a specific subtask of a QA process, and/or used for general purpose knowledge acquisition tasks such as frame induction from text. | 02-07-2013 |
20130035931 | PREDICTING LEXICAL ANSWER TYPES IN OPEN DOMAIN QUESTION AND ANSWERING (QA) SYSTEMS - In an automated Question Answer (QA) system architecture for automatic open-domain Question Answering, a system, method and computer program product for predicting the Lexical Answer Type (LAT) of a question. The approach is completely unsupervised and is based on a large-scale lexical knowledge base automatically extracted from a Web corpus. This approach for predicting the LAT can be implemented as a specific subtask of a QA process, and/or used for general purpose knowledge acquisition tasks such as frame induction from text. | 02-07-2013 |
20130035932 | SYSTEM AND METHOD OF GENERATING RESPONSES TO TEXT-BASED MESSAGES - A system to generate a response to a text-based natural language message includes a user interface, processing device, and a computer-readable storage medium storing executable instructions to generate the response to the text-based natural language message. The instructions and a method for generating the response include identifying a sentence in the text-based natural language message, identifying an input clause in the sentence, and parsing the input clause, thereby defining a relationship between words in the input clause. The instructions and method also include assigning a semantic tag to the parsed input clause, comparing the input clause to a previously received clause, the previously received clause being correlated with a previously generated response clause, and generating an output response message derived from the previously generated response clause. | 02-07-2013 |
20130041653 | Coefficients Attribution for Different Objects Based on Natural Language Processing - In one embodiment, a system includes one or more computing systems that implement a social networking environment and is operable to parse users' actions that include free form text to determine and store objects and affinities contained in the text string through natural-language processing. The method comprises accessing a text string, identifying objects and affinity declarations via natural-language processing, assessing the combination of objects and context data to determine an instance of a broader concept, and determining an affinity coefficient through a natural-language processing dictionary. Once a database of stored instances and affinities has been generated and stored, it may be leveraged to push suggestions to members of the social network to enhance their social networking experience. | 02-14-2013 |
20130041654 | AUTOMATED SENTENCE PLANNING IN A TASK CLASSIFICATION SYSTEM - Disclosed is a task classification system that interacts with a user. The task classification system may include a recognizer that may recognize symbols in the user's input communication, and a natural language understanding unit that may determine whether the user's input communication can be understood. If the user's input communication can be understood, the natural language understanding unit may generate understanding data. The system may also include a communicative goal generator that may generate communicative goals based on the symbols recognized by the recognizer and understanding data from the natural language understanding unit. The generated communicative goals may be related to information needed to be obtained from the user. The system may further include a sentence planning unit that may automatically plan one or more sentences based on the communicative goals generated by the communicative goal generator with at least one of the sentences plans being output to the user. | 02-14-2013 |
20130041655 | Systems and Methods for Word Offensiveness Detection and Processing Using Weighted Dictionaries and Normalization - Computer-implemented systems and methods are provided for identifying language that would be considered obscene or otherwise offensive to a user or proprietor of a system. A plurality of offensive words are received, where each offensive word is associated with a severity score identifying the offensiveness of that word. A string of words is received. A distance between a candidate word and each offensive word in the plurality of offensive words is calculated, and a plurality of offensiveness scores for the candidate word are calculated, each offensiveness score based on the calculated distance between the candidate word and the offensive word and the severity score of the offensive word. A determination is made as to whether the candidate word is an offender word, where the candidate word is deemed to be an offender word when the highest offensiveness score in the plurality of offensiveness scores exceeds an offensiveness threshold value. | 02-14-2013 |
20130046531 | PSYCHO-LINGUISTIC STATISTICAL DECEPTION DETECTION FROM TEXT CONTENT - An apparatus and method for determining whether a text is deceptive may comprise analyzing a body of textual content known to be one of text containing true content and text containing deceptive content; identifying psycho-linguistic cues that are indicative of a text being deceptive; statistically analyzing, via a computing device, a given text based upon the psycho-linguistic cues to determine if the text is deceptive. The apparatus and method may further comprise weighting the psycho-linguistic cues and statistically analyzing based on the weighted psycho-linguistic cues. The statistically analyzing step may be performed using one of a cue matching analysis, a weighted cue matching analysis, a Markov chain analysis, and a sequential probability ratio testing binary hypothesis analysis. The psycho-linguistic cues may be separated into categories, including increasing trend cues and decreasing trend cues and analyzed according to presence in a category from within the categories. | 02-21-2013 |
20130046532 | SYSTEM AND METHOD FOR PROVIDING DEFINITIONS - A system and method for providing definitions is described. A phrase to be defined is received. One or more documents, which each contain at least one definition, are determined. The phrase is matched to at least one of the definitions. One or more definitions for the phrase are presented. | 02-21-2013 |
20130054227 | Phonetic Symbol System - A phonetic symbol system formed by phonetic symbols using letters of English Alphabet is described. The cases or the styles of the letters do not affect the sounds of the phonetic symbols. The phonetic symbols are systematically and logically defined. The phonetic symbol system can be used where language is involved. The phonetic symbol system provides convenient ways to represent languages. In some embodiments, the phonetic symbol system provides ways to represent English language. | 02-28-2013 |
20130054228 | SYSTEM AND METHOD FOR PROCESSING MULTI-MODAL DEVICE INTERACTIONS IN A NATURAL LANGUAGE VOICE SERVICES ENVIRONMENT - A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction. | 02-28-2013 |
20130060560 | SERVER-BASED SPELL CHECKING - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for server-based spell check. One aspect of the subject matter described in this specification can be embodied in methods performed by a server. The methods include the actions of receiving a request to spell check text; dividing the text into multiple segments, each segment including no more than a predetermined number of terms; providing each segment to a spell checker programmed to spell check an input including no more than the predetermined number of terms; receiving, from the spell checker, one or more spelling correction suggestions, each spelling correction suggestion corresponding to a term in a segment, the term being designated as misspelled by the spell checker; and assembling the received one or more spelling correction suggestions into a response to the request to spell check the text. | 03-07-2013 |
20130060561 | Encoding and Decoding of Small Amounts of Text - Text is encoded using a predetermined dictionary not unique to the encoded text to substitute codes for words and phrases thereby obviating transmission of the dictionary along with transmitted encoded text. The codes of the dictionary are made of one or more text characters such that the message, once encoded, continues to be a legitimate text message and can travel through any data transport medium through which a conventional text message can travel. Non-word characters delimit codes and unencoded words in an encoded message. Any phrase that can be confused with a code is flagged to indicate that it is not a code. | 03-07-2013 |
20130060562 | INFORMATION PROCESSING APPRATUS, NATURAL LANGUAGE ANALYSIS METHOD, PROGRAM AND RECORDING MEDIUM - An apparatus and method for calculating a score of matching a sentence with a query pattern having a dependency structure. The apparatus includes: an input unit acquiring an analysis target sentence, a query pattern and an index value indexing how a linguistic unit in the sentence tends to modify another; and a score calculation unit calculating a matching score indexing the degree of matching of the sentence with the query pattern. The matching score is represented by a function having an index value with which a dependency relation included in the query pattern is associated. The score is calculated by attempting association between a substructure of the query pattern and a range in the sentence and by performing recursive calculation in the substructure and the range while storing partial calculation result of the function in a memory area for reuse. | 03-07-2013 |
20130060563 | HANDHELD ELECTRONIC DEVICE WITH REDUCED KEYBOARD AND ASSOCIATED METHOD OF PROVIDING IMPROVED DISAMBIGUATION - An improved handheld electronic device having a reduced keyboard provides facilitated language entry by making available to a user certain words that a user may reasonably be expected to enter. In some situations, certain words can be stored, for example, in a temporary dictionary for use in particular situations. For instance, the names of the recipients of an electronic message might be stored in a temporary dictionary for rapid retrieval when entering a salutation in the message. As another example, a number of the words in an existing electronic message may be stored in a temporary dictionary and made available to a user when replying to or forwarding the message since the existing message might include words that the user might reasonably be expected to type in the reply message or the forwarded message. | 03-07-2013 |
20130066625 | TEXT SEGMENTATION AND LABEL ASSIGNMENT WITH USER INTERACTION BY MEANS OF TOPIC SPECIFIC LANGUAGE MODELS AND TOPIC-SPECIFIC LABEL STATISTICS - The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document. | 03-14-2013 |
20130073277 | METHODS AND SYSTEMS FOR COMPILING COMMUNICATION FRAGMENTS AND CREATING EFFECTIVE COMMUNICATION - Methods and systems compiles communication fragments based on user chosen variables to create a communication. The communication, can be immediately displayed to the user for approval, may be manipulated real-time by selecting amongst a collection of variables to arrive at the final communication. An exemplary system gathers personal information and profile information from users of the communication service. This information is used when compiling a communication to allow for more customized and personalized communications. Information is used by an empirical database to gain intelligence on which communication has the highest likelihood of satisfying a receiver for a particular communication type. These communications, with the highest likelihood of success, are recommended by the communication service to potential senders of communication. | 03-21-2013 |
20130073278 | METHODS AND SYSTEMS FOR COMPILING COMMUNICATION FRAGMENTS AND CREATING EFFECTIVE COMMUNICATION - Methods and systems for forming a communication. At least one variable identifying an area of communication is received. Then, information associated with one or more users is received. A plurality of variables, each variable relevant to forming a communication within a communication category are received. A communication structure is generated based on at least two of the received variable identifying the area of communication, the received user information or the received plurality of variables. Communication fragments are identified based on the generated communication structure. Communication fragments are selected from those identified. A communication is formed based upon the selected communication fragments. Then, the formed communication is outputted. | 03-21-2013 |
20130073279 | METHODS AND SYSTEMS FOR COMPILING COMMUNICATION FRAGMENTS AND CREATING EFFECTIVE COMMUNICATION - A method compiles communication fragments whereby the user is required to choose among variables to create a communication. The communication, can be immediately displayed to the user for approval, may be manipulated real-time by selecting amongst a collection of variables to arrive at the final communication. An exemplary system gathers personal information and profile information from users of the communication service. This information is used when compiling a communication to allow for more customized and personalized communications. Information is used by an empirical database to gain intelligence on which communication has the highest likelihood of satisfying a receiver for a particular communication type. These communications, with the highest likelihood of success, are recommended by the communication service to potential senders of communication. | 03-21-2013 |
20130073280 | DYNAMIC SENTENCE FORMATION FROM STRUCTURED OBJECTS AND ACTIONS IN A SOCIAL NETWORKING SYSTEM - A social networking system includes a mechanism for integrating user actions on objects outside of the social networking system in the social graph. External system operators include widgets that, when executed by user devices, record user interactions that correspond to a defined structure of actions and objects. Third party operators utilize a tool provided by the social networking system to define the structure of actions and objects, verb tenses of action types, and noun forms object types. External actions are recorded by the social networking system for publishing to the social graph in dynamically generated sentences formed using the structure of the actions and objects. | 03-21-2013 |
20130080149 | SYSTEM AND METHOD FOR EXTRACTING CATEGORIES OF DATA - Lines of data that are from a historical document, such as a digitized city directory, have information extracted and stored in searchable data fields. Words and phrases within the lines of data are identified and tagged. Rendering rules are applied to the tagged words and phrases to extract names, addresses, occupations, spouse information and other data, and store that data in the searchable fields. | 03-28-2013 |
20130080150 | Automatic Semantic Evaluation of Speech Recognition Results - A semantic error rate calculation may be provided. After receiving a spoken query from a user, the spoken query may be converted to text according to a first speech recognition hypothesis. A plurality of results associated with the converted query may be received and compared to a second plurality of results associated with the converted query. | 03-28-2013 |
20130080151 | Systems and Methods for Teaching Phonemic Awareness - A system to teach phonemic awareness uses a plurality of phonemes and a plurality of graphemes. Each phoneme is a unique sound and an indivisible unit of sound in a spoken language, and each grapheme is a written representation of one of the plurality of phonemes. A plurality of distinct graphical images and a plurality of unique names are provided where each unique name is associated with one of the graphical images and represents a grouping of graphemes selected from the plurality of graphemes. The system uses a plurality of sets of display pieces having a plurality of individual display pieces. Each individual display piece includes at least a portion of one of the graphical images and the graphemes from the grouping of graphemes constituting the associated unique name. A predefined instructional environment defines a predefined spatial context and predefined rules governing the acquisition and utilization of individual display pieces. | 03-28-2013 |
20130080152 | LINGUISTICALLY-ADAPTED STRUCTURAL QUERY ANNOTATION - A system and method for natural language processing of queries are provided. A lexicon includes text elements that are recognized as being a proper noun when capitalized. A natural language query includes a sequence of text elements including words. The query is processed. The processing includes a preprocessing step, in which part of speech features are assigned to the text elements in the query. This includes identifying, from a lexicon, a text element in the query which starts with a lowercase letter and assigning recapitalization information to the text element in the query, based on the lexicon. This information includes a part of speech feature of the capitalized form of the text element. Then parts of speech for the text elements in the query are disambiguated, which includes applying rules for recapitalizing text elements based on the recapitalization information. | 03-28-2013 |
20130080153 | INFORMATION PROCESSING APPARATUS, NON-TRANSITORY COMPUTER READABLE MEDIUM STORING INFORMATION PROCESSING PROGRAM, AND INFORMATION PROCESSING METHOD - An information processing apparatus includes a receiving unit that receives character sequences, a sorting unit that sorts the character sequences received by the receiving unit into known words and unknown words, and a detecting unit that detects character sequences sorted as unknown words by the sorting unit as incorrect words and detects a third character sequence between a first character sequence and a second character sequence, which have been sorted as unknown words by the sorting unit, as incorrect words when the third character sequence includes words sorted as known words by the sorting unit and the number of the known words is less than or equal to or less than a predetermined number. | 03-28-2013 |
20130080154 | NETWORK BASED RESTORATIVE JUSTICE - Systems and methods herein provide for resolution between at least two parties through a network. In one embodiment, a system includes an interface operable to communicatively couple to the network. The system also includes a processor operable to establish secure communications between at least two client terminals and a facilitator terminal through the network. The processor provides dialog interfaces to the client terminals and to the facilitator terminal, receives dialog of the parties from the client terminals via their respective dialog interfaces, and provides the dialog of the parties to the facilitator terminal. The processor receives dialog from the facilitator terminal via the facilitator's dialog interface to manage the dialog between the parties, generates a file detailing an agreement between the parties based on the dialog between the parties, transfers the file to the parties via their respective dialog interfaces, and stores the file for subsequent access by the parties. | 03-28-2013 |
20130085745 | SEMANTIC-BASED APPROACH FOR IDENTIFYING TOPICS IN A CORPUS OF TEXT-BASED ITEMS - A method of identifying topics in a corpus that includes a plurality of text-based items begins by extracting keytext from each of the plurality of text-based items, resulting in sets of keytext. The method continues by processing the keytext sets to generate a respective semantic footprint for each of the text-based items, resulting in a plurality of semantic footprints. The semantic footprints are used to calculate similarity values for the text-based items, wherein the similarity values indicate commonality between pairs of the text-based items. The method continues by clustering the text-based items into a number of topic groups, wherein the clustering is influenced by the similarity values, and by generating a topic heading for each of the number of topic groups, resulting in a number of topic headings. Next, the text-based items are grouped into accessible topic groups associated with the topic headings. | 04-04-2013 |
20130085746 | PROOF READING OF TEXT DATA GENERATED THROUGH OPTICAL CHARACTER RECOGNITION - A novel system includes: a first proof reading tool for performing carpet proof reading on text data; a second proof reading tool for performing side-by-side proof reading on the text data; a storage unit configured to store a log of proof reading operations having been performed by using the first and second proof reading tools; and an analysis unit configured to determine, for each attribute serving as units in which carpet proof reading is performed with the first proof reading tool, whether or not to use the first proof reading tool in proof reading of the attribute, by comparing a first estimated value of a time taken when proof reading is performed by using the first proof reading tool with a second estimated value of a time taken when proof reading is performed by using the second proof reading tool without using the first proof reading tool, the first and second estimated values being calculated on the basis of the log. | 04-04-2013 |
20130090916 | System and Method for Detecting and Correcting Mismatched Chinese Character - A system and method for detecting and correcting mismatched Chinese characters in a phrase. The system comprises a database for the look-up of characters and Chinese phrases, a module to compare the input phrases with the look-up data retrieved from the database and a module to correct the mismatched characters. The database contains correct phrases as well as attributes associated with each character, such as pronunciation and radical composition. The modules inputs a Chinese phrase that has at least two characters and compares it with the data retrieved from the database to determine if there are incorrect characters. The spell checking method includes two groups of steps: mismatched character detection and mismatched character correction. Whether there is any mismatched character to be corrected is determined by the edit distance, the phrase length and comparisons of the pronunciation and radical composition of the mismatched characters. | 04-11-2013 |
20130090917 | FILTERING PROHIBITED LANGUAGE FORMED INADVERTENTLY VIA A USER-INTERFACE - Some embodiments of the inventive subject matter are directed to detecting that a text string is subject to an algorithmic function that would modify one of more parts of the text string to be at least one proposed text substring for presentation via a user interface, wherein the at least one proposed text substring is a portion of the text string. Some embodiments are further directed to evaluating the at least one proposed text substring against one or more prohibited text strings prohibited for presentation via the user interface and detecting, in response to the evaluating of the at least one proposed text substring against the one or more prohibited text strings, that the at least one proposed text substring is one of the one or more prohibited text strings. Some embodiments are further directed to modifying the at least one proposed text substring, in response to detecting that the at least one proposed text substring is one of the one or more prohibited text strings. | 04-11-2013 |
20130090918 | SYSTEM, METHOD AND APPARATUS FOR DETECTING RELATED TOPICS AND COMPETITION TOPICS BASED ON TOPIC TEMPLATES AND ASSOCIATION WORDS - A system for detecting related topics and competition topics for a target topic includes an information extracting apparatus configured to create topic templates and association words from documents created online to generate topic templates and association words. The system also includes a related topic detecting apparatus configured to detect and trace related topics and competition topics for the target topic based on the topic templates and the association words. | 04-11-2013 |
20130090919 | ELECTRONIC DEVICE AND DICTIONARY DATA DISPLAY METHOD - An electronic device includes a display module and a dictionary storage module which stores dictionary data that causes a plurality of entry words including compound words obtained by connecting a plurality of words to correspond to explanatory information on the entry words. When the user retrieves a dictionary, entry words for compound words are retrieved from the entry words in the dictionary storage module and words common to the retrieved compound words are listed and displayed on the display module. Entry words for compound words connecting with a word specified by a user operation in the displayed list are read from the dictionary data and displayed in list form on the display module. | 04-11-2013 |
20130090920 | SYSTEMS AND METHODS FOR ACCESSING WEB PAGES USING NATURAL LANGUAGE - Systems and methods for building an interface that receives and responds to varied natural language expressions. In an embodiment, the system receives a natural language expression in text or audio, and translates it by building at least one data structure which reflects the concepts expressed in the natural language expression. The data structure may comprise a symbol representing each concept. In an embodiment, a parser utilizes the data structure to parse language expressions to single concept symbols that represent the meaning of the expressions. Response actions may also be performed in response to the parsed language expressions. In addition, a parser may receive a single concept symbol, and generate one or many natural language expressions of the meaning of the concept symbol. Furthermore, the system may be configured to understand the local meaning of words and phrases. | 04-11-2013 |
20130096909 | SYSTEM AND METHOD FOR SUGGESTION MINING - A system and method for extraction of suggestions for improvement form a corpus of documents, such as customer reviews, are disclosed. A structured terminology provided or a topic includes a set of semantic classes, each including a set of terms. A thesaurus of terms relating to suggestions of improvement is provided. Text elements of text strings in the documents which are instances of terms in the structured terminology are labeled with the corresponding semantic class and text elements which are instances of terms in the thesaurus are also labeled. A set of patterns is applied to the labeled text strings to identify suggestions of improvement expressions. The patterns define syntactic relations between text elements, some of which are required to be instances of one of the terms in a particular semantic class or thesaurus. A set of suggestions for improvements is output based on the identified suggestions of improvement expressions. | 04-18-2013 |
20130096910 | METHOD AND SYSTEM FOR ADAPTING TEXT CONTENT TO THE LANGUAGE BEHAVIOR OF AN ONLINE COMMUNITY - A method for adapting a piece of text content to the language behavior of an online community, comprising the following steps:
| 04-18-2013 |
20130096911 | NORMALISATION OF NOISY TYPEWRITTEN TEXTS - Described herein is a method and system for normalising a SMS sequence in which the sequence is pre-processed to identify noisy segments in the sequence, normalising those noisy segments and normalising the rest of the SMS sequence in accordance with predefined rules. A morphosyntactic analysis is carried out on the normalised text before an output is provided either as a typewritten text or as a synthetic speech signal. | 04-18-2013 |
20130103385 | PERFORMING SENTIMENT ANALYSIS - There is provided a computer-implemented method of performing sentiment analysis. An exemplary method comprises performing a first sentiment analysis on microblogging data based on a method using an opinion lexicon. The method also includes training a classifier using training data from the first sentiment analysis. Additionally, the method includes identifying a new opinion term in the microblogging data by performing a statistical test. The new opinion terms are not in the opinion lexicon. The method also includes identifying new microblogging data based on the new opinion term. Further, the method includes performing a second sentiment analysis on the new microblogging data using the classifier. | 04-25-2013 |
20130103386 | PERFORMING SENTIMENT ANALYSIS - There is provided a computer-implemented method of performing sentiment analysis. An exemplary method comprises identifying one or more sentences in a microblog. The microblog comprises an entity. The method further includes identifying one or more opinion words in the sentences based on an opinion lexicon. Additionally, the method includes determining, for each of the sentences, an opinion value for the entity. The opinion value is determined based on an opinion value for each of the opinion words in an opinion lexicon. | 04-25-2013 |
20130103387 | COMPUTER PROCESSES FOR ANALYZING AND IMPROVING DOCUMENT READABILITY - Computer-based processes are disclosed for analyzing and improving document readability. Document readability is improved by using rules and associated logic to automatically detect various types of writing problems and to make and/or suggest edits for eliminating such problems. Many of the rules seek to generate more concise formulations of the analyzed sentences, such as by eliminating unnecessary words, rearranging words and phrases, and making various other types of edits. | 04-25-2013 |
20130103388 | DOCUMENT ANALYZING APPARATUS - A document analyzing apparatus includes a document analyzer and a comparator. The document analyzer is used for deconstructing a text file of a document stored in a data storage device to obtain a plurality of model sentences, and then storing the model sentences in the data storage device. The document analyzer further applies a position index to each of the model sentences, wherein the position index points to the storing position of the document having the model sentence in data storage device. The comparator is used for comparing a processing sentence and each of the model sentences for the similarity. The document analyzing apparatus in the present invention is capable of deconstructing text files into small units such as sentences so as to facilitate the user to search or classify the documents. | 04-25-2013 |
20130103389 | Selecting Terms in a Document - Determining a mapping between a textual representation in a document and a concept is disclosed. A document is received. A set of candidate textual representations in the document is identified. For at least one candidate textual representation included in the set, an associated concept included in a taxonomy of concepts is determined. The candidate textual representation and the associated concept are provided as output. | 04-25-2013 |
20130103390 | METHOD AND APPARATUS FOR PARAPHRASE ACQUISITION - A computer based natural language processing method for identifying paraphrases in corpora using statistical analysis comprises deriving a set of starting paraphrases (SPs) from a parallel corpus, each SP having at least two phrases that are phrase aligned; generating a set of paraphrase patterns (PPs) by identifying shared terms within two aligned phrases of an SP, and defining a PP having slots in place of the shared terms, in right hand side (RHS) and left hand side (LHS) expressions; and collecting output paraphrases (OPs) by identifying instances of the PPs in a non-parallel corpus. By using the reliably derived paraphrase information from a small parallel corpus to generate the PPs, and extending the range of instances of the PPs over the large non-parallel corpus, better coverage of the paraphrases in the language and fewer errors are encountered. | 04-25-2013 |
20130103391 | NATURAL LANGUAGE PROCESSING FOR SOFTWARE COMMANDS - A system and method for facilitating user access to software functionality. An example method includes receiving natural language input; determining an identify of a user providing the input; employing the identity to facilitate selecting a software command to associate with the received natural language input; and employing software to act on the command. In a more specific embodiment, the method further includes determining an initial set of available software commands, and narrowing the initial set of available software commands based on the identity of a user and enterprise data associated with the identity of the user, resulting in a narrowed set of software commands in response thereto. Example enterprise data includes enterprise organizational chart information (e.g., corporate hierarchy information) and user access privilege information maintained by an ERP system. | 04-25-2013 |
20130110496 | Calculating Term Similarity Using A Meta-Model Semantic Network | 05-02-2013 |
20130110497 | Functionality for Normalizing Linguistic Items | 05-02-2013 |
20130110498 | PHRASE-BASED DATA CLASSIFICATION SYSTEM | 05-02-2013 |
20130110499 | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD AND INFORMATION RECORDING MEDIUM | 05-02-2013 |
20130110500 | METHOD, SYSTEM, AND APPARTUS FOR SELECTING AN ACRONYM EXPANSION | 05-02-2013 |
20130110501 | PERPLEXITY CALCULATION DEVICE | 05-02-2013 |
20130110502 | System And Method For Internet Radio Station Program Discovery | 05-02-2013 |
20130110503 | Method for Automatically Preferring a Diacritical Version of a Linguistic Element on a Handheld Electronic Device Based on Linguistic Source and Associated Apparatus | 05-02-2013 |
20130110504 | Method and system for natural language dictionary generation | 05-02-2013 |
20130110505 | Using Event Alert Text as Input to an Automated Assistant | 05-02-2013 |
20130117012 | KNOWLEDGE BASED PARSING - The subject disclosure generally relates to parsing unstructured data based on knowledge of domains related to the unstructured data. A domain identification component can identify a set of domains related to a term in a data set. An inspection component can identify unmatched words, and unmatched related domains. A correlation component can compare the unmatched words to known values for the unmatched domains, and a manager component can match the unmatched words with the unmatched domains based on the comparison. In addition, combinations of the words can be generated based on a set of predetermined rules, and compared to the unmatched domains. Furthermore, delimiter based parsing can be employed to augment the knowledge based parsing. | 05-09-2013 |
20130117013 | PRONOUNCEABLE DOMAIN NAMES - Embodiments of the present teachings relate to systems and methods for generating pronounceable domain names. The method includes proving a list of character strings; filtering the list of character strings through a first filter based on a phonetic model to produce a first filtered list of character strings; filtering the list of character strings through a second filter based on a character order mode to produce a second filtered list of character strings; and generating, by a processor, a list of pronounceable domain names based on the first filtered list of character strings and the second filtered list of character strings. | 05-09-2013 |
20130124189 | NETWORK-BASED BACKGROUND EXPERT - A system and methodology that provides a network-based, e.g., cloud-based, background expert for predicting and/or accomplishing a user's goals is disclosed herein. Moreover, the system monitors, in the background, user generated data and/or publicly available data to determine and/or infer a user's goal, with or without an active indication/request from the user. Typically, the user-generated data can include user conversations, such as, but not limited to, speech data in a voice call, text messages, chat dialogues, etc. Further, the system identifies an action or task that facilitates accomplishment of the user goal in real-time. Moreover, the system can automatically perform the action/task and/or request user authorization prior to performing the action/task. | 05-16-2013 |
20130124190 | SYSTEM AND METHODOLOGY THAT FACILITATES PROCESSING A LINGUISTIC INPUT - Aspects for teaching processing linguistic expressions are disclosed, which include apparatuses, methods, and computer-readable storage media to facilitate such processing. In a particular aspect, modifying a linguistic expression includes receiving an input that includes the linguistic expression and a selection of a target vernacular, and retrieving a phonetic scheme corresponding to the target vernacular, which includes a set of accentuation rules associated with the target vernacular. An audible equivalent of the linguistic expression is then generated in the target vernacular according to the phonetic scheme. In another aspect, phonetic schemes are generated by aggregating linguistic information corresponding to a plurality of vernaculars, and analyzing the linguistic information to ascertain a plurality of accentuation rules. A phonetic scheme is then generated for each of the plurality of vernaculars, which includes a set of accentuation rules associated with the corresponding vernacular. | 05-16-2013 |
20130124191 | MICROBLOG SUMMARIZATION - Various embodiments provide summarization techniques that can be applied to blogs or microblogs to present information that is determined to be useful, in a shortened form. In one or more embodiments, a procedure is utilized to automatically acquire a set of concepts from various sources, such as free text. These acquired concepts are then used to guide a clustering process. Clusters are ranked and then summarized by incorporating sentiment and the frequency of words. | 05-16-2013 |
20130124192 | ALERT NOTIFICATIONS IN AN ONLINE MONITORING SYSTEM - An online monitoring system assists parents or other individuals in monitoring social networking activity and/or mobile phone usage of their children or others. The online monitoring system may gather data corresponding with monitored social networking and/or mobile phone accounts. The data may be analyzed to provide summarized information and alert notifications to parents or other individuals. The analyses provided by the online monitoring service may include several text-based analyses: keyword analysis, sentiment analysis, and structure analysis. The keyword analysis may include analyzing text to determine whether it includes any blacklisted or whitelisted words. The sentiment analysis may include determining an overall sentiment of text based on the sentiment of words within the text. The structure analysis may include analyzing the sentence structure of the text to identify grammatical parts. An overall structure score is determined based on the sentiment of the grammatical parts. | 05-16-2013 |
20130124193 | System and Method Implementing a Text Analysis Service - One embodiment includes a computer implemented method of processing documents. The method includes generating a text analysis task object that includes instructions regarding a document processing pipeline and a document identifier. The method further includes accessing, by a worker system, the text analysis task object and generating the document processing pipeline according to the instructions. The method further includes performing text analysis using the document processing pipeline on a document identified by the document identifier. | 05-16-2013 |
20130124194 | SYSTEMS AND METHODS FOR MANIPULATING DATA USING NATURAL LANGUAGE COMMANDS - Systems and methods for manipulating data using natural language commands in accordance with embodiments of the invention are disclosed. In one embodiment, a natural language enterprise system includes a database configured to store a natural language index, where the natural language index maps keywords to actions to data, a natural language application server configured to communicate with the database, wherein the natural language application server is configured to receive a command statement, parse the received command statement to identify at least one keyword in the command statement, query the database using at least one keyword to identify at least one actions to data using the natural language index, locate at least one piece of enterprise data to which at least one action to data may be performed, and initiate at least one action to data that is applied to at least one of the located pieces of enterprise data. | 05-16-2013 |
20130124195 | Phrase-Based Dialogue Modeling With Particular Application to Creating a Recognition Grammar - The invention enables creation of grammar networks that can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces (NLVUI). The invention enables phrase-based modeling of generic structures of verbal interaction to be used for the purpose of automating part of the design of such grammar networks. Most particularly, the invention enables such grammar networks to be used in providing a voice-controlled user interface to human readable text data that is also machine-readable (such as a Web page, a word processing document, a PDF document, or a spreadsheet). | 05-16-2013 |
20130132070 | Computer-Based Construction of Arbitrarily Complex Formal Grammar Expressions - A method, system and computer program product for building an expression, including utilizing any formal grammar of a context-free language, displaying an expression on a computer display via a graphical user interface, replacing at least one non-terminal display object within the displayed expression with any of at least one non-terminal display object and at least one terminal display object, and repeating the replacing step a plurality of times for a plurality of non-terminal display objects until no non-terminal display objects remain in the displayed expression, wherein the non-terminal display objects correspond to non-terminal elements within the grammar, and wherein the terminal display objects correspond to terminal elements within the grammar. | 05-23-2013 |
20130132071 | Method and Apparatus for Automatically Analyzing Natural Language to Extract Useful Information - An automatic language-processing system uses a human-curated lexicon to associate words and word groups with broad sentiments such as fear or anger, and topics such as accounting fraud or earnings projections. Grammar processing further characterizes the sentiments or topics with logical (“is” or “is not”), conditional (probability), temporal (past, present, future), quantitative (larger/smaller, higher/lower, etc.), and speaker identification (“I” or “He” or “Alan Greenspan”) measures. Information about the characterized sentiments and topics found in electronic messages is stored in a database for further analysis, display, and use in automatic trading systems. | 05-23-2013 |
20130132072 | ENGINE FOR HUMAN LANGUAGE COMPREHENSION OF INTENT AND COMMAND EXECUTION - The invention provides a computer system for interacting with a user. A set of concepts initially forms a target set of concepts. An input module receives a language input from the user. An analysis system executes a plurality of narrowing cycles until a concept packet having at least one concept has been identified. Each narrowing cycle includes identifying at least one portion of the language and determining a subset of concepts from the target set of concepts to form a new target subset. An action item identifier identifies an action item from the action items based on the concept packet. An action executer that executes an action based on the action item that has been identified. | 05-23-2013 |
20130138423 | Contextual search for modeling notations - A method, an apparatus, and a computer program product for contextual-based search of modeling notations to be used in a model. The method comprises obtaining a contextual property of a notation to be used in a diagram, wherein the contextual property defines a context of a usage of the notation in the diagram; and searching in a notation-base for notations, whereby a search result set is obtained, wherein the search result set comprises notations that were previously used in a similar context to the contextual property, wherein the notation-base is stored in a data storage. | 05-30-2013 |
20130138424 | Context-Aware Interaction System Using a Semantic Model - The subject disclosure is directed towards detecting symbolic activity within a given environment using a context-dependent grammar. In response to receiving sets of input data corresponding to one or more input modalities, a context-aware interactive system processes a model associated with interpreting the symbolic activity using context data for the given environment. Based on the model, related sets of input data are determined. The context-aware interactive system uses the input data to interpret user intent with respect to the input and thereby, identify one or more commands for a target output mechanism. | 05-30-2013 |
20130138425 | MULTIPLE RULE DEVELOPMENT SUPPORT FOR TEXT ANALYTICS - Methods, computer program products and systems are provided for applying text analytics rules to a corpus of documents. The embodiments facilitate selection of a document from the corpus within a graphical user interface (GUI), where the GUI opens the selected document to display text of the selected document and also a token parse tree that lists tokens associated with text components of the document, facilitate construction of a text analytics rule, via the GUI, by user selection of one or more tokens from the token parse tree, and, in response to a user selecting one or more tokens from the token parse tree, provide a list of hits via the GUI, the hits including a listing of text components from documents of the corpus that are associated with tokens that comply with the constructed text analytics rule. | 05-30-2013 |
20130138426 | AUTOMATED CONTENT GENERATION - Described are computer-based methods and apparatuses, including computer program products, for automated content generation. In some examples, the method includes generating content metadata from document content via natural language processing based on one or more context parameters associated with the document content. The method can further include receiving user feedback about the content metadata from a computing device associated with a user associated with the document content. The method can further include modifying the one or more context parameters based on the received user feedback. | 05-30-2013 |
20130138427 | Fraud Detection Using Text Analysis - In one embodiment, a method executed by at least one processor includes receiving text from submitted by a user. The method also includes determining a text score for the received text by comparing a first set of phrases included in the received text to a second set of phrases. The second set of phrases includes phrases from stored text. The stored text includes stored text known to be genuine and stored text known to be fraudulent. The method also includes determining that the received text is fraudulent based on the text score. | 05-30-2013 |
20130138428 | SYSTEMS AND METHODS FOR AUTOMATICALLY DETECTING DECEPTION IN HUMAN COMMUNICATIONS EXPRESSED IN DIGITAL FORM - An apparatus and method for determining whether text is deceptive has a computer programmed with software that automatically analyzes text in digital form by at least one of statistical analysis of psycho-linguistic cues, IP geo-location, gender analysis, authorship analysis, and analysis to detect coded/camouflaged messages. The computer has truth data against which the text message can be compared and a graphical user interface. The computer may be connectable to the Internet and may obtain the text automatically. Speech-to-text software may be used to convert verbal messages to text for analysis. The system may be made available on a webpage, web service, on a computer or by a wireless device. The text may be emails, website content, tweets. In one embodiment, the system detects coded messages (FIG. | 05-30-2013 |
20130138429 | Method and Apparatus for Information Searching - Techniques for performing searches using synonym pairs generated from data mining are described herein. These techniques may include receiving, by a server, a query including a keyword. The server may generate multiple synonym pairs associated with the keyword by mining multiple item descriptions under a certain context, and then calculate a comprehensive relevance for individual synonym pair. If the comprehensive relevance is greater than a predetermined value, the server may perform searches based on the individual synonym pair. | 05-30-2013 |
20130138430 | METHODS AND APPARATUS TO CLASSIFY TEXT COMMUNICATIONS - Methods and apparatus to classify text communications are disclosed. An example method includes determining a first score indicating a likelihood that a text belongs to a first classification mode by combining a first sentence score and a second sentence score retrieved from an index, the first sentence score indicating a probability that a first sentence in the text belongs to the first classification mode, the second sentence score indicating that a second sentence following the first sentence belongs to the first classification mode, determining a second score indicating a likelihood that the text belongs to a second classification mode, comparing the first score to the second score, classifying the text as the first classification mode when the first score is greater than the second score, and determining a confidence level that the text belongs to the first classification mode by dividing the first score by the second score. | 05-30-2013 |
20130144602 | Quantitative Type Data Analyzing Device and Method for Quantitatively Analyzing Data - A method for quantitatively analyzing data is applied to a computer system for determining whether a document under test is sensitive. The method obtains sample message from the computer system, partitions content of the sample message to derive at least one original paragraph. The method then partitions the original paragraph to derive original sentences and to derive a plurality of original sentence characteristics from the original sentences. After that, the method produces the feature vector according to the derived sentence characteristics. | 06-06-2013 |
20130144603 | ENHANCED VOICE CONFERENCING WITH HISTORY - Techniques for ability enhancement are described. Some embodiments provide an ability enhancement facilitator system (“AEFS”) configured to enhance voice conferencing among multiple speakers. Some embodiments of the AEFS enhance voice conferencing by recording and presenting voice conference history information based on speaker-related information. The AEFS receives data that represents utterances of multiple speakers who are engaging in a voice conference with one another. The AEFS then determines speaker-related information, such as by identifying a current speaker, locating an information item (e.g., an email message, document) associated with the speaker, or the like. The AEFS records conference history information (e.g., a transcript) based on the determined speaker-related information. The AEFS then informs a user of the conference history information, such as by presenting a transcript of the voice conference and/or related information items on a display of a conferencing device associated with the user. | 06-06-2013 |
20130144604 | SYSTEMS AND METHODS FOR EXTRACTING ATTRIBUTES FROM TEXT CONTENT - Systems and method for extracting attributes from text content are described. Example embodiments may include a computer implemented method for extracting attributes from text data, wherein the text data is obtained from at least one information source. As described, the implementation may include receiving, from a user, an address for the at least one information source and an attribute name, creating a tagged information file by associating a part of speech tag to text data obtained from the at least one information source, identifying a location of the attribute name in the tagged information file using an approximate text matching technique and determining at least one attribute descriptor from the tagged information file wherein the tagged information file is parsed based on a part of speech tag associated with the attribute name to determine a conclusion of the attribute descriptor. | 06-06-2013 |
20130144605 | Text Mining Analysis and Output System - A natural language authoring system that organizes technical, financial, legal and market information into Point of View specific analytical, visual and narrative decision-support content. The expert system transforms a user's point of view into a tailored narrative and/or visualization report. Expert rules embed interactive advertising, such as affiliate URL links, into analytical, visual and narrative and statistical content. The rules may be modified by one or more users, thereby capturing knowledge as the rules are utilized by users of the system. | 06-06-2013 |
20130144606 | System and Method for Using Data and Derived Features to Automatically Generate a Narrative Story - A system and method for automatically generating a narrative story receives data and information pertaining to a domain event. The received data and information and/or one or more derived features are then used to identify a plurality of angles for the narrative story. The plurality of angles is then filtered, for example through use of parameters that specify a focus for the narrative story, length of the narrative story, etc. Points associated with the filtered plurality of angles are then assembled and the narrative story is rendered using the filtered plurality of angles and the assembled points. | 06-06-2013 |
20130144607 | CHARACTER-BASED AUTOMATED TEXT SUMMARIZATION - Methods, devices, systems and tools are presented that allow the summarization of text, audio, and audiovisual presentations, such as movies, into less lengthy forms. High-content media files are shortened in a manner that preserves important details, by splitting the files into segments, rating the segments, and reassembling preferred segments into a final abridged piece. Summarization of media can be customized by user selection of criteria, and opens new possibilities for delivering entertainment, news, and information in the form of dense, information-rich content that can be viewed by means of broadcast or cable distribution, “on-demand” distribution, internet and cell phone digital video streaming, or can be downloaded onto an iPod™ and other portable video playback devices. | 06-06-2013 |
20130144608 | Incorporation of Variables Into Textual Content - Embodiments of the invention provide techniques for incorporating variable values into textual content. In one embodiment, an abstract phrase including a text phrase and a variable at a particular position in the text phrase is received. The abstract phrase may include multiple variables. A text value for the variable is received. The text phrase of the abstract phrase is combined with the text value according to the particular position of the variable. An integration rule is applied at a boundary of the text phrase of the abstract phrase and the text value, where the integration rule is based on a language rule. The integration rule modifies a portion of the text phrase of the abstract phrase or a portion of the text value to produce an integrated phrase. | 06-06-2013 |
20130144609 | TEXT PROCESSING SYSTEM, TEXT PROCESSING METHOD, AND TEXT PROCESSING PROGRAM - Provided is a text processing system capable of avoiding declining processing efficiency in analyses of text that does not contain breaks. | 06-06-2013 |
20130151235 | LINGUISTIC KEY NORMALIZATION - Systems, methods, and apparatuses including computer program products are provided for training machine learning systems. In some implementations, a method is provided. The method includes receiving a collection of phrases, normalizing a plurality of phrases of the collection of phrases, the normalizing being based at least in part on lexicographic normalizing rules, and generating a normalized phrase table including a plurality of key-value pairs, each key value pair includes a key corresponding to a normalized phrase and a value corresponding to one or more un-normalized phrases associated with the normalized key, each un-normalized phrase having one or more parameters. | 06-13-2013 |
20130151236 | COMPUTER IMPLEMENTED SEMANTIC SEARCH METHODOLOGY, SYSTEM AND COMPUTER PROGRAM PRODUCT FOR DETERMINING INFORMATION DENSITY IN TEXT - A method, computer program product and system are disclosed for determining the semantic density of textualized digital media is (a measure of how much information is conveyed in a sentence or clause relative to its length). The more semantically dense text is, the more information it conveys in a given space. Users input a topic, a timeline, and one or more target web media sources for analysis. Text in the target media sources is deconstructed to determine density, and a density rating assigned to the web media source. Over time, users can track trends in the density of text media relative to a given topic, and determine how much information is being conveyed in connection with the topic, such as a political campaign. Line graphs, pie charts, and other time-elapsed output graphic representations of the semantic density are generated and rendered for the user. | 06-13-2013 |
20130151237 | DYNAMIC METHOD FOR EMOTICON TRANSLATION - A vehicle communication system is provided and may include at least one communication device that audibly communicates information within the vehicle. A controller may receive a character string from an external device and may determine if the character string represents an emoticon. The controller may translate the character string into a face description if the character string represents an emoticon and may audibly communicate the face description via the at least one communication device. | 06-13-2013 |
20130151238 | Generation of Natural Language Processing Model for an Information Domain - Embodiments relate to a method, apparatus and program product and for generating a natural language processing model for an information domain. The method derives a skeleton of a natural language lexicon from a source model and uses it to form a dictionary. It also applies a set of syntactical rules defining concepts and relationships to the dictionary and expands the skeleton of the natural language lexicon based on a plurality of reference documents from the information domain. Using the expanded skeleton of the natural language lexicon, it also provides a natural language processing model for the information domain. | 06-13-2013 |
20130151239 | ORTHOGRAPHICAL VARIANT DETECTION APPARATUS AND ORTHOGRAPHICAL VARIANT DETECTION PROGRAM - Provided is an orthographical variant detection apparatus which detects orthographical variant candidates with a high precision. The orthographical variant detection apparatus includes a term extraction unit that extracts terms from document data, a similarity computation unit that computes similarity of an arbitrary pair of the extracted terms, an orthographical variant candidate determination unit that determines, based on the similarity, whether or not the terms in the pair of terms are orthographical variant candidates, and a group classification unit that groups the orthographical variant candidates based on a character string commonly included in pair of terms as the orthographical variant candidates. | 06-13-2013 |
20130151240 | INTERACTIVE FACT CHECKING SYSTEM - A fact checking system is able to verify the correctness of information and/or characterize information by comparing the information with one or more sources. The fact checking system automatically monitors, processes, fact checks information and indicates a status of the information. The fact checking system is able to be interactive with a user, so that a user is able to respond to a fact check result and receive additional information. | 06-13-2013 |
20130158977 | System and Method for Evaluating Speech Exposure - Systems and methods are provided for detecting and analyzing speech spoken in the vicinity of a user. The detected speech may be analyzed to determine the quality, volume, complexity, language, and other attributes. A value metric may be calculated for the received speech, such as to inform parents of a child's progress related to learning to speak, or to provide feedback to a foreign language learner. A corresponding device may display the number of words, the value metric, or other information about speech received by the device. | 06-20-2013 |
20130158978 | Adaptation of Vocabulary Levels for Enhanced Collaboration - A mechanism is provided for adapting vocabulary levels in a collaborative session. A vocabulary level indicator is received for a first user in the collaborative session. During generation of an electronic communication by a second user in the collaborative session, text entered in the electronic communication is scanned in order to identify a vocabulary level associated with text. The vocabulary level associated with the text is compared to the vocabulary level indicator for the first user. Responsive to the text exceeding the vocabulary level indicator for the first user thereby indicating violating text, an indication is provided to the second user that the violating text is above a vocabulary level of the first user. | 06-20-2013 |
20130158979 | System and Method for Identifying Phrases in Text - A method includes accessing text that includes a plurality of words, tagging each of the plurality of words with one of a plurality of parts of speech (POS) tags, and creating a plurality of tokens, each token comprising one of the plurality of words and its associated POS tag. The method further includes clustering one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the POS tags of the one or more tokens, and forming a phrase based on the chunk of tokens, the phrase comprising the words of the one or more tokens clustered into the chunk of tokens. | 06-20-2013 |
20130158980 | SUGGESTING INTENT FRAME(S) FOR USER REQUEST(S) - Techniques are described herein that are capable of suggesting intent frame(s) for user request(s). For instance, the intent frame(s) may be suggested to elicit a request from a user. An intent frame is a natural language phrase (e.g., a sentence) that includes at least one carrier phrase and at least one slot. A slot in an intent frame is a placeholder that is identified as being replaceable by one or more words that identify an entity and/or an action to indicate an intent of the user. A carrier phrase in an intent frame includes one or more words that suggest a type of entity and/or action that is to be identified by the one or more words that may replace the corresponding slot. In accordance with these techniques, the intent frame(s) are suggested in response to determining that natural language functionality of a processing system is activated. | 06-20-2013 |
20130158981 | LINKING NEWSWORTHY EVENTS TO PUBLISHED CONTENT - Methods, systems, and computer programs are presented for linking newsworthy events in a document to published content. One method includes an operation for receiving features by a classifier that is operable to determine a probability of the availability of news for a sentence. When the features are found in the sentence, the probability of the availability of news for the sentence increases, where the sentence includes one or more noun phrases and ends in a full stop. The classifier determines which sentences in a document are candidate sentences for being linked to news articles, and for each candidate sentence, the method includes an operation for finding an associated news article when there is an associated news article exceeding a relevance threshold. Further, the method includes operations for adding links in the document to the found associated news articles, and for displaying the document with the added links. | 06-20-2013 |
20130158982 | Computer-Implemented Systems and Methods for Content Scoring of Spoken Responses - Systems and methods are provided for scoring a non-scripted speech sample. A system includes one or more data processors and one or more computer-readable mediums. The computer-readable mediums are encoded with a non-scripted speech sample data structure, where the non-scripted speech sample data structure includes: a speech sample identifier that identifies a non-scripted speech sample, a content feature extracted from the non-scripted speech sample, and a content-based speech score for the non-scripted speech sample. The computer-readable mediums further include instructions for commanding the one or more data processors to extract the content feature from a set of words automatically recognized in the non-scripted speech sample and to score the non-scripted speech sample by providing the extracted content feature to a scoring model to generate the content-based speech score. | 06-20-2013 |
20130158983 | System and Method for Identifying Phrases in Text - A method includes accessing text that includes a plurality of words, tagging each of the plurality of words with one of a plurality of parts of speech (POS) tags, and creating a plurality of tokens, each token comprising one of the plurality of words and its associated POS tag. The method further includes clustering one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the POS tags of the one or more tokens, and forming a phrase based on the chunk of tokens, the phrase comprising the words of the one or more tokens clustered into the chunk of tokens. | 06-20-2013 |
20130158984 | METHOD OF AND SYSTEM FOR VALIDATING A FACT CHECKING SYSTEM - A fact checking system is able to verify the correctness of information and/or characterize information by comparing the information with one or more sources. The fact checking system automatically monitors, processes, fact checks information and indicates a status of the information. Fact checking results are able to be validated by re-fact checking the fact check results. | 06-20-2013 |
20130158985 | System and Method for Converting Graphical Call Flows Into Finite State Machines - A method, system and module for automatically converting a call flow into a state-based representation are disclosed. The method comprises walking a call flow and converting each page of the call flow into a rule of a higher level representation of the call flow, augmenting the higher level representation with terminal symbols representing state variable assignments and comparisons associated with decision and computation shapes in the call flow and converting the higher level representation into a state-based representation. | 06-20-2013 |
20130158986 | COMMUNICATIONS ANALYSIS SYSTEM AND PROCESS - A communications analysis process, including: accessing communications data representing communications of processing the communications data to determine similarity data representing similarities between concepts expressed by the one or more persons at different times during said communications; and processing the similarity data to determine one or more metrics of said communications. | 06-20-2013 |
20130166280 | Concept Search and Semantic Annotation for Mobile Messaging - A textual message processing system and method are described for use in a mobile environment. A user messaging application processes at least one user textual message during a user messaging session. A semantic annotation module identifies one or more semantically salient terms in the user textual message, and annotates the user textual message with annotation terms having a low semantic distance to the semantically salient terms. A user message history stores the annotated textual messages. The semantic annotation module may further annotate the user textual message with situational meta-data characterizing the user textual message. There may be a message search module for using one or more keywords to search the user message history including the annotation terms, and identifying as a search match any annotated textual messages within a semantic distance threshold of the one or more keywords. | 06-27-2013 |
20130166281 | OPTIMALLY SORTING A LIST OF ELEMENTS BASED ON THEIR SINGLETONS - A method provides a non-optimized list of elements, with some of the elements having multiple terms. A table of sub-elements is generated from the elements list, with each sub-element having one term only and with a number of times a sub-element appears in the elements list being weighted in the sub-elements table. A weighted singleton histogram table is generated using a singleton dictionary, and a total popularity score of each singleton is computed from the sub-elements table. For each element from the elements list, an elements score is generated based on the total popularity score of each singleton within the element. An optimally sorted list of the elements list is generated based on the elements scores. | 06-27-2013 |
20130166282 | METHOD AND APPARATUS FOR RATING DOCUMENTS AND AUTHORS - Methods and apparatus for determining a competence rating of an author relating to one or more topics is disclosed. An exemplary method comprises determining semantic information associated with one or more documents related to the one or more topics, determining amplification information associated with the one or more documents, determining occurrence information associated with the author; and determining a competence rating for the author based at least in part on the semantic information associated with the one or more documents, the amplification information associated with the one or more documents, and the occurrence information associated with the author. A document rating for at least one of the one or more documents may also be determined based at least in part on the one or more weighted semantic features and the amplification information. | 06-27-2013 |
20130166283 | METHOD AND APPARATUS FOR GENERATING PHONEME RULE - A phoneme rule generating apparatus includes a spectrum analyzer configured to analyze pronunciation patterns of voices included in a plurality of voice data, a clusterer configured to cluster the plurality of voice data based on the analyzed pronunciation patterns, a voice group generator configured to generate voice groups from the clustered voice data, a phoneme rule generator configured to generate a phoneme rule corresponding to each respective voice group from among the generated voice groups and a group mapping DB configured to store the generated voice groups and the generated phoneme rules for an accurate voice recognition. | 06-27-2013 |
20130166284 | System and Method of Spoken Language Understanding in Human Computer Dialogs - A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self-contained clauses, identifying a dialog act in each of the self-contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify. | 06-27-2013 |
20130173248 | LEVERAGING LANGUAGE STRUCTURE TO DYNAMICALLY COMPRESS A SHORT MESSAGE SERVICE (SMS) MESSAGE - A message within a message queue can be identified. The message queue can be within a software entity of a computing device. The message can be analyzed to determine an encoding scheme to apply to the message. The message can be encoded using the encoding scheme to create an encoded message. The encoding scheme can be a word level encoding scheme, a language-based encoding scheme, or a grammar encoding scheme. | 07-04-2013 |
20130173249 | Natural Language Processing ('NLP') - Natural language processing (‘NLP’) including: receiving text specifying predetermined evidence; receiving a text passage to process, the text passage including conditions and logical operators, the text passage comprising criteria for evidence; decomposing the text passage into coarse grained text fragments, including grouping text segments in dependence upon the logical operators; analyzing each coarse grained text fragment to identify conditions; evaluating each identified condition in accordance with the predetermined evidence and predefined condition evaluation rules; evaluating each coarse grained text fragment in dependence upon the condition evaluations and the logical operators; and calculating, in dependence upon the evaluations of each text fragment, a truth value indicating a degree to which the evidence meets the criteria of the text passage. | 07-04-2013 |
20130173250 | STATISTICAL STEMMING - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating suffix rewriting rules. A method includes obtaining a plurality of canonical suffix-rewriting rules each associated with one or more words, generating a suffix tree from the words, selecting a minimum colored subset of the nodes and leaves in the suffix tree, and generating a plurality of final suffix-rewriting rules from the nodes in the minimum colored subset. Another method includes receiving applicable and non-applicable words for a suffix-rewriting rule, generating a suffix tree from the applicable words and the non-applicable words, selecting a minimum colored subset of the nodes and leaves in the suffix tree, and generating a plurality of suffix-rewriting rules, wherein each rule corresponds to a node in the minimum colored subset with a valid status. | 07-04-2013 |
20130173251 | ELECTRONIC DEVICE AND NATURAL LANGUAGE ANALYSIS METHOD THEREOF - A natural language analysis method for an electronic device is provided. The language analysis method includes the steps of: receiving user inputs and generating signals; converting signals into textual information; segmenting the textual information into a number of vocabulary segments, each vocabulary segment including a number of separated vocabularies; retrieving the use frequency of each of vocabulary, sorting the vocabulary segments, and obtaining a first sorting of the number of vocabulary segments into descending order; segmenting the textual information into a number of sentence segmentations; obtaining a second sorting of the vocabulary segmentations, according to the number of sentence segmentations and the number of vocabulary segment results; and determining a reply to the textual information, according to the topmost result after the second sorting. An electronic device using the language analysis method is also provided. | 07-04-2013 |
20130173252 | ELECTRONIC DEVICE AND NATURAL LANGUAGE ANALYSIS METHOD THEREOF - A language analysis method for an electronic device storing a basic corpus and a temporary corpus is provided. The language analysis method includes steps of receiving user inputs and generating signals; converting signals into textualized information; analyzing the textualized information; obtaining a first understanding result according to the basic corpus, the vocabulary segmentation results, and the sentence segmentation results; determining whether the first understanding result is an appropriate understanding according to the context; determining one or more anaphoric vocabularies when the first understanding result is an inappropriate understanding; determining a temporary understanding result of the one or more anaphoric vocabularies and a second understanding result of the textualized information according to the context; and determining a reply for the textualized information, according to the second understanding result, the basic corpus, and the temporary corpus. An electronic device using the language analysis method is also provided. | 07-04-2013 |
20130173253 | SPEECH EFFECTS - A method of complementing a spoken text. The method including receiving text data representative of a natural language text, receiving effect control data including at least one effect control record, each effect control record being associated with a respective location in the natural language text, receiving a stream of audio data, analyzing the stream of audio data for natural language utterances that correlate with the natural language text at a respective one of the locations, and outputting, in response to a determination by the analyzing that a natural language utterance in the stream of audio data correlates with a respective one of the locations, at least one effect control signal based on the effect control record associated with the respective location. | 07-04-2013 |
20130173254 | Sentiment Analyzer - A sentiment analysis tool receives a t-gram from an electronic device. The t-gram comprises gram(s), each of the gram(s) representing a word in a collection of words. A polarity is set for the t-gram. Possible smaller-gram combinations are generated from the t-gram. Until a condition is met, iterative actions are taken. A likelihood ratio is calculated for the largest of the smaller-gram combinations employing the training set. A determination is made of whether the likelihood ratio meets a minimum replication threshold. If satisfied: the smaller-gram combinations most distant from an undefined polarity value are selected, the smaller-gram combinations employed in calculating the likelihood ratio are excluded; the polarity value for the t-gram is increasing proportional to the likelihood ratio; and the training set is reduced to v-grams that include the t-gram. Otherwise, the size of the smaller-gram is reduced by 1. | 07-04-2013 |
20130173255 | Methods for Creating A Phrase Thesaurus - The invention enables creation of grammar networks that can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces (NLVUI). More specifically, the invention concerns a phrase-based modeling of generic structures of verbal interaction and use of these models for the purpose of automating part of the design of such grammar networks. | 07-04-2013 |
20130173256 | NATURAL LANGUAGE PROCESSING ('NLP') - Natural language processing (‘NLP’) including: receiving text specifying predetermined evidence; receiving a text passage to process, the text passage including conditions and logical operators, the text passage comprising criteria for evidence; decomposing the text passage into coarse grained text fragments, including grouping text segments in dependence upon the logical operators; analyzing each coarse grained text fragment to identify conditions; evaluating each identified condition in accordance with the predetermined evidence and predefined condition evaluation rules; evaluating each coarse grained text fragment in dependence upon the condition evaluations and the logical operators; and calculating, in dependence upon the evaluations of each text fragment, a truth value indicating a degree to which the evidence meets the criteria of the text passage. | 07-04-2013 |
20130173257 | Systems and Processes for Identifying Features and Determining Feature Associations in Groups of Documents - Systems and computer-implemented processes for identification of features and determination of feature associations in a group of documents can involve providing a plurality of keywords identified among the terms of at least some of the documents. A value measure can be calculated for each keyword. High-value keywords are defined as those keywords having value measures that exceed a threshold. For each high-value keyword, term-document associations (TDA) are accessed. The TDA characterize measures of association between each term and at least some documents in the group. A processor quantifies similarities between unique pairs of high-value keywords based on the TDA for each respective high-value keyword and generates a similarity matrix that indicates one or more sets that each comprise highly associated high-value keywords. | 07-04-2013 |
20130173258 | Broad-Coverage Normalization System For Social Media Language - A method for identification of a standard text token in a dictionary that corresponds to a non-standard token identified in text includes identification of a first standard token that is associated with the non-standard using a predetermined conditional random field (CRF) model and identification of a second standard token that is associated with the non-standard token using a spell checker. The method further includes identification of noisy channel scores using data from the CRF model and the spell checker for the first standard token and the second standard token, respectively. The method further includes presentation of one of the first and second standard tokens having the greatest identified noisy channel score to a user with a user interface device. | 07-04-2013 |
20130179148 | METHOD AND APPARATUS FOR DATABASE AUGMENTATION AND MULTI-WORD SUBSTITUTION - A method and communication device are provided for database augmentation using linguistic data stored on a device, and utilizing a database stored on a device to perform multi-word substitution. A database may be augmented by monitoring other databases that contain linguistic data, such as contact databases containing linguistic data regarding entities that a device may communicate with, and updating the database with linguistic data in the other databases. The linguistic data in the augmented database may be compared with words received from an input apparatus to determine whether any of the received words should be replaced with linguistic data from the augmented database. The augmented database may contain one word entries and multi-word entries to allow for multi-word substitution. | 07-11-2013 |
20130179149 | COMMUNICATION PROCESSING - Disclosed are methods and apparatus for processing linguistic expressions (e.g., opinionated text documents). The linguistic expressions are processed by, firstly, detecting topics of interest discussed in the linguistic expressions. The sentiment, or sentiments, of an originator with respect to each of the topics detected in the linguistic expressions is then assessed. The originators are then grouped (or clustered) into one or more groups based on the similarities between the originators' respective sets of detected topics and corresponding sentiments. Semantic information is then associated with a given group. Finally, for a given member of a given group, a profile is created or updated. This profile comprises attributes that may be based on a degree of membership of the given member to the given group and the semantic information associated with the given group. | 07-11-2013 |
20130179150 | NOTE COMPILER INTERFACE - A computer implemented method is performed at an electronic device adapted to receive text-based input and having a display. The method comprises receiving text-based input, analyzing the inputted text, and performing, in dependence on the analyzing, at least one of two actions. The first action comprises comparing at least some of the analyzed text with other text accessed by the device; if a match is found between said at least some of the analyzed text and said accessed text, retrieving data associated with the matching accessed text, and associating the retrieved data with the inputted text for subsequent provision to the user. The second action comprises providing data associated with at least some of the analyzed text to a module of the device. In this way, data pull and push actions are performed in dependence on the analyzing. An electronic device and computer program product are also provided. | 07-11-2013 |
20130179151 | METHOD AND SYSTEM FOR CONSTRUCTING A LANGUAGE MODEL - Disclosed herein are various embodiments of methods and systems for constructing a first language model for use by a first Language Processing (LP) application of a plurality of LP applications. Each LP application of the plurality of LP applications receives one or more of a language based input, a derivative of the language based input, a response to the language based input and a derivative of the response. The method includes processing at least one input by a second LP application of the plurality of LP applications. Based on the processing of the second LP application, at least one output is generated. Subsequently, at least a portion of the first language model is constructed based on the at least one output. | 07-11-2013 |
20130179152 | Computer Implemented Method, Apparatus, Network Server And Computer Program Product - A computer implemented method for generating user element explanations for elements of a formal language may include: identifying at least one element of said formal language, selecting according to a target domain a respective mapping rule for each identified element, wherein the mapping rule refers to at least one word of a natural language and/or at least one audio file, generating at least one user element explanation for each identified element by automatically combining the respective at least one word of a natural language and/or at least one audio file according to predefined grammar rules, wherein the predefined grammar rules form part of the selected mapping rule and specify how to combine the at least one word of a natural language and/or audio file into said user element explanation, and linking the generated at least one user element explanation with the respective identified element of said formal language. | 07-11-2013 |
20130179153 | Computer-Implemented Systems and Methods for Detecting Punctuation Errors - Systems and methods are provided for detecting punctuation errors in a text including one or more sentences. A sentence including a plurality of words is received, the sentence including one or more preexisting punctuation marks. One or more punctuation marks are determined with a statistical classifier based on a set of rules, to be inserted in the sentence. The determined punctuation marks are compared with the preexisting punctuation marks. A report of punctuation errors is output based on the comparison. | 07-11-2013 |
20130185054 | TECHNIQUES FOR INSERTING DIACRITICAL MARKS TO TEXT INPUT VIA A USER DEVICE - A computer-implemented method for assisting a user to input Vietnamese text to a user device lacking a subset of characters in a Vietnamese alphabet includes receiving a character input by a user, determining three words previously input by the user, the three words having already had diacritical marks inserted, transmitting the three words and the character to a server via a network, receiving first and second information corresponding to the character from the server via the network, the first and second information generated at the server based on a context of the three words, the context determined at the server using a language model, the first and second information indicating whether the character requires a diacritical mark and a specific diacritical mark, respectively, generating a modified character comprising a character in the Vietnamese alphabet based on the character and the first and second information, and displaying the modified character. | 07-18-2013 |
20130185055 | System and Method for Performing Analysis on Information, Such as Social Media - A system for analyzing text-based information is presented. Each datum of information includes an author, a description and a timestamp. A fetcher fetches the raw information according to keywords. A parser parses the raw information to refine the results. A lexicon management module extracts lemmas from the raw information, and creates an edited lexicon containing the raw data and the lemmas for each datum. A data manager correlates lemmas in the edited lexicon and identifies clusters of lemmas that are correlated between each other. The results can be visually displayed to a user, and clusters of lemma that are less correlated than the other clusters can be visually identified. In one aspect, the user is able to excise the less correlated clusters, in order to further refine the results of the keyword search. | 07-18-2013 |
20130185056 | SYSTEM FOR GENERATING TEST SCENARIOS AND TEST CONDITIONS AND EXPECTED RESULTS - A requirements testing system facilitates the review and analysis of requirement statements for software applications. The requirements testing system automatically generates test artifacts from the requirement statements, including test scenarios, test conditions, test hints, and expected results. These test artifacts characterize the requirements statements to provide valuable analysis information that aids understanding what the intentions of the requirement statements are. The automation of the generation of these test artifacts produces numerous benefits, including fewer errors, objectivity, and no dependency on the skills and experience of a creator. | 07-18-2013 |
20130185057 | Computer-Implemented Systems and Methods for Scoring of Spoken Responses Based on Part of Speech Patterns - Systems and methods are provided for scoring a speech sample. Automatic speech recognition is performed on the speech sample using an automatic speech recognition system to generate a transcription of the sample. Words in the transcription are associated with parts of speech, and part of speech sequences are extracted from the parts of speech associations. A grammar metric is generated based on the part of speech sequences, and the speech sample is scored based on the grammar metric. | 07-18-2013 |
20130185058 | FORMAT FOR DISPLAYING TEXT ANALYTICS RESULTS - A system can receive text. The text can be divided into various portions. One or more significance indicators can be associated with each portion of text: these significance indicators can also be received by the system. The system can then display a portion of text and the associated significance indicators to the user. | 07-18-2013 |
20130185059 | Method and System for Automatically Detecting Morphemes in a Task Classification System Using Lattices - The invention concerns a method and corresponding system for building a phonotactic model for domain independent speech recognition. The method may include recognizing phones from a user's input communication using a current phonotactic model, detecting morphemes (acoustic and/or non-acoustic) from the recognized phones, and outputting the detected morphemes for processing. The method also updates the phonotactic model with the detected morphemes and stores the new model in a database for use by the system during the next user interaction. The method may also include making task-type classification decisions based on the detected morphemes from the user's input communication. | 07-18-2013 |
20130185060 | PHRASE BASED DOCUMENT CLUSTERING WITH AUTOMATIC PHRASE EXTRACTION - Meaningful phrases are distinguished from chance word sequences statistically, by analyzing a large number of documents and using a statistical metric such as a mutual information metric to distinguish meaningful phrases from groups of words that co-occur by chance. In some embodiments, multiple lists of candidate phrases are maintained to optimize the storage requirement of the phrase-identification algorithm. After phrase identification, a combination of words and meaningful phrases can be used to construct clusters of documents. | 07-18-2013 |
20130191113 | USER OPINION EXTRACTION METHOD USING SOCIAL NETWORK - A user opinion extraction method that includes: searching for social network groups that respectively have made one or more connections to a site having a domain that relates to software to be developed using a search module; analyzing structures of social networks for the retrieved social network groups using an analysis module; selecting a social network group from which user opinions are to be extracted based on a result of the analysis and collecting user opinions from SNS sentences that are mutually transmitted and received between a plurality of nodes within the selected social network group using a collection module; calculating degrees of influences of the collected user opinions on the social network group using a calculation module; and extracting at least one user opinion from among the collected user opinions in order of a higher degree of the influence on the social network group using an extraction module. | 07-25-2013 |
20130191114 | SYSTEM AND METHOD FOR PROVIDING UNIVERSAL COMMUNICATION - Provided is a system for providing universal communication, by generating a universal communication signal including a frequency component including light, a sound, a language, a dialect, an electromagnetic wave, and a vibration, by recording/storing the generated universal communication signal, and by converting an input signal into a universal communication signal, to enable communication between a human and a communication media or a non-human entity. | 07-25-2013 |
20130191115 | Methods and Systems for Transcribing or Transliterating to an Iconphonological Orthography - Described are methods of developing an iconographical, phonological, orthography for any spoken language. Such “iconophonological” orthographies can be applied to languages for which no written form exists, or can be used to supplement or replace extant writing systems. The iconicity of the orthographies represents features of the vocal tract, which limits the number of icons to easily learned sets. This simplification, and the phonological correspondence between the icons and spoken language, makes the orthographies easy to learn. The orthographies can use letters that represent the linguistic characteristics of the spoken language. By incorporation of cultural aesthetics, some embodiments bring a sense of ethnic belonging, and thus create an immediate emotional bond with the orthography. | 07-25-2013 |
20130197899 | SYSTEM AND METHOD FOR CONTEXTUALIZING DEVICE OPERATING PROCEDURES - A system and method for contextualizing operating procedures are provided. A set of procedures is provided, each including text describing user actions which are to be performed on a physical device to implement the procedure. A device model refers to components of the device on which user actions are performable and provides state charts which link an action performable on the respective component with states assumed by it. The text of each procedure is segmented to form a sequence of steps. Each step includes an action to be performed on one of the components of the device that is referred to in the device model. When a request for one of the procedures is received, the corresponding sequence of instruction steps is retrieved. A current one of the instruction steps is contextualized, based on device data received from the device and the state chart of the respective component. | 08-01-2013 |
20130197900 | Method and System for Determining Word Senses by Latent Semantic Distance - The invention relates to methods and systems for semantic disambiguation of a plurality of words. A representative method comprises providing a dataset of words associated by meaning into sets of synonyms; locating said sets at respective vertices of a graph according to semantic similarity and semantic relationship; transforming the graph into a Euclidean vector space comprising vectors indicative of respective locations of said sets; identifying a first group of said sets which include a first of said pair of words; identifying a second group of said sets which include a second of said pair of words; determining a closest pair in said vector space of said sets taken from said first and second groups of sets respectively; and outputting a meaning, of said plurality of words based on said closest pair of said sets and at least one of said semantic relationships between said closest pair of said sets. | 08-01-2013 |
20130204608 | IMAGE ANNOTATIONS ON WEB PAGES - An image in a web page may be annotated after deriving information about an image when the image may be displayed on multiple web pages. The web pages that show the image may be analyzed in light of each other to determine metadata about the image, then various additional content may be added to the image. The additional content may be hyperlinks to other webpages. The additional content may be displayed as annotations on top of the images and in other manners. Many embodiments may perform searching, analysis, and classification of images prior to the web page being served. | 08-08-2013 |
20130204609 | LANGUAGE INDEPENDENT PROBABILISTIC CONTENT MATCHING - Content is received and compared against rules for identifying a type of content. Each rule has both segmented and unsegmented patterns. The content is matched against the patterns and assigned a confidence score that is higher if the content matches a segmented pattern and lower if the content matches an unsegmented pattern. | 08-08-2013 |
20130204610 | Quasi Natural Language Man-Machine Conversation Device Base on Semantic Logic - The presented is a tool and method for language presentation, browsing, editing, translation and communication based on Semantic Web, to be utilized as interface for collaborating software products and services or human-machine interaction. The conceptual system is extended to further include such objects as language components, sentence patterns or syntax rules, to get solutions for semantic logic representation devices, language presentation devices, semantic-language converting devices, the registry and delegation system, in forming a language-component-based system for browsing, editing, conversion and communication. It is always allowed to bring need-based control over the conceptual system and the registry with their scope and scale being kept at appropriate level; with a widespread community participation, the establishment of semantic-language converting device ecosystem will be important guarantee of a flexible and diversified language expression system; therefore to constitute the core of those pragmatic standards or specifications for machine translation, human-machine interface and the web system. | 08-08-2013 |
20130204611 | TEXTUAL ENTAILMENT RECOGNITION APPARATUS, TEXTUAL ENTAILMENT RECOGNITION METHOD, AND COMPUTER-READABLE RECORDING MEDIUM - A textual entailment recognition apparatus ( | 08-08-2013 |
20130204612 | INTERACTIVE ENVIRONMENT FOR PERFORMING ARTS SCRIPTS - One or more embodiments present blocking information associated with a manuscript to a user. In one embodiment, a determination is made that at least one line from a digital representation of a manuscript has been selected. Another determination is made that the line is associated with a set of blocking information. The set of blocking information is presented on a digital representation of a stage. | 08-08-2013 |
20130204613 | LARGE-SCALE SENTIMENT ANALYSIS - A method for determining a sentiment associated with an entity includes inputting a plurality of texts associated with the entity, labeling seed words in the plurality of texts as positive or negative, determining a score estimate for the plurality of words based on the labeling, re-enumerating paths of the plurality of words and determining a number of sentiment alternations, determining a final score for the plurality of words using only paths whose number of alternations is within a threshold, converting the final scores to corresponding z-scores for each of the plurality of words, and outputting the sentiment associated with the entity. | 08-08-2013 |
20130204614 | REQUEST ACQUISITION SUPPORT SYSTEM IN SYSTEM DEVELOPMENT, REQUEST ACQUISITION SUPPORT METHOD AND RECORDING MEDIUM - A request pick-up assisting system includes: a question information registering unit registering a question item and attributes of a questionee; a basic connection word candidate extracting unit referring to information that includes words of the question, and extracting words that coexist with the words of the question; an attribute connection word candidate extracting unit referring to information including words of the question, and information that includes words constituting attributes of the questionee, and extracting, for each attribute, words in which the words of the question and the attributes coexist; an attribute specificity calculating unit calculating an attribute specificity based on dissimilarity between groups of words; an effective attribute extracting unit comparing attribute specificity for each attribute, and extracting a suitable attribute; a connection word extracting unit extracting a connection word about the effective attribute; and an association chart creating unit generating an association chart, by referencing the extracted connection word. | 08-08-2013 |
20130211821 | User Experience with Customized User Dictionary - In one embodiment, constructing one or more customized dictionaries for a particular user, each of the customized dictionaries comprising a different blending of one or more frequently used words collected from texts submitted by one or more users; and in response to the user inputting text to an electronic device, selecting one of the customized dictionaries and utilizing it to aid the particular user in inputting text. | 08-15-2013 |
20130211822 | SPEECH RECOGNITION APPARATUS, SPEECH RECOGNITION METHOD, AND COMPUTER-READABLE RECORDING MEDIUM - A speech recognition apparatus | 08-15-2013 |
20130211823 | CONCEPTUAL WORLD REPRESENTATION NATURAL LANGUAGE UNDERSTANDING SYSTEM AND METHOD - A Natural Language Understanding system is provided for indexing of free text documents. The system according to the invention utilizes typographical and functional segmentation of text to identify those portions of free text that carry meaning. The system then uses words and multi-word terms and phrases identified in the free to text to identify concepts in the free text. The system uses a lexicon of terms linked to a formal ontology that is independent of a specific language to extract concepts from the free text based on the words and multi-word terms in the free text. The formal ontology contains both language independent domain knowledge concepts and language dependent linguistic concepts that govern the relationships between concepts and contain the rules about how language works. The system according to the current invention may preferably be used to index medical documents and assign codes from independent coding systems, such as, SNOMED, ICD-9 and ICD-10. The system according to the current invention may also preferably make use of syntactic parsing to improve the efficiency of the method. | 08-15-2013 |
20130218553 | INFORMATION NOTIFICATION SUPPORTING DEVICE, INFORMATION NOTIFICATION SUPPORTING METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, an information notification supporting device includes an analyzer configured to analyze an input voice so as to identify voice information indicating information related to speech; a storage unit configured to store therein a history of the voice information; an output controller configured to determine, using the history of the voice information, whether a user is able to listen to a message of which the user should be notified; and an output unit configured to output the message when it is determined that the user is in a state in which the user is able to listen to the message. | 08-22-2013 |
20130218554 | Multi-Concept Latent Semantic Analysis Queries - A method includes accessing text, identifying a plurality of terms from the text, determining a plurality of term vectors associated with the identified plurality of terms, and clustering the determined plurality of term vectors into a plurality of clusters, the plurality of clusters comprising a first and a second cluster, the first and second clusters each comprising two or more of the determined term vectors. The method further includes creating a first pseudo-document according to the first cluster, creating a second pseudo-document according to the second cluster, identifying a first set of terms associated with the first cluster using latent semantic analysis (LSA) of the first pseudo-document, identifying a second set of terms associated with the second cluster using LSA of the second pseudo-document, and combining the first and second sets of terms into a list of output terms. | 08-22-2013 |
20130218555 | DEVICE FOR ANALYZING TEXT DOCUMENTS - An analysis device for analyzing a text document is provided. The analysis device includes a context storage unit configured to store context information that shows a position of a character set of a predetermined context in the text document. The analysis device also includes an index storage unit configured to store index information that shows a position of a word in the text document, for each word of a plurality of words contained in the text document. An input unit is configured to input a target word. A position detection unit is configured to detect from the index information a position of the target word contained in the text document. A frequency detection unit is configured to detect an appearance frequency of the target word per each type of context in the text document based on the position of the target word and on the context information. | 08-22-2013 |
20130226558 | Language Informed Source Separation - Methods and systems for non-negative hidden Markov modeling of signals are described. For example, techniques disclosed herein may be applied to signals emitted by one or more sources. The modeling may be constrained according to high level information. In some embodiments, methods and systems may enable the separation of a signal's various components. As such, the systems and methods disclosed herein may find a wide variety of applications. In audio-related fields, for example, these techniques may be useful in music recording and processing, source separation/extraction, noise reduction, teaching, automatic transcription, electronic games, audio search and retrieval, and many other applications. | 08-29-2013 |
20130226559 | APPARATUS AND METHOD FOR PROVIDING INTERNETDOCUMENTS BASED ON SUBJECT OF INTEREST TO USER - The present invention provides an apparatus for providing Internet documents based on a subject of interest to a user, including an subject reception unit configured to receive information on a subject from a user terminal; a relevant document collection unit configured to collect relevant documents related to the information on the subject of interest using search engines; a similar sentence classification unit configured to extract a core sentence from the relevant documents, calculate similarity of sentences peripheral to the core sentence, and classify sentences similar to the core sentence into similar sentence sets based on the calculated similarity; and a similar sentence providing unit configured to provide the core sentence and the similar sentence sets to the user terminal. | 08-29-2013 |
20130226560 | SYSTEM AND METHOD FOR DISCOVERING STORY TRENDS IN REAL TIME FROM USER GENERATED CONTENT - A method for identifying story trends includes identifying a set of words in a fixed size data stream based on a subword cache, and electronically determining at least one story trend associated with the set of words and electronically generating a story hash associated with the set of words. The method also includes storing the story hash in a story trend cache and updating the story trend cache according to the story hash, and retrieving one or more popular story topics according to the story trend cache. Machine readable media including program code that causes execution of a method for generating search results also are described. | 08-29-2013 |
20130226561 | INTELLIGENT EMOTION-INFERRING APPARATUS, AND INFERRING METHOD THEREFOR - The present disclosure provides an intelligent emotion-inferring apparatus and inferring method therefor. The intelligent emotion-inferring apparatus includes: an emotional-word-storing unit, which classifies emotional words into items including at least one among similarity, positivity or negativity, and emotional intensity, using classes of emotion comprising a basic emotion group which classifies human emotions and a detailed emotion group which classifies the basic emotion group, and stores the words in an emotional-word dictionary; a sentence-converting unit which ascertains the words and phrases of sentence logged by a user and converts the words and phrases into a basic format; a match-checking unit which checks the converted words and phrases for words and phrases matching those in the emotional-word dictionary; and an emotion-inferring unit, which applies a probabilistic model on the basis of co-occurrence of the converted words and phrases, and infers emotions on the basis of the probabilistic model. | 08-29-2013 |
20130226562 | SYSTEM AND METHOD FOR SEARCHING FUNCTIONS HAVING SYMBOLS - A system and method for searching through functions and expressions with symbols. Moreover, the system can be used to recognize and further analyze the notations of this nature and use this in order to translate, transform into audio, or solve the mathematical problems. According to at least some embodiments, the functions comprise mathematic equations which are defined by symbols and mathematic notation. The system and method enable a user to enter a mathematical equation in a WYSIWYG environment to a search engine, and to find similar or identical equations, first and foremost according to theoretical similarity, and secondly, according to visual similarity. The engine does this be understanding the meaning behind the visual symbols of the equation using a Dynamic Hidden Markov Model (hereon DHMM). The system enables the user to insert the equation with no prior knowledge of LaTeX, or any computing language, and with no need to follow a predefined generic protocol in order to insert the query. | 08-29-2013 |
20130226563 | RELATED-WORD REGISTRATION DEVICE, INFORMATION PROCESSING DEVICE, RELATED-WORD REGISTRATION METHOD, PROGRAM FOR RELATED-WORD REGISTRATION DEVICE, AND RECORDING MEDIUM - A related-word candidate group ( | 08-29-2013 |
20130231917 | SYSTEMS AND METHODS FOR NAME PRONUNCIATION - Systems and methods are provided for associating a phonetic pronunciation with a name by receiving the name, mapping the name to a plurality of monosyllabic components that are combinable to construct the phonetic pronunciation of the name, receiving a user input to select one or more of the plurality, and combining the selected one or more of the plurality of monosyllabic components to construct the phonetic pronunciation of the name. | 09-05-2013 |
20130231918 | SPLITTING TERM LISTS RECOGNIZED FROM SPEECH - In an embodiment, a method comprises analyzing a string of text that was generated based on audio input, identifying a plurality of text segments, wherein each text segment of the plurality of text segments comprises one or more words in the string of text, wherein at least one of the plurality of segments comprises a plurality of words, and organizing the plurality of text segments into a list of items, wherein each segment is a separate item in the list. | 09-05-2013 |
20130231919 | DISAMBIGUATING SYSTEM AND METHOD - A disambiguating method includes providing a storage unit storing a first database and a second database. The first database includes a dictionary of ambiguous language data, the second database includes a collection of disambiguating algorithms, each piece of ambiguous language data in the dictionary is associated with at least one of the disambiguating algorithms. A sentence input is received from the application system via the interface and recognized if the sentence comprises a piece of ambiguous language date which is defined in the dictionary. The recognized piece of ambiguous language data in the sentence is disambiguated using the at least one associated disambiguating algorithm, and results of disambiguating are generated. An interpretation is selected from the results and output to the application system via the interface. A disambiguating system is also provided. | 09-05-2013 |
20130231920 | APPARATUS FOR IDENTIFYING ROOT CAUSE USING UNSTRUCTURED DATA - A system and method of identifying root cause of an observation by leveraging features from unstructured data is disclosed. A report generation component may be configured to generate a report. A report presentation component may be configured to allow an operator to select an observation from the report. A root cause component may be configured to determine one or more causal factors associated with the observation. | 09-05-2013 |
20130231921 | Automatic Sound Level Control - A method includes identifying, at a computing device, a plurality of words in data. Each of the plurality of words corresponds to a particular word of a written language. The method includes determining a sound output level based on a location of the computing device. The method includes generating sound data based on the sound output level and the plurality of words identified in the data. | 09-05-2013 |
20130231922 | INTELLIGENT EMOTIONAL WORD EXPANDING APPARATUS AND EXPANDING METHOD THEREFOR - The present disclosure provides an intelligent emotional word expanding apparatus and an expanding method therefor. The intelligent emotional word expanding apparatus includes word dictionary storing module, emotion inferring module and word expanding module. The word dictionary storing module classifies emotional words into similarity, positivity or negativity, and emotional intensity using emotion classes including a basic emotion group classifying human emotions and a detailed emotion group classifying the basic emotion group, storing the classified emotional words in emotional word dictionary, and storing neutral words together with the number of calls thereof in neutral word dictionary. Emotion inferring module captures words and phrases of a sentence logged by a user, converting the words and phrases into basic formats, and inferring emotions. Word expanding module determines whether a word or a phrase is neutral on the basis of the neutral word dictionary when emotions are not inferred by the emotion inferring module. | 09-05-2013 |
20130238313 | DOMAIN SPECIFIC NATURAL LANGUAGE NORMALIZATION - Embodiments of the present invention provide a method, system and computer program product for the domain specific normalization of a corpus of text. In an embodiment of the invention, a method for domain specific normalization of a corpus of text is provided, including an industrial, organization, demographic or geographic domain. The method includes loading a corpus of text in memory of a computer and determining a domain for the corpus of text. The method also includes retrieving a lexicon of replacement words for the determined domain. Finally, the method includes text simplifying the corpus of text using the retrieved lexicon. In one aspect of the embodiment, the domain is determined through inference based upon words already presence in the corpus of text. In another aspect of the embodiment, the domain is determined based upon meta-data provided with the corpus of text. | 09-12-2013 |
20130238314 | METHODS AND SYSTEMS FOR PROVIDING AUDITORY MESSAGES FOR MEDICAL DEVICES - Methods and systems for providing auditory messages for medical devices are provided. One method includes receiving semantic rating scale data corresponding to a plurality of sounds and medical message descriptions and performing semantic mapping using the received semantic rating scale data. The method also includes determining profiles for audible medical messages based on the semantic mapping and generating audible medical messages based on the determined profiles. | 09-12-2013 |
20130238315 | METHOD, APPARATUS AND SYSTEM FOR FINDING SYNONYMS - A method and system are provided for finding synonyms which are more contextually relevant to the intended use of a particular word. The system finds a list of synonyms for the input word and also finds a list of synonyms for an additional word entered by the user to approximate the intended usage of the input word. These two lists of synonyms are compared to find words common to both lists, and the common words are presented to the user as potential synonyms which are appropriate for the intended use. | 09-12-2013 |
20130238316 | System and Method for Identifying Text in Legal documents for Preparation of Headnotes - A method for generating feature graphs employed for creation of a head note in a legal document is provided. The method enables identifying one or more predetermined features in a plurality of legal documents. The one or more predetermined features are based on grammatical constituents of text in the legal document. The plurality of legal documents is manually identified as headnote and non headnote. The method further enables obtaining data related to the availability of the one or more identified predetermined features in the sentences manually identified as headnote and non headnote in the plurality of legal documents. Furthermore, the method enables computing likelihood of a sentence being a headnote based on the obtained data. The method further enables generating feature graphs corresponding to each predetermined feature based on the computed likelihood and obtained data and storing the generated feature graphs in a repository. | 09-12-2013 |
20130238317 | VOCABULARY LOOK UP SYSTEM AND METHOD USING SAME - A vocabulary look up in an electronic apparatus includes a voiceprint acquiring module, a matching module, a reminding module, and a spelling module. The voiceprint acquiring module analyzes sounds collected by an audio collector to acquire voiceprints from the collected sounds. The matching module retrieves vocabulary information of a word associated with an acquired voiceprint in a lookup table. The reminding module displays a pop-up window to get the user to spell out the word to be looked up when more than one word is associated with the acquired voiceprint in the lookup table. The spelling module acquires the voiceprints of the letters constituting the word to be looked up to determine and displays the word to be looked up. | 09-12-2013 |
20130238318 | Method for Detecting Negative Opinions in Social Media, Computer Program Product and Computer - A method, device, and computer program product for detecting negative opinions in social media, computer program product, and computer. Negative opinions in social media can be precisely detected at an early stage. A method for processing, with a computer, a plurality of messages sent by a plurality of users over time includes the following steps: obtaining a plurality of messages, each including a specific proper noun; determining a politeness level of each of the plurality of messages, each including the specific proper noun; and calculating a proportion of messages having a politeness level lower than a certain threshold with respect to the plurality of messages, each including the specific proper noun. | 09-12-2013 |
20130238319 | INFORMATION PROCESSING APPARATUS AND MESSAGE EXTRACTION METHOD - A storage unit stores first filter information specifying the formats of messages and second filter information specifying weights for words or phrases. A first search unit selects messages matching the formats specified by the first filter information from a plurality of messages as messages to be extracted. A second search unit calculates the importance level of each message unselected by the first search unit, based on the words or phrases included in the message and the second filter information, and selects messages to be extracted, according to the calculated importance levels from the messages unselected by the first search unit. | 09-12-2013 |
20130238320 | SYSTEMS AND METHODS FOR GENERATING MARKUP-LANGUAGE BASED EXPRESSIONS FROM MULTI-MODAL AND UNIMODAL INPUTS - When using finite-state devices to perform various functions, it is beneficial to use finite state devices representing regular grammars with terminals having markup-language-based semantics. By using markup-language-based symbols in the finite state devices, it is possible to generate valid markup-language expressions by concatenating the symbols representing the result of the performed function. The markup-language expression can be used by other applications and/or devices. Finite-state devices are used to convert strings of words and gestures into valid markup-language, for example, XML, expressions that can be used, for example, to provide an application program interface to underlying system applications. | 09-12-2013 |
20130238321 | DIALOG TEXT ANALYSIS DEVICE, METHOD AND PROGRAM - A dialog text analysis device generates data for text processing from a dialog text. A negative judging means | 09-12-2013 |
20130246045 | Identification and Extraction of New Terms in Documents - A method and apparatus that can extract new terms from documents for inclusion in a vocabulary collection is disclosed. A document may be parsed to obtain an n-gram phrase indicative of a new term. The phrase may include a plurality of words. The n-gram phrase may be decomposed into a series of bi-gram phrases each including a first and a second phrase part. The first and second phrase parts each include at least one word. It may then be determined whether the first or second phrase part is in a vocabulary collection. If not, it may be estimated as to the probability that the bi-gram phrase should be in the vocabulary collection. The bi-gram phrase may be added to the vocabulary collection if the probability that the bi-gram phrase should be in the vocabulary collection exceeds a minimum threshold level. | 09-19-2013 |
20130246046 | RELATION TOPIC CONSTRUCTION AND ITS APPLICATION IN SEMANTIC RELATION EXTRACTION - Systems and method automatically collect training data from manually created semantic relations, automatically extract rules from the training data to produce extracted rules, and automatically characterize existing semantic relations in the training data based on co-occurrence of the extracted rules in the existing semantic relations. Such systems and methods automatically construct semantic relation topics based on the characterization of the existing semantic relations, and group instances of the training data into the semantic relation topics to detect new semantic relations. | 09-19-2013 |
20130246047 | Identification and Extraction of Acronym/Definition Pairs in Documents - A method and apparatus that can extract domain-specific acronyms and their definitions from large documents is disclosed. Strings of characters indicative of candidate acronyms within a portion of a document may be identified and extracted. Definitions for each selected string of characters may be extracted from text within the document proximal to that string of characters. Candidate acronym/definition pairs may be created for each selected string of characters based on the string of characters and their definitions. A classification system may be iteratively applied to the candidate acronym/definition pairs to create or update an acronym/definition pair dictionary for the document. | 09-19-2013 |
20130246048 | TEXT PROOFREADING APPARATUS AND TEXT PROOFREADING METHOD - A Japanese proofreading apparatus has a correction history corpus, a proofreading candidate generation unit, a proofreading availability determination unit, and an automatic proofreading unit. The correction history corpus stores negative sentences as post-proofreading sentences and positive example sentences as post-proofreading sentences, in association with each other. The proofreading candidate generation unit acquires the post-proofreading sentences corresponding to the pre-proofreading sentences from the correction history corpus, according to characteristics of a proofreading target sentence. The proofreading availability determination unit selects, from the post-proofreading sentences acquired by the proofreading candidate generation unit, post-proofreading sentences with degrees of similarity between the proofreading target sentence and the post-proofreading sentences equal to or more than a predetermined threshold value, as proofreading candidates. The automatic proofreading unit proofreads the proofreading target sentence, using, out of the post-proofreading sentences selected by the proofreading availability determination unit, a post-proofreading sentence with the highest degree of similarity. | 09-19-2013 |
20130246049 | METHOD AND SYSTEM FOR TEXT UNDERSTANDING IN AN ONTOLOGY DRIVEN PLATFORM - Embodiments of methods and systems for informatics systems are disclosed. Such informatics systems may utilize a unifying format to represent text to facilitate linking between data from the text and one or more ontologies, and the commensurate ability to mine such data. | 09-19-2013 |
20130246050 | VOICE CONTROL OF APPLICATIONS BY ASSOCIATING USER INPUT WITH ACTION-CONTEXT IDENTIFIER PAIRS - A method is provided for enabling or enhancing a use of voice control in a voice controlled application (VCA) via a development framework. The method includes: providing in the framework a plurality of action-context pairs—also called framework action-context pairs—usable in a memory of an application development device, which includes a processor, that serve to direct execution of the VCA, wherein the framework context defines a list of parameters related to the action and their respective value types; providing at least one of a voice recognition engine (VRE) and a natural language library to match each action-context pair with semantically related vocabulary; providing in the framework a registration mechanism that permits an association to be formed between an action-context pair and a handler in the voice controlled application. An associated development system for developing the VCA and user equipment that executes the VCA are provided as well. | 09-19-2013 |
20130253906 | ENVIRONMENT SENSITIVE PREDICTIVE TEXT ENTRY - Environmental factors may be used in a predictive text system provided in a device. The device may receive one or more characters entered by a user and determine, based on the one or more characters, words that are predicted to be words being entered by the user of the device, where the words are determined using grammar-based predictive techniques. The device may determine confidence scores, corresponding to the words. The device may refine the scores based on environmental data that includes data that describes an environment associated with the user of the device. The device may select, based on the refined plurality of scores, a subset of the plurality of words and output the subset of the plurality of words. | 09-26-2013 |
20130253907 | INTENTION STATEMENT VISUALIZATION - An example system includes an extraction module, an intention processing module, and an intention visualization module. The extraction module is configured to ingest textual data from a text source. The intention processing module is configured to process the textual data and identify one or more intention statements within the textual data. The intention visualization module is configured to provide an interactive interface that facilitates filtering and visualization of aspects of the one or more intention statements. | 09-26-2013 |
20130253908 | Method and System For Predicting Words In A Message - A method may include receiving a context comprising data that is indicative of one or more characters input by a user at the first computing device, sending information comprising at least a portion of the context and determining a first predicted word based at least in part on the context. The determining may be based at least in part on a local language model. The method may include receiving a second predicted word from a second computing device within a time period. The second predicted word may be determined based at least in part on the context and a remote language model, and the local language and the remote language model may be different. The method may include identifying one of the first predicted word and the second predicted word as a final predicted word, and outputting the final predicted word at a display. | 09-26-2013 |
20130253909 | SECOND LANGUAGE ACQUISITION SYSTEM - Method(s) and system(s) for speech processing of second language speech are described. According to the present subject matter, the system(s) implement the described method(s) for speech processing of Oriya English. The method for speech processing include receiving a plurality of speech samples of Oriya English to form a speech corpora where the plurality of speech samples comprise sounds of both vowels and consonants and, a plurality of speech parameters are associated with each of the plurality of speech samples. Method also includes determining values of the plurality of speech parameters for each of the plurality of speech samples and identifying difference between the values of each of the plurality of speech parameters and a corresponding value of accent neutral English. Further, the method includes articulating governing language rules based on the identifying to assess phonetic variation and mother tongue influence in sounds of vowels and consonants of Oriya English. | 09-26-2013 |
20130253910 | Systems and Methods for Analyzing Digital Communications - Systems and methods are provided for analyzing text within digital document. In some cases the analysis can include receiving and/or generating a digital document with processing circuitry and determining a distribution of each of a plurality of document terms based on occurrences of the document terms within a text sample and occurrences of sample terms within the text sample. Processing circuitry may be further used to determine a distribution characteristic for each of the plurality of document terms. The distribution characteristic for each document term can provide a measure of a characteristic of each respective document term's distribution. In some cases a characterization is provided of the text in the digital document with the processing circuitry based on the distribution characteristic of at least one of the plurality of document terms. | 09-26-2013 |
20130253911 | Real-time Data Localization - A method, apparatus, and system are provided for performing a real-time or a near real-time localization of data. The method comprises monitoring an input string and comparing a semantic associated with the input string to a semantic associated with at least one stored string. The method further comprises providing the stored string as an alternative to the input string. | 09-26-2013 |
20130253912 | SYSTEM AND METHOD FOR INPUTTING TEXT INTO ELECTRONIC DEVICES - A text prediction engine, a system comprising a text prediction engine, and a method for generating sequence predictions. The text prediction engine, system and method generate a final set of sequence predictions, each with an associated probability value. | 09-26-2013 |
20130253913 | KNOWLEDGE STORAGE AND RETRIEVAL SYSTEM AND METHOD - A system and method for representing, storing and retrieving real-world knowledge on a computer or network of computers is disclosed. Knowledge is broken down into permanent atomic “facts” which can be stored in a standard relational database and processed very efficiently. It also provides for the efficient querying of a knowledge base, efficient inference of new knowledge and translation into and out of natural language. Queries can also be processed with full natural language explanations of where the answers came from. The method can also be used in a distributed fashion enabling the system to be a large network of computers and the technology can be integrated into a web browser adding to the browser's functionality. | 09-26-2013 |
20130253914 | NET MODERATOR - A method and an apparatus for moderating an inappropriate relationship between two parties by analyzing a dialog between the two parties. The method and apparatus creates an alert depending upon the nature of the dialog between the two parties. The alert is sent to a third party who can moderate the relationship between the two parties. The third party can ban or block the dialog between the two parties based upon the inappropriate relationship between the two parties. A banning or block of the dialog between the two parties can also be automated. | 09-26-2013 |
20130253915 | HANDHELD ELECTRONIC DEVICE WITH TEXT DISAMBIGUATION - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device. | 09-26-2013 |
20130253916 | EXTRACTING TERMS FROM DOCUMENT DATA INCLUDING TEXT SEGMENT - A computer system, method, and article of manufacture for extracting a term from electronic document data that includes a text segment. The system includes: a first extraction unit that uses a first text processing information to extract a noun word from the document data; a second extraction unit that uses a second text processing information to extract a term candidate in relation to the noun word or a corpus that includes text data described in the same language used in the document data; a weight assignment unit that uses a third text processing information to select which type to assign a weight from the plurality of types and assigns the weight to the selected type for each noun word and term candidate; a determination unit that determines the type to which the noun word and term candidate belong; and an output unit to output the noun word and term candidate. | 09-26-2013 |
20130262082 | NATURAL LANGUAGE INCIDENT RESOLUTION - A natural language incident report resolution method and system are provided. Natural language incident reports received from a user are analyzed to determine a category associated with the incident. A database of existing incidents is analyzed to determine whether a report for the incident has already been submitted. The current status or state of the device associated with the incident is then ascertained and the incident, if new, is added to an incident database. If the incident is preexisting, the incident in the database is updated with the current status. A solution database is then queried to determine any solutions, automatic or manual workflows, that may correct the error or fault associated with the incident. The determined solution is communicated to the device associated with the incident for implementation. | 10-03-2013 |
20130262083 | Method and Apparatus for Processing Text with Variations in Vocabulary Usage - Text is processed to construct a model of the text. The text has a shared vocabulary. The text is partitioned into sets and subsets of texts. The usage of the shared vocabulary in two or more sets is different, and the topics of two or more subsets are different. A probabilistic model is defined for the text. The probabilistic model considers each word in the text to be a token having a position and a word value, and the usage of the shared vocabulary, topics, subtopics, and word values for each token in the text are represented using distributions of random variables in the probabilistic model, wherein the random variables are discrete. Parameters are estimated for the model corresponding to the vocabulary usages, the word values, the topics, and the subtopics associated with the words. | 10-03-2013 |
20130262084 | ITERATIVE FORWARD ERROR CORRECTION (FEC) ON SEGMENTED WORDS USING A SOFT-METRIC ARITHMETIC SCHEME - A system is to receive a word on which to perform error correction; obtain segments, from the word, each segment including a respective subset of samples; update, on a per segment basis, the word based on extrinsic information associated with a previous word; identify sets of least reliable positions (LRPs) associated with the segments; create a subset of LRPs based on a subset of samples within the sets of LRPs; generate candidate words based on the subset of LRPs; identify errors within the word or the candidate words; update, using the extrinsic information, a segment of the word that includes an error; determine distances between the candidate words and the updated word that includes the updated segment; identify best words associated with shortest distances; and perform error correction, on a next word, using other extrinsic information that is based on the best words. | 10-03-2013 |
20130262085 | NATURAL LANGUAGE PROCESSING APPARATUS, NATURAL LANGUAGE PROCESSING METHOD, NATURAL LANGUAGE PROCESSING PROGRAM, AND COMPUTER-READABLE RECORDING MEDIUM STORING NATURAL LANGUAGE PROCESSING PROGRAM - A natural language processing apparatus includes a result acquisition unit that acquires a plurality of analysis results indicating parts of speech of morphemes contained in one or more common sentences from a plurality of types of morphological analyzers, a pattern acquisition unit that detects a common segmentation point in the plurality of analysis results, extracts one or more parts of speech corresponding to a character string segmented at the common segmentation point from each of the analysis results, and acquires a set of the parts of speech as a part-of-speech differing pattern, and a candidate specifying unit that extracts the part-of-speech differing pattern with the number of appearances being equal to or less than a predetermined threshold and specifies the character string corresponding to the extracted part-of-speech differing pattern as a character string containing a candidate for an unknown word. | 10-03-2013 |
20130262086 | GENERATION OF A SEMANTIC MODEL FROM TEXTUAL LISTINGS - A corpus of textual listings is received and main concept words and attribute words therein are identified via an iterative process of parsing listings and expanding a semantic model. During the parsing phase, the corpus of textual listings is parsed to tag one or more head noun words and/or one or more identifier words in each listing based on previously identified main concept words or using a head noun identification rule. Once substantially each listing in the corpus has been parsed in this manner, the expansion phase assigns head noun words as main concept words and modifier words as attribute words, where possible. During the next iteration, the newly identified main concept words and/or attribute words are used to further parse the listings. These iterations are repeated until a termination condition is reached. Remaining words in the corpus are clustered based on the main concept words and attribute words. | 10-03-2013 |
20130262087 | SPEECH SYNTHESIS APPARATUS, SPEECH SYNTHESIS METHOD, SPEECH SYNTHESIS PROGRAM PRODUCT, AND LEARNING APPARATUS - According to one embodiment, a speech synthesis apparatus includes a language analyzer, statistical model storage, model selector, parameter generator, basis model storage, and filter processor. The language analyzer analyzes text data and outputs language information data that represents linguistic information of the text data. The statistical model storage stores statistical models prepared by statistically modeling acoustic information included in speech. The model selector selects a statistical model from the models based on the language information data. The parameter generator generates speech parameter sequences using the statistical model selected by the model selector. The basis model storage stores a basis model including basis vectors, each of which expresses speech information for each limited frequency range. The filter processor outputs synthetic speech by executing filter processing of the speech parameter sequences and the basis model. | 10-03-2013 |
20130262088 | Computer-Implemented Method, Program, and System for Identifying Non-Self-Descriptive Terms in Electronic Documents - A computer-implemented method, program, and system for identifying non-self-descriptive terms in electronic documents. The computer-implemented method for identifying a non-self-descriptive term in an electronic document, includes a memory and a processor communicatively coupled to the memory and configured to execute the steps of a method. The method includes acquiring a noun included in the corpus data. The method further includes calculating a qualifying level and a qualified level in the corpus data related to each known in the corpus data. The method further includes identifying one or more nouns included in the corpus data as having a qualifying level and/or qualified level satisfying a predetermined condition. The method further includes presenting a term related to one or more of the nouns in the electronic document as a candidate for the non-self-descriptive term in the electronic document. | 10-03-2013 |
20130262089 | NAMED ENTITY EXTRACTION FROM A BLOCK OF TEXT - A data processing method, program, and apparatus for identifying a document within a block of text. A block of text is tokenized into a plurality of text tokens according to at least one rule parser. Each of the plurality of text tokens is sequentially compared to a plurality of document tokens to determine if the text token matches one of the plurality of document tokens. The plurality of document tokens correspond to a plurality of documents which have been tokenized according to the one or more rule parsers. Each matched text token is filtered according to predetermined filtering criteria to generate one or more candidate text tokens. It is then determined whether sequence of candidate text tokens that occur in sequential order within the block of text match sequence of document tokens. If so, then it is determined that the document has been identified within the block of text. The document can correspond to an artist, a song names, and misspellings and aliases thereof. | 10-03-2013 |
20130262090 | SYSTEM AND METHOD FOR REDUCING SEMANTIC AMBIGUITY - A semantic ambiguity reduction system deconstructs the sentence into a number of basic word units according to predetermined word definitions and semantic logic rules. The semantic ambiguity reduction system acquires the semantic judgments based on the basic word units and the semantic logic rules, stores the semantic judgment if only one semantic judgment of the sentence is acquired, and determines a number of keywords of a semantic ambiguity if more than one semantic judgment is acquired. The semantic ambiguity determines critical information by searching the keywords in the word definitions and the semantic judgments being stored, and selects one semantic judgment from the more than one semantic judgment about the sentence according to the critical information. | 10-03-2013 |
20130262091 | AUTOMATED EXTRACTION OF BIO-ENTITY RELATIONSHIPS FROM LITERATURE - Automated, standardized and accurate extraction of relationships within text. Automatic extraction of such relationships/information allows the information to be stored in structured form so that it can be easily and accurately retrieved when needed. Such information can be used to build online search engines for highly specific and accurate information retrieval. The current invention discloses a novel approach to extract such information from raw text based on natural language processing (NLP) and graph theoretic algorithm. The novel method can be applied, for example, to extract protein-protein relationships in biomedical literature. The method can be easily extended to extract other biological relationships between biological terms such as proteins, genes, pathways, diseases and drugs. The method can also be applied to other information domains to extract other relationships. | 10-03-2013 |
20130262092 | Narrative Generator - A narrative generator includes a processor is configured to: implement a plurality of writers to create a plurality of narrative blocks related to selected topic paragraph creators, each writer including a plurality of text options from which a narrative block is constructed, wherein text options are selected for inclusion in a given narrative bock are based at least in part on reference to a narrative companion array and a grammar companion array, wherein the narrative companion array includes semantic values corresponding to the data elements included in any of the plurality of narrative blocks, wherein the grammar companion array includes grammar values associated with the text options included in the plurality of narrative blocks. | 10-03-2013 |
20130262093 | INFORMATION EXTRACTION IN A NATURAL LANGUAGE UNDERSTANDING SYSTEM - A method of extracting information from text within a natural language understanding system can include processing a text input through at least one statistical model for each of a plurality of features to be extracted from the text input. For each feature, at least one value can be determined, at least in part, using the statistical model associated with the feature. One value for each feature can be combined to create a complex information target. The complex information target can be output. | 10-03-2013 |
20130262094 | HANDHELD ELECTRONIC DEVICE INCLUDING INDICATION OF A SELECTED DATA SOURCE, AND ASSOCIATED METHOD - A method of enabling input into a handheld electronic device having stored therein a number of language objects includes detecting a selection of a languages, making a determination that the language is a default language or a non-default language, detecting as an ambiguous input an actuation of one or more input members, outputting at least a portion of a number of the language objects that corresponds to the ambiguous input, and outputting an indication representative of the language. | 10-03-2013 |
20130268261 | SEMANTIC ENRICHMENT BY EXPLOITING TOP-K PROCESSING - Proper representation of the meaning of texts is crucial to enhancing many data mining and information retrieval tasks, including clustering, computing semantic relatedness between texts, and searching. Representing of texts in the concept-space derived from Wikipedia has received growing attention recently, due to its comprehensiveness and expertise. This concept-based representation is capable of extracting semantic relatedness between texts that cannot be deduced with the bag of words model. A key obstacle, however, for using Wikipedia as a semantic interpreter is that the sheer size of the concepts derived from Wikipedia makes it hard to efficiently map texts into concept-space. An efficient algorithm is proved which is able to represent the meaning of a text by using the concepts that best match it. In particular, this approach first computes the approximate top- concepts that are most relevant to the given text. These concepts are then leverage to represent the meaning of the given text. | 10-10-2013 |
20130268262 | System and Method for Analysing Natural Language - A computer implemented method for analysing natural language to determine a sentiment between two entities discussed in the natural language, comprising the following steps: receiving the natural language at a processing circuitry; analysing the natural language to determine a syntactic representation which shows syntactic constituents of the analysed natural language and to determine a sentiment score of each constituent; determining which constituents link the two entities; and calculating an overall sentiment score for the sentiment between the two entities by processing the sentiment score of each constituent of the constituents determined to link the two entities. | 10-10-2013 |
20130268263 | METHOD FOR PROCESSING NATURAL LANGUAGE AND MATHEMATICAL FORMULA AND APPARATUS THEREFOR - The present disclosure provides an apparatus and method for processing a natural language and a mathematical formula. The apparatus includes a natural language and mathematical formula input unit configured to receive a natural language and a mathematical formula inputted; an information generation unit configured to generate parsing semantic information of the mathematical formula from combined data composed of the natural language combined with the mathematical formula; an operation information extraction unit configured to extract operation information generated by using a logical condition from the combined data; a natural language and mathematical formula structuralizing unit configured to analyze, classify in terms of specific meaning and recombine the combined data; an operation structuralizing unit configured to structuralize the operation information; and a natural language and mathematical formula indexing unit configured to index the combined data. | 10-10-2013 |
20130275119 | INPUT METHOD, INPUT APPARATUS, AND TERMINAL - The present invention discloses an input method, including: receiving input end indication information sent by an input module, where the input end indication information indicates that input of a character or a word ends; obtaining a location of a cursor; identifying the input character or word forward from the location of the cursor until a first punctuation input before the character or the word is identified; using the identified character or word as a previous text, and querying a word library for a next text associated with the previous text; and outputting the associated next text to a display module for displaying. The input method provided in embodiments of the present invention is capable of associating a next text according to an input previous text for a user to select after the user presses an input end key, for example, the space key, to end the input of a character or a word, so that the input efficiency is increased. | 10-17-2013 |
20130275120 | Process for a Signified Correct Contextual Meaning Sometimes Interspersed with Complementary Related Trivia - A process for displaying a signified correct contextual meaning for a word or phrase having two or more meanings. A user selects the word in a sentence on a computer screen which instantly triggers a nearby pop-up space. The pop-up space presents the two or more meanings of the selected word and one of the meanings is the correct contextual meaning and is encircled, for example. The signified correct contextual meaning is from the work and expertise, for example, of writers or editors. The two or more meanings in the pop-up space can be definitions, other reference materials, pictures, videos, or other meanings. Sometimes a complementary related trivia is presented to enliven the learning experience. The complementary trivia can be related, for example, to the selected word, to the associated information displayed on the screen, or to the pop-up space meanings. The educational process helps a person learn about the correct contextual meaning for words or phrases with two or more meanings, and the fun trivia enlivens the experience. | 10-17-2013 |
20130275121 | KNOWLEDGE REPOSITORY - A knowledge storage system is described. A specific embodiment is a computer system comprising a knowledge base of general knowledge in structured form which can be added to and queried by untrained users. Various embodiments include the facility for remote computers to access the knowledge stored in the system, natural language questions to be answered, profile screens giving general knowledge about an object in the system, and methods for distinguishing between reliable and unreliable facts. | 10-17-2013 |
20130275122 | METHOD FOR EXTRACTING SEMANTIC DISTANCE FROM MATHEMATICAL SENTENCES AND CLASSIFYING MATHEMATICAL SENTENCES BY SEMANTIC DISTANCE, DEVICE THEREFOR, AND COMPUTER READABLE RECORDING MEDIUM - A method of extracting the semantic distance from the mathematical sentence and classifying the mathematical sentence by the semantic distance, includes: receiving a user query; extracting at least one keyword included in the received user query; and extracting a semantic distance by, indexing one or more of natural language tokens and mathematical equation tokens including semantic information, extracting the semantic distance, between the at least one extracted keyword and the one or more indexed semantic information by referring indexed information, and acquiring a similarity of the received user query and the semantic information. | 10-17-2013 |
20130275123 | HANDHELD ELECTRONIC DEVICE WITH TEXT DISAMBIGUATION - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device. | 10-17-2013 |
20130275124 | GENERATION OF PICTORIAL REPORTING DIAGRAMS OF LESIONS IN ANATOMICAL STRUCTURES - The invention relates to a system (SYS) for automatically extracting a location of an abnormality with respect to an anatomical structure from a report, the system comprising a tokenizer (U | 10-17-2013 |
20130282361 | OBTAINING DATA FROM ELECTRONIC DOCUMENTS - Techniques for obtaining information from an electronic document include accessing a set of related electronic documents; identifying a product page associated with the set of related electronic documents using a page recognition model, the product page comprising a plurality of terms; filtering the plurality of terms into a first set of terms and a second set of terms, the first set of terms and the second set of terms including different terms of the plurality of terms, each term in the first set of terms identified as potentially being associated with a product name, and each term in the second set of terms identified as not being associated with a product name; and identifying each term in the first set of terms as being associated with a product name or not being associated with a product name with a name recognition model. | 10-24-2013 |
20130282362 | IDENTIFYING CULTURAL BACKGROUND FROM TEXT - Diaculture of text can be determined or analyzed by tokenizing words of the text according to a rule set to generate tokenized text, the rule set defining: a first set of grammatical types of words, which are words that are replaced with tokens that respectively indicate a grammatical type of a respective word, and a second set of grammatical types of words, which are words that are passed as tokens without changing. Grams can be constructed from the tokenized text, each gram including one or more of consecutive tokens from the tokenized text. The grams can be compared to a training data set that corresponds to a known diaculture to obtain a comparison result that indicates how well the text matches the training data set for the known diaculture. | 10-24-2013 |
20130282363 | LEXICAL ANSWER TYPE CONFIDENCE ESTIMATION AND APPLICATION - A system, method and computer program product for automatically estimating the confidence of a detected LAT to provide a more accurate overall score for an obtained candidate answer. A confidence “score” or value of each detected LAT is obtained, and the system and method performs combining the confidence score with a degree of match between a LAT and an AnswerType of the candidate answer to provide improved overall score for the candidate answer. | 10-24-2013 |
20130282364 | HANDHELD ELECTRONIC DEVICE WITH TEXT DISAMBIGUATION AND SELECTIVE DISABLING OF FREQUENCY LEARNING - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user is likely to have intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The learning function is disabled, however, when the relevant words are found to be in a special category for which frequency learning, i.e., frequency revision, is not employed. | 10-24-2013 |
20130282365 | ADAPTING LANGUAGE USE IN A DEVICE - In several non-English languages and cultures, such as Dutch and German, there is a formal and informal language form used to address a person. A device having a user interface is adapted for use with both formal and informal language. A user's preferred language form can change over time, and is determined directly or indirectly from characteristics of the user based on their use of the device, including how long the device has been used, a role of the user and/or his or her location. Another way of determining the characteristics of the user is to monitor the user's online behavior, including such data as social networking traffic, web sites visited, email and chat use, and the like. An application's user interface can be dynamically changed to use the current preferred language form. | 10-24-2013 |
20130289975 | ELECTRONIC DEVICE AND METHOD FOR A BIDIRECTIONAL CONTEXT-BASED TEXT DISAMBIGUATION - A system and method for a bidirectional context-based text disambiguation is provided. | 10-31-2013 |
20130289976 | METHODS AND SYSTEMS FOR A LOCALLY AND TEMPORALLY ADAPTIVE TEXT PREDICTION - An electronic device is provided, having a locally and temporally adaptive prediction database. | 10-31-2013 |
20130289977 | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM - There is provided an information processing device including an acquisition unit that acquires a first word input by a user, and a presentation unit that presents second words for replacing the first word when the first word is acquired by the acquisition unit. | 10-31-2013 |
20130289978 | METHOD FOR CLASSIFYING PIECES OF TEXT ON BASIS OF EVALUATION POLARITY, COMPUTER PROGRAM PRODUCT, AND COMPUTER - A computer-implemented method, program product, and system, for extracting pieces of text from a plurality of pieces of text. The method includes: primarily evaluating a measure of positive expressions and a measure of negative expressions included in each of pieces of text; secondarily evaluating each of the pieces of text on the basis of a plurality of evaluation functions, where certain evaluation functions among the plurality of evaluation functions include, as variables, the measure of positive expressions and the measure of negative expressions; and extracting a piece of text having an evaluation result with a higher rating in preference to a piece of text having an evaluation result with a lower rating, where the individual evaluation results are based on the same evaluation function among the plurality of evaluation functions. | 10-31-2013 |
20130289979 | HANDHELD ELECTRONIC DEVICE AND METHOD FOR DISAMBIGUATION OF TEXT INPUT PROVIDING SUPPRESSION OF LOW PROBABILITY ARTIFICIAL VARIANTS - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate text input. In addition to identifying and outputting representations of language objects that are stored in the memory and that correspond with a text input, the device is able to generate artificial variants in certain circumstances. Each artificial variant is compared with N-gram data on the handheld electronic device and is suppressed from being output if the artificial variant is determined to have a low probability of being the input intended by a user. | 10-31-2013 |
20130289980 | HANDHELD ELECTRIC DEVICE AND ASSOCIATED METHOD EMPLOYING A MULTIPLE-AXIS INPUT DEVICE AND ELEVATING THE PRIORITY OF CERTAIN TEXT DISAMBIGUATION RESULTS WHEN ENTERING TEXT INTO A SPECIAL INPUT FIELD - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. If a field into which text is being entered is determined to be a special input field, a disambiguated result can be sought first from a predetermined data source prior to seeking results from other data sources on the device. | 10-31-2013 |
20130297290 | AUTOMATIC ACCURACY ESTIMATION FOR AUDIO TRANSCRIPTIONS - Embodiments of the present invention provide an approach for estimating the accuracy of a transcription of a voice recording. Specifically, in a typical embodiment, each word of a transcription of a voice recording is checked against a customer-specific dictionary and/or a common language dictionary. The number of words not found in either dictionary is determined. An accuracy number for the transcription is calculated from the number of said words not found and the total number of words in the transcription. | 11-07-2013 |
20130297291 | CONFIDENCE LEVEL ASSIGNMENT TO INFORMATION FROM AUDIO TRANSCRIPTIONS - Embodiments of the present invention provide an approach for automatically assigning a confidence level to information extracted from a transcription of a voice recording. Specifically, in a typical embodiment, an axiom is extracted from a source associated with the text of the transcription. A confidence level of the source is determined. A confidence level is assigned to the axiom based on the confidence level of the source. | 11-07-2013 |
20130297292 | High Bandwidth Parsing of Data Encoding Languages - A mechanism is provided for accelerating data exchange language parsing. An input data stream is loaded into a first in, first out (FIFO) memory. A tokenization bit corresponding to a next byte to be read is extracted from a FIFO. A determination is made as to whether the tokenization bit corresponding to the next byte to be read from the FIFO indicates a control character or a non-control character located in an associated FIFO memory location in the FIFO. Responsive to the tokenization bit indicating the control character, the control character that causes a state change in a state machine is processed. Responsive to the tokenization bit indicating the non-control character, a length associated with the tokenized bit is identified and a set of non-control characters that do not cause a state change in the state machine are processed based on the length associated with the tokenized bit. | 11-07-2013 |
20130297293 | SYSTEMS AND METHODS FOR RESPONDING TO NATURAL LANGUAGE SPEECH UTTERANCE - Systems and methods are provided for receiving speech and non-speech communications of natural language questions and/or commands, transcribing the speech and non-speech communications to textual messages, and executing the questions and/or commands. The invention applies context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users presenting questions or commands across multiple domains. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech and non-speech communications and presenting the expected results for a particular question or command. | 11-07-2013 |
20130297294 | Computer-Implemented Systems and Methods for Non-Monotonic Recognition of Phrasal Terms - Systems and methods are provided for non-monotonic recognition of phrasal terms. Phrasal terms are identified from a corpus of written materials and ranked based on, for example, a mutual rank ratio. The phrasal terms are sequentially selected and a determination is made as to whether to accept or reject the selected phrasal term based on at least one predetermined criteria. The ranking of the phrasal terms may also rely on linguistic support to reduce duplication of phrasal terms and to distinguish different confidence levels for identified and accepted phrasal terms. | 11-07-2013 |
20130304453 | Automated Extraction of Semantic Content and Generation of a Structured Document from Speech - Techniques are disclosed for automatically generating structured documents based on speech, including identification of relevant concepts and their interpretation. In one embodiment, a structured document generator uses an integrated process to generate a structured textual document (such as a structured textual medical report) based on a spoken audio stream. The spoken audio stream may be recognized using a language model which includes a plurality of sub-models arranged in a hierarchical structure. Each of the sub-models may correspond to a concept that is expected to appear in the spoken audio stream. Different portions of the spoken audio stream may be recognized using different sub-models. The resulting structured textual document may have a hierarchical structure that corresponds to the hierarchical structure of the language sub-models that were used to generate the structured textual document. | 11-14-2013 |
20130304454 | MEDIATION COMPUTING DEVICE AND ASSOCIATED METHOD FOR GENERATING SEMANTIC TAGS - A computing device, computer system and associated method are provided to mediate a conversation in a manner that facilitates the inclusion of semantic tags within the conversation. In the context of a method, user input may be received relating to maintenance of a system. The method also determines, with processing circuitry, a candidate tag based upon semantic context of the user input. Additionally, the method provides an indication of the candidate tag to the user and receives a response from the user regarding validity of the candidate tag with respect to the semantic context of the user input. The method may also store the maintenance report including the user input and an associated tag. A corresponding mediation computing device and an associated computer system are also provided. | 11-14-2013 |
20130304455 | MANAGEMENT OF LANGUAGE USAGE TO FACILITATE EFFECTIVE COMMUNICATION - Provided are techniques for providing annotations for revising a message. A message to be sent from a sender to a recipient is received. A meaning map associated with the sender and a meaning map associated with the recipient are obtained. The message is parsed into sub-constructs. The sub-constructs are compared in the meaning map associated with the sender and the meaning map associated with the recipient. Alternative language for the sub-constructs is identified. Annotations are provided based on the alternative language. | 11-14-2013 |
20130304456 | Systems and Methods for Word Offensiveness Processing Using Aggregated Offensive Word Filters - Computer-implemented systems and methods are provided for identifying language that would be considered obscene or otherwise offensive to a user or proprietor of a system. A first plurality of offensive words are received, and a second plurality of offensive words are received. A string of words are received, where one or more detected offensive words are selected from the string of words that matches words from the first plurality of offensive words or the second plurality of offensive words. The string of words is processed based upon the detection of offensive words in the string of words. | 11-14-2013 |
20130311167 | METHODS AND DEVICES FOR GENERATING AN ACTION ITEM SUMMARY - Methods and devices for generating an action item summary are described. In one example embodiment, the present application describes a processor-implemented method. The method includes: receiving a request for creation of an action item, the action item comprising a record of a proposed future action; obtaining context information associated with the action item; storing the action item and context information; and generating a sentence describing the action item based on the context information associated with the action item. | 11-21-2013 |
20130311168 | SYSTEMS AND METHODS TO ENABLE INTERACTIVITY AMONG A PLURALITY OF DEVICES - Methods and systems to exchange and display data among a plurality of devices in response to one or more of user input and context-based information. User input may include one or more of motion, speech, text, pointing, and touch-selecting. Context-based information may include one or more of user location, which may be relative to one or more devices, background audio, information related to one or more products and/or services, and user-based context information. User context-based information may correspond one or more of prior transactions, prior activities, prior content exposure, and demographic information. Also disclosed herein are methods and systems to correlate user speech to one or more of commands and data objects, with respect to context-based information. Methods and systems to recognize speech may be implemented in combination with methods and systems to exchange and/or display of data among a plurality of devices, and in other environments. | 11-21-2013 |
20130311169 | METHOD AND SYSTEM RELATING TO SALIENT CONTENT EXTRACTION FOR ELECTRONIC CONTENT - Individuals receive overwhelming barrage of information which must be filtered, processed, analysed, reviewed, consolidated and distributed or acted upon. Automatic approaches to “scraping” salient content from sources of content are provided allowing the salient content to be provided to the user or subjected to further processing such as clustering or sentiment analysis for example. | 11-21-2013 |
20130311170 | Methods and Systems for Natural Language Understanding Using Human Knowledge and Collected Data - Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data. | 11-21-2013 |
20130311171 | HANDHELD ELECTRONIC DEVICE WITH TEXT DISAMBIGUATION - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided. Additionally, the device can facilitate the selection of variants by displaying a graphic of a special key of the keypad that enables a user to progressively select variants generally without changing the position of the user's hands on the device. | 11-21-2013 |
20130317804 | Method of Text Classification Using Discriminative Topic Transformation - Text is classified by determining text features from the text, and transforming the text features to topic features. Scores are determined for each topic features using a discriminative topic model. The model includes a classifier that operates on the topic features, wherein the topic features are determined by the transformation from the text features, and the transformation is optimized to maximize the scores of a correct class relative to the scores of incorrect classes. Then, a class label with a highest score is selected for the text. In situations where the classes are organized in a hierarchical structure, the discriminative topic models apply to classes at each level conditioned on previous levels and scores are combined across levels to evaluate the highest scoring class labels. | 11-28-2013 |
20130317805 | SYSTEMS AND METHODS FOR DETECTING REAL NAMES IN DIFFERENT LANGUAGES - Systems and methods for detecting real names in different languages are described, including receiving a candidate name; determining a human language of the candidate name; disassembling a structure of the candidate name by applying a rule base for at least one of a character set, a meaning, and a format of the candidate name, wherein the rule base is unique to the determined human language; verifying at least a part of the disassembled structure of the candidate name with respect to actual real name information to generate a degree of confidence that the candidate name is the an actual real name; and performing an action based on the generated degree of confidence that the candidate name is the actual real name. | 11-28-2013 |
20130317806 | ENTITY VARIANT GENERATION AND NORMALIZATION - Determining variants of a text entity comprises parsing the text entity into semantic components and generating variants for each of the semantic components. The entity is recomposed in different morphological forms from the different variants of the semantic components. | 11-28-2013 |
20130317807 | ENTITY VARIANT GENERATION AND NORMALIZATION - Determining variants of a text entity comprises parsing the text entity into semantic components and generating variants for each of the semantic components. The entity is recomposed in different morphological forms from the different variants of the semantic components. | 11-28-2013 |
20130317808 | SYSTEM FOR AND METHOD OF ANALYZING AND RESPONDING TO USER GENERATED CONTENT - A computer implemented system and method for automatically generating a response to a user generated content, the system comprises an interface configured to receive, via a communication network, user generated content from at least one social networking source; a natural language processor configured to process one or more terms from the user generated content to identify the user generated content; a programmed computer processor configured to match the identified user generated content with at least one resource provided by a content provider; an electronic storage component configured to store a reference to the at least one resource; a programmed computer processor configured to generate a response to the user generated content, wherein the resource comprises the reference to the at least one resource; and a programmed computer processor configured to provide, via a communication network, the response to the social networking source. | 11-28-2013 |
20130325436 | Large Scale Distributed Syntactic, Semantic and Lexical Language Models - A composite language model may include a composite word predictor. The composite word predictor may include a first language model and a second language model that are combined according to a directed Markov random field. The composite word predictor can predict a next word based upon a first set of contexts and a second set of contexts. The first language model may include a first word predictor that is dependent upon the first set of contexts. The second language model may include a second word predictor that is dependent upon the second set of contexts. Composite model parameters can be determined by multiple iterations of a convergent N-best list approximate Expectation-Maximization algorithm and a follow-up Expectation-Maximization algorithm applied in sequence, wherein the convergent N-best list approximate Expectation-Maximization algorithm and the follow-up Expectation-Maximization algorithm extracts the first set of contexts and the second set of contexts from a training corpus. | 12-05-2013 |
20130325437 | Computer-Implemented Systems and Methods for Mood State Determination - Computer-implemented systems and methods are provided for determining an overall mood score of a document. For example, the document is received from a computer-readable medium. A text segment in a document is identified to be indicative of a mood of the document. The text segment is mapped to a mood scale among a predetermined set of mood scales. A mood weight associated with the mood scale for the text segment is generated. An overall mood score of the document is determined based at least in part on the mood weight. | 12-05-2013 |
20130325438 | Touchscreen Keyboard with Corrective Word Prediction - The present disclosure provides a touchscreen keyboard with corrective word prediction. A method for correcting text input on an electronic device is described. The method comprises: displaying a virtual keyboard on a touchscreen, the virtual keyboard including a plurality of keys; receiving input from the virtual keyboard; generating one or more predicted sets of characters in accordance with the received input; and displaying a predicted set of characters at a designated location when the received input does not match one of the predicted sets of characters. | 12-05-2013 |
20130325439 | DISAMBIGUATING WORDS WITHIN A TEXT SEGEMENT - Determining a subject type for an entity in a text segment. A text segment is selected, which includes one or more single-word or multi-word entities. Natural language processing is performed on the selected text segment to identify entities that constitute subjects of the selected text segment. One entity is selected. A variant annotation is associated with the selected entity. The variant annotation reflects multiple subject types for the selected entity and a value for each subject type. The most probable subject type is determined for the selected entity, based on a combination of natural language processing rules and dictionary listings. The value of the annotation is incremented for the subject type corresponding to the most probable subject type for the selected entity, so that the highest value of the annotation indicates the most probable subject type for the selected entity within the selected text segment. | 12-05-2013 |
20130325440 | GENERATION OF EXPLANATORY SUMMARIES - A method for generating sum maries of text is described. The method includes the step of extracting features from text of text lists from summaries. The explanatoriness of the text is then evaluated, wherein evaluating the explanatoriness of text includes evaluating the features of the text, including at least the step of evaluating the discriminativeness of the features of the text by comparing the text to a first text data set, wherein the first text data set is derived from a topic label. The evaluated text is then ranked based on the explanatoriness evaluation. | 12-05-2013 |
20130325441 | METHODS AND SYSTEMS FOR MANAGING ADAPTATION DATA - Computationally implemented methods and systems include managing adaptation data, wherein the adaptation data is correlated to at least one aspect of speech of a particular party, facilitating transmission of the adaptation data to a target device, in response to an indicator related to a speech-facilitated transaction of a particular party, wherein the adaptation data is correlated to at least one aspect of speech of the particular party, and determining whether to update the adaptation data, said determination at least partly based on a result of at least a portion of the speech-facilitated transaction In addition to the foregoing, other aspects are described in the claims, drawings, and text. | 12-05-2013 |
20130325442 | Methods and Systems for Automated Text Correction - The present embodiments demonstrate systems and methods for automated text correction. In certain embodiments, the methods and systems may be implemented through analysis according to a single text correction model. In a particular embodiment, the single text correction model may be generated through analysis of both a corpus of learner text and a corpus of non-learner text. | 12-05-2013 |
20130325443 | Library of Existing Spoken Dialog Data for Use in Generating New Natural Language Spoken Dialog Systems - A machine-readable medium may include a group of reusable components for building a spoken dialog system. The reusable components may include a group of previously collected audible utterances. A machine-implemented method to build a library of reusable components for use in building a natural language spoken dialog system may include storing a dataset in a database. The dataset may include a group of reusable components for building a spoken dialog system. The reusable components may further include a group of previously collected audible utterances. A second method may include storing at least one set of data. Each one of the at least one set of data may include ones of the reusable components associated with audible data collected during a different collection phase. | 12-05-2013 |
20130325444 | HANDHELD ELECTRONIC DEVICE WITH REDUCED KEYBOARD AND ASSOCIATED METHOD OF PROVIDING IMPROVED DISAMBIGUATION - An improved handheld electronic device having a reduced keyboard provides facilitated language entry by making available to a user certain words that a user may reasonably be expected to enter. In some situations, certain words can be stored, for example, in a temporary dictionary for use in particular situations. For instance, the names of the recipients of an electronic message might be stored in a temporary dictionary for rapid retrieval when entering a salutation in the message. As another example, a number of the words in an existing electronic message may be stored in a temporary dictionary and made available to a user when replying to or forwarding the message since the existing message might include words that the user might reasonably be expected to type in the reply message or the forwarded message. | 12-05-2013 |
20130332145 | ONTOLOGY DRIVEN DICTIONARY GENERATION AND AMBIGUITY RESOLUTION FOR NATURAL LANGUAGE PROCESSING - A computer implemented method and system for natural language processing ambiguity resolution includes storing an ontology specifying a set of grammatical rules. A phrase comprising at least one current word to be processed is retrieved. A current word from the phrase is annotated with possible ontological classes according to the ontology. Any ontological rules associated with the possible ontological classes are retrieved. Ontological classes are eliminated based on the ontological rules. A surviving possible ontological class is determined to be an accurate ontological class for the current word. In another aspect of this disclosure, an ontology is stored in computer memory, the ontology having multiple ontological classifications, and word instances, each word instance associated with at least one of the ontological classifications. All word instances belonging to the selected ontological classification are retrieved. | 12-12-2013 |
20130338998 | PROGRAMMABLE REGULAR EXPRESSION AND CONTEXT FREE GRAMMAR MATCHER - A regular expression matcher system, including: a deterministic finite state machine (DFSM); a ternary content addressable memory (TCAM) matcher to compare a word stored at the TCAM matcher to an input stream, wherein the word determines a state-to-state transition of the DFSM from a comparison result; a programmable logic connected to an output of the TCAM matcher to identify a next state in the DFSM based on the comparison result; a state register to update a current state of the DFSM to the next state; and a collection data structure coupled to the TCAM matcher and the programmable logic to store a sequence of required state transitions for the DFSM, wherein the programmable logic determines a next required state transition to be matched from the sequence. | 12-19-2013 |
20130338999 | JOINT ALGORITHM FOR SAMPLING AND OPTIMIZATION AND NATURAL LANGUAGE PROCESSING APPLICATIONS OF SAME - In rejection sampling of a function or distribution p over a space X, a proposal distribution q | 12-19-2013 |
20130339000 | IDENTIFYING COLLOCATIONS IN A CORPUS OF TEXT IN A DISTRIBUTED COMPUTING ENVIRONMENT - Technologies pertaining to computing a metric that is indicative of whether an n-gram in a large corpus of text is a collocation are described herein. The metric is computed in connection with a distributed computing framework, wherein n-grams of varying lengths can be analyzed in a single input data pass, and wherein secondary sorting functionality of the distributed computing framework need not be invoked. | 12-19-2013 |
20130339001 | SPELLING CANDIDATE GENERATION - Methods, systems, and media are provided for generating one or more spelling candidates. A query log is received, which contains one or more user-input queries. The user-input queries are divided into one or more common context groups. Each term of the user-input queries is ranked within a common context group according to a frequency of occurrence to form a ranked list for each of the one or more common context groups. A chain algorithm is implemented to the respective ranked lists to identify a base word and a set of one or more subordinate words paired with the base word. The base word and all sets of the subordinate words from all of the respective ranked lists are aggregated to form one or more chains of spelling candidates for the base word. | 12-19-2013 |
20130339002 | IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD AND NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM - An image processing device, comprises: an input part for inputting image data; a word extracting part for extracting a word from texts contained in the image data; a synonym obtaining part for obtaining a synonym corresponds to the word, and for associating the obtained synonym with the word; a position identifying part for identifying a display position on the image data of the word with which the synonym is associated; a layer creating part for creating an accompanying layer to add to an original layer, which is the image data containing the word, and for embedding the synonym associated with the word within a position on the accompanying layer the same as the display position identified by the position identifying part; and an output image generating part for generating output image data including the original layer containing the word and the accompanying layer within which the synonym is embedded. | 12-19-2013 |
20130339003 | Assisted Free Form Decision Definition Using Rules Vocabulary - A method of decision definition using a rules vocabulary includes: receiving free form input; identifying terms contained within the free form input; searching the rules vocabulary objects for terms; responsive to the term being found, obtaining input from a user as to whether to use the found term; responsive to the term not being found; searching the rules vocabulary attributes for terms having attributes corresponding to the term; responsive to the term being found, obtaining input from a user as to whether to use the found term; and refactoring the free form input with the found term accepted by the user. The method also includes updating the rules vocabulary with the term identified in the free form input as a synonym for the term found in said rules vocabulary. One embodiment further provides a method of determining semantic equivalence between a plurality of rules using a rules database having preferred terms. | 12-19-2013 |
20130339004 | HANDHELD ELECTRONIC DEVICE AND METHOD FOR DISAMBIGUATION OF TEXT INPUT AND PROVIDING SPELLING SUBSTITUTION - A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate text input. The device is structured to identify and output representations of language objects that are stored in the memory and that correspond with a text input. The device is additionally structured to identify and output representations of language objects that are stored in the memory and that correspond with a known spelling substitution particular to a language active on the handheld electronic device. | 12-19-2013 |
20130339005 | Automated Extraction of Bio-Entity Relationships from Literature - Automated, standardized and accurate extraction of relationships within text. Automatic extraction of such relationships/information allows the information to be stored in structured form so that it can be easily and accurately retrieved when needed. Such information can be used to build online search engines for highly specific and accurate information retrieval. The current invention discloses a novel approach to extract such information from raw text based on natural language processing (NLP) and graph theoretic algorithm. The novel method can be applied, for example, to extract protein-protein relationships in biomedical literature. The method can be easily extended to extract other biological relationships between biological terms such as proteins, genes, pathways, diseases and drugs. The method can also be applied to other information domains to extract other relationships. | 12-19-2013 |
20130339006 | EFFICIENT STRING SEARCH - Some embodiments of an efficient string search have been presented. In one embodiment, a string of bytes representing content written in a non-delimited language is received, wherein the content has been classified into a predetermined category. In a single pass through the string of bytes, a set of N-grams is searched for simultaneously. Statistical information on occurrences of the N-grams, if any, in the string of bytes is collected. In some embodiments, a model is generated based on the statistical information, where the model is usable by a content filter to classify content. | 12-19-2013 |
20130346066 | Joint Decoding of Words and Tags for Conversational Understanding - Joint decoding of words and tags may be provided. Upon receiving an input from a user comprising a plurality of elements, the input may be decoded into a word lattice comprising a plurality of words. A tag may be assigned to each of the plurality of words and a most-likely sequence of word-tag pairs may be identified. The most-likely sequence of word-tag pairs may be evaluated to identify an action request from the user. | 12-26-2013 |
20130346067 | REAL-TIME MESSAGE SENTIMENT AWARENESS - Provided are techniques for determining a sentiment of an electronic message. The electronic message is parsed to identify one or more sub-constructs. For at least one of the sub-constructs that is not false-positive, a sentiment indicator is assigned from a set of types of sentiment indicators, and a score is assigned for the sentiment indicator. A final score is obtained for at least one type of sentiment indicator in the electronic message by summing scores for that type of sentiment indicator. Based on the final score for the at least one type of sentiment indicator, a sentiment of the electronic message is identified. | 12-26-2013 |
20130346068 | Voice-Based Image Tagging and Searching - The electronic device with one or more processors and memory provides a digital photograph of a real-world scene. The electronic device provides a natural language text string corresponding to a speech input associated with the digital photograph. The electronic device performs natural language processing on the text string to identify one or more terms associated with an entity, an activity, or a location. The electronic device tags the digital photograph with the one or more terms and their associated entity, activity, or location. | 12-26-2013 |
20130346069 | METHOD AND APPARATUS FOR IDENTIFYING A MENTIONED PERSON IN A DIALOG - This application relates to a method and apparatus for identifying a mentioned person in a dialog. A method for identifying a mentioned person in a dialog, comprising: identifying at least one person name entity associated with a mentioned person name which is acquired from the dialog; acquiring a group of candidate identifiers associated with the mentioned person name; acquiring at least one relation feature for each of the candidate identifiers from internal resources and external resources, wherein the relation feature refers to the relation between the candidate identifier and the at least one person name entity; and selecting an identifier from the group of candidate identifiers as the identifier of the mentioned person name based on the at least one relation feature. According to the method and the apparatus of the present invention, a mentioned person can be accurately identified. | 12-26-2013 |
20140006010 | PARSING RULES FOR DATA | 01-02-2014 |
20140006011 | CREATING, RENDERING AND INTERACTING WITH A MULTI-FACETED AUDIO CLOUD | 01-02-2014 |
20140006012 | Learning-Based Processing of Natural Language Questions | 01-02-2014 |
20140006013 | TEXT MINING FOR LARGE MEDICAL TEXT DATASETS AND CORRESPONDING MEDICAL TEXT CLASSIFICATION USING INFORMATIVE FEATURE SELECTION | 01-02-2014 |
20140006014 | COMPUTER SYSTEM FOR AUTOMATICALLY COMBINING REFERENCE INDICIA TO A COMMON NOUN DIFFERENTIATED BY ADJECTIVES IN A DOCUMENT | 01-02-2014 |
20140012567 | Text Auto-Correction via N-Grams - An input text string is received that contains characters or words. The input text string can be completed or corrected using contact scores based on n-grams. In addition, a subsequent text string and a preceding text string for the input text string are also identified, again using n-gram scores. A corrected text string is created by inserting the preceding text string before the input text string and appending the subsequent text string after the input text string. | 01-09-2014 |
20140012568 | Text Auto-Correction via N-Grams - An input text string is received that contains characters or words. The input text string can be completed or corrected using contact scores based on n-grams. In addition, a subsequent text string and a preceding text string for the input text string are also identified, again using n-gram scores. A corrected text string is created by inserting the preceding text string before the input text string and appending the subsequent text string after the input text string. | 01-09-2014 |
20140012569 | System and Method Using Data Reduction Approach and Nonlinear Algorithm to Construct Chinese Readability Model - The invention constructs Chinese readability model with data reduction and smart/advanced artificial intelligence algorithm. The model contains 1) a word segmentation which segments words and tags the part of speech of the words. 2) a readability indicator unit which analyzes readability features based the segmented words segmentation and part of speech tagging; and 3) an evolution algorithm unit, which construct a Chinese text readability model using data reduction approach and smart/advanced artificial intelligence algorithm. The present invention assesses the readability of Chinese texts, based on a small amount of Chinese text, and identifies the adequate readers. | 01-09-2014 |
20140019117 | RESPONSE COMPLETION IN SOCIAL MEDIA - Embodiments are directed towards providing word-by-word message completion for an incomplete response message, wherein the response message is composed in response to a received stimulus message. The message completion is based on a Response Completion Model (RCM) that may model both the language used in the incomplete response message and the contextual information in the received stimulus message. The RCM may be determined based on conversational stimulus-response data including stimulus-response message pairs. The RCM may be a mixture model and include a generic response language model based on an N-gram model, a Stimulus Model based on a Selection Model or a Topic. Model, and a mixture parameter. In some embodiments, at least one candidate next word for the incomplete response message is determined based on the RCM. The at least one candidate next word may be selected and included in the incomplete response message. A complete response message may be generated and provided to a user. | 01-16-2014 |
20140019118 | COMPUTER ARRANGEMENT FOR AND COMPUTER IMPLEMENTED METHOD OF DETECTING POLARITY IN A MESSAGE - The present invention relates to automatic sentiment analysis by a computer arrangement and a computer implemented method. A message is presented to the computer arrangement which stores a set of patterns. Each pattern has a word and an associated part-of-speech tag. The message is compared against the patterns as stored in memory rendering a set of matching patterns. The set of matching patterns is then processed in accordance with a set of rules taking into account presence of patterns in the message that may add to a positive polarity and negative polarity, and patterns that may amplify, attenuate or flip such positive polarity or negative polarity. | 01-16-2014 |
20140019119 | TEMPORAL TOPIC SEGMENTATION AND KEYWORD SELECTION FOR TEXT VISUALIZATION - Visualizing content change of a data collection over time. A topic may be split into multiple linear, non-overlapping sub-topics along a timeline by satisfying a diverse set of semantic, temporal, and visualization constraints simultaneously. For each derived sub-topic, a set of representative keywords may be automatically selected to summarize the main content of the sub-topic. | 01-16-2014 |
20140019120 | LIST DISPLAY APPARATUS AND LIST DISPLAY METHOD - A list display apparatus that displays multiple pieces of character string data in a list on a display unit includes a storage unit that stores the multiple pieces of character string data; a sorting unit that sorts the multiple pieces of character string data stored in the storage unit in a character code order; a ligature decomposing unit that decomposes a ligature included in the multiple pieces of character string data into multiple original letters on which the ligature is based in the sorting by the sorting unit; and a sorting control unit that controls the sorting unit so that the multiple letters resulting from the decomposition by the ligature decomposing unit is used in the sorting, instead of the ligature. | 01-16-2014 |
20140019121 | DATA PROCESSING METHOD, PRESENTATION METHOD, AND CORRESPONDING APPARATUSES - A data processing method includes obtaining text information corresponding to a presented content, the presented content comprising a plurality of areas; performing text analysis on the text information to obtain a first keyword sequence, the first keyword sequence including area keywords associated with at least one area of the plurality of areas; obtaining speech information related to the presented content, the speech information at least comprising a current speech segment; and using a first model network to perform analysis on the current speech segment to determine the area corresponding to the current speech segment, wherein the first model network comprises the first keyword sequence. | 01-16-2014 |
20140019122 | Method for Parsing Natural Language Text - A parser for natural language text is provided. The parser is trained by accessing a corpus of labeled utterances. The parser extracts details of the syntactic tree structures and part of speech tags from the labeled utterances. The details extracted from the tree structures include Simple Links which are the key to the improved efficiency of this new approach. The parser creates a language model using the details that were extracted from the corpus. The parser then uses the language model to parse utterances. | 01-16-2014 |
20140019123 | METHOD AND DEVICE FOR GENERATING VOCAL ORGANS ANIMATION USING STRESS OF PHONETIC VALUE - Disclosed are a method and a device for generating a vocal organ animation using a stress of a phonetic value, the method and the device which generate a more accurate and a more natural vocal organ animation by applying a pronunciation form of a native speaker, which changes according to the stress of the phonetic values constituting a word. The proposed device for generating a vocal organ animation using the stress of a phonetic value: generates phonetic value configuration data having applied thereto a detailed phonetic value for each of the stresses by detecting from voice information, and allocating to a corresponding phonetic value, a phonation length and stress information of each of the phonetic values included in text information; and generates a vocal organ animation corresponding to the words included in the text information by assigning pronunciation form information detected on the basis of the phonetic value configuration data. | 01-16-2014 |
20140025367 | PREDICTIVE TEXT ENGINE SYSTEMS AND RELATED METHODS - Predictive text engine systems and related methods are provided. In this regard, a representative system includes: a mobile device operative to communicate information via a communications network, the mobile device having a user interface, a context analysis system and a predictive text engine; the user interface being operative to receive user input to generate a text-based communication; the context analysis system being operative to determine a context of the text-based communication and to request, via the communications network, information corresponding to the context to enhance performance of the predictive text engine; the predictive text engine being operative to predict text corresponding to the user input based, at least in part, on the information corresponding to the context and received responsive to the request. | 01-23-2014 |
20140025368 | Fixing Broken Tagged Words - Embodiments of the invention relate to a method, system, and computer program product to identify broken tag words of a data item and to replace the broken tag words with a compound word. Data items that have at least two tag words are examined to determine if the tag words are broken elements of a compound word. A computational assessment is conducted to determine a relationship between a set of compound words and an examined data item. Based upon the computational assessment a set of broken tag words may be replaced with a related compound word. | 01-23-2014 |
20140025369 | SYSTEM AND METHOD FOR PHRASE MATCHING WITH ARBITRARY TEXT - A system and method for matching phrases having arbitrary text. A first data structure stores a list of common phrases having multiple words. Each unique word is indexed in a hash table and mapped to one or more values that describe attributes of using the word in one or more of the common phrases. Using the hash table and the list of common phrases, a temporary array is defined to keep track of possible matches between words in an input string and the list of common phrases. | 01-23-2014 |
20140025370 | DATA DETECTION - A method for detecting data in a sequence of characters or text using both a statistical engine and a pattern engine. The statistical engine is trained to recognize certain types of data and the pattern engine is programmed to recognize the grammatical pattern of certain types of data. The statistical engine may scan the sequence of characters to output first data, and the pattern engine may break down the first data into subsets of data. Alternatively, the statistical engine may output items that have a predetermined probability or greater of being a certain type of data and the pattern engine may then detect the data from the output items and/or remove incorrect information from the output items. | 01-23-2014 |
20140025371 | METHOD AND APPARATUS FOR RECOMMENDING TEXTS - A text recommendation for offering a next text recommendations that may be entered by a user is performed by executing a communication application; collecting context information associated with the communication application; predicting a user's intention by analyzing the context information; retrieving recommended texts corresponding to the user's intention; and displaying the recommended texts. | 01-23-2014 |
20140025372 | TEXT ANALYZING DEVICE, PROBLEMATIC BEHAVIOR EXTRACTION METHOD, AND PROBLEMATIC BEHAVIOR EXTRACTION PROGRAM - The present invention provides a text analyzing device which can extract the great amount of problematic behavior at low cost. A punishment action text extraction means | 01-23-2014 |
20140025373 | Fixing Broken Tagged Words - Embodiments of the invention relate to a method for identifying broken tag words of a data item and replacing the broken tag words with a compound word. Data items that have at least two tag words are examined to determine if the tag words are broken elements of a compound word. A computational assessment is conducted to determine a relationship between a set of compound words and an examined data item. Based upon the computational assessment a set of broken tag words may be replaced with a related compound word. | 01-23-2014 |
20140032206 | GENERATING STRING PREDICTIONS USING CONTEXTS - In a mobile device, a context is determined for the mobile device. The context is determined based on a variety of characteristics of the mobile device environment including, for example, the current application being used, any contacts that a user of the mobile device is interacting with or having a conversation with, the current date and/or time, a current topic of the conversation, a current style of the conversation, etc. Based on a set of strings associated with the determined context and user generated text, one or more string predictions are generated for the user generated text. The string predictions may be presented to the user as suggested completions of the user generated text. | 01-30-2014 |
20140032207 | Information Classification Based on Product Recognition - The present disclosure provides an example information classification method and system based on product recognition. When a request for product recognition is received, one or more candidate product words of product profile information for recognition are determined. One or more characteristics of the product profile information are extracted based on the determined candidate product words respectively. Based on the candidate product words and their corresponding characteristics, the learning sub-model and the comprehensive learning model determine a product word corresponding to the product profile information. The product profile information is classified based on the product word. The present techniques implement automatic classification of the product profile information and improve an efficiency of information classification. | 01-30-2014 |
20140032208 | Labeling Context Slices To Produce a Storyline from Mobile Device Data - Embodiments create and label context slices from observation data that together define a storyline of a user's movements. A context is a (possibly partial) specification of what a user was doing in the dimensions of time, place, and activity. Contexts can vary in their specificity, their semantic content, and their likelihood. A storyline is composed of a time-ordered sequence of contexts that partition a given span of time. A storyline is created through a process of data collection, slicing and labeling. Raw context data can be collected from a variety of observation sources with various error characteristics. Slicing refines the chaotic collection of contexts produced by data collection into a single consistent storyline composed of a sequence of contexts representing homogeneous time intervals. Labeling adds more specific and semantically meaningful data (e.g., geography, venue, activity) to the storyline produced by slicing. | 01-30-2014 |
20140032209 | OPEN INFORMATION EXTRACTION - A system for identifying relational tuples is provided. The system extracts a relation phrase from a sentence by identifying a verb in the sentence and then identifying a relation phrase of the sentence as a phrase in the sentence starting with the identified verb that satisfies both a syntactic constraint and a lexical constraint. The system also identifies arguments for a relation phrase. To extract the arguments, the system applies a left-argument-left-bound classifier, a left-argument-right-bound classifier, and a right-argument-right-bound classifier to identify a left argument and right argument for the relation phrase such that the left argument, the relation phrase, and the right argument form a relational tuple. | 01-30-2014 |
20140039875 | VISUAL ANALYSIS OF PHRASE EXTRACTION FROM A CONTENT STREAM - A system may include an extraction engine to extract candidate phrases from a content stream, and an analysis engine to assign the candidate phrases visual cues and display the visual cues to an operator. | 02-06-2014 |
20140039876 | EXTRACTING RELATED CONCEPTS FROM A CONTENT STREAM USING TEMPORAL DISTRIBUTION - A system may include an analysis engine to generate a set of candidate phrases from a content stream based on the temporal resolution, the interestingness, and/or the correlation of the candidate phrases. | 02-06-2014 |
20140039877 | Systems and Methods for Semantic Information Retrieval - A semantic tagging method may add context to a sentence in order to increase search efficiency. Regardless of an author's writing style, translating semantic concepts into tags may increase search efficiency. Automatic semantic tagging of documents may allow semantic search and reasoning. Text for semantic tagging may include an email, a website chat room, an internet forum, or a text message, Additional texts may include aggregating general consensus of an emailed topic across multiple emails, whether in the same email chain or separate emails. To increase search efficiency, the analysis of prior communications within the body of text may comprise analyzing structured contextual information to facilitate with homophora resolution. The structured contextual information may include at least one of a sender email address, one or more recipient email addresses, a subject field, a message date and time stamp, and an attachment title. | 02-06-2014 |
20140039878 | Symbolic-To-Natural Language Conversion - Techniques are described for converting characters that represent a mathematical expression, according to mathematical conventions, into natural language that communicates the mathematical expression based on the rules of the natural language for communicating mathematical expressions. A mathematical expression parser parses the characters representing the mathematical expression into a syntax tree. A visitor function visits each node of the syntax tree and produces natural language for the nodes based, at least in part, on types of the syntax tree nodes and, potentially, contexts of syntax tree nodes. The natural language produced for the syntax tree is assembled into a string based, at least in part, on the structure of the syntax tree. The resulting natural language string may be displayed via a graphical user interface, used by a text-to-speech mechanism to produce a spoken communication of the natural language for the mathematical expression, etc. | 02-06-2014 |
20140039879 | GENERIC SYSTEM FOR LINGUISTIC ANALYSIS AND TRANSFORMATION - A system providing a set of natural language processing functionalities, such as named entity extraction, domain extraction, sense disambiguation, automatic translation between different natural languages, morphological analysis, tokenization, via a unified process of analysis and transformation, using underlying linguistic database. The invention can accept text input and can be used to translate text, find out the correct sense of a word, obtain the main subject of a text, obtain the grammatical attributes of a word, paraphrase a text, and search for specific entities within the input text. | 02-06-2014 |
20140039880 | Applying Service Levels to Transcripts - Speech is transcribed to produce a draft transcript of the speech. Portions of the transcript having a high priority are identified. For example, particular sections of the transcript may be identified as high-priority sections. As another example, portions of the transcript requiring human verification may be identified as high-priority sections. High-priority portions of the transcript are verified at a first time, without verifying other portions of the transcript. Such other portions may or may not be verified at a later time. Limiting verification, either initially or entirely, to high-priority portions of the transcript limits the time required to perform such verification, thereby making it feasible to verify the most important portions of the transcript at an early stage without introducing an undue delay into the transcription process. Verifying the other portions of the transcript later ensures that early verification of the high-priority portions does not sacrifice overall verification accuracy. | 02-06-2014 |
20140046653 | METHOD AND SYSTEM FOR BUILDING ENTITY HIERARCHY FROM BIG DATA - The various embodiments herein provide a method and a system for building an entity hierarchy. The method comprises extracting a plurality of entities from a bin data, determining a parent entity by understanding a context in which the entity is used, resolving the entities by bringing the synonymous entities together and holding the polysemous entities apart based on a semantic context and a syntactic context and building a hierarchical structure of entities using knowledge repositories, ontologies and language repositories along with natural language processing techniques. The method of extracting entities from the structured data comprises identifying each data point as an entity and identifying entities based on a relationship defined with other entities. The method of extracting entities from unstructured data includes a self-learning process and training based learning process to learn new parent entities from domain specific documents using new entity recognition models. | 02-13-2014 |
20140046654 | TEXT PROCESSING METHOD, SYSTEM AND COMPUTER PROGRAM - A method includes hierarchically identifying occurrences of some of the words in the set of sentences; creating a first index for each of some of the words based on the upper hierarchy of occurrences identified for each word; receiving input of a queried word; hierarchically identifying occurrences of the queried word in the set of sentences; creating a second index based on the upper hierarchy of occurrences identified for the queried word; comparing the first index and the second index to calculate an estimated value for the number of occurrences of a word in the neighborhood of the queried word; and calculating the actual value of the number of occurrences of a word in the neighborhood of the queried word based on an upper hierarchy and lower hierarchy of the occurrences on condition that the estimated value is equal to or greater than a predetermined number. | 02-13-2014 |
20140046655 | SYSTEMATIC PRESENTATION OF THE CONTENTS OF ONE OR MORE DOCUMENTS - A method of systematically presenting the contents of at least one document, comprising: (a) a user providing an electronic version of at least one document to a computer; (b) a user accepting or modifying noise words generated by a computer module; (c) generating a list of every non-noise word by means of a computer module wherein the list indicates every page on which a non-noise word appears; and (d) displaying the entire list of non-noise words. In some embodiments, the list of non-noise words further indicates the number of times a word occurs on a page. In some embodiments, the list of non-noise words further indicates each line on which a non-noise word appears. | 02-13-2014 |
20140052436 | SYSTEM AND METHOD FOR UTILIZING MULTIPLE ENCODINGS TO IDENTIFY SIMILAR LANGUAGE CHARACTERS - Described herein are systems and methods for identifying the similarity between language characters. As described herein, a pair of language characters is received at a language character match engine. The language character match engine is adapted to receive encoding configuration information from each of a plurality of encoding components, and is adapted to encode the pair of language characters based on the unique structure of each language character to generate a pair of string identification characters for each encoding component. Thereafter, each pair of string identification characters is compared to one another to generate a similarity score, and the similarity score for each pair of string identification characters is combined to create a composite similarity score. The composite similarity score represents a similarity between the pair of language characters, and is used to identify the similarity between the pair of language characters. | 02-20-2014 |
20140052437 | VIRTUAL KEYBOARD SYSTEM WITH AUTOMATIC CORRECTION - There is disclosed an enhanced text entry system which uses word-level analysis to automatically correct inaccuracies in user keystroke entries on reduced keyboards such as those implemented on a touch-sensitive panel or display screen, or on mechanical keyboard systems. A method and system are defined which determine one or more alternate textual interpretations of each sequence of inputs detected within a designated auto-correcting keyboard region. | 02-20-2014 |
20140058721 | REAL TIME STATISTICS FOR CONTACT CENTER MOOD ANALYSIS METHOD AND APPARATUS - Methods and systems for providing a graphical depiction of a determined sentiment for a contact received at a contact center are provided. Moreover, the determined sentiment can be displayed for a grouping of contacts as an aggregated sentiment. The sentiment or aggregated sentiment can be displayed in real time or near real time. | 02-27-2014 |
20140058722 | Word Detection and Domain Dictionary Recommendation - New word detection and domain dictionary recommendation are provided. When text content is received according to a given language, for example, Chinese language, words are extracted from the content by analyzing the content according to a variety of rules. The words then are ranked for inclusion into one or more lexicons or domain dictionaries for future use for such functionalities as text input methods, spellchecking, grammar checking, auto entry completion, definition, and the like. In addition, when a user is entering or editing text according to one or more prescribed domain dictionaries, a determination may be made as to whether more helpful domain dictionaries may be available. When entered words have a high degree of association with a given domain dictionary, that domain dictionary may be recommended to the user to increase the accuracy of the user's input of additional text and editing of existing text. | 02-27-2014 |
20140058723 | METHOD AND SYSTEM FOR DISCOVERING SUSPICIOUS ACCOUNT GROUPS - In one exemplary embodiment, a system for discovering suspicious account groups establishes a language model according to the post contents from each account of a first group of accounts during a first time interval, to describe the speech of the account, and compares the similarity among a plurality of language models of the first group of accounts to cluster the first group of accounts; and for a plurality of newly added data during a second time interval, discovers near-synonyms of at least a monitored vocabulary set, and updates the near-synonyms to a plurality of language models of a second group of accounts. The system further integrates the first and the second groups of accounts, and re-clusters an integrated group of accounts. | 02-27-2014 |
20140058724 | Method of and System for Using Conversation State Information in a Conversational Interaction System - A method of using conversation state information in a conversational interaction system is disclosed. A method of inferring a change of a conversation session during continuous user interaction with an interactive content providing system includes receiving input from the user including linguistic elements intended by the user to identify an item, associating a linguistic element of the input with a first conversation session, and providing a response based on the input. The method also includes receiving additional input from the user and inferring whether or not the additional input from the user is related to the linguistic element associated with the conversation session. If related, the method provides a response based on the additional input and the linguistic element associated with the first conversation session. Otherwise, the method provides a response based on the second input without regard for the linguistic element associated with the first conversation session. | 02-27-2014 |
20140058725 | KEYBOARD SYSTEM WITH AUTOMATIC CORRECTION - Alternative textual interpretations of each sequence of inputs detected within an auto-correcting keyboard region are determined. Actual keystroke contract locations may occur outside the boundaries of specific keyboard key regions associated with the actual characters of word interpretations proposed for selection. The distance from each contact location to each corresponding intended character may increase with the expected frequency of the intended word. An intended word is selected from among generated interpretations and is automatically accepted for output. | 02-27-2014 |
20140067368 | DETERMINING SYNONYM-ANTONYM POLARITY IN TERM VECTORS - A document-term matrix may be generated based on a corpus. A term representation matrix may be generated based on modifying a plurality of elements of the document-term matrix based on antonym information included in the corpus. Similarities may be determined based on a plurality of elements of the term representation matrix. | 03-06-2014 |
20140067369 | METHODS AND SYSTEMS FOR ACQUIRING USER RELATED INFORMATION USING NATURAL LANGUAGE PROCESSING TECHNIQUES - Systems and methods for acquiring information associated with a user by using NLP techniques are disclosed. One or more phrases are classified in one or more categories at least partly on the basis of a period for which a product has been used by the user, the user's experience with the product, preferences of the user, or needs of the user by applying one or more natural language processing (NLP) techniques. The one or more phrases are extractable from an electronic publication at least partly on the basis of on a predefined set of verbs, a predefined set of domain-specific terms, and terms indicative of temporal information. One or more terms from the classified phrases are extracted, in which the one or more terms are indicative of the information about the user. | 03-06-2014 |
20140067370 | LEARNING OPINION-RELATED PATTERNS FOR CONTEXTUAL AND DOMAIN-DEPENDENT OPINION DETECTION - A method for extracting opinion-related patterns includes receiving a corpus of reviews, the reviews each including an explicit rating of a topic. The reviews are partitioned among a predefined plurality of classes, based on the ranking. Syntactic relations are identified in each review. The syntactic relations may each include an adjective and a noun. A set of patterns is generated, each of the patterns having at least one of the identified syntactic relations as an instance and the patterns clustered into a set of clusters based on a set of features. At least one of the features is based on occurrences, in the predefined classes, of the instances of the patterns. A polarity is assigned to ones of the clusters and propagated to patterns in the respective clusters. The polarity-labeled patterns can each be instantiated as a contextual rule for opinion mining. | 03-06-2014 |
20140067371 | CONTEXT SENSITIVE AUTO-CORRECTION - Methods, systems, and computer program products are provided for adaptively autocorrecting text according to context. Text may be received at a mobile electronic device that was input by a user. The received text may be displayed at a display component of the mobile electronic device. An auto-correct dictionary is selected from a plurality of auto-correct dictionaries. The auto-correct dictionary may be selected based at least on usage information that is representative of a usage context of the mobile electronic device. The displayed text is auto-corrected according to the selected auto-correct dictionary. | 03-06-2014 |
20140067372 | SCORING PREDICTIONS BASED ON PREDICTION LENGTH AND TYPING SPEED - A method that includes receiving an input, determining, by the processor, a likelihood that a predicted string associated with the received input matches an intended input string, where the determination is a function of at least one of a length of the predicted string and a typing speed associated with the received input, and displaying the predicted string. | 03-06-2014 |
20140067373 | METHOD AND APPARATUS FOR ENHANCED PHONETIC INDEXING AND SEARCH - The subject matter discloses a method two phase phonetic indexing and search comprising: receiving a digital representation of an audio signal; producing a phonetic index of the audio signal; producing phonetic N-gram sequence from the phonetic index by segmenting the phonetic index into a plurality of phonetic N-grams; and producing an inverted index of the plurality of phonetic N-grams. | 03-06-2014 |
20140067374 | SYSTEM AND METHOD FOR PHONETIC SEARCHING OF DATA - A method for phonetically searching media including a plurality of audio tracks is disclosed where each audio track is indexed to provide a phonetic representation of the audio track. The method comprises obtaining a text search query and searching for the text query against a set of reference documents to obtain a sub-set of pseudo-relevant documents. The pseudo-relevant documents are examined for a set of search expressions characterizing the pseudo-relevant documents. A phonetic representation corresponding to at least some of the set of search expressions is provided and for each of the phonetic representations of the search expressions, the indexed phonetic representations for one or more of the plurality of audio tracks is phonetically searched to provide any indicators of the incidence of the search expression within the one or more audio tracks. | 03-06-2014 |
20140067375 | Human-to-human Conversation Analysis - Customer support, and other types of activities in which there is a dialogue between two humans can generate large volumes of conversation records. Automated analysis of these records can provide information about high-level features of, for example, the workings of a customer service department. Analysis of these conversations between a customer and a customer-support agent may also allow identification of customer support activities that can be provided by virtual agents instead of actual human agents. The analysis may evaluate conversations in terms of complexity, duration, and sentiment of the participants. Additionally, the conversations may also be analyzed to identify the existence of selected concepts or keywords. Workflow characteristics, the extent to which the conversation represents a multi-step process intended to accomplish a task, may also be determined for the conversations. Characteristics of individual conversations may be combined to obtain generalized or representative features for a set of a conversation records. | 03-06-2014 |
20140067376 | Method and system for understanding text - A query answer engine converts a query and answers into a non-natural language forming units that correspond to words, symbols, numbers and spaces and combinations of the above, and establishes the contextual meaning of the units utilizing read, recognize and relate processes to populate a query matrix and an answer matrix with 3-valued points corresponding to the read, recognize and relate values of a unit. The two matrices are cross correlated to yield a robust answer to a query taking into account context. The conversion of text to units uses contextual meaning parametric tables, databases, libraries, rules and an engine for determining the focal point of a query or an answer. | 03-06-2014 |
20140067377 | METHOD AND APPARATUS FOR SITUATIONAL ANALYSIS TEXT GENERATION - Methods, apparatuses, and computer program products are described herein that are configured to generate a situational analysis text. In some example embodiments, a method is provided that comprises generating a set of messages based on one or more key events in a primary data channel and one or more significant events in one or more related data channels in response to an alert condition. The method of this embodiment may also include generating a situational analysis text based on the set of messages and the relationships between them. In some example embodiments, the situational analysis text is configured to linguistically express the one or more key events, the one or more significant events, and the relationships between the one or more key events and the one or more significant events. | 03-06-2014 |
20140067378 | Expert Conversation Builder - An expert conversation builder contains a knowledge database that includes a plurality of dialogues having nodes and edges arranged as directed acyclic graphs. Users and authors of the system interface with the knowledge database through a graphical interface to author dialogues and to create expert conversations as threads traversing the node in the dialogues. | 03-06-2014 |
20140067379 | AUTOMATIC SENTENCE EVALUATION DEVICE USING SHALLOW PARSER TO AUTOMATICALLY EVALUATE SENTENCE, AND ERROR DETECTION APPARATUS AND METHOD OF THE SAME - An automatic sentence evaluating device using a shallow parser. A simple grammatical error and an error in sentence structure are detected by generating a string of parts of speech using n-gram for a composed input sentence and parsing the generated string of parts of speech on the basis of a rule (shallow parsing) defined according to a connective relationship between adjacent parts of speech, and a corrected draft is proposed for the detected errors to thereby increase accuracy of sentence evaluation, and an error detection apparatus and a method for the same. | 03-06-2014 |
20140067380 | METHOD TO ASSIGN WORD CLASS INFORMATION - An assignment device ( | 03-06-2014 |
20140074454 | Conversational Virtual Healthcare Assistant - A conversation user interface enables patients to better understand their healthcare by integrating diagnosis, treatment, medication management, and payment, through a system that uses a virtual assistant to engage in conversation with the patient. The conversation user interface conveys a visual representation of a conversation between the virtual assistant and the patient. An identity of the patient, including preferences and medical records, is maintained throughout all interactions so that each aspect of this integrated system has access to the same information. The conversation user interface presents allows the patient to interact with the virtual assistant using natural language commands to receive information and complete task related to his or her healthcare. | 03-13-2014 |
20140074455 | METHOD AND SYSTEM FOR MOTIF EXTRACTION IN ELECTRONIC DOCUMENTS - A method, system, and computer program product for extracting text motifs from the electronic documents is disclosed. A user provides a largest-maximal repeat or a super-maximal repeat as a first text block. The occurrences of the first text block are detected to identify the second text blocks in the vicinity of the occurrences of the first text block on the basis of pre-defined parameters. The text motifs are determined by combining the first text block and the second text block. Finally, the text motifs are extracted from the electronic documents. | 03-13-2014 |
20140074456 | REFINING HIERARCHIES IN OBJECT-ORIENTED MODELS - Embodiments are directed to refining hierarchies in object-oriented models. A method includes providing a business object model in the form of an object-oriented model having one or more members with multiple distinct verbalizations and identifying distinct verbalizations of a given business object model member. The method also includes reviewing existing rules of the business object model to produce mappings of the distinct verbalizations and any attributes or operations used in conjunction with the distinct verbalizations of members of the business object model and analysing the mappings to identify patterns of use of the distinct verbalizations. The method further includes categorising a distinct verbalization as a superclass or subclass. | 03-13-2014 |
20140074457 | REPORT GENERATING SYSTEM, NATURAL LANGUAGE PROCESSING APPARATUS, AND REPORT GENERATING APPARATUS - A report generating system includes a service information acquiring unit that acquires service information, a natural language processing unit that performs natural language processing on the acquired service information to extract a word, a replacing unit that replaces the word extracted from the service information with word identification information by referring to a word list in which the word and the word identification information are associated, a word identification information output unit that outputs the replaced word identification information, a word identification information acquiring unit that acquires the output word identification information, a counting unit that counts a total number of pieces of word identification information with a same value among acquired pieces of the word identification information, for each of values of the pieces of the word identification information, a generating unit that generates a report based on a counting result. | 03-13-2014 |
20140074458 | PROBABILITY-BASED APPROACH TO RECOGNITION OF USER-ENTERED DATA - A method for entering keys in a small key pad is provided. The method comprising the steps of: providing at least a part of keyboard having a plurality of keys; and predetermining a first probability of a user striking a key among the plurality of keys. The method further uses a dictionary of selected words associated with the key pad and/or a user. | 03-13-2014 |
20140081623 | METHOD FOR PROCESSING MEDICAL REPORTS - A method for processing medical reports is disclosed. An initial segmentation textual contents of a medical report into information units is performed using natural language processing methods, thereby addressing the challenge of identifying information units, such as text fragments, sentences or text passages, covered within medical reports that are of relevance for a particular context. The information units are then classified into at least one context class to determine their appropriate context classes for a particular situation or application. The context classifications are created using a grammar, e.g., a context-free grammar, and can then be used for automatically assigning each information unit to an appropriate context meta-information. The medical report is then annotated by assigning at least one of said information units to at least one context meta-information determined by said context class. The context meta-information may be then used by other applications. | 03-20-2014 |
20140081624 | Methods, Systems, and Program Products for Navigating Tagging Contexts - Methods and systems are described for navigating tagging contexts. In an aspect, In a first tagging context, a first tagging is identified of a first resource with a first tag. The first tagging is determined to be in a second tagging context. In the second tagging context and in response to identifying the first tagging, a second tagging that is not in the first tagging context is detected. | 03-20-2014 |
20140081625 | Natural Language Image Spatial and Tonal Localization - Natural language image spatial and tonal localization techniques are described. In one or more implementations, a natural language input is processed to determine spatial and tonal localization of one or more image editing operations specified by the natural language input. Performance is initiated of the one or more image editing operations on image data using the determined spatial and tonal localization. | 03-20-2014 |
20140081626 | Natural Language Vocabulary Generation and Usage - Natural language vocabulary generation and usage techniques are described. In one or more implementations, one or more search results are mined for a domain to determine a frequency at which words occur in the one or more search results, respectively. A set of the words is selected based on the determined frequency. A sense is assigned to each of the selected set of the words that identifies a part-of-speech for a respective word. A vocabulary is then generated that includes the selected set of the words and a respective said sense, the vocabulary configured for use in natural language processing associated with the domain. | 03-20-2014 |
20140088952 | SYSTEMS AND METHODS FOR AUTOMATIC PROGRAM RECOMMENDATIONS BASED ON USER INTERACTIONS - Methods and systems are provided for generating automatic program recommendations based on user interactions. In some embodiments, control circuitry processes verbal data received during an interaction between a user of a user device and a person with whom the user is interacting. The control circuitry analyzes the verbal data to automatically identify a media asset referred to during the interaction by at least one of the user and the person with whom the user is interacting. The control circuitry adds the identified media asset to a list of media assets associated with the user of the user device. The list of media assets is transmitted to a second user device of the user. | 03-27-2014 |
20140088953 | LINGUISTICAL ANALYTIC CONSOLIDATION FOR MOBILE CONTENT - A method for linguistical analytic consolidation is described. The method includes displaying a user interface on a mobile device. The method also includes receiving source text content to display in the user interface. The method also includes scanning the source text content for a specific element. The method also includes flagging the specific element of the source text content to be modified according to a set of linguistic rules. Modifying the specific element according to the set of linguistic rules results in a consolidated form of the source text content. | 03-27-2014 |
20140088954 | APPARATUS AND METHOD PERTAINING TO AUTOMATICALLY-SUGGESTED EMOTICONS - These teachings provide for automatically using content from a received text-based message to identify at least one context-relevant emoticon and then automatically displaying that context-relevant emoticon such that a user can select the context-relevant emoticon to include in a text-based response to that received message. | 03-27-2014 |
20140088955 | MOBILE TERMINAL AND CONTROLLING METHOD THEREOF - A mobile terminal and controlling method thereof are disclosed, by which a feedback matching a meaning of a natural language is outputted in the course of outputting a sound corresponding to the natural language by sentence unit. The present invention includes a display unit configured to display a text by sentence unit, an audio output module configured to output a synthetic sound generated from converting the text to a sound, and a controller configured to generate the synthetic sound, extract a meaning of the text, and control a feedback matching the meaning of the text to be outputted while the synthetic sound is outputted via the audio output module. | 03-27-2014 |
20140088956 | HANDHELD ELECTRONIC DEVICE WITH REDUCED KEYBOARD AND ASSOCIATED METHOD OF PROVIDING QUICK TEXT ENTRY IN A MESSAGE - An improved handheld electronic device having a reduced keyboard provides facilitated language entry by making available to a user certain words that a user may reasonably be expected to enter. In some situations, certain words can be stored, for example, in a temporary dictionary for use in particular situations. For instance, the names of the recipients of an electronic message might be stored in a temporary dictionary for rapid retrieval when entering a salutation in the message. As another example, a number of the words in an existing electronic message may be stored in a temporary dictionary and made available to a user when replying to or forwarding the message since the existing message might include words that the user might reasonably be expected to type in the reply message or the forwarded message. | 03-27-2014 |
20140095145 | RESPONDING TO NATURAL LANGUAGE QUERIES - Disclosed herein are a system, non-transitory computer-readable medium, and method for responding to natural language queries. Keywords likely to appear in a natural language query are determined and each likely keyword is associated with a module. A response to a natural language query comprises information generated by each module associated with a likely keyword appearing in the natural language query. | 04-03-2014 |
20140095146 | DOCUMENTATION OF SYSTEM MONITORING AND ANALYSIS PROCEDURES - A method, computer system, and computer program product to document system analysis procedures. The method includes a computer receiving text in a text editor and determining that the received text is a command relevant to a system under analysis. The method further includes the computer receiving a request to execute the command, and then requesting, from the system under analysis, the output data from the executed command. The output data is then inserted into the text editor. | 04-03-2014 |
20140095147 | Situation Aware NLU/NLP - An arrangement and corresponding method are described for natural language processing. A natural language understanding (NLU) arrangement processes a natural language input to determine a corresponding sentence-level interpretation. A user state component maintains user context data that characterizes an operating context of the NLU arrangement. Operation of the NLU arrangement is biased by the user context data. | 04-03-2014 |
20140095148 | EMOTION IDENTIFICATION SYSTEM AND METHOD - A system and method for identifying emotion in text that connotes authentic human expression, and training an engine that produces emotional analysis at various levels of granularity and numerical distribution across a set of emotions at each level of granularity. The method may include determining similarity between textual data and an emotion, and classifying emotions as similar emotions. | 04-03-2014 |
20140095149 | EMOTION IDENTIFICATION SYSTEM AND METHOD - A system and method for identifying emotion in text that connotes authentic human expression, and training an engine that produces emotional analysis at various levels of granularity and numerical distribution across a set of emotions at each level of granularity. The method may include classifying textual data as emotional textual data or non-emotional textual data, and determining duration of an emotional state. | 04-03-2014 |
20140095150 | EMOTION IDENTIFICATION SYSTEM AND METHOD - A system and method for identifying emotion in text that connotes authentic human expression, and training an engine that produces emotional analysis at various levels of granularity and numerical distribution across a set of emotions at each level of granularity. The method may include producing a chart of data transmissions referenced against time, comparing filtered data transmissions to a database, and selecting a database based on a demographic class of an author. | 04-03-2014 |
20140095151 | EXPRESSION TRANSFORMATION APPARATUS, EXPRESSION TRANSFORMATION METHOD AND PROGRAM PRODUCT FOR EXPRESSION TRANSFORMATION - According to one embodiment, an expression transformation apparatus includes a processor; an input unit configured to input a sentence of a speaker as a source expression; a detection unit configured to detect a speaker attribute representing a feature of the speaker; a normalization unit configured to transform the source expression to a normalization expression including an entry and a feature vector representing a grammatical function of the entry; an adjustment unit configured to adjust the speaker attribute to a relative speaker relationship between the speaker and another speaker, based on another speaker attribute of the other speaker; and a transformation unit configured to transform the normalization expression based on the relative speaker relationship. | 04-03-2014 |
20140095152 | DIALOGUE SYSTEM USING EXTENDED DOMAIN AND NATURAL LANGUAGE RECOGNITION METHOD AND COMPUTER-READABLE MEDIUM THEREOF - A dialogue system uses an extended domain in order to have a dialogue with a user using natural language. If a dialogue pattern actually input by the user is different from a dialogue pattern predicted by an expert, an extended domain generated in real time based on user input is used and an extended domain generated in advance is used to have a dialogue with the user. | 04-03-2014 |
20140100846 | NATURAL LANGUAGE METRIC CONDITION ALERTS GENERATION - Enterprise data sources can be monitored to detect metric conditions via rules, and alerts can be generated. The alerts can be presented as natural language descriptions of business metric conditions. From an alert, the reader can navigate to a story page that presents additional detail and allows further navigation within the data. Additional detail presented can include a drill down synopsis, strategies for overcoming a negative condition, links to discussions within the organization about the condition, options for sharing or collaborating about the condition, or the like. | 04-10-2014 |
20140108004 | TEXT/CHARACTER INPUT SYSTEM, SUCH AS FOR USE WITH TOUCH SCREENS ON MOBILE PHONES - A system and method for receiving character input from a user includes a programmed processor that receives inputs from the user and disambiguates the inputs to present character sequence choices corresponding to the input characters. In one embodiment, a first character input is received and a corresponding first recognized character is stored in a temporary storage buffer and displayed to the user for editing. After a predetermined number of subsequent input characters and/or predetermined amount of time without being edited, the system determines that the first recognized character is the intended character input by the user and removes the first recognized character from the buffer, thereby inhibiting future editing. | 04-17-2014 |
20140108005 | Universal Language Classification Devices, Systems, and Methods - A computer-implemented method, implemented, at least in part, by hardware in combination with software, the method includes (A) obtaining text from a document; (B) parsing said text using at least one parallel sentence parsing process to obtain sentence data from said text; (C) parsing said sentence data using at least one parallel noun parsing process to obtain text data from said sentence data; (D) scoring said text data using at least one term scorer process and a known word list to obtain scored terms corresponding to said text data; and (E) determining known word scores corresponding to said text data, using said known word list, wherein said known word scores comprise base scores and category penetration scores; wherein steps (B), (C), (D), and (E) operate in parallel for at least some of the text from the document. | 04-17-2014 |
20140108006 | SYSTEM AND METHOD FOR ANALYZING AND MAPPING SEMIOTIC RELATIONSHIPS TO ENHANCE CONTENT RECOMMENDATIONS - A system and method described in this disclosure seeks to create new ways of defining and mapping relationships between content items in order to create more relevant content recommendations. Semiotic analysis, unlike semantic analysis, looks at how words mean rather than what words mean. Semiotics can define an emotional context for content items, which may be leveraged into content recommendations to users, creating more personalized and meaningful recommendations. The system and method analyze the semiotic context by analyzing the semiotic nature of the content itself through analysis of the writing style or genre of the content item, and the tone in which the content item is written; by analyzing the semiotic nature of the entities extracted from content items; and by analyzing the semiotic nature of the publisher or author who created the content item. | 04-17-2014 |
20140114643 | Autocaptioning of images - The description relates to sentence autocaptioning of images. One example can include a set of information modules and a set of sentence generation modules. The set of information modules can include individual information modules configured to operate on an image or metadata associated with the image to produce image information. The set of sentence generation modules can include individual sentence generation modules configured to operate on the image information to produce a sentence caption for the image. | 04-24-2014 |
20140114644 | METHOD AND APPARATUS FOR SIMULATED FAILOVER TESTING - Implementations of the present disclosure involve a system and method for simulating a storage cluster testing system. The method and system includes a processor configured to instructions stored on a memory to produce a simulation interface. The simulation interface includes an abstraction layer that receives verbs from a test driver and passes the verbs to one of two or more plugins. The plugins may include a synthetic plugin configured to translate the verbs into one or more command and send commands to a simulated storage appliance that is a computing device with relatively lower performance than an actual storage appliance. The simulated storage appliance may act in place of two storage appliances clustered to form a storage cluster. The simulated storage appliance forms a simulated storage cluster. The simulated storage cluster simulates the performance of the verb on by the storage cluster. | 04-24-2014 |
20140114645 | INFORMATION MANAGEMENT SYSTEMS AND METHODS - Management of information includes analyzing an inquiry with a language processor, and generating a keyword associated with the inquiry based on the analysis. Further, it may be determined whether an inquiry with the generated keyword previously has been received and stored in a memory and, when it is determined that such an inquiry previously has been received, the inquiry is retrieved from the memory. A recipient of the inquiry is assigned based on an inputted recipient or the keyword, and a recipient of an answer to the inquiry is designated. The inquiry is transmitted to the assigned recipient of the inquiry. An answer to the inquiry is received, the inquiry is marked as answered, and the answer is transmitted to the designated recipient of the answer. The inquiry, the keyword associated with the inquiry, and the answer to the inquiry are stored in the memory. | 04-24-2014 |
20140114646 | CONVERSATION ANALYSIS SYSTEM FOR SOLUTION SCOPING AND POSITIONING - A system receives vocal input from one or more persons, and extracts one or more keywords from the vocal input. The system then generates a query using the one or more keywords, searches a database of products and services using the query, and identities a product or service as a function of the query. | 04-24-2014 |
20140114647 | SYSTEMS FOR DYNAMICALLY GENERATING AND PRESENTING NARRATIVE CONTENT - In some embodiments, a non-transitory processor-readable medium stores code representing instructions that when executed cause a processor to select a narrative content template based at least in part on a predetermined content type associated with a real-world and/or virtual event. The code further represents instructions that when executed cause the processor to select a narrative tone type. The code further represents instructions that when executed cause the processor to, for each phrase included in an ordered set of phrases associated with the narrative content template, select, based at least in part on the narrative tone type, a phrase variation from a set of phrase variations associated with that phrase, and define, based on the selected phrase variation and at least one datum from a set of data, a narrative content portion associated with the real-world event. The code further represents instructions that when executed cause the processor to output, at a display, the narrative content portion. | 04-24-2014 |
20140114648 | METHOD FOR DETERMINING A SENTIMENT FROM A TEXT - A method for determining a sentiment, including determining, from a text including formatting information related to parts of the text, a sentiment expressed by at least one of the parts, wherein the sentiment is determined automatically using a microprocessor and depends on formatting information related to the at least one of the parts. | 04-24-2014 |
20140114649 | METHOD AND SYSTEM FOR SEMANTIC SEARCHING - A method and system for facilitating a semantic search based on one or more corpuses of natural language texts are provided. One or more corpuses of natural language texts are received including indexed linguistic parameters and semantic structures of lexical units. The linguistic parameters and semantic structures are generated during a preliminary syntactico-semantic analysis. Searching for text fragments satisfying a query in the one or more corpuses is performed. Relevance of the search results is estimated. | 04-24-2014 |
20140122056 | CHATBOT SYSTEM AND METHOD WITH ENHANCED USER COMMUNICATION - A chatbot system and method with enhanced user communication. A termination mark signifies that a chatbot input message sentence is complete. The chatbot system responds to the input message before a plurality of sentences are entered along with the input message sentence. | 05-01-2014 |
20140122057 | TECHNIQUES FOR INPUT METHOD EDITOR LANGUAGE MODELS USING SPATIAL INPUT MODELS - A computer-implemented technique includes receiving, at a computing device including one or more processors, a touch input. The technique includes determining, at the computing device, one or more characters and one or more first probability scores using a spatial model and a position of the touch input with respect to a virtual keyboard displayable at the computing device, the one or more characters being from the virtual keyboard, the one or more first probability scores being associated with the one or more characters, respectively. The technique includes determining, at the computing device, a word based on the one or more characters and the one or more first probability scores using a language model. The technique also includes displaying, at the computing device, the word. | 05-01-2014 |
20140122058 | Automatic Transcription Improvement Through Utilization of Subtractive Transcription Analysis - A mechanism is provided for subtractive transcript improvement. The mechanism identifies a set of corrections made to a previous transcript, where the set of corrections comprise, for each correction in the set of corrections, an erred phrase and a correction made to the erred phrase. For each erred phrase in a set of erred phrases in a current transcript, the mechanism determines whether the erred phrase in the current transcript matches an erred phrase in the set of corrections made to the previous transcript. Responsive to the erred phrase in the current transcript matching an erred phrase in the set of corrections made to the previous transcript, the mechanism corrects the erred phrase in the current transcript with the correction made to the erred phrase in the previous transcript. | 05-01-2014 |
20140122059 | METHOD AND SYSTEM FOR VOICE BASED MEDIA SEARCH - Voice-based input is used to operate a media device and/or to search for media content. Voice input is received by a media device via one or more audio input devices and is translated into a textual representation of the voice input. The textual representation of the voice input is used to search one or more cache mappings between input commands and one or more associated device actions and/or media content queries. One or more natural language processing techniques may be applied to the translated text and the resulting text may be transmitted as a query to a media search service. A media search service returns results comprising one or more content item listings and the results may be presented on a display to a user. | 05-01-2014 |
20140122060 | HYBRID COMPRESSION OF TEXT-TO-SPEECH VOICE DATA - Recorded or synthesized speech segments of text-to-speech (TTS) systems may be compressed though the use of both time domain compression and perceptual compression techniques. The twice-compressed recording may be separated into speech segments corresponding to words or subword units for use in a TTS system. The compression rate of time domain compression, and the ratio of time domain compression to perceptual compression, may be modified for any speech segment. The compression amount or ratio may be determined based on linguistic or acoustic features of the word or subword unit that the speech segment represents. Differing compression amounts and ratios may be applied to portions of a single speech segment. | 05-01-2014 |
20140122061 | REGULAR EXPRESSION WORD VERIFICATION - The present disclosure is directed to a method of verifying a compound word. The method includes receiving an input signal indicative of a textual input and accessing a rule and a lexical data structure from data stores. The rule is applied to the textual input to determine whether the textual input is a valid compound word. An output signal is provided that is indicative of whether the textual input is a compound word. | 05-01-2014 |
20140122062 | AUTOMATIC CONTEXT SENSITIVE LANGUAGE GENERATION, CORRECTION AND ENHANCEMENT USING AN INTERNET CORPUS - A computer-assisted language generation system including sentence retrieval functionality, operative on the basis of an input text containing words, to retrieve from an internet corpus a plurality of sentences containing words which correspond to the words in the input text and sentence generation functionality operative using a plurality of sentences retrieved by the sentence retrieval functionality from the internet corpus to generate at least one correct sentence giving expression to the input text. | 05-01-2014 |
20140129210 | System And Method For Extracting And Reusing Metadata To Analyze Message Content - A system and method for extracting and reusing metadata to analyze messages is provided. A stream of messages is monitored. Those messages with a predetermined message component pointing to a referent are identified. Words that are related to the referent are extracted from each of the messages. A local similarity of the identified messages is determined by comparing the extracted words of each message. A global similarity of the identified messages is determined by combining the extracted words from all the identified messages and by comparing the combined extracted words with extracted words from all messages that include a different referent. A determination is made as to whether one or more of the extracted words from the identified messages are descriptive of the referent based on the local and global comparisons. | 05-08-2014 |
20140129211 | SVO-BASED TAXONOMY-DRIVEN TEXT ANALYTICS - Organizing textual data into statement clusters. Sentences are extracted from textual data and parsed. A verb usage pattern is identified and an SVO triplet is determined. The SVO triplet is compared to a taxonomy associated with the domain of the data and a sentiment is derived. A statement cluster is constructed comprising a higher level SVO triplet sensitive to the taxonomy and verb usage pattern, as well as the derived sentiment. Accordingly, the statement clusters may be organized by grouping. | 05-08-2014 |
20140129212 | Universal Difference Measure - Described herein are methods for finding substantially similar/different sources (files and documents), and estimating similarity or difference between given sources. Similarity and difference may be found across a variety of formats. Sources may be in one or more languages such that similarity and difference may be found across any number and types of languages. A variety of characteristics may be used to arrive at an overall measure of similarity or difference including determining or identifying syntactic roles, semantic roles and semantic classes in reference to sources. | 05-08-2014 |
20140129213 | SVO-BASED TAXONOMY-DRIVEN TEXT ANALYTICS - Organizing textual data into statement clusters. Sentences are extracted from textual data and parsed. A verb usage pattern is identified and an SVO triplet is determined. The SVO triplet is compared to a taxonomy associated with the domain of the data and a sentiment is derived. A statement cluster is constructed comprising a higher level SVO triplet sensitive to the taxonomy and verb usage pattern, as well as the derived sentiment. Accordingly, the statement clusters may be organized by grouping. | 05-08-2014 |
20140136183 | Distributed NLU/NLP - An arrangement and corresponding method are described for distributed natural language processing. A set of local data sources is stored on a mobile device. A local natural language understanding (NLU) match module on the mobile device performs natural language processing of a natural language input with respect to the local data sources to determine one or more local interpretation candidates. A local NLU ranking module on the mobile device processes the local interpretation candidates and one or more remote interpretation candidates from a remote NLU server to determine a final output interpretation corresponding to the natural language input. | 05-15-2014 |
20140136184 | TEXTUAL AMBIGUITY RESOLVER - A textual ambiguity resolver system for disambiguating textual elements in information transferred over a communications network comprising a database; and a disambiguation processor adapted to perform a parsing operation on the transferred information, including an ambiguous mapping extractor module to identify at least one ambiguous textual element in the transferred information and to map said ambiguous textual element to at least one interpretation candidate in an ontology, a lexical resolver module to determine a relationship between said ambiguous textual element and an idiom phrase, a named-entity resolver module to determine a relationship between said ambiguous textual element and a named-entity element, a syntactic resolver module to determine a relationship between said ambiguous textual element and a syntactic compound, and a classification resolver module to determine a relationship between said ambiguous textual element and a linguistic pattern. | 05-15-2014 |
20140136185 | SENTIMENT ANALYSIS BASED ON DEMOGRAPHIC ANALYSIS - A method, apparatus and article of manufacture for analyzing product or service reviews is disclosed. In one embodiment, the method comprises the steps of performing a demographic text analysis on a product or service review generated by a reviewer, wherein the demographic text analysis examines the product or service review to determine demographic information of the reviewer. A sentiment text analysis is performed on the product or service review, wherein the sentiment text analysis examines the product or service review to determine a sentiment of the product or service review. The sentiment of the product or service review is categorized based on the demographic information of the reviewer. | 05-15-2014 |
20140136186 | METHOD AND SYSTEM FOR GENERATING AN ALTERNATIVE AUDIBLE, VISUAL AND/OR TEXTUAL DATA BASED UPON AN ORIGINAL AUDIBLE, VISUAL AND/OR TEXTUAL DATA - A computer implemented method and system for generating an alternative audible, visual and/or textual data based upon an original audible, visual and/or textual data comprising the step of inputting to a processor original audible, visual and/or textual data having an original plot, extracting a plurality of basic segments from the original audible, visual and/or textual data, defining a vocabulary of intermediate-level semantic concepts based on the plurality of basic segments and/or the original plot, inputting to the processor at least an alternative plot based upon the original plot, modifying the alternative plot in terms of the vocabulary of intermediate-level semantic concepts for generating a modified alternative plot, and modifying the plurality of basic segments of the original audible, visual and/or textual data in terms of said vocabulary of intermediate-level semantic concepts for generating a modified plurality of basic segments. | 05-15-2014 |
20140136187 | VEHICLE PERSONAL ASSISTANT - A vehicle personal assistant to engage a user in a conversational dialog about vehicle-related topics, such as those commonly found in a vehicle owner's manual, includes modules to interpret spoken natural language input, search a vehicle knowledge base and/or other data sources for pertinent information, and respond to the user's input in a conversational fashion. The dialog may be initiated by the user or more proactively by the vehicle personal assistant based on events that may be currently happening in relation to the vehicle. The vehicle personal assistant may use real-time inputs obtained from the vehicle and/or non-verbal inputs from the user to enhance its understanding of the dialog and assist the user in a variety of ways. | 05-15-2014 |
20140136188 | NATURAL LANGUAGE PROCESSING SYSTEM AND METHOD - A natural language processing system is disclosed herein. Embodiments of the NLP system perform hand-written rule-based operations that do not rely on a trained corpus. Rules can be added or modified at any time to improve accuracy of the system, and to allow the same system to operate on unstructured plain text from many disparate contexts (e.g. articles as well as twitter contexts as well as medical articles) without harming accuracy for any one context. Embodiments also include a language decoder (LD) that generates information which is stored in a three-level framework (word, clause, phrase). The LD output is easily leveraged by various software applications to analyze large quantities of text from any source in a more sophisticated and flexible manner than previously possible. A query language (LDQL) for information extraction from NLP parsers' output is disclosed, with emphasis on its embodiment implemented for LD. It is also presented, how to use LDQL for knowledge extraction on the example of application named Knowledge Browser. | 05-15-2014 |
20140136189 | Phrase-Based Dialogue Modeling With Particular Application to Creating a Recognition Grammar - The invention enables creation of grammar networks that can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces (NLVUI). The invention enables phrase-based modeling of generic structures of verbal interaction to be used for the purpose of automating part of the design of such grammar networks. Most particularly, the invention enables such grammar networks to be used in providing a voice-controlled user interface to human readable text data that is also machine-readable (such as a Web page, a word processing document, a PDF document, or a spreadsheet). | 05-15-2014 |
20140142920 | Method and apparatus for Utilizing Structural Information in Semi-Structured Documents to Generate Candidates for Question Answering Systems - An approach to candidate answer generation by leveraging structural information in semi-structured resources, such as the title of a document and anchor texts in a document. | 05-22-2014 |
20140142921 | AUTOMATED STATISTICS CONTENT PREPARATION - Various embodiments are generally directed to automated searching and comparison of game statistics to identify, rank and present statistically significant events related to game play during and/or after a game in automatically generated sentences. An apparatus comprises a processor circuit and storage storing instructions operative on the processor circuit to receive signals conveying a first set of statistical information closely related to play of a first game; search the first set of statistical information for a first set of statistical anomalies; and in response to the first set of statistical anomalies comprising an insufficient number of statistical anomalies, search a second set of statistical information less closely related to play of the first game for a second set of statistical anomalies, and transmit a multitude of sentences describing statistical anomalies of the first and second sets of statistical anomalies to a computing device. Other embodiments are described and claimed herein. | 05-22-2014 |
20140142922 | NLP-BASED ENTITY RECOGNITION AND DISAMBIGUATION - Methods and systems for entity recognition and disambiguation using natural language processing techniques are provided. Example embodiments provide an entity recognition and disambiguation system (ERDS) and process that, based upon input of a text segment, automatically determines which entities are being referred to by the text using both natural language processing techniques and analysis of information gleaned from contextual data in the surrounding text. In at least some embodiments, supplemental or related information that can be used to assist in the recognition and/or disambiguation process can be retrieved from knowledge repositories such as an ontology knowledge base. In one embodiment, the ERDS comprises a linguistic analysis engine, a knowledge analysis engine, and a disambiguation engine that cooperate to identify candidate entities from a knowledge repository and determine which of the candidates best matches the one or more detected entities in a text segment using context information. | 05-22-2014 |
20140142923 | TEXT PREDICTION USING ENVIRONMENT HINTS - Provided are techniques for text prediction using environment hints. A list of words is received, wherein each word in the list of words has an associated weight. For at least one word in the list of words, an environment weight is obtained from an environment dictionary. The associated weight of the at least one word is updated using the obtained environment weight. The words in the list of words are ordered based on the updated, associated weight of each of the words. | 05-22-2014 |
20140142924 | SYSTEM AND METHOD FOR LANGUAGE EXTRACTION AND ENCODING - Improved systems and methods for extracting information from medical and natural-language text data. | 05-22-2014 |
20140149105 | IDENTIFYING PRODUCT REFERENCES IN USER-GENERATED CONTENT - Systems and methods are disclosed herein for extracting products referenced in a document. A document is analyzed to identify a product type that is referenced in the document. Attributes are extracted from the document. A set of candidate products are identified corresponding to the extracted attributes. A score is calculated for the candidate products and the products are further selected or filtered based on the score, whitelist rules, and blacklist rules in order to identify one or more inferred products referenced by the document. The whitelist and blacklist rules may take as inputs a domain, a user identifier, and keywords included in the document. A set of sufficient attributes may be identified for each product type. Selection of a candidate product may be based at least in part on the document including all of the attributes in the set of sufficient attributes. | 05-29-2014 |
20140149106 | Categorization Based on Word Distance - Examples disclosed herein relate to categorizing a target word based on word distance. A processor may determine a difference level threshold for a category based on difference levels between words associated with the category and determine difference levels between a target word and the words associated with the category. If one of the difference levels of the target word is below the threshold associated with the category, the processor outputs the category. | 05-29-2014 |
20140149107 | SYSTEMS AND METHODS FOR NATURAL LANGUAGE GENERATION - A method includes receiving a corpus comprising a set of pre-segmented texts. The method further includes creating a plurality of modified pre-segmented texts for the set of pre-segmented texts by extracting a set of semantic terms for each pre-segmented text within the set of pre-segmented texts and applying at least one domain tag for each pre-segmented text within the set of pre-segmented texts. The method further includes clustering the plurality of modified pre-segmented texts into one or more conceptual units, wherein each of the one or more conceptual units is associated with one or more templates, wherein each of the one or more templates corresponds to one of the plurality of modified pre-segmented texts. | 05-29-2014 |
20140149108 | ADAPTIVE CONSTRUCTION OF A STATISTICAL LANGUAGE MODEL - A statistical language model (SLM) may be iteratively refined by considering N-gram counts in new data, and blending the information contained in the new data with the existing SLM. A first group of documents is evaluated to determine the probabilities associated with the different N-grams observed in the documents. An SLM is constructed based on these probabilities. A second group of documents is then evaluated to determine the probabilities associated with each N-gram in that second group. The existing SLM is then evaluated to determine how well it explains the probabilities in the second group of documents, and a weighting parameter is calculated from that evaluation. Using the weighting parameter, a new SLM is then constructed as a weighted average of the existing SLM and the new probabilities. | 05-29-2014 |
20140149109 | SYSTEM, METHODS AND AUTOMATED TECHNOLOGIES FOR TRANSLATING WORDS INTO MUSIC AND CREATING MUSIC PIECES - Systems, methods and computer program products are provided for translating a natural language into music. Through systematic parsing, music compositions can be created. These compositions can be created by one or more persons who do not speak the same natural language. | 05-29-2014 |
20140149110 | NLP-BASED SYSTEMS AND METHODS FOR PROVIDING QUOTATIONS - Techniques for providing quotations obtained from text documents using natural language processing techniques are described. Some embodiments provide a content recommendation system (“CRS”) configured to provide quotations by extracting quotations from a corpus text documents, and providing access to the extracted quotations in response to search requests received from users. The CRS may extract quotations by using natural language processing-based techniques to identify one or more entities, such as people, places, objects, concepts, or the like, that are referenced by the extracted quotations. The CRS may then store the extracted quotations along with identified entities, such as quotation speakers and subjects, for later access via search requests. | 05-29-2014 |
20140156259 | Generating Stimuli for Use in Soliciting Grounded Linguistic Information - A processing system is described which generates stimulus information (SI) having one or more stimulus components (SCs) selected from an inventory of such components. The processing system then presents the SI to a group of human recipients, inviting those recipients to provide linguistic descriptions of the SI. The linguistic information that is received thereby has an implicit link to the SCs. Further, each linguistic component is associated with at least one feature of a target environment, such as a target computer system. Hence, the linguistic information also maps to the features of the target environment. These relationships allow applications to use the linguistic information to interact with the target environment in different ways. In one case, the processing system uses a challenge-response authentication task presentation to convey the stimulus information to the recipients. | 06-05-2014 |
20140156260 | GENERATING SENTENCE COMPLETION QUESTIONS - The subject disclosure is directed towards automated processes for generating sentence completion questions based at least in part on a language model. Using the language model, a sentence is located, and alternates for a focus word (or words) in the sentence are automatically provided. Also described is automated filtering candidate sentences to locate the sentence, filtering the alternates based upon elimination criteria, scoring sentences with the correct word and as modified the alternates, and ranking the alternates. Manual selection may be used along with the automated processes. | 06-05-2014 |
20140156261 | DETERMINING SIMILARITY OF UNFIELDED NAMES USING FEATURE ASSIGNMENTS - Provided are techniques for comparing names. A first phrase score is obtained by comparing a name phrase in a first name to a name phrase in a second name. A second phrase score is obtained by comparing another name phrase in the first name to another name phrase in the second name. An overall score is generated based on the obtained first phrase score and the obtained second phrase score. The overall score is updated based on comparing features of the first name with features of the second name. | 06-05-2014 |
20140156262 | Systems and Methods for Character String Auto-Suggestion Based on Degree of Difficulty - In one embodiment, a method includes receiving one or more characters of a character string as a user enters the character string into a graphical user interface (GUI) of a computing device. The method also includes determining a degree of difficulty of the user entering the character string into the GUI of the computing device. The method further includes, if the degree of difficulty is at least approximately equal to or exceeds a pre-determined threshold, providing for display to the user an auto-suggestion for completing the character string for the user. | 06-05-2014 |
20140156263 | DETERMINING SIMILARITY OF UNFIELDED NAMES USING FEATURE ASSIGNMENTS - Provided are techniques for comparing names. A first phrase score is obtained by comparing a name phrase in a first name to a name phrase in a second name. A second phrase score is obtained by comparing another name phrase in the first name to another name phrase in the second name. An overall score is generated based on the obtained first phrase score and the obtained second phrase score. The overall score is updated based on comparing features of the first name with features of the second name. | 06-05-2014 |
20140156264 | OPEN LANGUAGE LEARNING FOR INFORMATION EXTRACTION - A system for extracting relational tuples from sentences is provided. The system includes a bootstrapper, an open pattern learner, and a pattern matcher. The bootstrapper generates training data by, for each of a plurality of seed tuples, identifying sentences of a corpus that contains the words of the seed tuple. The open pattern learner learns, from the seed tuples and sentence pairs, open patterns that encode ways in which relational tuples may be expressed in a sentence, The pattern matcher matches the open patterns to a dependency parse of a sentence, identifies base nodes of the dependency parse for the arguments and relation for the relational tuple that the open pattern encodes, and expands the arguments and relation of the relational tuple. | 06-05-2014 |
20140156265 | METHOD AND SYSTEM FOR CONVEYING AN EXAMPLE IN A NATURAL LANGUAGE UNDERSTANDING APPLICATION | 06-05-2014 |
20140156266 | SYSTEM AND METHOD FOR ENHANCING COMPREHENSION AND READABILITY OF TEXT - The present invention is a text display system with speech output that uses a method of text segmentation in which segments of text are presented one after another for reading text sequentially. To indicate the location of text a user is currently reading, the current sentence is emphasized by presenting the surrounding text in faded colors. The current sentence is segmented into phrases where the points of segmentation are chosen by a series of grammatical rules and the desired number of words in each segment. When the text is presented sequentially, each segment is highlighted within the current sentence. With the use of a text-to-speech output system, each segment is spoken out with a pause before the next segment is presented. In a non-linear/selective reading scenario, a user can select a text segment, for which the span of the segment can be automatically generated or manually selected by the user. | 06-05-2014 |
20140163953 | Automatic Dynamic Contextual Data Entry Completion - A method preformed in a character entry system involves receiving user input and using a Generalized Lexicographic Ordering (GLO) process to determine an order for presentation of one or more completion candidates to a the user for selection. | 06-12-2014 |
20140163954 | COMMUNICATION CONTEXT BASED PREDICTIVE-TEXT SUGGESTION - Disclosed herein are representative embodiments of tools and techniques for determining predicted-text suggestions based on communication contexts. According to one exemplary technique, text that recurs in one or more past communications is determined. The one or more past communications being associated with at least one context attribute. Also, a text entry is stored in a text suggestion dictionary. The text entry comprising the text and metadata associating the text with the at least one context attribute. Additionally, using the text suggestion dictionary, at least one predicted-text suggestion that includes the text is determined for a current communication associated with a communication context. | 06-12-2014 |
20140163955 | System and Method For Extracting Ontological Information From A Body Of Text - A system for extracting ontological information from a body of text includes an input module configured to receive a verb phrase. The system also includes a parsing module configured to parse one or more sentences from the body of text into parse tree format to generate a set of parsed sentences. The system further includes a named-entity-recognition module configured to identify a subset of parsed sentences from the set of parsed sentences, identify a subset of noun phrases from the subset of parsed sentences, classify a first noun phrase in subset of noun phrases as an entity, and classify a second noun phrase in subset of noun phrases as a property. The system also includes a concept-extraction module configured to identify and output a conceptual relationship between the first entity and the first property based at least partially on grammatical relationship of the first entity and the first property. | 06-12-2014 |
20140163956 | MESSAGE COMPOSITION OF MEDIA PORTIONS IN ASSOCIATION WITH CORRELATED TEXT - Disclosed are systems, devices and techniques that generate a set of media clips associated with a set of text inputs for a message called a cinegram. A user, for example, provides a textual word or phrase, or a vocal word or phrase as input, in which an apparatus, in response to the input, associates a set of media clips with the input. The user can modify the sequence of the media clips, modify which media clips are associated with which word or phrase, and/or modify a set of classification criteria for modifying the type of media clips generated for a message. The set of media clips is presented in a defined order and formulates segments (e.g., words, phrases and the like) of the textual overlay in order to present a continuous stream of video comprising multiple different media clips associated with words and/or phrases of the message. | 06-12-2014 |
20140163957 | MULTIMEDIA MESSAGE HAVING PORTIONS OF MEDIA CONTENT BASED ON INTERPRETIVE MEANING - Disclosed are systems, devices and techniques that generate a set of media portions associated with a set of message inputs for a multimedia message based on a meaning of the words or phrases in the message inputs. A semantic component can determine segments of media content according to words or phrases that are different from words or phrases received. The semantic component is configured to determine portions of media content that convey the same meaning as the words or phrases received, and a multimedia message is then generated with the media content portions. | 06-12-2014 |
20140163958 | APPROXIMATE NAMED-ENTITY EXTRACTION - According to one embodiment, approximate named-entity extraction from a dictionary that includes entries is provided, where each of the entries includes one or more words. Words are read from the entries of the dictionary, and network resources are searched to determine a frequency of occurrence of the words on the network resources. In view of the frequency of occurrence of the words located on the network resources, domain relevancy of the words in the entries of the dictionary is determined. A domain repository is created using top-ranked words as determined by the domain relevancy of the words. In view of the domain repository, signatures for both the entries of the dictionary and strings of an input document are computed. The strings of the input document are filtered by comparing the signatures of the strings against the signatures of the entries to identify approximate-match entity names. | 06-12-2014 |
20140163959 | Multi-Domain Natural Language Processing Architecture - An arrangement and corresponding method are described for multi-domain natural language processing. Multiple parallel domain pipelines are used for processing a natural language input. Each domain pipeline represents a different specific subject domain of related concepts. Each domain pipeline includes a mention module that processes the natural language input using natural language understanding (NLU) to determine a corresponding list of mentions, and an interpretation generator that receives the list of mentions and produces a rank-ordered domain output set of sentence-level interpretation candidates. A global evidence ranker receives the domain output sets from the domain pipelines and produces an overall rank-ordered final output set of sentence-level interpretations. | 06-12-2014 |
20140163960 | REAL - TIME EMOTION TRACKING SYSTEM - Devices, systems, methods, media, and programs for detecting an emotional state change in an audio signal are provided. A plurality of segments of the audio signal is received, with the plurality of segments being sequential. Each segment of the plurality of segments is analyzed, and, for each segment, an emotional state and a confidence score of the emotional state are determined. The emotional state and the confidence score of each segment are sequentially analyzed, and a current emotional state of the audio signal is tracked throughout each of the plurality of segments. For each segment, it is determined whether the current emotional state of the audio signal changes to another emotional state based on the emotional state and the confidence score of the segment. | 06-12-2014 |
20140163961 | System and Method for Predicting Customer Satisfaction - A system includes a memory and a processor communicatively coupled to the memory. The memory stores interaction data associated with an interaction between a customer and an associate of an entity. The processor is operable to determine, from the interaction data, one or more keywords in the interaction between the customer and the associate, determine an order of the one or more keywords, and determine a grouping of the one or more keywords. The processor determines, based on the determined keywords, order, and grouping, a perception of the entity by the customer, the determination of the perception of the entity occurring in real-time after the interaction between the customer and the associate. | 06-12-2014 |
20140163962 | DEEP ANALYSIS OF NATURAL LANGUAGE QUESTIONS FOR QUESTION ANSWERING SYSTEM - Creating training data for a natural language processing system may comprise obtaining natural language input, the natural language input annotated with one or more important phrases; and generating training instances comprising a syntactic parse tree of nodes representing elements of the natural language input augmented with the annotated important phrases. In another aspect, a classifier may be trained based on the generated training instances. The classifier may be used to predict one or more potential important phrases in a query. | 06-12-2014 |
20140163963 | Methods and Systems for Automated Text Correction - The present embodiments demonstrate systems and methods for automated text correction. In certain embodiments, the methods and systems may be implemented through analysis according to a single text correction model. In a particular embodiment, the single text correction model may be generated through analysis of both a corpus of learner text and a corpus of non-learner text. | 06-12-2014 |
20140163964 | APPROXIMATE NAMED-ENTITY EXTRACTION - According to one embodiment, a method is provided for approximate named-entity extraction from a dictionary that includes entries, where each of the entries includes one or more words. Words are read from the entries of the dictionary, and network resources are searched to determine a frequency of occurrence of the words on the network resources. In view of the frequency of occurrence of the words located on the network resources, domain relevancy of the words in the entries of the dictionary is determined. A domain repository is created using top-ranked words as determined by the domain relevancy of the words. In view of the domain repository, signatures for both the entries of the dictionary and strings of an input document are computed. The strings of the input document are filtered by comparing the signatures of the strings against the signatures of the entries to identify approximate-match entity names. | 06-12-2014 |
20140163965 | Method of and System for Using Conversation State Information in a Conversational Interaction System - A method of using conversation state information in a conversational interaction system is disclosed. A method of inferring a change of a conversation session during continuous user interaction with an interactive content providing system includes receiving input from the user including linguistic elements intended by the user to identify an item, associating a linguistic element of the input with a first conversation session, and providing a response based on the input. The method also includes receiving additional input from the user and inferring whether or not the additional input from the user is related to the linguistic element associated with the conversation session. If related, the method provides a response based on the additional input and the linguistic element associated with the first conversation session. Otherwise, the method provides a response based on the second input without regard for the linguistic element associated with the first conversation session. | 06-12-2014 |
20140163966 | IDENTIFYING GLOSSARY TERMS FROM NATURAL LANGUAGE TEXT DOCUMENTS - A device may obtain text to be analyzed to identify glossary terms. The device may analyze a linguistic unit to generate multiple linguistic units related to the linguistic unit. The device may analyze the multiple linguistic units to generate potential glossary terms. The device may perform a glossary term analysis on the potential glossary terms to generate glossary terms that include a subset of the potential glossary terms. The device may identify included terms that are included in the glossary terms. The device may identify excluded terms that are excluded from the glossary terms. The device may determine a semantic relatedness score between at least one excluded term and at least one included term. The device may selectively add the excluded linguistic term to the glossary terms to form a final set of glossary terms based on the semantic relatedness score, and may output the final set of glossary terms. | 06-12-2014 |
20140163967 | VERIFYING THE TERMS OF USE FOR ACCESS TO A SERVICE - Provided are techniques in which a document accompanying a service is acquired, a natural language analysis is performed on the acquired document, a determination is made from the results of the natural language analysis whether an item defined in the access control policy is found in the acquired document and, when the item defined in the access control policy is found in the acquired document, the access control policy is referenced and access to the service controlled accordingly. | 06-12-2014 |
20140163968 | Phrase-Based Dialogue Modeling with Particular Application to creating Recognition Grammars for Voice-Controlled User Interfaces - The invention enables creation of grammar networks that can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces (NLVUI). More specifically, the invention concerns a phrase-based modeling of generic structures of verbal interaction and use of these models for the purpose of automating part of the design of such grammar networks. | 06-12-2014 |
20140163969 | METHOD AND SYSTEM FOR DIFFERENTIATING TEXTUAL INFORMATION EMBEDDED IN STREAMING NEWS VIDEO - The application provides a method and system for differentiating textual information embedded in a streaming news video. The application enables a method and system for differentiating textual information embedded in a streaming news video for simplified indexing and annotation of the said news video. | 06-12-2014 |
20140172412 | ACTION BROKER - Among other things, one or more techniques and/or systems are provided for building an action catalogue, generating an action frame for an action within the action catalogue, and/or executing an action. In an example, an action may be included within the action catalogue based upon descriptive text associated with an application indicating that the application is capable of performing the action (e.g., a movie app may be capable of performing an order movie tickets action). A parameter (e.g., a movie name) and/or an execution endpoint (e.g., a uniform resource identifier used to access movie ticket ordering functionality) may be used to generate an action frame for the action. In this way, user intent to perform an action may be identified from user input (e.g., a spoken command), and the action may be performed (e.g., on behalf of the user with minimal additional user input) by using the action frame. | 06-19-2014 |
20140172413 | SHORT PHRASE LANGUAGE IDENTIFICATION - A computer receives a short phrase. The short phrase is transmitted in a query to a search engine. The computer receives one or more search results from the search engine in response to the query, and parses one or more longer phrases that include the short phrase from each of the one or more search results. The computer transmits the one or more longer phrases to a language identification engine for identification of the language of the one or more longer phrases, and receives from the language identification engine the language of each of the one or more the longer phrases. The computer then determines the most likely language of the short phrase, based at least in part on the language of each of the one or more the longer phrases. | 06-19-2014 |
20140172414 | SYSTEM SUPPORT FOR EVALUATION CONSISTENCY - A system and computer product for validating the consistency between quantitative and natural language textual evaluations. An example method involves computing a numeric score for a textual evaluation, comparing the numeric score to a quantitative evaluation, and producing a rating based on the similarity of the two evaluations. | 06-19-2014 |
20140172415 | APPARATUS, SYSTEM, AND METHOD OF PROVIDING SENTIMENT ANALYSIS RESULT BASED ON TEXT - Disclosed are an apparatus, a system, and a method of providing a sentiment analysis result based on a text. An apparatus for providing a sentiment analysis result based on a text according to the present invention includes: an input unit configured to receive a keyword for a target for which a sentiment is desired to be analyzed from a user; a control unit configured to request a sentiment analysis for the received keyword to a service server and receive a sentiment analysis result as a result of the request; a display unit configured to display an attribute for the target according to the received sentiment analysis result, and display a text corresponding to an attribute value for each displayed attribute; and a storage unit configured to store the received sentiment analysis result. | 06-19-2014 |
20140172416 | SYSTEM SUPPORT FOR EVALUATION CONSISTENCY - A system and computer product for validating the consistency between quantitative and natural language textual evaluations. An example method involves computing a numeric score for a textual evaluation, comparing the numeric score to a quantitative evaluation, and producing a rating based on the similarity of the two evaluations. | 06-19-2014 |
20140172417 | VITAL TEXT ANALYTICS SYSTEM FOR THE ENHANCEMENT OF REQUIREMENTS ENGINEERING DOCUMENTS AND OTHER DOCUMENTS - A Vital Text Analytics System (VTAS), incorporating a repository of enterprise terms or concepts, is one that improves the readability and fidelity of technical specifications, instructions, training manuals requirements engineering documents and other related engineering documents, typically from a single organization or workgroup. The system stresses ontological analysis of a corpus of related documents, and applies a suite of computational tools that supports the identification and assessment of risk in evaluating the content of the documents, as well as providing statistical measures reflecting the frequency and severity of document features that threaten comprehension. | 06-19-2014 |
20140180672 | METHOD AND APPARATUS FOR CONDUCTING CONTEXT SENSITIVE SEARCH WITH INTELLIGENT USER INTERACTION FROM WITHIN A MEDIA EXPERIENCE - In some embodiments, the invention involves context based search engine using a user selected term within a media experience. A natural language processor module is configured to provide context based keywords related to the search term and from within the media experience. In some embodiments, a proximity based statistical analysis is used to derive the keywords. The keywords are provided to at least one content browser or other search engine(s) to effect the search. In some embodiments, a machine learning module is communicatively coupled to the natural language processer to further refine the context for selecting relevant keywords. The context search engine, natural language processor module, machine learning module and search engine may reside on the same computing device or be distributed among a variety of local, remote and cloud devices for processing. Other embodiments are described and claimed. | 06-26-2014 |
20140180673 | Audio Processing Techniques for Semantic Audio Recognition and Report Generation - System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and/or values for certain features and are tagged for specific ranges and/or values. Extracted audio features that are most similar to one or more templates from the comparison are identified according to the tagged information. The tags are used to determine the semantic audio data that includes genre, instrumentation, style, acoustical dynamics, and emotive descriptor for the audio signal. | 06-26-2014 |
20140180674 | AUDIO MATCHING WITH SEMANTIC AUDIO RECOGNITION AND REPORT GENERATION - System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and/or values for certain features and are tagged for specific ranges and/or values. The semantic information may be associated with audio signature data | 06-26-2014 |
20140180675 | Audio Decoding with Supplemental Semantic Audio Recognition and Report Generation - System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and/or values for certain features and are tagged for specific ranges and/or values. The semantic information may be associated with audio codes to determine changing characteristics of identified media during a time period. | 06-26-2014 |
20140180676 | NAMED ENTITY VARIATIONS FOR MULTIMODAL UNDERSTANDING SYSTEMS - Click logs are automatically mined to assist in discovering candidate variations for named entities. The named entities may be obtained from one or more sources and include an initial list of named entities. A search may be performed within one or more search engines to determine common phrases that are used to identify the named entity in addition to the named entity initially included in the named entity list. Click logs associated with results of past searches are automatically mined to discover what phrases determined from the searches are candidate variations for the named entity. The candidate variations are scored to assist in determining the variations to include within an understanding model. The variations may also be used when delivering responses and displayed output in the SLU system. For example, instead of using the listed named entity, a popular and/or shortened name may be used by the system. | 06-26-2014 |
20140180677 | Analogy Finder - A user provides a query that includes at least two of a subject, a predicate, and an object. A computer system identifies synonyms of one or more of the subject, predicate, and object, and forms new queries from the identified synonyms. The system searches a dataset using the new queries, and possibly also using the user-provided query, to produce search results. The system may process the search results, such as by filtering and/or sorting them. The system provides output representing the search results to the user. The user may use the search result output to identify answers that are analogous to answers to the query originally provided by the user. | 06-26-2014 |
20140188456 | Dictionary Markup System and Method - A method for providing the appropriate meaning of an entry in a text is described. The method includes the steps of determining if there are alternative meanings of the entry in an electronic dictionary and if there are alternative meanings determining the dictionary markup theme associated with each of the alternative meanings of the entry. Also, the theme associated with the text is determined. For a hierarchical structure associated with themes of entries in the electronic dictionary, the distance between the theme of the text with the dictionary markup theme of the alternative meanings of the entry is compared. Based on the distance between the theme of the text and the dictionary markup theme of the alternative meanings of the entry, the appropriate meaning is selected. | 07-03-2014 |
20140188457 | REAL-TIME SENTIMENT ANALYSIS FOR SYNCHRONOUS COMMUNICATION - A lexical annotator that identifies a chunk of a communication and an associated sentiment is created. In real time, while monitoring a communication from a user, the lexical annotator is used to identify the sentiment for the chunk of the communication, and the sentiment for the chunk of the communication is provided. | 07-03-2014 |
20140188458 | SYSTEM AND METHOD FOR DATA ENTRY BY ASSOCIATING STRUCTURED TEXTUAL CONTEXT TO IMAGES - A system for data entry by associating structured textual context to images comprising a computer apparatus having a display device to facilitate interaction with a user. The system has electronics, data, tokens, grammatical rules, and an interface. The data is comprised of records, images, and template images, the template images having hotspots and sub-template images. The hotspots have a selection, and the sub-template images have sub-template hotspots, with the sub-template hotspots having a selection. The selections are preferably associated with diagnostic images, of which there are both general and template-specific. | 07-03-2014 |
20140188459 | INTERACTIVE DASHBOARD BASED ON REAL-TIME SENTIMENT ANALYSIS FOR SYNCHRONOUS COMMUNICATION - Provided is a technique for providing an interactive dashboard based on real-time sentiment analysis for synchronous communication. In real time, while monitoring communications between a first user and a second user, a cumulative sentiment score is generated representing a sentiment of the first user during a period of time and an instantaneous sentiment score representing a sentiment of the first user at a current time. In real time, an interactive dashboard is displayed with a visual representation for the first user having a first indicator representing the cumulative sentiment score and a second indicator representing the instantaneous sentiment score. | 07-03-2014 |
20140188460 | FEATURE-BASED AUTOCORRECTION - A computing device is described that outputs for display at a presence-sensitive screen, a graphical keyboard having keys. The computing device receives an indication of a selection of one or more of the keys. Based on the selection the computing device determines a character string from which the computing device determines one or more candidate words. Based at least in part on the candidate words and a plurality of features, the computing device determines a spelling probability that the character string represents an incorrect spelling of at least one candidate word. The plurality of features includes a spatial model probability associated with at least one of the candidate words. If the spelling probability satisfies a threshold, the computing device outputs for display the at least one candidate word. | 07-03-2014 |
20140188461 | OPTIMIZED CLOUD COMPUTING FACT CHECKING - A fact checking system is able to verify the correctness of information and/or characterize information by comparing the information with one or more sources. The fact checking system automatically monitors, processes, fact checks information and indicates a status of the information. Fact checking results are able to be validated by re-fact checking the fact check results. | 07-03-2014 |
20140188462 | Methods and Systems for Applications for Z-numbers - Specification covers new algorithms, methods, and systems for artificial intelligence, soft computing, and deep learning/recognition, e.g., image recognition (e.g., for action, gesture, emotion, expression, biometrics, fingerprint, facial, OCR (text), background, relationship, position, pattern, and object), large number of images (“Big Data”) analytics, machine learning, training schemes, crowd-sourcing (using experts or humans), feature space, clustering, classification, similarity measures, optimization, search engine, ranking, question-answering system, soft (fuzzy or unsharp) boundaries/impreciseness/ambiguities/fuzziness in language, Natural Language Processing (NLP), Computing-with-Words (CWW), parsing, machine translation, sound and speech recognition, video search and analysis (e.g. tracking), image annotation, geometrical abstraction, image correction, semantic web, context analysis, data reliability (e.g., using Z-number (e.g., “About 45 minutes; Very sure”)), rules engine, control system, autonomous vehicle, self-diagnosis and self-repair robots, system diagnosis, medical diagnosis, biomedicine, data mining, event prediction, financial forecasting, economics, risk assessment, e-mail management, database management, indexing and join operation, memory management, and data compression. | 07-03-2014 |
20140195221 | Utilizing semantic analysis to determine how to measure affective response - A semantic analyzer receives a segment of content, analyzes it utilizing semantic analysis, and outputs an indication regarding whether a value related to a predicted emotional response to the segment reaches a predetermined threshold. Based on the indication, a controller selects a measuring rate, from amongst at least first and second measuring rates, at which a device is to take measurements of affective response of a user to the segment. The first rate may be selected when the value does not reach the predetermined threshold, while the second mode may be selected when the value does reach it. The device takes significantly fewer measurements while operating at the first measuring rate, compared to number of measurements it takes while operating at the second measuring rate. | 07-10-2014 |
20140200879 | Method and System for Rating Food Items - A networked computer system for calculating a rating of a food. The computer system includes a database comprising a plurality of records and a personal computer system. Each record in the database includes a natural language description of a respective food and a rating of the respective food. The database further comprising an index comprising indexed natural language descriptions corresponding to the natural language descriptions in the records. The personal computer system includes an input module configured for receiving a request to calculate a rating for a requested food item and for extracting inputted natural language descriptions of the requested food item in the request. Also included is an input processing module configured for identifying tokens in the inputted natural language descriptions, a query module configured for querying the database; and a rating module configured for calculating the rating for the requested food item based on the query result. | 07-17-2014 |
20140200880 | Patent Analyzing System - A patent analyzing system for efficiently reviewing and analyzing a patent document (e.g. patent application, published patent document or patent). The patent analyzing system includes providing a patent document, wherein said patent document includes text data having a claims section, identifying a first element name within a first claim in said claims section, and emphasizing said first element name within said first claim. | 07-17-2014 |
20140207440 | LANGUAGE RECOGNITION BASED ON VOCABULARY LISTS - A method is implemented at a computer to determine that certain information content is composed or compiled in a specific language selected among two or more similar languages. The computer integrates a first vocabulary list of a first language and a second vocabulary list of a second language into a comprehensive vocabulary list. The integrating includes analyzing the first vocabulary list in view of the second vocabulary list to identify a first vocabulary sub-list that is used in the first language, but not in the second language. The computer then identifies, in the information content, a plurality of expressions that are included in the comprehensive vocabulary list, and a subset of expressions that are included in the first vocabulary sub-list. Upon a determination that a total frequency of occurrence of the subset of expressions meets predetermined occurrence criteria, the computer determines that the information content is composed in the first language. | 07-24-2014 |
20140207441 | Semantic Clustering And User Interfaces - Semantic clustering techniques are described. In various implementations, a conversational agent is configured to perform semantic clustering of a corpus of user utterances. Semantic clustering may be used to provide a variety of functionality, such as to group a corpus of utterances into semantic clusters in which each cluster pertains to a similar topic. These clusters may then be leveraged to identify topics and assess their relative importance, as for example to prioritize topics whose handling by the conversation agent should be improved. A variety of utterances may be processed using these techniques, such as spoken words, textual descriptions entered via live chat, instant messaging, a website interface, email, SMS, a social network, a blogging or micro-blogging interface, and so on. | 07-24-2014 |
20140214402 | IMPLEMENTATION OF UNSUPERVISED TOPIC SEGMENTATION IN A DATA COMMUNICATIONS ENVIRONMENT - A method is provided in one example embodiment and includes extracting sentences from data, which comprises a speech transcript; tokenizing the plurality of sentences to develop for each of the plurality of sentences a sentence vector and at least one feature vector; and performing topic segmentation on the speech transcript using the sentence vectors and feature vectors, the topic segmentation resulting in a listing of segments corresponding to the speech transcript. In certain embodiments, the feature vector may be at least one of a cue word feature vector, a speaker change feature vector, and a scene change feature vector. | 07-31-2014 |
20140214403 | SYSTEM AND METHOD FOR IMPROVING VOICE COMMUNICATION OVER A NETWORK - Systems and methods for improving communication over a network are provided. A system for improving communication over a network, comprises a detection module capable of detecting data indicating a problem with a communication between at least two participants communicating via communication devices over the network, a management module capable of analyzing the data to determine whether a participant is dissatisfied with the communication, wherein the management module includes a determining module capable of determining that the participant is dissatisfied, and identifying an event causing the dissatisfaction, and a resolution module capable of providing a solution for eliminating the problem. | 07-31-2014 |
20140214404 | IDENTIFYING TASKS AND COMMITMENTS - An example of identifying tasks and commitments can include receiving a communication message. A task and a parameter can be identified in the communication message. Information related to the task can be extracted from the communication message using natural language processing (NLP) and machine learning (ML). A commitment related to the task can be identified using NLP extracted information. A state of the commitment can be identified using NLP and ML based on the extracted information. | 07-31-2014 |
20140214405 | CHARACTER AND WORD LEVEL LANGUAGE MODELS FOR OUT-OF-VOCABULARY TEXT INPUT - A computing device determines, based at least in part on indications of user input, scores for a first set of candidate strings and a second set of candidate strings. Each candidate string from the first set of candidate strings is in a lexicon. Candidate strings from the second set of candidate strings are not necessarily in the lexicon. The computing device determines the scores for the first set of candidate strings based on probabilities of the candidate strings being entered. For each candidate string from the second set of candidate strings, the computing device determines the scores for the candidate string based on probabilities of characters of the candidate string being entered. The computing device selects a candidate string based on the scores for the first and second sets of candidate strings and outputs, for display at the display device, the selected candidate string. | 07-31-2014 |
20140214406 | METHOD AND SYSTEM OF ADDING PUNCTUATION AND ESTABLISHING LANGUAGE MODEL - A method of processing information content based on a language model is performed at a computer, the method including the following steps: identifying a plurality of expressions in the information content that is queued to be processed; dividing the plurality of expressions into a plurality of characteristic units according to semantic features and predetermined characteristics associated with each of the plurality of characteristic units, each characteristic unit including a subset of the plurality of expressions and the predetermined characteristics at least including a respective integer number of expressions that are included in the characteristic unit; extracting, from the language model, a plurality of probabilities for a plurality of punctuation marks associated with each of the plurality of characteristic units; and in accordance with the extracted probabilities, associating a respective punctuation mark with each of the plurality of characteristic units included in the information content. | 07-31-2014 |
20140214407 | SYSTEM AND METHOD FOR KEYWORD SPOTTING USING REPRESENTATIVE DICTIONARY - Methods and systems for keyword spotting, i.e., for identifying textual phrases of interest in input data. In the embodiments described herein, the input data comprises communication packets exchanged in a communication network. The disclosed keyword spotting techniques can be used, for example, in applications such as Data Leakage Prevention (DLP), Intrusion Detection Systems (IDS) or Intrusion Prevention Systems (IPS), and spam e-mail detection. A keyword spotting system holds a dictionary of textual phrases for searching input data. In a communication analytics system, for example, the dictionary defines textual phrases to be located in communication packets—such as e-mail addresses or Uniform Resource Locators (URLs). | 07-31-2014 |
20140214408 | SENTIMENT ANALYSIS BASED ON DEMOGRAPHIC ANALYSIS - A method, apparatus and article of manufacture for analyzing product or service reviews is disclosed. In one embodiment, the method comprises the steps of performing a demographic text analysis on a product or service review generated by a reviewer, wherein the demographic text analysis examines the product or service review to determine demographic information of the reviewer. A sentiment text analysis is performed on the product or service review, wherein the sentiment text analysis examines the product or service review to determine a sentiment of the product or service review. The sentiment of the product or service review is categorized based on the demographic information of the reviewer. | 07-31-2014 |
20140214409 | Systems and Methods for Identifying and Suggesting Emoticons - Various embodiments provide a method that comprises receiving a set of segments from a text field, analyzing the set of segments to determine at least one of a target subtext or a target meaning associated with the set of segments, and identifying a set of candidate emoticons where each candidate emoticon in the set of candidate emoticons has an association between the candidate emoticon and at least one of the target subtext or the target meaning. The method may further comprise presenting the set of candidate emoticons for entry selection at a current position of an input cursor, receiving an entry selection for a set of selected emoticons from the set of candidate emoticons, and inserting the set of selected emoticons into the text field at the current position of the input cursor. | 07-31-2014 |
20140222416 | SEARCHING AND MATCHING OF DATA - Described herein is a technology for facilitating searching and matching of data. In accordance with one implementation, first and second feature sets are extracted. The first feature set is associated with an input data string including one or more first ideographic elements, while the second feature set is associated with a candidate string including one or more second ideographic elements. A match score of the candidate string is determined based on the first and second feature sets. | 08-07-2014 |
20140222417 | METHOD AND DEVICE FOR ACOUSTIC LANGUAGE MODEL TRAINING - A method and a device for training an acoustic language model, include: conducting word segmentation for training samples in a training corpus using an initial language model containing no word class labels, to obtain initial word segmentation data containing no word class labels; performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels; using the first word segmentation data containing word class labels to train a first language model containing word class labels; using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels; and in accordance with the second word segmentation data meeting one or more predetermined criteria, using the second word segmentation data containing word class labels to train the acoustic language model. | 08-07-2014 |
20140236569 | Disambiguation of Dependent Referring Expression in Natural Language Processing - A system, and computer program product for disambiguation of dependent referring expression in natural language processing are provided in the illustrative embodiments. A portion of a document in a set of document is selected, the portion including a set of dependent referring expression instances. The portion is filtered to identify an instance from a set of dependent referring expression instances by using a linguistic characteristic of the instance, the instance of dependent referring expression referring to a full expression occurring in the set of documents. The full expression is located in one member document in the set of documents by locating where the dependent referring expression is defined to be a stand-in for the full expression. The instance is resolved using the full expression such that information about the full expression is available at a location of the instance. | 08-21-2014 |
20140236570 | EXPLOITING THE SEMANTIC WEB FOR UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING - An unsupervised training approach for Spoken Language Understanding (SLU) systems uses the structure of content sources (e.g. semantic knowledge graphs, relational databases, . . . ) to automatically specify a semantic representation for SLU. The semantic representation is used when creating entity-relation patterns that are used to mine natural language (NL) examples (e.g. NL surface forms from the web and search query click logs). The structure of the content source (e.g. semantic graph) is enriched with the mined NL examples. The NL examples and patterns may be used to automatically train SLU systems in an unsupervised manner that covers the knowledge represented in the structured content. | 08-21-2014 |
20140236571 | Inducing and Applying a Subject-Targeted Context Free Grammar - A processing system is described which induces a context free grammar (CFG) based on a set of descriptions. The descriptions pertain to a particular subject. Thus, the CFG targets the particular subject, and is accordingly referred to as a subject-targeted context free grammar (ST-CFG). The processing system can use the ST-CFG to determine whether a new description is a proper description of the subject. The processing system also provides synthesizing functionality for building an ST-CFG based on one or more smaller component ST-CFGs. | 08-21-2014 |
20140236572 | System Apparatus Circuit Method and Associated Computer Executable Code for Natural Language Understanding and Semantic Content Discovery - Disclosed are systems, apparatuses, circuits and methods for extrapolating meaning from vocalized speech or otherwise obtained text. Speech of a speaking user is sampled and digitized, the digitized speech is converted into a text stream, the text stream derived from speech or otherwise obtained is analyzed syntactically and semantically, a knowledgebase in the specific context domain of the text stream is utilized to construct one or more semantic/syntactic domain specific query analysis constrains/rule-sets, and a “Domain Specific Knowledgebase Query” (DSKQ) or set of queries is built at least partially based on the domain specific query analysis constrains/rule-sets. | 08-21-2014 |
20140236573 | Automatic Semantic Rating and Abstraction of Literature - Deep semantic analysis is performed on an electronic literary work in order to detect plot elements and optional other storyline elements such as characters within the work. Multiple levels of abstract are generated into a model representing the literary work, wherein each element in each abstraction level may be independently rated for preference by a user. Through comparison of multiple abstraction models and one or more user rating preferences, one or more alternative literary works may be automatically recommended to the user. | 08-21-2014 |
20140236574 | AUTOMATED CONTEXT-BASED UNIQUE LETTER GENERATION - The automated generation of a unique letter or unique letters using one or more context variables for the letter. The contextual variables may represent author characteristics, audience characteristics, tone, word diversification, letter type, and so forth. Different entropy may be used for each letter to thereby generate a unique letter even if the context for the letters is the same. Nevertheless, each unique letter is suitable for the given context. If desired, the automatically generated letter may be further edited, for example, for grammatical, word choice, or legal content. Thus, the letter may appear to be custom drafted by a human for the context, whereas the letter was entirely or substantially computer-generated. | 08-21-2014 |
20140236575 | EXPLOITING THE SEMANTIC WEB FOR UNSUPERVISED NATURAL LANGUAGE SEMANTIC PARSING - Structured web pages are accessed and parsed to obtain implicit annotation for natural language understanding tasks. Search queries that hit these structured web pages are automatically mined for information that is used to semantically annotate the queries. The automatically annotated queries may be used for automatically building statistical unsupervised slot filling models without using a semantic annotation guideline. For example, tags that are located on a structured web page that are associated with the search query may be used to annotate the query. The mined search queries may be filtered to create a set of queries that is in a form of a natural language query and/or remove queries that are difficult to parse. A natural language model may be trained using the resulting mined queries. Some queries may be set aside for testing and the model may be adapted using in-domain sentences that are not annotated. The models may be tested using these implicitly annotated natural-language-like queries in an unsupervised fashion. | 08-21-2014 |
20140236576 | Automatic Text Skimming Using Lexical Chains - Automatic text skimming using lexical chains may be provided. First, at least one lexical chain may be created from an electronic document. Next, a list of positions within the electronic document may be created. The positions may include where at least one concept represented by one of the at least one lexical chain is mentioned. In addition, a list of the position where the at least one concept is mentioned may be assembled. A selection of at least one concept may be received from the list. | 08-21-2014 |
20140236577 | Semantic Representations of Rare Words in a Neural Probabilistic Language Model - Systems and methods are disclosed for representing a word by extracting n-dimensions for the word from an original language model; if the word has been previously processed, use values previously chosen to define an (n+m) dimensional vector and otherwise randomly selecting m values to define the (n+m) dimensional vector; and applying the (n+m) dimensional vector to represent words that are not well-represented in the language model. | 08-21-2014 |
20140236578 | Question-Answering by Recursive Parse Tree Descent - Systems and methods are disclosed to answer free form questions using recursive neural network (RNN) by defining feature representations at every node of a parse trees of questions and supporting sentences, when applied recursively, starting with token vectors from a neural probabilistic language model; and extracting answers to arbitrary natural language questions from supporting sentences. | 08-21-2014 |
20140236579 | Method and Device for Performing Natural Language Searches - A digital device and a method for parsing a query, in particular a natural language query, and retrieving results from possibly multiple data sources such as relational databases or the Semantic Web. The method includes a parsing procedure for generating a graph-based logical representation of the query using semantically structured resources, consisting of a tokenizer, a node generator, a relationship generator, and a focus identificator. The digital device realizes a modularized architecture, consisting of a parser enabling the processing of a query with possibly multiple vocabularies, a query performer retrieving data of knowledge sources independently from their database management system, and a result processor merging the results. | 08-21-2014 |
20140236580 | TEXT SEGMENTATION AND LABEL ASSIGNMENT WITH USER INTERACTION BY MEANS OF TOPIC SPECIFIC LANGUAGE MODELS, AND TOPIC-SPECIFIC LABEL STATISTICS - The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document | 08-21-2014 |
20140244239 | IDENTIFYING WORDS FOR A CONTEXT - For identifying words for a context, a monitor module monitors first communications at a digital processing system and determines usage frequencies of a plurality of words in one or more contexts. An identity module identifies a first word in response to a usage frequency for the first word exceeding a use threshold. | 08-28-2014 |
20140244240 | Determining Explanatoriness of a Segment - A technique may include generating a segment from a sentence using a probabilistic model or structure. The probabilistic model/structure may be based on a Hidden Markov Model (HMM). The technique may further include determining an explanatoriness score of the segment using the probabilistic model/structure. | 08-28-2014 |
20140244241 | AUTOMATED CLASSIFICATION OF BUSINESS RULES FROM TEXT - The present subject matter relates to an automated classification of business rules. In one embodiment, a method for automated classification of the business rules comprises identifying a business rule from a text document, wherein the business rule comprises one or more rule intents. Further, the method comprises comparing the one or more rule intents in the business rule with rule intents associated with a plurality of rule types in a rule repository. Furthermore, the method comprises classifying the business rule under at least one of the rule types based on the comparison. | 08-28-2014 |
20140244242 | CONTEXT BASED DOCUMENT ANALYSIS - A method, computer program product, and computer system for identifying, by a computing device, content in a document, wherein the content includes a language expression. A context of the language expression is determined from a defined range of the content in the document. An action item associated with the language expression is generated based upon, at least in part, the context of the language expression. | 08-28-2014 |
20140244243 | APPARATUS AND METHOD FOR PROVIDING INPUT PREDICTION SERVICE - A method for providing an input prediction service in a first mobile terminal, the method includes receiving a message, from a second mobile terminal, at the first mobile terminal using a first application program, extracting specific text included in the received message, and identifying prediction information based on the extracted text. An apparatus to provide an input prediction service in a first mobile terminal, the apparatus includes a first application program to receive a message from a second mobile terminal, and a recognition unit to extract specific text included in the receive message and to identify prediction information based on the extracted text. | 08-28-2014 |
20140249799 | RELATIONAL SIMILARITY MEASUREMENT - Relational similarity measuring embodiments are presented that generally involve creating a relational similarity model that, given two pairs of words, is used to measure a degree of relational similarity between the two relations respectively exhibited by these word pairs. In one exemplary embodiment this involves creating a combined relational similarity model from a plurality of relational similarity models. This is generally accomplished by first selecting a plurality of relational similarity models, each of which measures relational similarity between two pairs of words, and each of which is trained or created using a different method or linguistic/textual resource. The selected models are then combined to form the combined relational similarity model. The combined model inputs two pairs of words and outputs a relational similarity indicator representing a measure the degree of relational similarity between the word pairs. | 09-04-2014 |
20140249800 | LANGUAGE PROCESSING METHOD AND ELECTRONIC DEVICE - A language processing method is provided comprising forming a feature from at least one word from an input sequence of words; generating an address of a memory cell storing a weight for the feature based on a hash function using the feature as argument; retrieving the weight for the feature from the memory cell with the address; and generating a dependency tree for the input sequence based on the weight and a second order dependency parsing algorithm. A corresponding electronic device is provided as well. | 09-04-2014 |
20140249801 | SYSTEMS AND METHODS FOR IMPROVING THE EFFICIENCY OF SYNTACTIC AND SEMANTIC ANALYSIS IN AUTOMATED PROCESSES FOR NATURAL LANGUAGE UNDERSTANDING - A natural language understanding system may be given the capability to construct a semantically detailed parse tree for each acceptable interpretation of an input natural language expression (or fewer such parse trees than interpretations) by independently solving sub-trees corresponding to various series of post nominal modifiers and associating those partial solutions with corresponding nodes in the overall parse tree. The argument order in predicate calculus atomic formulas may be standardized in a manner that supports the use of a chart parser applied to a head-driven phase structure grammar and that permits a simplified more tractable grammar that in turn can be used as a domain general semantic grammar. | 09-04-2014 |
20140249802 | SYSTEMS AND METHODS FOR IMPROVING THE EFFICIENCY OF SYNTACTIC AND SEMANTIC ANALYSIS IN AUTOMATED PROCESSES FOR NATURAL LANGUAGE UNDERSTANDING USING ARGUMENT ORDERING - A natural language understanding system may be given the capability to construct a semantically detailed parse tree for each acceptable interpretation of an input natural language expression (or fewer such parse trees than interpretations) by independently solving sub-trees corresponding to various series of post nominal modifiers and associating those partial solutions with corresponding nodes in the overall parse tree. The argument order in predicate calculus atomic formulas may be standardized in a manner that supports the use of a chart parser applied to a head-driven phase structure grammar and that permits a simplified more tractable grammar that in turn can be used as a domain general semantic grammar. | 09-04-2014 |
20140249803 | SYSTEMS AND METHODS FOR IMPROVING THE EFFICIENCY OF SYNTACTIC AND SEMANTIC ANALYSIS IN AUTOMATED PROCESSES FOR NATURAL LANGUAGE UNDERSTANDING USING TRAVELING FEATURES - A natural language understanding system may be given the capability to construct a semantically detailed parse tree for each acceptable interpretation of an input natural language expression (or fewer such parse trees than interpretations) by independently solving sub-trees corresponding to various series of post nominal modifiers and associating those partial solutions with corresponding nodes in the overall parse tree. The argument order in predicate calculus atomic formulas may be standardized in a manner that supports the use of a chart parser applied to a head-driven phase structure grammar and that permits a simplified more tractable grammar that in turn can be used as a domain general semantic grammar. | 09-04-2014 |
20140249804 | SYSTEMS AND METHODS FOR IMPROVING THE EFFICIENCY OF SYNTACTIC AND SEMANTIC ANALYSIS IN AUTOMATED PROCESSES FOR NATURAL LANGUAGE UNDERSTANDING USING GENERAL COMPOSITION - A natural language understanding system may be given the capability to construct a semantically detailed parse tree for each acceptable interpretation of an input natural language expression (or fewer such parse trees than interpretations) by independently solving sub-trees corresponding to various series of post nominal modifiers and associating those partial solutions with corresponding nodes in the overall parse tree. The argument order in predicate calculus atomic formulas may be standardized in a manner that supports the use of a chart parser applied to a head-driven phase structure grammar and that permits a simplified more tractable grammar that in turn can be used as a domain general semantic grammar. | 09-04-2014 |
20140257791 | APPARATUS AND METHOD FOR AUTO-GENERATION OF JOURNAL ENTRIES - Various aspects of an apparatus and method for auto-generation of journal entries may include an electronic device. The electronic device receives information associated with a user from one or more sources. The electronic device analyzes the received information to determine information to be included in the journal entry. The electronic device determines a writing style of the user based on the received information. The electronic device generates one or more sentences for the journal entry based on the determined journal information, the determined writing style of the user, and one or more pre-determined parameters associated with the user. | 09-11-2014 |
20140257792 | Anaphora Resolution Using Linguisitic Cues, Dialogue Context, and General Knowledge - An automatic conversational system has multiple computer-implemented dialogue components for conducting an automated dialogue process with a human user. A user client delivers dialogue output prompts to the human user and receives dialogue input responses from the human user including speech inputs. An automatic speech recognition engine processes the speech inputs to determine corresponding sequences of representative text words. A natural language understanding (NLU) processing arrangement processes the dialogue input responses and the text words to determine corresponding semantic interpretations. The NLU processing arrangement includes an anaphora processor that accesses different information sources characterizing dialogue context, linguistic features, and NLU features to identify unresolved anaphora in the text words needing resolution in order to determine a semantic interpretation. A dialogue manager manages the dialogue process with the human user based on the semantic interpretations. | 09-11-2014 |
20140257793 | Communicating Context Across Different Components of Multi-Modal Dialog Applications - A human-machine dialogue system is described which has multiple computer-implemented dialogue components. A user client delivers output prompts to a human user and receives dialogue inputs including speech inputs from the human user. An automatic speech recognition (ASR) engine processes the speech inputs to determine corresponding sequences of representative text words. A natural language understanding (NLU) engine processes the text words to determine corresponding semantic interpretations. A dialogue manager (DM) generates the output prompts and responds to the semantic interpretations so as to manage a dialogue process with the human user. The dialogue components share context information with each other using a common context sharing mechanism such that the operation of each dialogue component reflects available context information. | 09-11-2014 |
20140257794 | Semantic Re-Ranking of NLU Results in Conversational Dialogue Applications - A human-machine dialogue system is described which has multiple computer-implemented dialogue components. A user client delivers output prompts to a human user and receives dialogue inputs from the human user including speech inputs. An automatic speech recognition (ASR) engine processes the speech inputs to determine corresponding sequences of representative text words. A natural language understanding (NLU) engine processes the text words to determine corresponding NLU-ranked semantic interpretations. A semantic re-ranking module re-ranks the NLU-ranked semantic interpretations based on at least one of dialogue context information and world knowledge information. A dialogue manager responds to the re-ranked semantic interpretations and generates the output prompts so as to manage a dialogue process with the human user. | 09-11-2014 |
20140257795 | Linguistic Expression of Preferences in Social Media for Prediction and Recommendation - Disclosed herein are systems, methods and computer readable storage media for determining tags or labels from natural language expressions expressing a preference or choice, determining attributes from natural language expressions and other data, and predicting preferences from natural language expressions and other data. | 09-11-2014 |
20140257796 | SIGNAL PROCESSING APPROACH TO SENTIMENT ANALYSIS FOR ENTITIES IN DOCUMENTS - A document can be processed to provide sentiment values for phrases in the document. The sequence of sentiment values associated with the sequence of phrases in a document can be handled as if they were a sampled discrete time signal. For phrases which have been identified as entities, a filtering operation can be applied to the sequence of sentiment values around each entity to determine a sentiment value for the entity. | 09-11-2014 |
20140257797 | METHODS AND SYSTEMS FOR A LOCALLY AND TEMPORALLY ADAPTIVE TEXT PREDICTION - An electronic device is provided, having a locally and temporally adaptive prediction database. | 09-11-2014 |
20140278351 | DETECTING AND EXECUTING DATA RE-INGESTION TO IMPROVE ACCURACY IN A NLP SYSTEM - In some NLP systems, queries are compared to different data sources stored in a corpus to provide an answer to the query. However, the best data sources for answering the query may not currently be contained within the corpus or the data sources in the corpus may contain stale data that provides an inaccurate answer. When receiving a query, the NLP system may evaluate the query to identify a data source that is likely to contain an answer to the query. If the data source is not currently contained within the corpus, the NLP system may ingest the data source. If the data source is already within the corpus, however, the NLP may determine a time-sensitivity value associated with at least some portion of the query. This value may then be used to determine whether the data source should be re-ingested—e.g., the information contained in the corpus is stale. | 09-18-2014 |
20140278352 | IDENTIFYING A STALE DATA SOURCE TO IMPROVE NLP ACCURACY - In some NLP systems, queries are compared to different data sources stored in a corpus to provide an answer to the query. However, the best data sources for answering the query may not currently be contained within the corpus or the data sources in the corpus may contain stale data that provides an inaccurate answer. When receiving a query, the NLP system may evaluate the query to identify a data source that is likely to contain an answer to the query. If the data source is not currently contained within the corpus, the NLP system may ingest the data source. If the data source is already within the corpus, however, the NLP may determine a time-sensitivity value associated with at least some portion of the query. This value may then be used to determine whether the data source should be re-ingested—e.g., the information contained in the corpus is stale. | 09-18-2014 |
20140278353 | Systems and Methods for Language Classification - Systems and methods are provided for classifying text based on language using one or more computer servers and storage devices. In general, the systems and methods can include a language classification module for classifying text of an input data set using the output of a training module. In an exemplary embodiment, a bootstrapping step feeds the output of the language classification module back into the training module to increase the accuracy of the language classification module. By iterating the language classification and training modules with input data having certain features, a user can tailor the language classification module for use with text having those or similar features. | 09-18-2014 |
20140278354 | IDENTIFYING CORRESPONDING POSITIONS IN DIFFERENT REPRESENTATIONS OF A TEXTUAL WORK - Described herein are techniques for determining corresponding positions between different representations of a textual work. In some of the techniques, portions of one or more representations may be processed. A determination of a corresponding position may be made in response to a request received from a user, such as a reader that desires to switch between representations. The request may indicate a position in one representation and the representation to which the user would like to switch. In response to receiving the request, one or more portions of one or more representations of a textual work may be processed. In some techniques, a corresponding position between different representations may be determined without processing the entirety of one or more representations of the textual work. For example, a corresponding position may be determined without processing an entire audio representation. | 09-18-2014 |
20140278355 | USING HUMAN PERCEPTION IN BUILDING LANGUAGE UNDERSTANDING MODELS - An understanding model is trained to account for human perception of the perceived relative importance of different tagged items (e.g. slot/intent/domain). Instead of treating each tagged item as equally important, human perception is used to adjust the training of the understanding model by associating a perceived weight with each of the different predicted items. The relative perceptual importance of the different items may be modeled using different methods (e.g. as a simple weight vector, a model trained using features (lexical, knowledge, slot type, . . . ), and the like). The perceptual weight vector and/or or model are incorporated into the understanding model training process where items that are perceptually more important are weighted more heavily as compared to the items that are determined by human perception as less important. | 09-18-2014 |
20140278356 | SMART POSTING WITH DATA ANALYTICS - Provided are techniques for smart posting with data analytics. A message is received before the message is posted to a social media service. The message is analyzed using data analytics to obtain analysis results. The obtained analysis results are compared to similar analysis results stored for at least one pre-existing message. For one or more correlations between the message and at least one pre-existing message, one or more contributing terms that have semantic meaning within a context of the social media service are determined. Based on the one or more contributing terms, one or more suggestions for improving the message are generated. The message is modified based on the one or more suggestions. | 09-18-2014 |
20140278357 | WORD GENERATION AND SCORING USING SUB-WORD SEGMENTS AND CHARACTERISTIC OF INTEREST - Methods for scoring a word or generating new words according to a characteristic of interest can use a computer system to access a corpus of words exemplifying a characteristic of interest. Each word is broken into a type of subword segments. Each subword segment in the corpus of words has a value score. The word to be scored is broken into the type of sub-word segment and value score for each is determined and is used to create a characteristic of interest score for the word. For generating new words, a number of first subword segments are chosen based at least in part upon value scores. At least one additional subword segment is combined with the first subword segments to create a set of potential new words, a second value score is generated for each, and a new word is selected and provided to a user. | 09-18-2014 |
20140278358 | Adapting Tabular Data for Narration - A system, and computer program product for adapting tabular data for narration are provided in the illustrative embodiments. A set of categories used to organize data is identified in a first tabular portion of a document. A structure of the categories is analyzed. An inference is drawn about data in a first cell in the first tabular portion based on a position of the first cell in the structure. The first tabular portion of the document is transformed into a first narrative form using the inference. | 09-18-2014 |
20140278359 | METHOD AND SYSTEM FOR CONVERTING DOCUMENT SETS TO TERM-ASSOCIATION VECTOR SPACES ON DEMAND - Disclosed herein is a method and system for producing a term association vector space on demand for a client given a document set in electronic form. The method extracts terms from the document set, stripping out words that do not convey meaning and adding important phrases within the context of the document set to the terms. Associations between terms are calculated, subjected to further analytical processes, and collected in a matrix, whose rows are vectors defining the vector space. Additional associational data can be added by matrix arithmetic, and documents can be rendered as further vectors in the space. | 09-18-2014 |
20140278360 | PRESENTING KEY DIFFERENCES BETWEEN RELATED CONTENT FROM DIFFERENT MEDIUMS - System, method, and computer program product to identify differences between different media formats of a media title, by identifying at least one component of each of the different media formats of the media title, the at least one component comprising a unit of the media title, annotating a respective text transcription of each of the different media formats of the media title to include at least one attribute of the respective at least one component, computing a difference score for a first component of a first media format of the media title relative to each of the remaining different media formats of the media title, and upon determining that the difference score for the first component relative to a second media format of the media title exceeds a predefined threshold, creating an indication that the first component of the first media format is different from the second media format. | 09-18-2014 |
20140278361 | AUTOCORRECTING TEXT FOR THE PURPOSE OF MATCHING WORDS FROM AN APPROVED CORPUS - System, method, and computer program product to autocorrect text for the purpose of matching words from an approved corpus, by: responsive to receiving a text input comprising a first word, validating the first word against a content corpus, wherein the content corpus comprises a plurality of words approved for display in an online chat, validating the first word against a set of rules, and upon determining that the first word of the text input is not validated against at least one of the content corpus and the set of rules, modifying the text input by replacing the first word with a first approved word from the content corpus. | 09-18-2014 |
20140278362 | Entity Recognition in Natural Language Processing Systems - Mechanisms are provided for generating a dictionary data structure for analytical operations. A source terminology resource is ingested to generate a hierarchical representation of the source terminology resource comprising nodes for terms related to concepts in the source terminology resource. For a node of the nodes in the hierarchical representation of the source terminology resource, a permutation of a corresponding term associated with the node is generated. An expanded hierarchical representation of the source terminology resource is generated based on the generated permutation. An enhanced dictionary data structure is generated based on the expanded hierarchical representation and output to an analytics engine to perform analysis of a corpus of information using the enhanced dictionary data structure. | 09-18-2014 |
20140278363 | Enhanced Answers in DeepQA System According to User Preferences - A semantic search engine is enhanced to employ user preferences to customize answer output by, for a first user, extracting user preferences and sentiment levels associated with a first question; receiving candidate answer results of a semantic search of the first question; weighting the candidate answer results according to the sentiment levels for each of the user preferences; and producing the selected candidate answers to the first user. Optionally, user preferences and sentiment levels may be accumulated over different questions for the same user, or over different users for similar questions. And, supplemental information may be retrieved relative to a user preference in order to further tune the weighting per the preferences and sentiment levels. | 09-18-2014 |
20140278364 | BUSINESS INTELLIGENCE DATA MODELS WITH CONCEPT IDENTIFICATION USING LANGUAGE-SPECIFIC CLUES - Techniques are described for modeling information from a data source. In one example, a method for modeling information from a data source includes comparing, with one or more computing devices, a data item heading from the data source with concept keywords in a concept library, the concept library comprising a plurality of concepts and one or more of the concept keywords in at least one language associated with each of one or more of the concepts. The method further includes identifying, with one or more computing devices, one or more matches between the data item heading and one or more concept keywords associated with a particular concept from among the concepts comprised in the concept library. The method further includes identifying, with one or more computing devices, the data item heading as being associated with the particular concept. | 09-18-2014 |
20140278365 | SYSTEM AND METHODS FOR DETERMINING SENTIMENT BASED ON CONTEXT - System and methods are disclosed for determining the connotation or sentiment type of a text unit comprising multiple terms and with a grammatical structure, such as subject+verb, verb+object, adjective+noun, noun+noun, noun+preposition+noun. The connotation or sentiment type of the text unit is determined by applying context rules where the context of the grammatical structure may change the inherent or default connotations of individual terms in the text unit. The methods provide a solution to the challenge of correctly or accurately determining the sentiment type of various linguistic structures under different context, and to the simplistic approach of using the inherent or default connotation of individual terms for the linguistic structure containing such terms. | 09-18-2014 |
20140278366 | FEATURE EXTRACTION FOR ANONYMIZED SPEECH RECOGNITION - Various of the disclosed embodiments relate to systems and methods for extracting audio information, e.g. a textual description of speech, from a speech recording while retaining the anonymity of the speaker. In certain embodiments, a third party may perform various aspects of the anonymization and speech processing. Certain embodiments facilitate anonymization in compliance with various legislative requirements even when third parties are involved. | 09-18-2014 |
20140278367 | COMPREHENSIVE SAFETY SCHEMA FOR ENSURING APPROPRIATENESS OF LANGUAGE IN ONLINE CHAT - A method is disclosed for evaluating a chat message sent between users of an online environment. The method may include associating each word in the chat message with metadata. The metadata identifies a word type and usage for each word in the chat message. This method may also include identifying one or more safety rules associated with the metadata. Each safety rule identifies an ordered sequence of one or more sets of words. This method may also include applying the safety rule to the chat message to determine whether a sequence of words in the chat message present in the ordered sequence of sets of words. Upon determining a word, from each set of words in the ordered sequence of sets of words, matches a respective one of the words in the chat message, the chat message is blocked from being sent to a message recipient. | 09-18-2014 |
20140278368 | MORPHEME-LEVEL PREDICTIVE GRAPHICAL KEYBOARD - In one example, a method includes determining, by a computing device and based at least in part on an initial character string, one or more candidate morpheme sequences, wherein each of the candidate morpheme sequences includes the initial character string and one or more candidate morphemes. The method further includes outputting, for display, the one or more candidate morpheme sequences. The method further includes receiving an indication of a user input detected at a presence-sensitive input device. The method further includes selecting, based on the indication of the user input, at least one of the candidate morphemes from one of the candidate morpheme sequences to define a selected morpheme sequence that includes the initial character string and the selected candidate morpheme from the one of the candidate morpheme sequences. The method further includes outputting, for display, the selected morpheme sequence. | 09-18-2014 |
20140278369 | METHOD AND SYSTEM FOR USING NATURAL LANGUAGE TECHNIQUES TO PROCESS INPUTS - Systems and methods are provided for utilizing natural language to process queries. The method for analyzing a linguistic input may include receiving the linguistic input, the linguistic input including at least one word, accessing prestored language data for a language corresponding to the linguistic input, converting the linguistic input into a text-possibility representations based on the received language data, determining a meaning of the text possibility based on the prestored language data, generating at least one semantic structure corresponding to the determined meaning, and determining an action to perform based on the generated at least one semantic structure. The prestored language data may be converted from multiple formats into one or more formats that can be algorithmically processed by a computational device. | 09-18-2014 |
20140278370 | Systems and Methods for Customizing Text in Media Content - Various embodiments are disclosed for facilitating automatic media editing. Media content is obtained and semantic analysis is performed on text in at least a portion of the media content to obtain at least one semantic textual segment each corresponding to a text section of the media content, wherein the text section comprises at least one word in the text in the at least a portion of the media content. At least one context token corresponding to the at least one semantic textual segment is generated. The text section is visually accentuated according to the context token. | 09-18-2014 |
20140278371 | METHOD AND SYSTEM FOR GENERATING A PARSER AND PARSING COMPLEX DATA - Computer-implemented systems and methods are disclosed for constructing a parser that parses complex data. In some embodiments, a method is provided for receiving a parser definition as an input to a parser generator and generating a parser at least in part from the parser definition. In some embodiments, the generated parser comprises two or more handlers forming a processing pipeline. In some embodiments, the parser receives as input a first string into the processing pipeline. In some embodiments, the parser generates a second string by a first handler and inputs the second string regeneratively into the parsing pipeline, if the first string matches an expression specified for the first handler in the parser definition. | 09-18-2014 |
20140278372 | AMBIENT SOUND RETRIEVING DEVICE AND AMBIENT SOUND RETRIEVING METHOD - An ambient sound retrieving device includes a sound input unit receiving a sound signal, a sound recognition unit performing a speech recognition process on the sound signal and generating an onomatopoeic word, a sound data storage unit storing an ambient sound and an onomatopoeic word corresponding to the ambient sound, a correlation information storage unit storing correlation information in which a first onomatopoeic word, a second onomatopoeic word, and a frequency of selecting the second onomatopoeic word are correlated with each other, a conversion unit converting the first onomatopoeic word into the second onomatopoeic word corresponding to the first onomatopoeic word using the correlation information, and a retrieval and extraction unit extracting the ambient sound corresponding to the second onomatopoeic word from the sound data storage unit and ranking and presenting a plurality of candidates of the extracted ambient sound. | 09-18-2014 |
20140278373 | NATURAL LANGUAGE PROCESSING (NLP) PORTAL FOR THIRD PARTY APPLICATIONS - A method for generating an natural language processing (NLP) model including obtaining tags, obtaining actions to be implemented by a third party application, obtaining a training corpus including sentences, where at least one word in each of the sentences is associated with one of the tags, and wherein each of the sentences is associated with one of the actions. The method further includes generating features for the NLP model for the third party application using the tags, the actions, and the training corpus, training the NLP model using the features and the training corpus to obtain a trained NLP model, and generating an APIKey for use by the third party application, where the API provides the third party application access to the trained NLP model. | 09-18-2014 |
20140278374 | SYSTEM AND METHOD FOR IMPROVING TEXT INPUT IN A SHORTHAND-ON-KEYBOARD INTERFACE - A word pattern recognition system improves text input entered via a shorthand-on-keyboard interface. A core lexicon comprises commonly used words in a language; an extended lexicon comprises words not included in the core lexicon. The system only directly outputs words from the core lexicon. Candidate words from the extended lexicon can be outputted and simultaneously admitted to the core lexicon upon user selection. A concatenation module enables a user to input parts of a long word separately. A compound word module combines two common shorter words whose concatenation forms a long word. | 09-18-2014 |
20140278375 | METHODS AND SYSTEM FOR CALCULATING AFFECT SCORES IN ONE OR MORE DOCUMENTS - A method and a computer-implemented system are provided for calculating an affect score for a text corpus, so as to facilitate analysis of the sentiment inherent to that text corpus. A list of a plurality of words is provided, wherein each word is expressing affect. Words in the text corpus are matched with words contained in the list, and a frequency of the matched words is computed. Affect words are associated along at least one semantic dimension, and those derived from the semantic dimensions and the respective frequencies are aggregated using a Choquet integral function into an affect score. | 09-18-2014 |
20140278376 | Systems and Methods for Generating Recitation Items - Computer-implemented systems and methods are provided for automatically generating recitation items. For example, a computer performing the recitation item generation can receive one or more text sets that each includes one or more texts. The computer can determine a value for each text set using one or more metrics, such as a vocabulary difficulty metric, a syntactic complexity metric, a phoneme distribution metric, a phonetic difficulty metric, and a prosody distribution metric. Then the computer can select a final text set based on the value associated with each text set. The selected final text set can be used as the recitation items for a speaking assessment test. | 09-18-2014 |
20140278377 | AUTOMATIC NOTE TAKING WITHIN A VIRTUAL MEETING - Arrangements relate to automatically taking notes in a virtual meeting. The virtual meeting has meeting content that includes a plurality of meeting content streams. One or more of the meeting content streams is in a non-text format. The one or more meeting content streams in a non-text format can be converted into text. As a result, the plurality of meeting content streams is in text format. The text of the plurality of meeting content streams can be analyzed to identify a key element within the text. Consolidated system notes that include the key element can be generated. | 09-18-2014 |
20140278378 | CONTENT TO TEST CONVERTER SYSTEM (CTTCS) - This disclosure is drawn to methods, systems, devices and/or apparatus related to converting content to questions and/or tests. Specifically, the disclosed methods, systems, devices and/or apparatus relate to converting the content (e.g., website, text, audio, video) into one or more tests to test an audience's understanding of the content. Generally, the present disclosure includes converting any content such as a website, web page or other source material into a test. Example content may be remotely and locally stored content. In some examples, tests may be displayed via a third party hosted system website, directly on the website of the source material being converted, and/or in an application on the user's computing device. | 09-18-2014 |
20140288920 | ASSISTED UPDATE OF KNOWLEDGE BASE FOR PROBLEM SOLVING - A system and method for proposing candidate solutions for updating a knowledge base are disclosed. In the method, knowledge base solutions in a natural language are each processed to generate a first action sequence of atomic steps, each including a verb and an object including a noun which is in a syntactic dependency with the respective verb. A recorded solution, expressed in a natural language, is received which includes actions performed on a device in the device class. The recorded solution is processed to generate a second action sequence of atomic steps, as for the first action sequence. The second action sequence is compared with the first action sequences to determine whether the recorded solution corresponds to one of the knowledge base solutions. Based on the comparison, provision is made for proposing an update to the knowledge base, based on the recorded solution. | 09-25-2014 |
20140288921 | SYSTEM AND METHOD FOR THE AUTOMATIC VALIDATION OF DIALOG RUN TIME SYSTEMS - A method, system and module for automatically validating dialogs associated with a spoken dialog service. The method comprises extracting key data from a dialog call detail record associated with a spoken dialog service, transmitting the key data as a dialog to a state-based representation (such as a finite-state machine) associated with a call-flow for the spoken dialog service and determining whether the dialog associated with the key data is a valid dialog for the call-flow. | 09-25-2014 |
20140288922 | METHOD AND APPARATUS FOR MAN-MACHINE CONVERSATION - The present disclosure is applied to the field of computer technology and provides a method and apparatus for man-machine conversation, including a method for man-machine conversation which is applied to a server, comprising: receiving conversation preceding data transmitted by a first client; acquiring conversation succeeding data matched with the conversation preceding data, the conversation succeeding data including first data collected from at least one second client by forwarding the conversation preceding data to the at least one second client; and returning the conversation succeeding data to the first client. In the present disclosure, for conversation preceding data from a client, the man-machine conversation is completed by collecting data from other client(s) to match corresponding conversation succeeding data and returning the conversation succeeding data to the client transmitting the conversation preceding data. Thereby, a machine's capability of responding to a user's complicated expression and expression fault-tolerance is significantly improved. | 09-25-2014 |
20140297260 | Detect and Automatically Hide Spoiler Information in a Collaborative Environment - An approach is provided to detect and hide spoiler information. In the approach, potential spoiler content included user text entries submitted to a collaborative environment are automatically detected. The system inhibits display of the potential spoiler content from the collaborative environment in response to the detection. | 10-02-2014 |
20140297261 | SYNONYM DETERMINATION AMONG N-GRAMS - A technique includes obtaining a plurality of n-grams from a plurality of messages, determining a temporal histogram for each n-gram, and determining synonyms among the n-grams based on a combination of a correlation of the histograms and a distance measure between n-grams. | 10-02-2014 |
20140297262 | ACCELERATED REGULAR EXPRESSION EVALUATION USING POSITIONAL INFORMATION - Methods and arrangements for evaluating a regular expression. Text strings are received. A regular expression is also received, the regular expression comprising a pattern for specifying and recognizing at least one text string from among the received text strings. There is generated, with respect to the received text strings, a data structure containing grams with positional information. The data structure is employed to evaluate the regular expression via identifying a subset of the text strings comprising at least one match for the given regular expression. Other variants and embodiments are broadly contemplated herein. | 10-02-2014 |
20140297263 | METHOD AND APPARATUS FOR VERIFYING TRANSLATION USING ANIMATION - A translation verification method using an animation may include the processes of analyzing an originally input sentence in a first language using a translation engine so that the sentence in the first language is converted into a second language, generating an animation capable of representing the meaning of the sentence in the first language based on information on the results of the analysis of the sentence in the first language, and providing the original and the generated animation to a user who uses the original in order for the user to check for errors in the translation. | 10-02-2014 |
20140297264 | OPEN LANGUAGE LEARNING FOR INFORMATION EXTRACTION - Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary sentences. However, state-of-the-art Open IE systems such as R | 10-02-2014 |
20140297265 | TERMINAL DEVICE, CONVERSION WORD CANDIDATE SYNCHRONIZATION METHOD, AND CONVERSION WORD CANDIDATE SYNCHRONIZATION PROGRAM - A terminal device includes: a memory that stores a candidate group including a plurality of conversion word candidates for an input character; and a processor that, if a display rank of any candidate in the candidate group is changed, determines whether the changed display rank is included in a predetermined range of display ranks and, if the changed display rank is included in the predetermined range of display ranks, lets a communication unit transmit the candidate group including the changed display rank to another terminal device. | 10-02-2014 |
20140297266 | SYSTEMS AND METHODS FOR EXTRACTING KEYWORDS IN LANGUAGE LEARNING - Disclosed are systems, methods, and products for language learning that may extract text from various resources having text, using various natural-language processing features, which can be combined with custom-designed learning activities to offer a needs-based, adaptive learning methodology. The system may receive a resource, extract keywords pedagogically valuable to non-native language learning and academic exercises. Metadata describing various aspects of resources from which keywords are extracted may be associated with keywords. Metadata describing various aspects of keywords may also be associated with keywords. Extracted keywords may be stored into a keyword store along with any metadata associated with keywords. | 10-02-2014 |
20140297267 | SYSTEM AND METHOD FOR INPUTTING TEXT INTO ELECTRONIC DEVICES - Systems comprising a user interface configured to receive text input by a user and a text prediction engine configured to receive the input text and generate text predictions. The text prediction engine may comprise a general language model and a context-specific language model. The text prediction engine is configured to generate text predictions from the general language model and the context-specific language model and combine the text predictions. The text prediction engine may comprise first and second language models and a first context-specific weighting factor associated with the first language model. The text prediction engine is configured to generate text predictions using the first and second language models, generate weighted probabilities of the text predictions from the first language model using the first context-specific weighting factor; and generate final text predictions from the weighted predictions generated from the first language model and the predictions generated by the second language model. | 10-02-2014 |
20140297268 | Advanced System and Method for Automated-Context-Aware-Dialog with Human Users - Apparatus for conducting a dialog with a user of at least one computerized enterprise system, the apparatus comprising an ontological topic definer using at least one ontological entity to define user dialog topics, each topic including an item, a block identifying executable computer code operative to resolve the item; and at least one input parameter passed to the block; and a dialog server operative for conducting a dialog with a user of at least one computerized enterprise system about an individual topic from among said user dialog topics. | 10-02-2014 |
20140297269 | ASSOCIATING PARTS OF A DOCUMENT BASED ON SEMANTIC SIMILARITY - A system for processing at least one document ( | 10-02-2014 |
20140303962 | Ordering a Lexicon Network for Automatic Disambiguation - Described systems and methods allow a computer system to employ a lexicon network for word sense disambiguation (WSD). In an exemplary embodiment, each node of the lexicon network represents a gloss of a lexicon entry, while an edge represents a lexical definition relationship between two glosses. The lexicon network is ordered prior to WSD, wherein ordering the lexicon network comprises arranging the nodes of the lexicon network in an ordered sequence, and removing a set of edges to eliminate loops. In some embodiments, the position of each node within the ordered sequence is determined according to a direction and a weight of an edge connected to the respective node. The weight may represent a semantic importance of the respective edge relative to other edges of the network. | 10-09-2014 |
20140303963 | NATURAL LANGUAGE PARSING METHOD TO PROVIDE CONCEPTUAL FLOW - A method for parsing the flow of natural human language to convert a flow of machine recognizable language into a conceptual flow includes, first, recognizing the lexical structure and then, a basic semantic grouping is determined for the language flow in the lexical structure. The basic semantic grouping is then determined that denotes the main action, occurrence or state of being for the language flow. The responsibility of the main action, occurrence or state of being for the language flow is then determined within the lexical structure followed by semantically parsing the lexical structure. Thereafter, any ambiguities in the responsibilities are resolved in a recursive manner by applying a predetermined set of rules thereto | 10-09-2014 |
20140309984 | GENERATING A REGULAR EXPRESSION FOR ENTITY EXTRACTION - A computer receives a formatted query having a plain text word. The computer selects each character in the plain text word. The computer identifies a group of characters from a confusion matrix that are commonly confused with the character selected. The computer generates a set of characters for each character selected, wherein the set of characters begin with one of the each character selected followed by and ending with the group of characters from the confusion matrix. The computer generates a regular expression by concatenating each of the set of characters. | 10-16-2014 |
20140309985 | OPTIMIZING GENERATION OF A REGULAR EXPRESSION - A computer determines whether performance optimization parameters are configured to optimize performance of generating a regular expression. In response to the computer determining the one or more performance optimization parameters are configured to optimize performance of generating the regular expression, the computer identifies syllables within a plain text word that have a high probability of spelling errors. The computer selects each character in the syllables identified. The computer identifies a group of characters from a confusion matrix that are commonly confused with the character selected. The computer generates a set of characters for each character selected, wherein the set of characters begin with one of the each character selected followed by and ending with the group of characters from the confusion matrix. The computer generates a regular expression by concatenating each of the set of characters. | 10-16-2014 |
20140309986 | WORD BREAKER FROM CROSS-LINGUAL PHRASE TABLE - Automatically creating word breakers which segment words into morphemes is described, for example, to improve information retrieval, machine translation or speech systems. In embodiments a cross-lingual phrase table, comprising source language (such as Turkish) phrases and potential translations in a target language (such as English) with associated probabilities, is available. In various examples, blocks of source language phrases from the phrase table are created which have similar target language translations. In various examples, inference using the target language translations in a block enables stem and affix combinations to be found for source language words without the need for input from human-judges or prior knowledge of source language linguistic rules or a source language lexicon. | 10-16-2014 |
20140309987 | RECONCILING DETAILED TRANSACTION FEEDBACK - Reconciling detailed transaction feedback by detecting a rating of a transaction, where the rating indicates a negative experience, mining the sentiment of words in feedback text that is included with or as part of the rating to detect whether the words indicate positive sentiment or negative sentiment, responsive to determining that the words in the feedback text indicate that the feedback text connotes a positive sentiment, adjusting the rating of the transaction. The mining may include testing words in the feedback text to detect whether the words indicate positive sentiment or negative sentiment by calculating a sentiment score. | 10-16-2014 |
20140309988 | CPW method with application in an application system - The Cognitive Process Workflow (CPW) method with its grammar, syntax and semantics is a process method, a process modeling method and a workflow method applicable to different technical systems. Among others also the application of the CPW Method in a process tool, workflow tool, workflow engine, enterprise architecture framework and enterprise architecture engine. The CPW Process is represented as simple sentence with a CPW Subject, a CPW Predicate and a CPW Object. The CPW Method—with CPW Process, CPW Dialog and CPW Workflow and in addition with CPW Context Diagrams (CPW Subject Context Diagram, CPW Object Context Diagram and CPW Subject Object Context Diagram)—can be applied to the business areas of financial services (banking, assurance and financial approval), chemistry, pharmacy, medicine, transport, travel, film industry, politics, psychology, legal practice and other business areas. | 10-16-2014 |
20140309989 | METHOD AND SYSTEM FOR ANALYZING TEXT - An apparatus for providing a control input signal for an industrial process or technical system having one or more controllable elements includes elements for generating a semantic space for a text corpus, and elements for generating a norm from one or more reference words or texts, the or each reference word or text being associated with a defined respective value on a scale, and the norm being calculated as a reference point or set of reference points in the semantic space for the or each reference word or text with its associated respective scale value. Elements for reading at least one target word included in the text corpus, elements for predicting a value of a variable associated with the target word based on the semantic space and the norm, and elements for providing the predicted value in a control input signal to the industrial process or technical system. A method for predicting a value of a variable associated with a target word is also disclosed together with an associated system and computer readable medium. | 10-16-2014 |
20140309990 | SEMANTIC RE-RANKING OF NLU RESULTS IN CONVERSATIONAL DIALOGUE APPLICATIONS - Multiple natural language understanding (NLU) interpretation selection models may be generated. The NLU interpretation selection models may include a generic NLU interpretation selection model that is not specialized for a specific set of NLU interpretations type and one or more specialized NLU interpretation selection models, each of which may be specific to a particular set of NLU interpretations type. The specialized NLU interpretation selection model(s) may be utilized to process natural language input data comprising data corresponding to their respective sets of NLU interpretations type(s). The generic NLU interpretation selection model may be utilized to process natural language input data comprising data that does not correspond to the sets of NLU interpretations type(s) associated with the specialized NLU interpretation selection model(s). | 10-16-2014 |
20140316764 | CLARIFYING NATURAL LANGUAGE INPUT USING TARGETED QUESTIONS - A dialog assistant embodied in a computing system can present a clarification question based on a machine-readable version of human-generated conversational natural language input. Some versions of the dialog assistant identify a clarification target in the machine-readable version, determine a clarification type relating to the clarification target, present the clarification question in a conversational natural language manner, and process a human-generated conversational natural language response to the clarification question. | 10-23-2014 |
20140316765 | PREVENTING FRUSTRATION IN ONLINE CHAT COMMUNICATION - Monitoring an internet chat in which a text transcript is generated by at least two chat participants, by: (i) performing a simple check on the text transcript for existence of a potential frustration precondition; and (ii) on condition that a frustration precondition is found, performing text analytics type analysis on the text transcript to determine whether potential frustration is evidenced by the text transcript. If it is determined that potential frustration is evidenced by the chat transcript then responsive action is taken to prevent and/or stem the frustration. | 10-23-2014 |
20140316766 | METHODS AND SYSTEMS FOR GENERATION OF FLEXIBLE SENTENCES IN A SOCIAL NETWORKING SYSTEM - A method and system for providing flexible sentences are disclosed. The system includes a developer interface for providing options to define actor, edge, target and aggregation of a flexible sentence syntax. In one embodiment, tokens are provided to define property expressions of the edge and/or target of the flexible sentence syntax. Based on the defined edge and target, the developer interface may generate a plurality of flexible sentence syntaxes for a developer to select. In some embodiments, the developer can add additional property expressions to further define the edge and/or target of the flexible sentence syntax. In some instances, the plurality of flexible sentence syntaxes may be prioritized based on a percentage coverage, which is determined by the impressions received over a given time frame. | 10-23-2014 |
20140316767 | PREVENTING FRUSTRATION IN ONLINE CHAT COMMUNICATION - Monitoring an internet chat in which a text transcript is generated by at least two chat participants, by: (i) performing a simple check on the text transcript for existence of a potential frustration precondition; and (ii) on condition that a frustration precondition is found, performing text analytics type analysis on the text transcript to determine whether potential frustration is evidenced by the text transcript. If it is determined that potential frustration is evidenced by the chat transcript then responsive action is taken to prevent and/or stem the frustration. | 10-23-2014 |
20140316768 | SYSTEMS AND METHODS FOR NATURAL LANGUAGE PROCESSING - Methods, systems and computer programs for automatic, highly accurate machine comprehension of a plurality of segments of free form unstructured text in a natural language. The system answers a plurality of complex, free-form questions asked in a natural language, based on the totality of input text. The system further uses a multi-dimensional data model to measure the total effects of actions/verbs acting on various unique nouns present in the input text. The system may convert the questions into another multi-dimensional data model and may then compare the two data models in program memory to derive the answers to the posed questions. The system may then automatically detect unknown words and optionally look them up in digital information sources, such as online dictionaries and encyclopedias, to fill in the gaps in knowledge to answer the questions with expert-like reliability. | 10-23-2014 |
20140316769 | GAME PLAY FACT CHECKING - A fact checking system is able to verify the correctness of information and/or characterize information by comparing the information with one or more sources. The fact checking system automatically monitors, processes, fact checks information and indicates a status of the information. Fact checking results are able to be validated by re-fact checking the fact check results. | 10-23-2014 |
20140316770 | PROCESSING A REPORT - A system for processing a report, comprising a natural language processing unit ( | 10-23-2014 |
20140324413 | SYSTEM, METHOD, AND COMPUTER-READABLE MEDIUM FOR PLAGIARISM DETECTION - A system, method, and computer-readable medium for detecting plagiarism in a set of constructed responses by accessing and pre-processing the set of constructed responses to facilitate the pairing and comparing of the constructed responses. The similarity value generated from the comparison of a pair of constructed responses serves as an indicator of possible plagiarism. | 10-30-2014 |
20140324414 | METHOD AND APPARATUS FOR DISPLAYING EMOTICON - Various embodiments provide methods and apparatus for displaying an emoticon. In an exemplary method, an instant message can be received and a pre-set keyword contained in the instant message can be determined. An emoticon corresponding to the pre-set keyword contained in the instant message can then be obtained and displayed on a chat interface displaying the instant message. Accordingly, an exemplary apparatus for displaying an emoticon can include a first obtaining module, a second obtaining module, and/or a displaying module. | 10-30-2014 |
20140324415 | EFFICIENT STRING SEARCH - Some embodiments of an efficient string search have been presented. In one embodiment, a string of bytes representing content written in a non-delimited language is received, wherein the content has been classified into a predetermined category. In a single pass through the string of bytes, a set of N-grams is searched for simultaneously. Statistical information on occurrences of the N-grams, if any, in the string of bytes is collected. In some embodiments, a model is generated based on the statistical information, where the model is usable by a content filter to classify content. | 10-30-2014 |
20140324416 | METHOD OF AUTOMATED ANALYSIS OF TEXT DOCUMENTS - Automated analysis of text documents is used to scan text documents in order to find phrases or text fragments from other documents, or modifying the existing ones. A comparatively fast and universally applicable method finds phrases, sentences or even text fragments from other documents. The method includes: all electronic files containing model documents are converted to a given format; meaningful fragments, called “clauses”, are extracted from them; the converted files containing model documents are stored in the database; each electronic file containing a document to be analyzed is converted to the given format; clauses extracted from analyzed documents are compared with clauses extracted from model documents; fractions of clauses from an analyzed document matching clauses from each model document are calculated; fractions found are then compared with a pre-set threshold value in order to find out whether there are text fragments from a model document in the analyzed one. | 10-30-2014 |
20140330553 | VERIFYING THE TERMS OF USE FOR ACCESS TO A SERVICE - Provided are techniques in which a document accompanying a service is acquired, a natural language analysis is performed on the acquired document, a determination is made from the results of the natural language analysis whether an item defined in the access control policy is found in the acquired document and, when the item defined in the access control policy is found in the acquired document, the access control policy is referenced and access to the service controlled accordingly. | 11-06-2014 |
20140330554 | System and Method for Generating Manually Designed and Automatically Optimized Spoken Dialog Systems - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for generating a natural language spoken dialog system. The method includes nominating a set of allowed dialog actions and a set of contextual features at each turn in a dialog, and selecting an optimal action from the set of nominated allowed dialog actions using a machine learning algorithm. The method includes generating a response based on the selected optimal action at each turn in the dialog. The set of manually nominated allowed dialog actions can incorporate a set of business rules. Prompt wordings in the generated natural language spoken dialog system can be tailored to a current context while following the set of business rules. A compression label can represent at least one of the manually nominated allowed dialog actions. | 11-06-2014 |
20140330555 | Methods and Systems for Natural Language Understanding Using Human Knowledge and Collected Data - Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data. | 11-06-2014 |
20140337009 | ENHANCING TEXT-BASED ELECTRONIC COMMUNICATIONS USING PSYCHO-LINGUISTICS - Embodiments of the present invention relate to enhancing text-based electronic communications using psycho-linguistics. In one embodiment, a first repository that includes a predetermined general personality profile and/or a dictionary that includes words, phrases, and/or sentences that are correlated with the personality profile is generated. A second repository that includes a predetermined participant personality profile for a participant and/or a dictionary of words, phrases, and/or sentences that are correlated with the predetermined participant personality profile is generated. An analysis on the electronic communication using the first repository and/or the second repository is performed. An alternative suggestion for a word, phrase, and/or sentence included in the electronic communication that is correlated with a predetermined participant personality profile and/or a predetermined general personality profile is generated. Words included in the electronic communication that have a correlation with the predetermined participant personality profile of the participant are determined. | 11-13-2014 |
20140337010 | INTERACTIVE ACQUISITION OF REMOTE SERVICES - A natural language specification of at least one high level information technology services requirement is obtained from a user, via a conversational interface; the same is parsed into first pre-defined semi-structured data, using a conversation parser. Based on the first pre-defined semi-structured data, a subset of candidate information technology services is identified, with a dialog engine, from a plurality of candidate information technology services provided by a plurality of vendors, the dialog engine is used to formulate a response including second pre-defined semi-structured data. The response is reverse-parsed into a natural language response, using the conversation parser. The natural language response includes a question for the user to assist in further refining the subset of candidate information technology services; the natural language response is presented to the user via the conversational interface. | 11-13-2014 |
20140337011 | CONTROLLING LANGUAGE TENSE IN ELECTRONIC CONTENT - Controlling language tense in electronic content includes determining that an age of language in electronic content exceeds a language change time threshold and changing the language to reflect a current time in response to exceeding the language change time threshold. | 11-13-2014 |
20140337012 | CONTROLLING LANGUAGE TENSE IN ELECTRONIC CONTENT - Controlling language tense in electronic content includes determining that an age of language in electronic content exceeds a language change time threshold and changing the language to reflect a current time in response to exceeding the language change time threshold. | 11-13-2014 |
20140337013 | PROCESSING MESSAGES IN A COMMUNICATION NETWORK - A method and a system for processing messages in a communication network are described herein. In one implementation, the method includes organizing a plurality of messages in an input queue of a communication terminal ( | 11-13-2014 |
20140343920 | METHOD AND SYSTEM TO DETERMINE PART-OF-SPEECH - A computer-implemented method to determine a part-of-speech (POS) category associated with a word in a text. The method includes determining a first set of candidate POS categories associated with the word based on a dictionary. The method further includes determining one or more contexts in which the word is used in the text based on a first set of rules. The method further includes determining a second set of candidate POS categories from the first set of POS categories based on the one or more contexts. The method furthermore includes determining the POS category from the second set of candidate POS categories based on a second set of rules. | 11-20-2014 |
20140343921 | ANALYZING DOCUMENTS CORRESPONDING TO DEMOGRAPHICS - Embodiments of the present invention disclose a method, computer program product, and system for analyzing documents corresponding to demographics. A computer determines whether a first text analysis algorithm corresponds to a demographic of a document, wherein Natural Language Processing (NLP) utilizes text analysis algorithms to produce an analysis of the document and provide annotations. Responsive to determining that the first text analysis algorithm does correspond to the demographic of the document, the computer analyzes the document utilizing the determined corresponding first text analysis algorithm. In another embodiment, the computer determines whether a second text analysis algorithm is available. Responsive to determining that a second text analysis algorithm is not available, the computer provides information from the analysis of the document utilizing one or more text analysis algorithms. | 11-20-2014 |
20140343922 | DEVICE, METHOD AND PROGRAM FOR ASSESSING SYNONYMOUS EXPRESSIONS - A synonymous expression assessment device includes: synonymy assessment means for receiving input of binary relations each of which includes a nominal and a predicate, and assessing whether or not the input binary relations are synonymous using a similarity between input nominals and a similarity between input predicates; and inter-predicate similarity computation means for, when computing the similarity between the input predicates based on a distribution of occurrence frequencies of nominals that are in binary relations to the input predicate in a document set, performing the computation using a distribution of only nominals that are used in the same type of concept as the input nominal. | 11-20-2014 |
20140343923 | Systems and Methods for Assessing Constructed Recommendations - A computer-implemented method of training an assessment model for assessing constructed texts expressing opinions on subjects includes accessing a plurality of training texts, which are constructed texts. The training texts are analyzed with the processing system to derive values of a plurality of linguistic features of an assessment model. At least one of the plurality of linguistic features relates to sentiment and at least one of the plurality of linguistic feature relates to specificity. The assessment model is trained with the processing system based on the values of the plurality of linguistic features. Based on the training, a weight for each of the plurality of linguistic features is determined. The assessment model is calibrated to include the weights for at least some of the plurality of linguistic features such that the assessment model is configured to generate assessment measures for constructed texts expressing opinions on subjects. | 11-20-2014 |
20140343924 | Active Lab - Various embodiments provide a tool, referred to herein as “Active Lab” that can be used to develop, debug, and maintain knowledge bases. These knowledge bases (KBs) can then engage various applications, technology, and communications protocols for the purpose of task automation, real time alerting, system integration, knowledge acquisition, and various forms of peer influence. In at least some embodiments, a KB is used as a virtual assistant that any real person can interact with using their own natural language. The KB can then respond and react however the user wants: answering questions, activating applications, or responding to actions on a web page. | 11-20-2014 |
20140343925 | TEXT ANALYSIS SYSTEM - A text analysis system is described. A natural language input unit ( | 11-20-2014 |
20140343926 | LANGUAGE MODEL GENERATING DEVICE, METHOD THEREOF, AND RECORDING MEDIUM STORING PROGRAM THEREOF - A text in a corpus including a set of world wide web (web) pages is analyzed. At least one word appropriate for a document type set according to a voice recognition target is extracted based on an analysis result. A word set is generated from the extracted at least one word. A retrieval engine is caused to perform a retrieval process using the generated word set as a retrieval query of the retrieval engine on the Internet, and a link to a web page from the retrieval result is acquired. A language model for voice recognition is generated from the acquired web page. | 11-20-2014 |
20140343927 | SYSTEM AND METHOD FOR MEANING DRIVEN PROCESS AND INFORMATION MANAGEMENT TO IMPROVE EFFICIENCY, QUALITY OF WORK AND OVERALL CUSTOMER SATISFACTION - A customer service system for providing enhanced guidance and resources to service agents and providing an enhance ability to select service agents that are best suited to address specific customers and specific customer needs, and methods for manufacturing and using same. The customer service system includes a system server with a semantic engine configured to analyze customer contact events and customize service agent interfaces based on determined customer needs. | 11-20-2014 |
20140343928 | Wearable-Based Virtual Agents - Virtual agents may be implemented on a wearable device. The wearable device may include an input device to receive input and a communication component to send the input to a computing device for processing and to receive a response for the input. The wearable device may also include an output device to output the response via the virtual agent as part of a conversation with a user. | 11-20-2014 |
20140350917 | IDENTIFYING REPEAT SUBSEQUENCES BY LEFT AND RIGHT CONTEXTS - A system and method of identifying repeat subsequences having at least a value of x for threshold of different left contexts and a value of y for a threshold of different right contexts for an input sequence are disclosed. The method may include generating a lexicographically sorted suffix array for the input sequence and a longest common prefix array. The suffix array is traversed in lexicographic order comparing the longest common prefix values between consecutive suffixes. Suffixes with the same longest common prefix are representative of occurrence of the same repeat, a higher longest common prefix indicates a new occurrence of a longer repeat, and a lower longest common prefix indicates the last occurrence of a repeat. | 11-27-2014 |
20140350918 | METHOD AND SYSTEM FOR ADDING PUNCTUATION TO VOICE FILES - A method and system for adding punctuation to a voice file is disclosed. The method includes: utilizing silence or pause duration detection to divide a voice file into a plurality of speech segments for processing, the voice file includes a plurality of features units; identifying all features units that appear in the voice file according to every term or expression and semantics features of the every term or expression that form each of the plurality of speech segments; using a linguistic model to determine a sum of weight of various punctuation modes in the voice file according to all the feature units, the linguistic model is built upon semantics features of various parsed out terms or expressions from a body text of a spoken sentence according to a language library; and adding punctuations to the voice file based on the determined sum of weight of the various punctuation modes. | 11-27-2014 |
20140350919 | METHOD AND APPARATUS FOR WORD COUNTING - A method for word counting is described, including: obtaining initial letter combinations for word combinations in a target text, wherein there is a one-to-one correspondence relationship between the initial letter combinations and the word combinations; counting occurrence frequencies of the initial letter combinations, and determining one or more initial letter combinations as target initial letter combinations; and determining target word combinations, each of which corresponding to a respective one of the target initial letter combinations according to the one-to-one correspondence relationship between the word combinations and the initial letter combinations, wherein occurrence frequency of a target initial letter combination serves as an occurrence frequency of corresponding target word combination. Further, an apparatus for word counting is described. In the method and the apparatus, memory consumption of device can be reduced in the process of counting occurrence frequencies of words. | 11-27-2014 |
20140350920 | SYSTEM AND METHOD FOR INPUTTING TEXT INTO ELECTRONIC DEVICES - The present invention provides a system comprising a user interface configured to receive text input by a user, a text prediction engine comprising a plurality of language models and configured to receive the input text from the user interface and to generate concurrently text predictions using the plurality of language models, and wherein the text prediction engine is further configured to provide text predictions to the user interface for display and user selection. An analogous method and an interface for use with the system and method are also provided. | 11-27-2014 |
20140350921 | PRESENTATION OF WRITTEN WORKS BASED ON CHARACTER IDENTITIES AND ATTRIBUTES - A method is provided for presenting a written work. A character identity is recognized within a written work. Presentation information for the written work, such as a graphical scheme or an electronic voice, is determined based on the character identity. The presentation information is provided to a user computing device. The user computing device renders the written work or a portion thereof using the presentation information. | 11-27-2014 |
20140358520 | REAL-TIME ONLINE AUDIO FILTERING - Audio from online, real-time activity is routed through a filter to remove inappropriate language associated with parameters received by a user interface. The filter automatically removes audio based on the parameters and/or derived parameters. The parameters can be directly input by a user and/or a list can be provided to the user from which they select their desired parameters. | 12-04-2014 |
20140358521 | CAPTURE SERVICES THROUGH COMMUNICATION CHANNELS - Techniques and systems are presented for capturing content for a note through various communication channels including those for email, text, and voice. One technique includes receiving a message from a communication channel; parsing the message and determining semantic structure of the message; determining a presentation form for how the content is to be presented and used in a note from elements in the message; and inserting the message into the note according to the presentation form. Receipt of a message addressed to a uniform address may be used to indicate that the message is to be inserted into a note. | 12-04-2014 |
20140358522 | INFORMATION SEARCH APPARATUS AND INFORMATION SEARCH METHOD - A processor of an information search apparatus receives an input of information that includes a plurality of search words. The processor separates two search words from the received information. The processor searches for and extracts, from a storage unit, two words that correspond to the two search words and semantic information of the two words, the storage unit storing a plurality of words included in a search target sentence and semantic information in association with the search target sentence, the semantic information stored in the storage unit indicating a relationship established within the search target sentence between the plurality of words and another word. An output unit outputs the extracted semantic information. This allows an intended search result to be obtained efficiently. | 12-04-2014 |
20140358523 | TOPIC-SPECIFIC SENTIMENT EXTRACTION - One or more embodiments of techniques or systems for sentiment extraction are provided herein. From a corpus or group of social media data which includes one or more expressions pertaining to a topic, target topic, or a target, one or more candidate expressions may be extracted. Relationships between one or more pairs of candidate expressions may be identified or evaluated. For example, a consistency relationship or an inconsistency relationship between a pair may be determined. A root word database may include one or more root words which facilitate identification of candidate expressions. Among one or more of the root words may be seed words, which may be associated with a predetermined polarity. To this end, polarities may be determined based on a formulation which assigns polarities to a sentiment expression, candidate expressions, or an expression as a constrained optimization problem. | 12-04-2014 |
20140358524 | MACHINE TRANSLATION QUALITY MEASUREMENT - A method, an apparatus, and a computer program for measuring a quality of the machine translation. An original segment, for example one sentence in English, is translated to a target language, for example Spanish. The translated sequence is translated and then back-translated with several machine translation engines back to the original language, for example English. The resulting translations, back-translations are compared, possibly to each other, and to the original sequence. This gives measurement value of the quality of the back translations. At least one measured value from above steps of the process can be used in order to output information about the quality of the machine translation. | 12-04-2014 |
20140365206 | Method and system for idea spotting in idea-generating social media platforms - A computer-implemented system and method provide for identifying the core of an idea. The method includes receiving an idea submission which includes a textual description of an idea. The textual description of the idea is natural language processed to identify dependencies (syntactic and/or semantic relations between text elements) in at least a part of the textual description. Provision is made for identifying directive illocutionary acts in the textual description, based on the identified dependencies. The method further includes providing for identifying an idea core of the idea submission, based on an identified directive illocutionary act, where present, and outputting information based on the identified idea core. | 12-11-2014 |
20140365207 | Method and system for classifying reviewers' comments and recommending related actions in idea-generating social media platforms - A system and method for classifying comments are disclosed. The method includes receiving a collection of comments. Each of the comments in the collection includes text in a natural language and is associated with a previously-submitted idea submission which includes a description of an idea. The method further includes natural language processing each of the comments to identify dependencies (syntactic and/or semantic relations between text elements) in at least a part of the comment. Based on the identified dependencies, the comments are each automatically classified into one (or more) of a plurality of comment classes. The comment classes may include a first class for reaction to the content of the idea, a second class for expression of a commenter's judgment of an idea's value, and a third class for reaction to an idea generation process in which the associated idea submission is made. Information based on the assigned comment classes is output. For example, actions are proposed to use the comments according to their class. | 12-11-2014 |
20140365208 | CLASSIFICATION OF AFFECTIVE STATES IN SOCIAL MEDIA - Affective state classification embodiments are described which train and use a classifier to identify an affect exhibited by a segment of text. The affect being identified is chosen from a group of affects, each of which corresponds to a different emotion or sentiment being expressed by a person authoring the segment of text. In addition, each affect in the group of affects relates more than the valence of the emotion or sentiment being expressed. In other word, the identified affect is more than just an indication of the positive or negative nature of the text segment. Rather, in one embodiment, the classifier is trained to identify whether a segment of text exhibits one of the following affects: fear, sadness, guilt, hostility, joviality, self-assurance, attentiveness, shyness, fatigue, surprise, and serenity. | 12-11-2014 |
20140365209 | SYSTEM AND METHOD FOR INFERRING USER INTENT FROM SPEECH INPUTS - A text string with a first and a second portion is provided. A domain of the text string is determined by applying a first word-matching process to the first portion of the text string. It is then determined whether the second portion of the text string matches a word of a set of words associated with the domain by applying a second word-matching process to the second portion of the text string. Upon determining that the second portion of the text string matches the word of the set of words, it is determined whether a user intent from the text string based at least in part on the domain and the word of the set of words. | 12-11-2014 |
20140365210 | Systems and Methods for Processing Patient Information - Systems and methods described herein are for transforming narrative content into structured output. In some embodiments the narrative content is processed using a natural language processing (NLP) engine and a clinical model. The structured output can include a section, a clinical assertion, and a plurality of elements, wherein the elements may include section elements and clinical assertion elements that annotate the section and clinical assertions respectively. The elements can be labeled based on the clinical model. | 12-11-2014 |
20140372101 | STYLE-BASED SPELLCHECKER TOOL - Systems and methods for providing style-based spellchecking are provided. An example system includes a dictionary that includes one or more words in a language. At least one word in the dictionary is based on one or more styles of the language. The system also includes a language module that receives a user selection of the language in which to spellcheck a document. The system further includes a style module that receives a user selection of one or more selected styles of the language. The system also includes a spellchecking module that identifies in the document a word applicable to the one or more selected styles and that determines whether the dictionary includes one or more spellings of the word based on the one or more selected styles. | 12-18-2014 |
20140372102 | COMBINING TEMPORAL PROCESSING AND TEXTUAL ENTAILMENT TO DETECT TEMPORALLY ANCHORED EVENTS - A method for extraction of events includes performing linguistic processing on a collection of text documents to identify predicates and respective arguments of the predicates and performing temporal processing on the collection of documents to normalize referential dates. A query is received which includes a topic and date information which defines a date range. A collection of excerpts from the collection of documents is identified, each excerpt including an argument which is based on the topic and a normalized reference to a date which matches the defined date range. A plurality of sets of events in the collection of excerpts is identified, each set of events including a plurality of the excerpts in the collection that are linked together by entailment relationships. | 12-18-2014 |
20140372103 | DATA DETECTION - A method of processing a sequence of characters, the method comprising converting the sequence of characters into a sequence of tokens so that each token comprises a lexeme and one of a plurality of token types. Each of the plurality of token types relates to at least one of a plurality of predetermined functions, wherein at least one said token type relates to multiple functions of the plurality of predetermined functions. | 12-18-2014 |
20140372104 | METHOD AND APPARATUS FOR SITUATIONAL ANALYSIS TEXT GENERATION - Methods, apparatuses, and computer program products are described herein that are configured to generate a situational analysis text. In some example embodiments, a method is provided that comprises generating a set of messages based on one or more key events in a primary data channel and one or more significant events in one or more related data channels in response to an alert condition. The method of this embodiment may also include generating a situational analysis text based on the set of messages and the relationships between them. In some example embodiments, the situational analysis text is configured to linguistically express the one or more key events, the one or more significant events, and the relationships between the one or more key events and the one or more significant events. | 12-18-2014 |
20140372105 | Submatch Extraction - A method for submatch extraction may include receiving an input string, receiving a regular expression, and converting the regular expression with capturing groups into a plurality of finite automata to extract submatches. The method further includes using a first automaton to determine whether the input string is in a language described by the regular expression, and to process the input string, and using states of the first automaton in a second automaton to extract the submatches. | 12-18-2014 |
20140372106 | Assisted Free Form Decision Definition Using Rules Vocabulary - A method of decision definition using a rules vocabulary includes: receiving free form input; identifying terms contained within the free form input; searching the rules vocabulary objects for terms; responsive to the term being found, obtaining input from a user as to whether to use the found term; responsive to the term not being found; searching the rules vocabulary attributes for terms having attributes corresponding to the term; responsive to the term being found, obtaining input from a user as to whether to use the found term; and refactoring the free form input with the found term accepted by the user. The method also includes updating the rules vocabulary with the term identified in the free form input as a synonym for the term found in said rules vocabulary. One embodiment further provides a method of determining semantic equivalence between a plurality of rules using a rules database having preferred terms. | 12-18-2014 |
20140379323 | ACTIVE LEARNING USING DIFFERENT KNOWLEDGE SOURCES - Different knowledge sources are automatically accessed to identify and obtain additional data to update a conversational dialog system. One of the knowledge sources is initially selected as a seed source. Seed data from the seed source are used to identify related data in at least one other knowledge source. For example, query click logs may be accessed and searched to determine popular queries that use the seed data. A structured knowledge source may be accessed to determine related nodes to the seed data. A query click log, or some other knowledge source, may be used to determine when a node is related to the seed data. Data that is identified to be related may be used to train a language understanding model or update a schema for the SLU system. The data may be automatically annotated or manually annotated. | 12-25-2014 |
20140379324 | PROVIDING WEB-BASED ALTERNATE TEXT OPTIONS - Systems, methods, and computer-readable storage media are provided for mining web content for synonyms for selected words and/or phrases and presenting such web-based synonyms in the context of applications that permit text editing. As the synonyms are mined from web content, they have potentially more expansive and accurate coverage than a fixed, and often dated, thesaurus. Web content for synonyms of selected words and/or phrases may be mined taking into account at least a portion of the surrounding context in which the selected words and/or phrases appear. Further, web content for synonyms of selected words and/or phrases may be mined taking into account user behaviors that might provide clues as to the intended meaning of the selected words and/or phrases. | 12-25-2014 |
20140379325 | TEXT ENTRY AT ELECTRONIC COMMUNICATION DEVICE - Typed input is received at a text field from a keyboard, such as a virtual keyboard displayed at an electronic communication device. Text prediction using the typed input can be performed to obtain at least one text-predicted candidate word and an associated confidence value. Speech recognition can be performed on audio input received via a microphone to obtain at least one speech-recognized candidate word and an associated confidence value, which can be adjusted based on the recency of the audio input. A candidate word having a highest confidence value can be selected from the text-predicted and speech-recognized candidate words for display as a suggestion to the user for selection by the user. The suggested candidate word can be displayed on a fret of the virtual keyboard. | 12-25-2014 |
20140379326 | BUILDING CONVERSATIONAL UNDERSTANDING SYSTEMS USING A TOOLSET - Tools are provided to allow developers to enable applications for Conversational Understanding (CU) using assets from a CU service. The tools may be used to select functionality from existing domains, extend the coverage of one or more domains, as well as to create new domains in the CU service. A developer may provide example Natural Language (NL) sentences that are analyzed by the tools to assist the developer in labeling data that is used to update the models in the CU service. For example, the tools may assist a developer in identifying domains, determining intent actions, determining intent objects and determining slots from example NL sentences. After the developer tags all or a portion of the example NL sentences, the models in the CU service are automatically updated and validated. For example, validation tools may be used to determine an accuracy of the model against test data. | 12-25-2014 |
20140379327 | APPARATUS AND METHOD FOR HELPING IN THE READING OF AN ELECTRONIC MESSAGE - An apparatus and method for determining whether the meaning of a word included in an electronic message needs to be presented to a user, according to a dynamic determination whether the user currently knows the meaning of the word. In a client, a communication control unit receives a message sent from a user A to a user B, a morphological analysis unit extracts a word from the message, and a history acquisition unit acquires history information on viewing, usage, or the like of the word by the user B. A display determination unit determines whether the meaning of the word needs to be displayed, according to the acquired history information, the language level of the user B stored in a user level storage unit, and the difficulty level of the word stored in a dictionary storage unit. An input/output control unit performs control such that the meaning of the word is presented to the user B according to the determination result. | 12-25-2014 |
20140379328 | APPARATUS AND METHOD FOR OUTPUTTING IMAGE ACCORDING TO TEXT INPUT IN REAL TIME - An apparatus and method for outputting an image according to text input in real time are provided. The apparatus for outputting an image in real time includes: a text receiving unit configured to extract unit text from input text; a syntax analyzing unit configured to analyze syntax of the unit text to generate state information corresponding to the unit text; a text reference database (DB) matching unit configured to search a reference DB to generate an image corresponding to the state information; a change necessity determining unit configured to determine whether an image corresponding to previous unit text needs to be changed; and an output unit configured to generate an output image by using any one or more of the image corresponding to the state information and the image corresponding to the previous unit text according to the necessity for the change. | 12-25-2014 |
20140379329 | METHODS AND APPARATUSES FOR MINING SYNONYMOUS PHRASES, AND FOR SEARCHING RELATED CONTENT - The present disclosure is related to a method and an apparatus of mining synonymous phrases. The method comprises: obtaining, according to a parallel text corpus, a first phrase-alignment relationship from phrases of a current language to phrases of an intermediate language, and a second phrase-alignment relationship from the phrases of the intermediate language to the phrases of the current language; obtaining, for a target phrase of current language, a first set of aligned phrases of the intermediate language that are aligned with the target phrase of the current language based on the first phrase-alignment relationship; obtaining a second set of aligned phrases of the current language that are aligned with selected phrase(s) in the first set of aligned phrases based on the second phrase-alignment relationship; and obtaining synonymous phrases for the target phrase from the second set of aligned phrases. | 12-25-2014 |
20140379330 | NATURAL LANGUAGE QUESTION EXPANSION AND EXTRACTION - Methods, computer program products and systems for generating at least one factual question from a set of seed questions and answer pairs. One method includes: obtaining at least one seed question and answer pair from the set of seed question and answer pairs; extracting a set of features associated with the at least one seed question and answer pair using at least one common analysis system (CAS) in a set of CASs and a specific knowledge base; generating a set of candidate questions from the extracted set of features using a logistic regression algorithm and the specific knowledge base; and ranking each candidate question relative to a remainder of candidate questions in the set of candidate questions based on the extracted set of features and the at least one seed question and answer pair. | 12-25-2014 |
20140379331 | Automatic Semantic Rating and Abstraction of Literature - Deep semantic analysis is performed on an electronic literary work in order to detect plot elements and optional other storyline elements such as characters within the work. Multiple levels of abstract are generated into a model representing the literary work, wherein each element in each abstraction level may be independently rated for preference by a user. Through comparison of multiple abstraction models and one or more user rating preferences, one or more alternative literary works may be automatically recommended to the user. | 12-25-2014 |
20150012260 | APPARATUS AND METHOD FOR RECOGNIZING VOICE AND TEXT - A method for recognizing a voice includes receiving, as an input, a voice involving multiple languages, recognizing a first voice of the voice by using a voice recognition algorithm matched to a preset primary language, identifying the preset primary language and a non-primary language different from the preset primary language, which are included in the multiple languages, determining a type of the non-primary language based on context information, recognizing a second voice of the voice in the non-primary language by applying a voice recognition algorithm, which is matched to the non-primary language of the determined type, to the second voice, and outputting a result of recognizing the voice which is based on a result of recognizing the first voice and a result of recognizing the second voice. | 01-08-2015 |
20150012261 | Method for phonetizing a data list and voice-controlled user interface - A method for phonetizing a data list having text-containing list entries, each list entry in the data list being subdivided into at least two data fields for provision to a voice-controlled user interface, includes: converting a list entry from a text representation into phonetics; storing the phonetics as phonemes in a phonetized data list; inserting a separating character into the text of a list entry between the respective data fields of the list entry, concomitantly converting the inserted separating character into phonetics and concomitantly storing the converted separating character as a phoneme symbol; and storing the phonemes in a phonetic database, the phonetized data list being produced from the phonemes stored in the phonetic database. | 01-08-2015 |
20150012262 | METHOD AND SYSTEM FOR GENERATING NEW ENTRIES IN NATURAL LANGUAGE DICTIONARY - A method and computer system for analyzing a text corpus in a natural language is provided. An initial morphological description having word inflection rules for various groups of words in the natural language is created by a linguist. A plurality of text corpuses are analyzed to obtain information on the occurrence of a plurality of word forms for each word token in each text corpus. A morphological dictionary which contains information about each base form and word inflection rules for each word token with verified hypothesis is generated. | 01-08-2015 |
20150012263 | SYSTEM AND METHOD FOR SEMANTIC ANALYSIS OF CANDIDATE INFORMATION TO DETERMINE COMPATIBILITY - A computer includes a taxonomy, mapping grammatical patterns to qualities. A scanner on the computer can scan content to identify phrases that correspond to the grammatical patterns in the taxonomy. The computer can then calculate percentages of occurrences for the grammatical patterns, and also for combinations of grammatical patterns. The calculated percentages of occurrences can then be output. | 01-08-2015 |
20150019202 | Ontology and Annotation Driven Grammar Inference - Inferring a natural language grammar is based on providing natural language understanding (NLU) data with concept annotations according to an application ontology characterizing a relationship structure between application-related concepts for a given NLU application. An application grammar is then inferred from the concept annotations and the application ontology. | 01-15-2015 |
20150019203 | REAL-TIME NATURAL LANGUAGE PROCESSING OF DATASTREAMS - Systems and methods for identifying and locating related content using natural language processing are generally disclosed herein. One embodiment includes an HTML5/JavaScript user interface configured to execute scripting commands to perform natural language processing and related content searches, and to provide a dynamic interface that enables both user-interactive and automatic methods of obtaining and displaying related content. The natural language processing may extract one or more context-sensitive key terms of text associated with a set of content. Related content may be located and identified using keyword searches that include the context-sensitive key terms. For example, text associated with video of a first content, such as text originating from subtitles or closed captioning, may be used to perform searches and locate related content such as a video of a second content, or text of a third content. | 01-15-2015 |
20150019204 | FEATURE COMPLETION IN COMPUTER-HUMAN INTERACTIVE LEARNING - A collection of data that is extremely large can be difficult to search and/or analyze. Relevance may be dramatically improved by automatically classifying queries and web pages in useful categories, and using these classification scores as relevance features. A thorough approach may require building a large number of classifiers, corresponding to the various types of information, activities, and products. Creation of classifiers and schematizers is provided on large data sets. Exercising the classifiers and schematizers on hundreds of millions of items may expose value that is inherent to the data by adding usable meta-data. Some aspects include active labeling exploration, automatic regularization and cold start, scaling with the number of items and the number of classifiers, active featuring, and segmentation and schematization. | 01-15-2015 |
20150019205 | METHOD AND SYSTEM FOR GENERATING GRAMMAR RULES - An information retrieval system, including a natural language parser ( | 01-15-2015 |
20150019206 | METADATA EXTRACTION OF NON-TRANSCRIBED VIDEO AND AUDIO STREAMS - A system and computer based method for transcribing and extracting metadata from a source media. A processor-based server extracts audio and video stream from the source media. A speech recognition engine processes the audio stream to transcribe the audio stream into a time-aligned textual transcription, thereby providing a time-aligned machine transcribed media. The video frame engine process the video stream to extract time-aligned video frames. A database stores the time-aligned machine transcribed media and time-aligned video frames. A server processor processes the time-aligned machine transcribed media to extract time-aligned textual metadata associated with the source media, and processes the time-aligned vide frames to extract time-aligned visual metadata associated with the source media. | 01-15-2015 |
20150019207 | Detecting Semantic Errors in Text Using Ontology-Based Extraction Rules - Semantic errors in a natural language text document are automatically detected by matching sentences in the document with stored ontology-based extraction rules that express both logically correct and logically incorrect relationships between the classes and properties of an ontology for a predefined knowledge domain of relevance to the natural language text document. The matching identifies logically correct and incorrect statements in the document which may be used for various applications such as automatic grading. | 01-15-2015 |
20150019208 | METHOD FOR IDENTIFYING A SET OF SENTENCES IN A DIGITAL DOCUMENT, METHOD FOR GENERATING A DIGITAL DOCUMENT, AND ASSOCIATED DEVICE - A method for generating a digital summary, the method including: a parameterisation step for defining a first degree of summarisation of a first digital document defining a first ratio between a first number representing the quantity of data contained in the desired digital abstract and a second number representing the quantity of data contained in the first document; an analysis step for analysing the first digital document, including the definition of a set of terms, known as TAG; a segmentation step for (i) determining a first set of sentences in the first document or (ii) associating a weighing with each of the sentences; an extraction step for extracting a number of sentences according to the degree of condensation; and a generation step for generating a digital abstract including a set of ordered sentences. | 01-15-2015 |
20150019209 | ATTRIBUTION USING SEMANTIC ANALYSIS - A method and system for semantic attribution of a request. Source data statements for the request are received. A selection of a domain for the received source data statements is received. The received source data statements are semantically analyzed, which includes matching elements in the received source data statements to respective one or more entries in an ontology associated with the selected domain. The ontology includes items and relationships that define the selected domain. Each element in the received source data statements is a word or a phrase. The one or more entries are assigned to the matched elements, respectively, to annotate each matched element with a respective annotation consisting of the respective one or more entries. The annotated elements are saved with the respective annotations. The annotations are used to generate a search query for the request. | 01-15-2015 |
20150019210 | SYSTEM AND METHOD OF EXTRACTING CLAUSES FOR SPOKEN LANGUAGE UNDERSTANDING - A clausifier and method of extracting clauses for spoken language understanding are disclosed. The method relates to generating a set of clauses from speech utterance text and comprises inserting at least one boundary tag in speech utterance text related to sentence boundaries, inserting at least one edit tag indicating a portion of the speech utterance text to remove, and inserting at least one conjunction tag within the speech utterance text. The result is a set of clauses that may be identified within the speech utterance text according to the inserted at least one boundary tag, at least one edit tag and at least one conjunction tag. The disclosed clausifier comprises a sentence boundary classifier, an edit detector classifier, and a conjunction detector classifier. The clausifier may comprise a single classifier or a plurality of classifiers to perform the steps of identifying sentence boundaries, editing text, and identifying conjunctions within the text. | 01-15-2015 |
20150025875 | SEMANTICS-ORIENTED ANALYSIS OF LOG MESSAGE CONTENT - Processing a log message is disclosed. A log message is received. One or more portions of the log message to be separately extracted are identified. A value is extracted from each identified portion. Extracting the value includes using an extraction rule. The extraction rule is associated with the identified portion. | 01-22-2015 |
20150025876 | INTEGRATED KEYPAD SYSTEM - A data entry system using a number of different input signals provided by interacting with input means such as keys wherein to at least some of the different input signals a number of symbols including substantially all of the letters of the alphabet of one language are distributively assigned such that at least two of the letters are ambiguously assigned to at least one of the some input signals. The data entry system includes a database of words and uses a word predictive system such that in order to enter a word of the database the user provides a first input information consisting of providing the input signals corresponding to the characters, generally the letters, of the word, and an additional input information corresponding to at least one of the characters of the word and its corresponding input signal provided through the first input information, and wherein the system precisely recognizes the character among the ambiguous characters assigned to the input signal and provides a word of the dictionary that corresponds to the first and the additional input information. | 01-22-2015 |
20150025877 | CHARACTER INPUT DEVICE, CHARACTER INPUT METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a character input device includes a first obtainer, a determiner, a first generator, and an outputter. The first obtainer receives an input of characters from a user and obtains an input character string. The determiner infers, from the input character string, word notations intended by the user and relations of connection between the word notations and to determine routes each of which represents the relation of connection having a high likelihood of serving as a notation candidate intended by the user. The first generator extracts, from a group of word notations included in the routes, the word notations to be output and generate layout information used in outputting the extracted word notations as the notation candidates. The outputter outputs the layout information. | 01-22-2015 |
20150032441 | Initializing a Workspace for Building a Natural Language Understanding System - Designing a natural language understanding (NLU) model for an application from scratch can be difficult for non-experts. A system can simplify the design process by providing an interface allowing a designer to input example usage sentences and build an NLU model based on presented matches to those example sentences. In one embodiment, a method for initializing a workspace for building an NLU system includes parsing a sample sentence to select at least one candidate stub grammar from among multiple candidate stub grammars. The method can include presenting, to a user, respective representations of the candidate stub grammars selected by the parsing of the sample sentence. The method can include enabling the user to choose one of the respective representations of the candidate stub grammars. The method can include adding to the workspace a stub grammar corresponding to the representation of the candidate stub grammar chosen by the user. | 01-29-2015 |
20150032442 | METHOD AND APPARATUS FOR SELECTING AMONG COMPETING MODELS IN A TOOL FOR BUILDING NATURAL LANGUAGE UNDERSTANDING MODELS - Selecting a grammar for use in a machine question-answering system, such as a Natural Language Understanding System, can be difficult for non-experts in such grammars. A tool, according to an example embodiment, can compare annotations of sample sentences, performed correctly by a human, the annotations having intents and mentions, against annotations performed by multiple grammars. Each grammar can be scored, and the system can select the best scored grammar for the user. In one embodiment, a method of selecting a grammar includes comparing manually-generated annotations against machine-generated annotations as a function of a given grammar among multiple grammars. The method can further include applying scores to the machine-generated annotations that are a function of weightings of the intents and mentions. The method can additionally include recommending whether to employ the given grammar based on the scores. | 01-29-2015 |
20150032443 | SELF-LEARNING STATISTICAL NATURAL LANGUAGE PROCESSING FOR AUTOMATIC PRODUCTION OF VIRTUAL PERSONAL ASSISTANTS - Technologies for natural language request processing include a computing device having a semantic compiler to generate a semantic model based on a corpus of sample requests. The semantic compiler may generate the semantic model by extracting contextual semantic features or processing ontologies. The computing device generates a semantic representation of a natural language request by generating a lattice of candidate alternative representations, assigning a composite weight to each candidate, and finding the best route through the lattice. The composite weight may include semantic weights, phonetic weights, and/or linguistic weights. The semantic representation identifies a user intent and slots associated with the natural language request. The computing device may perform one or more dialog interactions based on the semantic request, including generating a request for additional information or suggesting additional user intents. The computing device may support automated analysis and tuning to improve request processing. Other embodiments are described and claimed. | 01-29-2015 |
20150032444 | CONTEXTUAL ANALYSIS DEVICE AND CONTEXTUAL ANALYSIS METHOD - According to an embodiment, a contextual analysis device includes a generator, an predictor, and a processor. The generator is configured to generate, from a target document for analysis, an predicted sequence in which some elements of a sequence having elements arranged therein are obtained by prediction. Each element is a combination of a predicate having a common argument, word sense identification information of the predicate, and case classification information indicating a type of the common argument. The predictor is configured to predict an occurrence probability of the predicted sequence based on a probability of appearance of the sequence that is acquired in advance from an arbitrary group of documents and that is matching with the predicted sequence. The processor is configured to perform contextual analysis with respect to the target document by using the predicted occurrence probability of the predictepredictord sequence. | 01-29-2015 |
20150039289 | Systems and Methods for Representing, Diagnosing, and Recommending Interaction Sequences - Systems and methods for representing and diagnosing interaction sequences in accordance embodiments of the invention are disclosed. In one embodiment of the invention, a group interaction diagnosis and recommendation server system includes a processor and a memory configured to store a set of reference interaction data, where the reference interaction data includes a set of reference interaction sequences, wherein a group interaction diagnosis application configures the processor to obtain a set of group interaction data, generate an interaction model based on the group interaction data and an interaction dynamics language, determine at least one interaction sequence within the set of group interaction data based on the generated interaction model, identify at least one matching interaction sequence within the determined at least one interaction sequence, and recommend at least one improved interaction sequence based on the identified at least one matching interaction sequence and the set of reference interaction data. | 02-05-2015 |
20150039290 | KNOWLEDGE-RICH AUTOMATIC TERM DISAMBIGUATION - Embodiments of the invention relate to ambiguity detection. In one embodiment, an object and a topical domain associated with the object are obtained. In this embodiment, the object includes at least one term. At least one of a plurality of information sources is analyzed based on the at least one term and the topical domain. A determination is made that object is one of ambiguous and unambiguous based on analyzing at least one of the plurality of information sources. | 02-05-2015 |
20150039291 | Using a group of CVs and Job Descriptions in a database to establish a library of contextual words and phrases against which documents (CVs or Job Descriptions) can be matched, scored, and ranked. - The present invention relates to using a group of CVs and Job Descriptions (Documents) in a database to establish a library of contextual phrases (References), against which Documents (internal in the database, and external from the database) can be matched, scored, and ranked. | 02-05-2015 |
20150039292 | METHOD AND SYSTEM OF CLASSIFICATION IN A NATURAL LANGUAGE USER INTERFACE - A method and system are provided for processing natural language user queries for commanding a user interface to perform functions. Individual user queries are classified in accordance with the types of functions and a plurality of user queries may be related to define a particular command. To assist with classification, a query type for each user query is determined where the query type is one of a functional query requesting a particular new command to perform a particular type of function, an entity query relating to an entity associated with the particular new command having the particular type of function and a clarification query responding to a clarification question posed to clarify a prior user query having the particular type of function. Functional queries may be processed using a plurality of natural language processing techniques and scores from each technique combined to determine which type of function is commanded. | 02-05-2015 |
20150039293 | SYSTEM AND METHOD FOR DETECTING THE OCCURENCES OF IRRELEVANT AND/OR LOW-SCORE STRINGS IN COMMUNITY BASED OR USER GENERATED CONTENT - A system and method can detect and/or prevent profane/objectionable content in forums/communities with community based or user generated content. The system generates and provides a disallowed variants dictionary, which can be constructed based on a misuse table, wherein the disallowed variants dictionary contains a plurality of variants of one or more disallowed words in a community based or user generated content. Furthermore, the system checks each word in an incoming message against the disallowed variants dictionary, and determines that one or more words in the incoming message are disallowed when there is a hit. | 02-05-2015 |
20150039294 | METHOD AND SYSTEM FOR SIMPLIFYING IMPLICIT RHETORICAL RELATION PREDICTION IN LARGE SCALE ANNOTATED CORPUS - The present invention provides a method and system directed to predicting implicit rhetorical relations between two spans of text, e.g., in a large annotated corpus, such as the Penn Discourse Treebank (“PDTB”), Rhetorical Structure Theory corpus, and the Discourse Graph Bank, and particularly directed to determining a rhetorical relation in the absence of an explicit discourse marker. Surface level features may be used to capture pragmatic information encoded in the absent marker. In one manner a simplified feature set based only on raw text and semantic dependencies is used to improve performance for all relations. By using surface level features to predict implicit rhetorical relations for the large annotated corpus the invention approaches a theoretical maximum performance, suggesting that more data will not necessarily improve performance based on these and similarly situated features. | 02-05-2015 |
20150039295 | NATURAL LANGUAGE PROCESSOR - Disclosed is a method for converting a plurality of words or sign language gestures into one or more sentences. The method involves the steps of: obtaining a plurality of words; assigning a part of speech tag to each of said words; assigning a sentence structure tag to said plurality of words; and parsing said words into one or more sentences based on a predefined sentence structure. The method can be implemented by a computer to provide a translator that more accurately reflects the natural language of the original text. | 02-05-2015 |
20150039296 | PREDICATE TEMPLATE COLLECTING DEVICE, SPECIFIC PHRASE PAIR COLLECTING DEVICE AND COMPUTER PROGRAM THEREFOR - A predicate template collector allowing efficient and automatic recognition of predicate templates is adapted to include: a noun pair collector | 02-05-2015 |
20150039297 | Guided Article Authorship - A method includes, determining a target publication, identifying one or more content suggestions associated with the target publication, and causing a user to be prompted to input content. The input content satisfies at least a portion of the one or more content suggestions. | 02-05-2015 |
20150046150 | IDENTIFYING AND AMALGAMATING CONDITIONAL ACTIONS IN BUSINESS PROCESSES - Methods and systems for identifying conditional actions in a business process are disclosed. In accordance with one such method, text fragments are extracted from input documents. In addition, a plurality of pairs of the text fragments that respectively include text fragments that are similar according to a pre-defined similarity standard are determined. For each pair of at least a subset of the pairs, at least one difference between the text fragments of the corresponding pair is determined. Further, at least two particular pairs of the subset of the pairs are merged in response to determining that the particular pairs have at least one of the determined differences in common. Additionally, the merged particular pairs are output to indicate the conditional actions in the business process. | 02-12-2015 |
20150046151 | SYSTEM AND METHOD FOR IDENTIFYING AND VISUALISING TOPICS AND THEMES IN COLLECTIONS OF DOCUMENTS - Method and systems for estimating and visualising a plurality of topics in a collection of documents, wherein the collection of documents comprises a plurality of words and each document comprises one or more of the plurality of words, the method comprising: performing two rounds of topic modelling to the collection of documents, wherein the first round of topic modelling estimates a plurality of topics associated with the collection of documents and each topic comprises one or more words, and the second round identifies a plurality of themes associated with the topics, wherein each theme comprises one or more topics; and visually representing the topics and themes to a user. | 02-12-2015 |
20150046152 | DETERMINING CONCEPT BLOCKS BASED ON CONTEXT - A method for generating a set of concept blocks is presented, wherein the concept blocks are words in a corpus of documents that can be processed to extract trends, build an efficient inverted search index, or generate a summary report of the content. The method entails generating a plurality of target words from the corpus, determining context strings for the target words, obtaining pattern types that are based on number of words and position of words relative to the target words, and assigning weights to each of the context strings having a particular pattern type. The target words are then expressed as vectors that reflect the weights of the context strings. The vectors are compared and grouped into clusters based on similarity. Target words in the resulting clusters are concept blocks. A subgroup of clusters may be selected for another iteration of the process to catch new concept blocks. | 02-12-2015 |
20150046153 | MACRO REPLACEMENT OF NATURAL LANGUAGE INPUT - In a method of creating a natural language (NL) macro, a first term/phrase and a second term/phrase in an imprecise syntax are obtained, and an association between the first and the second terms/phrases is created. The association is stored as an NL macro. In a method of using an NL macro in an NL query, it is determined that an original NL query includes an NL macro, and the NL macro is replaced with its corresponding NL value to form a revised NL query. The revised NL query is processed to generate one or more answers. | 02-12-2015 |
20150046154 | NATIVE-SCRIPT AND CROSS-SCRIPT CHINESE NAME MATCHING - Techniques for Chinese name matching are described. A Chinese name is received and is romanized into a Mandarin Pinyin representation. The Mandarin Pinyin representation of the Chinese name is matched against a set of Romanized Chinese names originating from several different Chinese character names. In response to finding a potential match between the Mandarin Pinyin representation and Romanized Chinese name, the original Chinese script for the Romanized Chinese name is retrieved. A native script comparison is applied between the received Chinese name and the original Chinese script for the Romanized Chinese name to obtain a match score. The native script comparison includes character-by-character comparison, character variant look-up, and/or consideration of name component misalignments. The obtained match score is used as a filter to reduce false positives that are generated in the matching of the Mandarin Pinyin representation against the set of Romanized Chinese names. | 02-12-2015 |
20150051899 | CORRECTING N-GRAM PROBABILITIES BY PAGE VIEW INFORMATION - Methods and a system for calculating N-gram probabilities in a language model. A method includes counting N-grams in each page of a plurality of pages or in each document of a plurality of documents to obtain respective N-gram counts therefor. The method further includes applying weights to the respective N-gram counts based on at least one of view counts and rankings to obtain weighted respective N-gram counts. The view counts and the rankings are determined with respect to the plurality of pages or the plurality of documents. The method also includes merging the weighted respective N-gram counts to obtain merged weighted respective N-gram counts for the plurality of pages or the plurality of documents. The method additionally includes calculating a respective probability for each of the N-grams based on the merged weighted respective N-gram counts. | 02-19-2015 |
20150051900 | UNSUPERVISED LEARNING OF DEEP PATTERNS FOR SEMANTIC PARSING - Using exemplary sentences, usage patterns and thematic roles ascribed in VerbNet to generate “deep pattern trees” for the exemplary sentences. Then, when an arbitrary natural language subject sentence is input, these deep pattern trees can be matched to the natural language subject sentence in order to assign thematic roles to at least some of the “grammatical portions” of the natural language subject sentence. | 02-19-2015 |
20150051901 | METHODS AND DEVICES FOR PROVIDING PREDICTED WORDS FOR TEXTUAL INPUT - A computer-implemented method for use in an electronic device includes obtaining a set of candidate words without character-based input. A display of the electronic device displays a first virtual keyboard that presents a first subset of the candidate words and information identifying a plurality of categories associated with the candidate words. The first subset of the candidate words is associated with a first one of the categories. A first input indicative of at least one of a selection of a second one of the categories is received, and a second subset of candidate words is identified based on the received input. The display of the electronic device displays a second virtual keyboard presenting the second subset of the candidate words. | 02-19-2015 |
20150051902 | CORRECTING N-GRAM PROBABILITIES BY PAGE VIEW INFORMATION - Methods and a system for calculating N-gram probabilities in a language model. A method includes counting N-grams in each page of a plurality of pages or in each document of a plurality of documents to obtain respective N-gram counts therefor. The method further includes applying weights to the respective N-gram counts based on at least one of view counts and rankings to obtain weighted respective N-gram counts. The view counts and the rankings are determined with respect to the plurality of pages or the plurality of documents. The method also includes merging the weighted respective N-gram counts to obtain merged weighted respective N-gram counts for the plurality of pages or the plurality of documents. The method additionally includes calculating a respective probability for each of the N-grams based on the merged weighted respective N-gram counts. | 02-19-2015 |
20150051903 | INFORMATION PROCESSING DEVICE, STORAGE MEDIUM, AND METHOD - There is provided an information processing device including an input unit to which words are input, an analysis unit configured to analyze meanings of the respective words, an image generation unit configured to generate single images corresponding to the respective words, and a display control unit configured to control the images generated by the image generation unit to be displayed on a display unit. | 02-19-2015 |
20150057996 | TEXT PROCESSING APPARATUS AND TEXT DISPLAY SYSTEM - A text processing apparatus includes an environmental information acquisition unit configured to acquire environmental information, a text acquisition unit configured to acquire text, a word extraction unit configured to extract a word from the text, and a joint indication unit configured to convert the word extracted from the text into a converted word using a dictionary that is accessed according to the environmental information acquired by the environmental information acquisition unit and indicate the converted word along with the word extracted from the text. | 02-26-2015 |
20150057997 | CONCEPT SEARCH AND SEMANTIC ANNOTATION FOR MOBILE MESSAGING - A textual message processing system and method are described for use in a mobile environment. A user messaging application processes at least one user textual message during a user messaging session. A semantic annotation module identifies one or more semantically salient terms in the user textual message, and annotates the user textual message with annotation terms having a low semantic distance to the semantically salient terms. A user message history stores the annotated textual messages. The semantic annotation module may further annotate the user textual message with situational meta-data characterizing the user textual message. There may be a message search module for using one or more keywords to search the user message history including the annotation terms, and identifying as a search match any annotated textual messages within a semantic distance threshold of the one or more keywords. | 02-26-2015 |
20150066475 | Method For Detecting Plagiarism In Arabic - The present invention provides a method for detecting plagiarism in Arabic texts including any rewording, reordering of words and phrases, and any pronoun changes. Such detection is achieved by returning all the Arabic words in the text to its original root using a stemmer, then comparing all the sentences in the submitted document with every sentence in all original documents. In the method of the present invention, the user has the ability to choose the source of plagiarism, wherein such source comprises a database, a web, or a direct matching. | 03-05-2015 |
20150066476 | Methods and Systems of Four Valued Analogical Transformation Operators Used in Natural Language Processing and Other Applications - A system for the dynamic encoding in a semantic network of both syntactic and semantic information into a common four valued logical notation. The encoding of new information being benign to prior syntactic constructions, tests for N conditionals in time O(C) and allows for the proper quantification of variables at each recursive step. The query/inference engine constructed from such an implementation is able to optimize short term memory for maximizing long term storage in the automaton. In a parallel context this can be viewed as optimizing communication and memory allocation between processes. The self-referencing system is capable of analogically extending knowledge from one knowledge source to another linearly. Disclosed embodiments include machine translation, text summarization, natural language speech recognition natural language. | 03-05-2015 |
20150066477 | SYSTEM AND METHOD FOR PROCESSING NATURAL LANGUAGE - A method for processing natural language includes generating a first layer of a multi-layer knowledge network including a plurality of word nodes arranged to represent a word or an entity name, generating a second layer of the multi-layer knowledge network with a natural language dataset, the second layer including one or more instance nodes arranged to represent a word or an entity of the natural language dataset, each of the instance nodes being linked by one or more semantic or syntactic relations to form one or more sub-graphs, and, referencing the first layer of the multi-layer knowledge network with the second layer of the multi-layer knowledge network by establishing a reference between each of the word nodes and each of the instance nodes when the word or the entity name represented by each word node is associated with the word or the entity represented by the instance node. | 03-05-2015 |
20150066478 | SYNONYM RELATION DETERMINATION DEVICE, SYNONYM RELATION DETERMINATION METHOD, AND PROGRAM THEREOF - A synonym relation determination device comprises: a synonym expression candidate storage unit which associates and stores a synonym candidate (EW) with the synonym source (OW); a text gathering unit which associates and gathers text with an issuing time; a synonym candidate search unit which calculates from the issuing time of the text a time interval (PD) in which the synonym candidate is searched in a text set (TX); a synonym source search unit which searches for a synonym source from the text set of a period which overlaps with the time interval in which the synonym candidate is searched for and calculates an occurrence of the synonym source; and synonym relation extraction unit which, when the occurrence of the synonym source is present in the time interval in which the synonym candidate is searched for, extracts a synonym relation between the synonym candidate and the synonym source. | 03-05-2015 |
20150066479 | CONVERSATIONAL AGENT - A method, system, and computer program product provide a conversation agent to process natural language queries expressed by a user and perform commands according to the derived intention of the user. A natural language processing (NLP) engine derives intent using conditional random fields to identify a domain and at least one task embodied in the query. The NLP may further identify one or more subdomains, and one or more entities related to the identified command. A template system creates a data structure for information relevant to the derived intent and passes a template to a services manager for interfacing with one or more services capable of accomplishing the task. A dialogue manager may elicit more entities from the user if required by the services manager and otherwise engage in conversation with the user. In one embodiment, the conversational agent allows a user to engage in multiple conversations simultaneously. | 03-05-2015 |
20150066480 | NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM FOR STORING ACRONYM-MANAGEMENT PROGRAM, ACRONYM-MANAGEMENT DEVICE, NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM FOR STORING EXPANDED-DISPLAY PROGRAM, AND EXPANDED-DISPLAY DEVICE - An acronym-management program causes a computer to extracts an acronym from a list-updating reference for updating an acronym list; extracts, from the list-updating reference, a candidate for an expanded form corresponding to the extracted acronym; and, when the likelihood of the extracted acronym being used in the list-updating reference to refer to the extracted expanded-form candidate reaches or exceeds a specific level, increase, in the acronym list, the frequency corresponding to the acronym and the expanded form. | 03-05-2015 |
20150066481 | SYSTEM AND METHOD FOR PERFORMING AUTOMATIC AUDIO PRODUCTION USING SEMANTIC DATA - There is described a computer implemented method for performing automatic audio production, comprising: receiving an audio signal to be processed; receiving semantic information; determining at least one semantic-based rule using the received semantic information, the semantic-based rule comprising production data that defines how the audio signal to be processed should be produced; processing the audio signal to be processed using the production data, thereby obtaining a produced audio signal; outputting the produced audio signal. | 03-05-2015 |
20150066482 | SYTEM AND METHOD FOR USE OF SEMANTIC UNDERSTANDING IN STORAGE, SEARCHING, AND PROVIDING OF DATA OR OTHER CONTENT INFORMATION - A system and method for using semantic understanding in storing and searching data and other information. A linearized tuple-based version of a conceptual graph can be created from a user input. A plurality of conceptual graphs, or portions thereof, can be compared to determine matches. An associative database can be created and/or searched using a hierarchy of conceptual graphs in tuple format, so that the data storage and searching of such database is optimized. The associative database can be used to integrate data from multiple different sources; form part of an Internet or other search engine; or used in other implementations. Also disclosed herein is a system and method for use of semantic understanding in searching and providing of content is described herein. In accordance with an embodiment, the system comprises a Syntactic Parser (SP) or statistical word tokenizer for data retrieval and parsing; a Syntax To Semantics (STS) transformational algebra-based semantic rule set, and an Associative Database (ADB) of linearized tuple conceptual graphs (TCG), utilizing a conceptual graph formalism. Data can be represented within the ADB, enabling both fast data retrieval in the form of semantic objects and a broad ranging taxonomy of content. | 03-05-2015 |
20150066483 | AUTOMATED EXTRACTION OF BIO-ENTITY RELATIONSHIPS FROM LITERATURE - Automated, standardized and accurate extraction of relationships within text. Automatic extraction of such relationships/information allows the information to be stored in structured form so that it can be easily and accurately retrieved when needed. Such information can be used to build online search engines for highly specific and accurate information retrieval. The current invention discloses a novel approach to extract such information from raw text based on natural language processing (NLP) and graph theoretic algorithm. The novel method can be applied, for example, to extract protein-protein relationships in biomedical literature. The method can be easily extended to extract other biological relationships between biological terms such as proteins, genes, pathways, diseases and drugs. The method can also be applied to other information domains to extract other relationships. | 03-05-2015 |
20150066484 | SYSTEMS AND METHODS FOR AN AUTONOMOUS AVATAR DRIVER - The autonomous avatar driver is useful in association with language sources. A sourcer may receive dialog from the language source. It may also, in some embodiments, receive external data from data sources. A segmentor may convert characters, represent particles and split dialog. A parser may then apply a link grammar, analyze grammatical mood, tag the dialog and prune dialog variants. A semantic engine may lookup token frames, generate semantic lexicons and semantic networks, and resolve ambiguous co-references. An analytics engine may filter common words from dialog, analyze N-grams, count lemmatized words, and analyze nodes. A pragmatics analyzer may resolve slang, generate knowledge templates, group proper nouns and estimate affect of dialog. A recommender may generate tag clouds, cluster the language sources into neighborhoods, recommend social networking to individuals and businesses, and generate contextual advertising. Lastly, a response generator may generate responses for the autonomous avatar using the analyzed dialog. The response generator may also incorporate the generated recommendations. | 03-05-2015 |
20150073773 | DEFECT RECORD CLASSIFICATION - An approach to classify different defect records by mapping plain language phrases to a taxonomy. The approach includes a method that includes receiving, by at least one computing device, a defect record associated with a defect. The method further includes receiving, by the least one computing device, a plain language phrase or word. The method further includes mapping, by the least one computing device, the plain language phrase or word to a taxonomy. The method further includes classifying, by the least one computing device, how the defect was at least one of detected and resolved using the taxonomy. | 03-12-2015 |
20150073774 | Automatic Domain Sentiment Expansion - Methods and systems for automatically extending a sentiment dictionary are provided. Starting with an initial set of elements (e.g., words, emoticons, etc.) having a known sentiment, messages can be analyzed for words frequently appearing in association with such words. As a result the frequently appearing words may then be associated with a sentiment and used to help determine the sentiment of a message. | 03-12-2015 |
20150073775 | UNSPOKEN SENTIMENT - The sentiment of a message may not be obtainable from the message itself. However, many messages have an associated context that provides information useful in determining the sentiment of a message. Messages may include links to other resources, such as graphics or videos, which in turn include titles, comments, viewer ratings or other attributes that may provide a sentiment of the message. | 03-12-2015 |
20150073776 | CHECKING DOCUMENTS FOR SPELLING AND/OR GRAMMATICAL ERRORS AND/OR PROVIDING RECOMMENDED WORDS OR PHRASES BASED ON PATTERNS OF COLLOQUIALISMS USED AMONG USERS IN A SOCIAL NETWORK - A method, system and computer program product for checking documents using colloquialisms. Colloquialisms used in messages by users in a social network are tracked. The relationships (e.g., co-worker) between the senders and recipients of these messages are identified. A social graph is then generated to depict the relations between the users in the social network based on these identified relationships. Furthermore, usage patterns of colloquialisms (e.g., a particular colloquialism is used only with close friends as opposed to co-workers) are formulated. A rule set is generated using the social graph and formulated usage patterns. By using the rule set to check documents, documents may be more accurately checked for spelling and/or grammatical errors by taking into consideration the appropriate usage of colloquialisms based on the context (e.g., communicating with a friend). Furthermore, alternative words or phrases may be appropriately recommended based on the context using such a rule set. | 03-12-2015 |
20150073777 | SYSTEM AND METHOD FOR DETERMINING SEMANTICS AND THE PROBABLE MEANING OF WORDS - Provided are a system and method to determine semantics and the probable meaning and/or context of words. The method includes for at least one First Entity, gathering Metadata from at least one posting by a First User on a First Social Network to define at least one First Field associated with the First Entity, provided by the at least one First User and occurring in the at least one posting. Each First Field associated with the First Entity has an initial system generated value. The method continues by evaluating Responses to the posting by at least one Third Party, and in response to the Third Party using one or more of the First Fields associated with the First Entity in the Response, incrementing the value of each used First Field associated with the First Entity by the addition of a system generated value. The method provides an indication of relevance for each First Field in relation to at least one Second Field associated with each First Entity, the indication of relevance permitting a determination of semantics for each associated Field of the First Entity. An associated system is also provided. | 03-12-2015 |
20150073778 | TECHNIQUES FOR AUTOMATICALLY GENERATING TEST DATA - Techniques for automatically generating test data solve various problems in test data generation. A technique of automatically generating test data includes receiving a signature to be embedded in at least one character string to be generated and determining a total sum of attribute values intrinsic to characters in the character string. The sum is associated with each element of the signature. At least one of the characters in the character string may be selected from a character table describing characters prepared to create the test data so as to achieve the determined total sum for each element of the signature. The generated test data contains the character string including the selected character. | 03-12-2015 |
20150073779 | METHOD OF CONVERTING USER HANDWRITING INTO TEXT INFORMATION AND ELECTRONIC DEVICE FOR PERFORMING THE SAME - A method, computer-readable storage medium, and device for converting user handwriting into text information by reflecting a paragraph form of the user handwriting in an electronic device is provided. The method includes receiving a handwriting input from a user, recognizing the received handwriting input and converting the recognized handwriting input into text information, recognizing a paragraph form of the received handwriting input, and applying the recognized paragraph form to the converted text information. | 03-12-2015 |
20150081275 | COMPRESSING DATA FOR NATURAL LANGUAGE PROCESSING - Data pertaining to a subject matter domain, a set of text strings forming a set of seeds, a description of a linguistic structure present in a language of the domain-related data, and a statistical model applicable to the domain-related data are received. A set of portions of the domain-related data is extracted, a portion in the set of portions forming a nugget. A nugget matches the statistical model according to a criterion, and conforms to the linguistic structure within a threshold degree. The nugget is scored according to a subset of a set of features found in the nuggets. A subset of nuggets is selected. A score of each nugget included in the subset of nuggets exceeds a score threshold. The subset of nuggets is combined to form a pseudo-document. The pseudo-document is submitted to an application for answering a question related to the domain. | 03-19-2015 |
20150081276 | USING NATURAL LANGUAGE PROCESSING (NLP) TO CREATE SUBJECT MATTER SYNONYMS FROM DEFINITIONS - Methods, apparatus and systems, including computer program products, for creating subject matter synonyms from definitions extracted from a subject matter glossary. Confidence scores, each representing a likelihood that two terms defined in the subject matter glossary are synonyms, are determined by applying natural language processing (e.g., passage term matching, lexical matching, and syntactic matching) to the extracted definitions. A subject matter thesaurus is built based on the confidence scores. In one embodiment, a statement containing a first term is created based on an extracted definition of the first term, a modified statement is created by substituting a second term in the statement in lieu of the first term, a corpus is searched, and a confidence score is determined based on evidence in the corpus that the modified statement is accurate. The first and second terms are marked as synonyms if the confidence score is greater than a threshold. | 03-19-2015 |
20150081277 | System and Method for Automatically Classifying Text using Discourse Analysis - The present invention is a textual discourse analysis with the purpose of analyzing and visualizing of complex text. The invention operates and functions based on conceptual relations, both logical and axiological, among grammatical components of a sentence and across sentences of a given text. Thus, three basic grammatical units, namely Agent/s, Topic/s and Object/s, have been utilized, in order to build a tripartite structure. Discursive analysis of text based on this invention provides a novel approach for automatically classifying positions of Agent/s within particular textual databases vis-a-vis to Topic/s and Object/s, and vice versa. Therefore, as illustrated above, a computer program method of the present invention starts by creating a conceptual map of a given text, classifying semantic macro-areas, positions of Agents, Topics and objects and then correlates such positions with other components in the database. In the next step of the invention, the computer assigns a reference system, provided for analyzing denotative content of discourse. The system is based upon a database of terms of words and phrases and their associated denotative as well as connotative meanings followed by generation of a database, axiologically categorizing subject-matters. | 03-19-2015 |
20150081278 | ELECTRONIC DEVICE, CHARACTER CONVERSION METHOD, AND STORAGE MEDIUM - An electronic device includes a processor configured to execute: accepting a character input; causing a display module to display a character or a character string, which has been input by the character input; causing the display module to display a plurality of words which are conversion candidates corresponding to the input character or character string; causing the display module to display, responding to designation by a user of a word of the plurality of words which are conversion candidates, at least one synonym corresponding to the designated word; and causing the display module to display, responding to selection by the user of a synonym of the displayed at least one synonym, the selected synonym in place of the input character or character string. | 03-19-2015 |
20150081279 | HYBRID NATURAL LANGUAGE PROCESSOR - Methods and a natural language processor for processing a natural language query are provided. The processor includes a classifier, a rule-based pre-processor, a rule-based post-processor, a named entity recognizer, and an output module. The method involves receiving a text representation of the natural language query, pre-processing the text representation, applying a classification statistical model to the text representation when pre-processing fails, applying a post-processing rule, and performing name entity recognition. | 03-19-2015 |
20150081280 | Processing Text with Domain-Specific Spreading Activation Methods - A method for performing natural language processing of free text using domain-specific spreading activation. Embodiments of the present invention ontologize free text using an algorithm based on neurocognitive theory by simulating human recognition, semantic, and episodic memory approaches. Embodiments of the invention may be used to process clinical text for assignment of billing codes, analyze suicide notes or legal discovery materials, and for processing other collections of text. Further, embodiments of the invention may be used to more effectively search large databases, such as a database containing a large number of medical publications. | 03-19-2015 |
20150088489 | SYSTEMS AND METHODS FOR PROVIDING MAN-MACHINE COMMUNICATIONS WITH ETIQUETTE - Systems and methods are disclosed for performing man-machine interaction with a user by capturing audible signals and video signals from an environment; detecting a communication context from the audio and video signals; looking up the context in an etiquette database; communicating without disrupting the user and if not possible, determining an appropriate time and fashion to interrupt the user; and communicating with the user at the appropriate time in the appropriate fashion. | 03-26-2015 |
20150088490 | SYSTEM AND METHOD FOR CONTEXT BASED KNOWLEDGE RETRIEVAL - A system and method are presented for context based knowledge retrieval. In one embodiment, such retrieval pertains to pattern recognition for data related to interactions between users, configuration and organization of systems data in an enterprise, and the quality of calculations of communication interactions. In one embodiment, a communication may be analyzed and a user interface created of potential sources of information related to the communication. This information may be from an internal source, such as a knowledge base, or an external source, such as an internet connected information source, for example. The user interface may comprise entities with associated links and actions, which may be configured based on a user's preference. | 03-26-2015 |
20150088491 | KEYWORD EXTRACTION APPARATUS AND METHOD - According to one embodiment, a keyword extraction apparatus includes a separation unit, a generation unit, a calculation unit, a first update unit, a second update unit. The separation unit separates a first annotation from each of a plurality of documents. The generation unit generates one or more document clusters by calculating a score of keywords and performing clustering on documents having a correlation value higher than a threshold. The calculation unit calculates a characteristic quantity in accordance with a type of a second annotation. The first update unit updates the score of the keyword to which the second annotation is added, based on the characteristic quantity. The second update unit updates the one or more document cluster in accordance with the updated score to obtain an updated document cluster. | 03-26-2015 |
20150088492 | AUTOMATICALLY CREATING A HIERARCHICAL STORYLINE FROM MOBILE DEVICE DATA - Embodiments create and label contextual slices from observation data and aggregate slices into a hierarchical storyline for a user. A context is a (possibly partial) specification of what a user was doing in the dimensions of time, place, and activity. A storyline is composed of a time-ordered sequence of contexts that partition a given span of time that are arranged in groups at one or more hierarchical levels. A storyline is created through a process of data collection, slicing, labeling, and aggregating. Raw context data can be collected from a variety of observation sources with various error characteristics. Slicing refines the raw context data into a consistent storyline composed of a sequence of contexts representing homogeneous time intervals. Labeling adds more specific and semantically meaningful data (e.g., geography, venue, activity) to the slices. Aggregation identifies groups of slices that correspond to a single semantic concept. | 03-26-2015 |
20150095013 | Extending Concept Labels of an Ontology - Concept labels of an ontology are extended. An ontology includes concepts at least partially authored in a source language; a corpus at least partially authored in a target language is provided; said corpus is processed by a linguistical analysis and receiving a list of first terms as a result of said linguistical analysis, said first terms in said list ordered by a linguistical relevancy; within said list, at least one of said first terms is associated with an associated second term, said second term being a translation of said first term into said source language; a retrieval is conducted by using at least one of said second terms for identifying a matching concept within said ontology; and a concept label of said matching concept is extended by a first term associated to said second term of a matching retrieval. | 04-02-2015 |
20150095014 | METHOD AND APPARATUS FOR PROVIDING CONTENT SCREENING AND RATING - An approach is provided for content screening and rating. A content rating platform processes content directed to or originating from a user to determine one or more elements in the content. The content rating platform calculates a content rating for the content by comparing the one or more elements against at least one user-configurable dictionary, and then selects an operation for processing the content based on the content rating. | 04-02-2015 |
20150095015 | Method and System for Presenting Statistical Data in a Natural Language Format - A computer-implemented method for presenting statistical analysis in a natural language textual output comprising: receiving data to be analyzed by the processor; processing the data according to at least one of a plurality of pre-established statistical analysis types, thereby providing processed data; interpreting the processed data by converting the processed data to a pre-determined natural language text, thereby providing interpreted data; and generating a natural language textual output for the interpreted data according to at least one pre-established rule for converting the interpreted data to a natural language textual output. | 04-02-2015 |
20150095016 | ONTOLOGICALLY DRIVEN PROCEDURE CODING - Computer implemented systems and methods of processing clinical documentation for a multi-axial coding scheme include inputting clinical documentation from memory operatively coupled with a computer system, and executing a natural language processor configured to process narrative text in the clinical documentation. The processor segments the narrative text based on boundaries defined in the clinical documentation, sequences words in the narrative text based on the segmentation, and maps the sequenced words to semantic objects in an ontology database. The ontology defines classes of semantic objects and relationships between them, corresponding to the multi-axial coding scheme. The semantic objects are converted into characters and output into slots in a medical code, with the characters positioned in the slots based on the multi-axial coding scheme. | 04-02-2015 |
20150095017 | SYSTEM AND METHOD FOR LEARNING WORD EMBEDDINGS USING NEURAL LANGUAGE MODELS - A system and method are provided for learning natural language word associations using a neural network architecture. A word dictionary comprises words identified from training data consisting a plurality of sequences of associated words. A neural language model is trained using data samples selected from the training data defining positive examples of word associations, and a statistically small number of negative samples defining negative examples of word associations that are generated from each selected data sample. A system and method of predicting a word association is also provided, using a word association matrix including data defining representations of words in a word dictionary derived from a trained neural language model, whereby a word association query is resolved without applying a word position-dependent weighting. | 04-02-2015 |
20150095018 | Methods and Systems for Automated Generation of Nativized Multi-Lingual Lexicons - An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. The computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language. | 04-02-2015 |
20150095019 | FRAUD DETECTION USING TEXT ANALYSIS - In one embodiment, a method executed by at least one processor includes receiving text from submitted by a user. The method also includes determining a text score for the received text by comparing a first set of phrases included in the received text to a second set of phrases. The second set of phrases includes phrases from stored text. The stored text includes stored text known to be genuine and stored text known to be fraudulent. The method also includes determining that the received text is fraudulent based on the text score. | 04-02-2015 |
20150095020 | Systems and Methods for Identifying and Suggesting Emoticons - Various embodiments provide a method that comprises receiving a set of segments from a text field, analyzing the set of segments to determine at least one of a target subtext or a target meaning associated with the set of segments, and identifying a set of candidate emoticons where each candidate emoticon in the set of candidate emoticons has an association between the candidate emoticon and at least one of the target subtext or the target meaning. The method may further comprise presenting the set of candidate emoticons for entry selection at a current position of an input cursor, receiving an entry selection for a set of selected emoticons from the set of candidate emoticons, and inserting the set of selected emoticons into the text field at the current position of the input cursor. | 04-02-2015 |
20150095021 | MACHINE-BASED CONTENT ANALYSIS AND USER PERCEPTION TRACKING OF MICROCONTENT MESSAGES - A system and a method for microcontent natural language processing are presented. The method comprising steps of receiving a microcontent message from a social networking server, tokenizing the microcontent message into one or more text tokens, performing a topic extraction on the microcontent message to extract topic metadata, generating sentiment metadata for the microcontent message, analyzing co-occurrence of all available metadatas in the plurality of microcontent messages, producing a list that ranks the plurality of microcontent messages based on all available topic metadata, and compiling a trend database that reveals how perception of users of the social networking server on a given topic changes by tracking how the list changes over time. | 04-02-2015 |
20150100302 | SYSTEM AND METHOD FOR ANALYZING AND CLASSIFYING CALLS WITHOUT TRANSCRIPTION VIA KEYWORD SPOTTING - A facility and method for analyzing and classifying calls without transcription via keyword spotting is disclosed. The facility uses a group of calls having known outcomes to generate one or more domain- or entity-specific grammars containing keywords and related information that are indicative of particular outcome. The facility monitors telephone calls by determining the domain or entity associated with the call, loading the appropriate grammar or grammars associated with the determined domain or entity, and tracking keywords contained in the loaded grammar or grammars that are spoken during the monitored call, along with additional information. The facility performs a statistical analysis on the tracked keywords and additional information to determine a classification for the monitored telephone call. | 04-09-2015 |
20150100303 | ONLINE CLASSROOM ANALYTICS SYSTEM AND METHODS - The methods, apparatus, and systems described herein facilitate decision-making by providing predictions of student outcomes and behaviors. The methods include receiving a communication posted by a student, identifying keywords in text of the communication associated with one or more student metrics, scoring the communication for at least one student metric, and predicting a likelihood of a student outcome based on the score. | 04-09-2015 |
20150100304 | INCREMENTAL COMPUTATION OF REPEATS - A method of updating a suffix tree includes providing an initial suffix tree based on a first sequence of symbols drawn from an alphabet. The suffix tree includes existing nodes representing respective subsequences occurring in the first sequence of symbols. The existing nodes are associated with information relating to membership of the subsequences in at least one class of repeat subsequences. A second sequence of symbols is received and the initial suffix tree is updated to form an updated suffix tree by adding new nodes representing subsequences occurring in the second sequence of symbols that are not represented by the existing nodes. The subsequences represented by the new nodes are ordered in a new node data structure which is processed to updating the information relating to the at least one class of repeat subsequences associated with at least some of the nodes in the updated suffix tree. | 04-09-2015 |
20150100305 | COMPUTER PROCESSES FOR ANALYZING AND SUGGESTING IMPROVEMENTS FOR TEXT READABILITY - Computer-based processes are disclosed for analyzing and improving document readability. Document readability is improved by using rules and associated logic to automatically detect various types of writing problems and to make and/or suggest edits for eliminating such problems. Many of the rules seek to generate more concise formulations of the analyzed sentences, such as by eliminating unnecessary words, rearranging words and phrases, and making various other types of edits. Proposed edits can be conveyed, e.g., through a word processing platform, by changing the visual appearance of text to indicate how the text would appear with (or with and without) the edit. | 04-09-2015 |
20150100306 | DETECTING DANGEROUS EXPRESSIONS BASED ON A THEME - Embodiments relate to a dangerous expression based on a particular theme. An aspect includes acquiring, by an electronic apparatus, from text data for learning, a subset of the text data associated with the particular theme and with particular time period information. Another aspect includes extracting text data containing negative information from the acquired subset of the text data. Another aspect includes extracting a word or phrase having a high correlation with the extracted text data or a word or phrase having a high appearance frequency in the extracted text data from the extracted text data. Yet another aspect includes determining that the extracted word or phrase is the dangerous expression based on the particular theme. | 04-09-2015 |
20150100307 | TEXT SEGMENTATION WITH MULTIPLE GRANULARITY LEVELS - Text processing includes: segmenting received text based on a lexicon of smallest semantic units to obtain medium-grained segmentation results; merging the medium-grained segmentation results to obtain coarse-grained segmentation results, the coarse-grained segmentation results having coarser granularity than the medium-grained segmentation results; looking up in the lexicon of smallest semantic units respective search elements that correspond to segments in the medium-grained segmentation results; and forming fine-grained segmentation results based on the respective search elements, the fine-grained segmentation results having finer granularity than the medium-grained segmentation results. | 04-09-2015 |
20150106078 | CONTEXTUAL ANALYSIS ENGINE - A contextual analysis engine systematically extracts, analyzes and organizes digital content stored in an electronic file such as a webpage. Content can be extracted using a text extraction module which is capable of separating the content which is to be analyzed from less meaningful content such as format specifications and programming scripts. The resulting unstructured corpus of plain text can then be passed to a text analytics module capable of generating a structured categorization of topics included within the content. This structured categorization can be organized based on a content topic ontology which may have been previously defined or which may be developed in real-time. The systems disclosed herein optionally include an input/output interface capable of managing workflows of the text extraction module and the text analytics module, administering a cache of previously generated results, and interfacing with other applications that leverage the disclosed contextual analysis services. | 04-16-2015 |
20150106079 | ONTOLOGY-DRIVEN ANNOTATION CONFIDENCE LEVELS FOR NATURAL LANGUAGE PROCESSING - An approach for determining a combination of terms that represents subject matter of a natural language sentence is provided. Numbers of words from a beginning of the sentence to terms in the sentence that match terms in the combination of terms are determined. The sentence is divided into natural language phrases including a complex phrase and first and second simple phrases extracted from the complex phrase. Based in part on (a) the numbers of words from the beginning of the sentence to the terms in the sentence that match terms in the combination of terms, (b) whether all terms of the combination are contained in the first and/or second simple phrases, and (c) whether all terms of the combination are contained in the complex phrase but not contained in the first and/or second simple phrases, how well the combination of terms represents the subject matter is determined. | 04-16-2015 |
20150106080 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM - An information processing apparatus includes a receiving unit, a determining unit, a first assigning unit, an extracting unit, a second assigning unit, a modeling unit, and an output unit. The receiving unit receives a target character string. The determining unit determines whether a sentiment character string is included in the received character string. A first assigning unit assigns a label corresponding to a sentiment character string to the character string when the sentiment character string is included, and assigns plural labels to the character string when no sentiment character string is included. The extracting unit extracts a word from the character string. The second assigning unit assigns to the extracted word a label which has been assigned to the character string that includes the word. The modeling unit performs supervised topic modeling for the character string. The output unit outputs a result of a process by the modeling unit. | 04-16-2015 |
20150106081 | COMPUTER-BASED ANALYSIS OF VIRTUAL DISCUSSIONS FOR PRODUCTS AND SERVICES - A method for analyzing a virtual discussion may include identifying, with a processing device, a first concept relevant to a first subdiscussion associated with an online discussion, identifying a second concept relevant to the first subdiscussion, and determining a relation between the first concept and the second concept. | 04-16-2015 |
20150112664 | SYSTEM AND METHOD FOR GENERATING A TRACTABLE SEMANTIC NETWORK FOR A CONCEPT - Computer implemented natural language processing systems and methods for generating a semantic network for a specific concept of interest. The method includes identifying co-reference relationships between sentences or clusters of a corpus of documents so as to determine one or more clusters of co-referential sentences. One or more concepts or events are determined from the clauses or sentences of the clusters and relationship identification rules are processed to determine relationships between concepts or events identified in the clusters. Subsequently, the semantic network of the determined relationships is generated. | 04-23-2015 |
20150112665 | SHORT MESSAGE PROCESSING METHOD AND APPARATUS - A short message processing method and apparatus, which analyzes a short message received from a mobile communication network and provides via a packet data service node (PDSN) a supplementary service such as a credit card settlement details notifying service, a contact point registration service, a spam filtering service, a schedule registration service, a message history management service, and so forth, based on the result of the analysis. The short message processing method and apparatus can execute a supplementary service corresponding to the short message received through a PDSN, in corporation with a platform such as WIPI or BREW. | 04-23-2015 |
20150120281 | AUTOMATIC SENTENCE PUNCTUATION - An aspect provides a method, including: receiving, at an information handling device input component, user input comprising a sentence; identifying, using a processor, the sentence; determining, using the processor, correct punctuation for the sentence identified; determining, using the processor, a confidence level for the correct punctuation determined; and responsive to the confidence level exceeding a predetermined threshold, automatically modifying, using the processor, the sentence based on the correct punctuation determined. Other embodiments are described and claimed. | 04-30-2015 |
20150120282 | PRESERVING EMOTION OF USER INPUT - An aspect provides a method, identifying: receiving, at an input component of an information handling device, user input comprising one or more words; identifying, using a processor of the information handling device, an emotion associated with the one or more words; creating, using the processor, an emotion tag including the emotion associated with the one or more words; and storing the emotion tag in a memory. Other embodiments are described and claimed. | 04-30-2015 |
20150120283 | COMPUTER-IMPLEMENTED SYSTEMS AND METHODS FOR MOOD STATE DETERMINATION - Computer-implemented systems and methods are provided for determining an overall mood score of a document. For example, the document is received from a computer-readable medium. A text segment in a document is identified to be indicative of a mood of the document. The text segment is mapped to a mood scale among a predetermined set of mood scales. A mood weight associated with the mood scale for the text segment is generated. An overall mood score of the document is determined based at least in part on the mood weight. | 04-30-2015 |
20150127323 | REFINING INFERENCE RULES WITH TEMPORAL EVENT CLUSTERING - A method for computing similarity between paths includes extracting corpus statistics for triples from a corpus of text documents, each triple comprising a predicate and respective first and second arguments of the predicate. Documents in the corpus are clustered to form a set of clusters based on textual similarity and temporal similarity. An event-based path similarity is computed between first and second paths, the first path comprising a first predicate and first and second argument slots, the second path comprising a second predicate and first and second argument slots, the event-based path similarity being computed as a function of a corpus statistics-based similarity score which is a function of the corpus statistics for the extracted triples which are instances of the first and second paths, and a cluster-based similarity score which is a function of occurrences of the first and second predicates in the clusters. | 05-07-2015 |
20150127324 | NATURAL LANGUAGE PARSERS TO NORMALIZE ADDRESSES FOR GEOCODING - The present invention provides a technique for building natural language parsers by implementing a country and/or jurisdiction specific set of training data that is automatically converted during a build phase to a respective predictive model, i.e., an automated country specific natural language parser. The predictive model can be used without the training data to quantify any input address. This model may be included as part of a larger Geographic Information System (GIS) data-set or as a stand alone quantifier. The build phase may also be run on demand and the resultant predictive model kept in temporary storage for immediate use. | 05-07-2015 |
20150127325 | METHODS AND SYSTEMS FOR NATURAL LANGUAGE COMPOSITION CORRECTION - The present disclosure relates to methods and systems for improving the probability of detection of grammatical errors. In one aspect, a method for improving probability of detection of grammatical errors is based on one or more linguistic algorithms that relies on demographic information of the writer. Examples of types of demographic information that may be used to improve the probability of detection of grammatical errors includes a native language of the speaker, a country of origin of the writer, the writer's age, gender, amongst others. In another aspect, methods and systems for evaluating a user's level of competence in a natural language are provided. According to yet another aspect, methods and systems for detecting grammatical errors using a set of error detection rules are provided. | 05-07-2015 |
20150134324 | Systems, Computer-Program Products and Methods for Annotating Multiple Controlled Vocabulary-Defined Concepts In Single Noun Phrases - Systems, computer-program products and methods for annotating electronic text documents with multiple entities defined in a controlled vocabulary extracted from a compound noun phrase are disclosed. In one embodiment, a method of annotating an electronic text document includes searching, by a computing device, the electronic text document for instances of congruent compound noun phrases including a head and a modifier. If a congruent compound noun phrase is found, the method further includes determining a preceding word that precedes the modifier of the congruent compound noun phrase, and searching a controlled vocabulary for a second full term having the preceding word and the head of the congruent compound noun phrase. If the second full term is found in the controlled vocabulary, the method further includes annotating the electronic text document with the second full term having the preceding word and the head of the congruent compound noun phrase. | 05-14-2015 |
20150134325 | Deep Language Attribute Analysis - Contact centers may benefit from routing messages to agents who have similar, or complementary, attributes as the customer of the message. In a text message, certain message attributes provide artifacts that may be common to one particular customer attribute. Messages containing that particular message attribute provide a derived customer attribute and the message routed accordingly. In addition, agents responding to a customer may be provided with guidance to ensure their response is appropriate for the derived customer attribute of the customer. | 05-14-2015 |
20150134326 | MECHANISM FOR SYNCHRONISING DEVICES, SYSTEM AND METHOD - There is provided a mechanism for synchronising a plurality of dynamic language models residing in a plurality of devices associated with a single user, each device comprising a dynamic language model. The mechanism is configured to: receive text data representing text that has been input by a user into one or more of the plurality of devices; train at least one language model on the text data; and provide the at least one language model for synchronising the devices. There is also provided a system comprising the mechanism and a plurality of devices, and a method for synchronising a plurality of dynamic language models residing in a plurality of devices associated with a single user. | 05-14-2015 |
20150142418 | Error Correction in Tables Using a Question and Answer System - Mechanisms are provided for performing tabular data correction in a document. The mechanisms receive a natural language document comprising a portion of content and analyze the portion of content within the natural language document to identify an erroneous sub-portion comprising an erroneous or missing item of information. The mechanisms generate a semantic signature for the erroneous sub-portion and generate a query based on the semantic signature. The mechanisms apply the query to a knowledge base to identify a candidate sub-portion of content. The mechanisms correct the erroneous sub-portion using the identified candidate sub-portion of content to generate a corrected natural language document. | 05-21-2015 |
20150142419 | CONTEXTUAL VALIDATION OF SYNONYMS IN OTOLOGY DRIVEN NATURAL LANGUAGE PROCESSING - Embodiments described herein provide approaches for validating synonyms in ontology driven natural language processing. Specifically, an approach is provided for receiving a user input containing a token, structuring the user input into a semantic model comprising a set of classes each containing a set of related permutations of the token, designating the token as a synonym of one of the set of related permutations, annotating the token with a class from the set of classes corresponding to the one of the set of related permutations, and validating the annotation of the token by determining an accuracy of the designation of the token as a synonym of the one of the set of related permutations. In one embodiment, the accuracy is determined by quantifying a linear distance between the token and a contextual token also within the user input, and comparing the linear distance to a pre-specified linear distance limit. | 05-21-2015 |
20150142420 | DIALOGUE EVALUATION VIA MULTIPLE HYPOTHESIS RANKING - In language evaluation systems, user expressions are often evaluated by speech recognizers and language parsers, and among several possible translations, a highest-probability translation is selected and added to a dialogue sequence. However, such systems may exhibit inadequacies by discarding alternative translations that may initially exhibit a lower probability, but that may have a higher probability when evaluated in the full context of the dialogue, including subsequent expressions. Presented herein are techniques for communicating with a user by formulating a dialogue hypothesis set identifying hypothesis probabilities for a set of dialogue hypotheses, using generative and/or discriminative models, and repeatedly re-ranks the dialogue hypotheses based on subsequent expressions. Additionally, knowledge sources may inform a model-based with a pre-knowledge fetch that facilitates pruning of the hypothesis search space at an early stage, thereby enhancing the accuracy of language parsing while also reducing the latency of the expression evaluation and economizing computing resources. | 05-21-2015 |
20150142421 | PROVIDING ASSISTANCE WITH REPORTING - A system for maintaining corresponding information in a structured document and in a report is disclosed. The structured document comprises structured data elements and the report comprises text in a natural language. An associating unit ( | 05-21-2015 |
20150142422 | AUTOMATIC CONTEXT SENSITIVE LANGUAGE CORRECTION AND ENHANCEMENT USING AN INTERNET CORPUS - A computer-assisted language correction system including spelling correction functionality, misused word correction functionality, grammar correction functionality and vocabulary enhancement functionality utilizing contextual feature-sequence functionality employing an internet corpus. | 05-21-2015 |
20150142423 | PHRASE-BASED DATA CLASSIFICATION SYSTEM - A method of classifying data is disclosed. Text data items are received. A set of classes into which the text data items are to be classified is received. A phrase-based classifier to classify the text data items into the set of classes is selected. The phrase-based classifier is applied to classify the text data items into the classes. Here, the applying includes creating a controlled vocabulary pertaining to classifying the text data items into the set of classes, building phrases based on the text data items and the controlled vocabulary, and classifying the text data items into the set of classes based on the phrases. | 05-21-2015 |
20150149151 | PROCEDURE FOR BUILDING A MAX-ARPA TABLE IN ORDER TO COMPUTE OPTIMISTIC BACK-OFFS IN A LANGUAGE MODEL - Each entry of an ARPA table for a modeled language includes an n-gram Az, an associated backoff value Az.p equal to the conditional probability p(z|A) that symbol z follows context A in the modeled language, and an associated backoff weight value Az.b for the context A. A method comprises: (1) computing and adding for each entry of the ARPA table in descending n-gram order an associated maximum backoff weight product value Az.m; (2) after performing operation (1), computing and adding for each entry of the ARPA table in descending n-gram order an associated max-backoff value Az.w=max | 05-28-2015 |
20150149152 | METHOD FOR COMBINING A QUERY AND A COMMUNICATION COMMAND IN A NATURAL LANGUAGE COMPUTER SYSTEM - A method for processing a natural language input to a computerized system. The method parses the input to identify a query portion and a communication portion of the input. The system then determines an answer to the query portion, including identifying communication parameters from the communication portion. Upon determining the answer, the system prepares an answer to the communication and transmits that answer. If the answer requires information from a remote source, the system creates a subsidiary query to obtain that information and then submits the subsidiary query to the remote source. A response to the query is used to compose the answer to the query from the answer to the subsidiary query. If the system concludes that the query portion does not require information from a remote source, analyzing and answering the query locally. | 05-28-2015 |
20150149153 | SYSTEMS AND METHODS FOR IDENTIFYING AND RECORDING THE SENTIMENT OF A MESSAGE, POSTING, OR OTHER ONLINE COMMUNICATION USING AN EXPLICIT SENTIMENT IDENTIFIER - In some embodiments, a user expresses his/her sentiment in a message, blog post, social media post, or other online communication and explicitly identifies that sentiment with a symbol (such as an asterisk). This explicitly identified sentiment is recorded in a database of individual and public opinions. | 05-28-2015 |
20150149154 | SEMANTIC PHRASE SUGGESTION ENGINE - A semantic phrase suggestion engine that provides term and sentence suggestions based on context-specific user groups. Knowledge domains within a semantic network may be automatically derived from user software applications, and each term within the knowledge domain includes meta-data about the terms, e.g., term type and an importance indicator. The indicators may be defined within the context of specific user groups and relate to how many times that group has used the term (e.g., in documents, emails, etc.) The semantic phrase suggestion engine may also include spelling conditions and grammar conditions, which can then provide phrase suggestions according to the conditions and importance indicators, specific to a user group. | 05-28-2015 |
20150149155 | Methods and Systems for Applications for Z-numbers - Specification covers new algorithms, methods, and systems for artificial intelligence, soft computing, and deep learning/recognition, e.g., image recognition (e.g., for action, gesture, emotion, expression, biometrics, fingerprint, facial, OCR (text), background, relationship, position, pattern, and object), large number of images (“Big Data”) analytics, machine learning, training schemes, crowd-sourcing (using experts or humans), feature space, clustering, classification, similarity measures, optimization, search engine, ranking, question-answering system, soft (fuzzy or unsharp) boundaries/impreciseness/ambiguities/fuzziness in language, Natural Language Processing (NLP), Computing-with-Words (CWW), parsing, machine translation, sound and speech recognition, video search and analysis (e.g. tracking), image annotation, geometrical abstraction, image correction, semantic web, context analysis, data reliability (e.g., using Z-number (e.g., “About 45 minutes; Very sure”)), rules engine, control system, autonomous vehicle, self-diagnosis and self-repair robots, system diagnosis, medical diagnosis, biomedicine, data mining, event prediction, financial forecasting, economics, risk assessment, e-mail management, database management, indexing and join operation, memory management, and data compression. | 05-28-2015 |
20150293899 | MOBILE BASED LEXICON AND FORECASTING - An approach is provided for ranking candidate answers to a natural language question. A natural language question is received from a user of a mobile device. Candidate answers to the received natural language question are generated. A lexicon is generated based on contextual information of the user determined by the mobile device. Contextual information of the user is forecasted based on usage of the mobile device. Based in part on the lexicon and the forecasted contextual information, the candidate answers to the natural language question are ranked. | 10-15-2015 |
20150293900 | INFORMATION RETRIEVAL SYSTEM BASED ON A UNIFIED LANGUAGE MODEL - Embodiments of the invention provide systems and methods for representing a plurality of languages in a lexicon based on a unified language model. More specifically, embodiments of the present invention utilize a language model that focuses on commonalities within a particular language and between languages. Such commonalities may be based on rhyming of the words, other patterns within the words, and more generally, prosody of the words, phrases, and/or language overall. Prosody is commonly defined as the rhythm, stress, and intonation of the language when spoken. Using a language model defining such characteristics for one or more languages, embodiments of the present invention can define a lexicon for those one or more languages. In this lexicon, words and phrases of the one or more languages can be represented and classified into groups based on the commonalities between them | 10-15-2015 |
20150293902 | METHOD FOR AUTOMATED TEXT PROCESSING AND COMPUTER DEVICE FOR IMPLEMENTING SAID METHOD - The method includes combining words into syntagmas, putting stresses at the ends of the syntagmas and, subsequently, transcribing the syntagmas for the purpose of obtaining syntagma transcriptions in terms of phonemes and allophones. In addition, a database of reference allophones is formed. Coincidences between the syntagma transcription allophones are compared to reference allophones, and syntagma transcription allophones that do not coincide with reference allophones are excluded. Balanced text syntagmas, i.e., those having a greatest number of coincidences between the syntagma transcription allophones and reference allophones, are formed from syntagma transcription allophones coinciding with reference allophones. The device includes a text input unit, an analysis unit, a database unit, and a result submission unit. A parameter input unit and a balanced syntagma forming unit are added. | 10-15-2015 |
20150293903 | TEXT ANALYSIS - A method of processing text having an associated source type to generate data indicative of a property associated with said text, said text comprising a plurality of tokens. The method comprises generating a plurality of metrics of said text based upon said plurality of tokens, the plurality of metrics comprising token count data for said plurality of tokens, part of speech data for said plurality of tokens, semantic field data for said plurality of tokens and at least one metric indicative of a property of the text; selecting reference data from a plurality of reference data based upon the source type associated with the text processing each of said plurality of metrics of said text based upon the reference data to generate data indicating a relationship between said plurality of metrics and said reference data; and combining the data indicating a relationship between the respective ones of the plurality of metrics and said reference data to generate the data indicative of a property associated with said text. The method may be applied to author profiling. | 10-15-2015 |
20150293904 | INTELLIGENT CONTEXTUALLY AWARE DIGITAL ASSISTANTS - One embodiment of the present invention provides a system for providing context-based web services for a user. During operation, the system receives a sentence as input from a user. The system performs natural language processing on the sentence to determine one or more parameters. The system retrieves data from a foreground knowledge graph containing contextual data for the user and from a background knowledge graph containing background information corresponding to the parameters. The system determines a set of arguments based on the parameters and/or data from the foreground knowledge graph and/or data from the background knowledge graph. The system then selects an action module based on results of the natural language processing and/or the set of arguments. The system passes the arguments to the action module. The action module then uses the arguments to respond to a question or interact with web services to perform an action for the user. | 10-15-2015 |
20150293905 | Summarization of a Document - A method for summarizing a document is provided. A concept is detected for each sentence in said document. Relevance measures between the sentences are computed according to the detected concepts. And then a concept-aware graph is constructed, wherein a node in said graph represents a sentence in the document and an edge between two nodes represents a relevance measure between these two sentences. | 10-15-2015 |
20150293906 | COMPUTER-BASED ANALYSIS OF VIRTUAL DISCUSSIONS FOR PRODUCTS AND SERVICES - A method for analyzing a virtual discussion. The method may include identifying, with a processing device, a first concept relevant to a first subdiscussion associated with an online discussion. The method may also include identifying a second concept relevant to the first subdiscussion, and determining a relation between the first concept and the second concept. | 10-15-2015 |
20150293907 | CALCULATING CORRELATIONS BETWEEN ANNOTATIONS - An apparatus for calculating a correlation between annotations includes a first obtaining unit configured to provide an annotator with a first data group capable of being evaluated to determine whether or not to attach annotations thereto, and obtaining a plurality of first confidence levels indicating certainty of the annotations in the first data group, the annotator outputting confidence levels indicating certainty of annotations to be attached to data when the data is given; a second obtaining unit configured to provide the annotator with a second data group used to calculate a correlation between the plurality of annotations, and thereby obtaining a plurality of second confidence levels indicating the certainty of the annotations in the second data group; and a computing unit configured to compute an estimated value of the correlation between the plurality of annotations based on the plurality of first and second confidence levels. | 10-15-2015 |
20150301795 | CROWD SOURCED BASED TRAINING FOR NATURAL LANGUAGE INTERFACE SYSTEMS - A crowdsourcing based community platform includes a natural language configuration system that predicts a user's desired function call based on a natural language input (speech or text). The system provides a collaboration platform to configure and optimize quickly natural language systems to leverage the work and data of other developers, thus minimizing the time and data required to improve the quality and accuracy of one single system and providing a network effect to reach quickly critical mass of data. An application developer can provide training data for training a model specific to the developer's application. The developer can also obtain training data by forking one or more other applications so that the training data provided for the forked applications is used to train the model for the developer's application. | 10-22-2015 |
20150301999 | METHOD OF TAKING A COMPUTER ARCHITECTURE REPRESENTATION AND GENERATING MANUFACTURING COMPUTER SYSTEMS CONTAINED IN A SPECIFICATION - Techniques and a system for creating a vendor independent computer language and compiling the language into an architecture specification language allowing for taking a source data stream (file, WSDL, XML) and passing thru a language parser, populating a storage medium with a plurality of technical inputs and vendor technical specifications for generic technologies and probable technologies required for desired architectures generated by the language parser, and optimizing the inputs and creating relationships between technologies and groups of technologies and storing results in the storage medium. | 10-22-2015 |
20150302000 | A METHOD AND A TECHNICAL EQUIPMENT FOR ANALYSING MESSAGE CONTENT - The present embodiments relate to a method and a technical equipment for implementing the method. The method comprises receiving a message; identifying an action request from the message; determining a keyword from the message; monitoring the requested action, and detecting a data item relating to the action; and associating the keyword with the detected data item. | 10-22-2015 |
20150302001 | Method and device for phonetizing data sets containing text - A method for phonetizing text-containing data records that include graphemes includes: phonetizing the data records by converting the graphemes in the data records into phonemes, and storing the phonemes as phonetized data records; and preprocessing to condition the graphemes for the phonetization by modifying the graphemes on a language-defined and/or user-defined basis. The preprocessing of the graphemes and the conversion of the graphemes into phonemes are performed in parallel on different computation units or different portions of the computation units. | 10-22-2015 |
20150302002 | ARCHITECTURE FOR MULTI-DOMAIN NATURAL LANGUAGE PROCESSING - Features are disclosed for processing a user utterance with respect to multiple subject matters or domains, and for selecting a likely result from a particular domain with which to respond to the utterance or otherwise take action. A user utterance may be transcribed by an automatic speech recognition (“ASR”) module, and the results may be provided to a multi-domain natural language understanding (“NLU”) engine. The multi-domain NLU engine may process the transcription(s) in multiple individual domains rather than in a single domain. In some cases, the transcription(s) may be processed in multiple individual domains in parallel or substantially simultaneously. In addition, hints may be generated based on previous user interactions and other data. The ASR module, multi-domain NLU engine, and other components of a spoken language processing system may use the hints to more efficiently process input or more accurately generate output. | 10-22-2015 |
20150302003 | GENERIC VIRTUAL PERSONAL ASSISTANT PLATFORM - A method for assisting a user with one or more desired tasks is disclosed. For example, an executable, generic language understanding module and an executable, generic task reasoning module are provided for execution in the computer processing system. A set of run-time specifications is provided to the generic language understanding module and the generic task reasoning module, comprising one or more models specific to a domain. A language input is then received from a user, an intention of the user is determined with respect to one or more desired tasks, and the user is assisted with the one or more desired tasks, in accordance with the intention of the user. | 10-22-2015 |
20150309965 | METHODS, SYSTEMS, AND DEVICES FOR OUTCOME PREDICTION OF TEXT SUBMISSION TO NETWORK BASED ON CORPORA ANALYSIS - Computationally implemented methods and systems include acquiring a message that is configured to be submitted to a network for publication, performing text-based analysis on the acquired message to determine an objective message prediction, wherein the text-based analysis is at least partially based on a corpus of one or more related texts, and transmitting the objective message prediction to a destination device, wherein the objective message prediction is configured to be presented on the destination device prior to submission of the acquired message to the network. In addition to the foregoing, other aspects are described in the claims, drawings, and text. | 10-29-2015 |
20150309982 | GRAMMATICAL ERROR CORRECTING SYSTEM AND GRAMMATICAL ERROR CORRECTING METHOD USING THE SAME - Provided are a grammatical error correcting system and a grammatical error correcting method using the same, and in detail, the grammatical error correcting system includes: a learning unit configured to acquire a plurality of context features according to a linguistic characteristic from a plurality of corpuses and generate a primary learning classification model and a secondary learning classification model which are references of diagnosing a grammatical error from the context features; and an executing unit configured to predict the grammatical error with respect to a corpus which is input by a learner by using the primary learning classification model, predict the grammatical error by using a primary prediction result of the grammatical error and the secondary learning classification model, and correct the grammatical error, in which the secondary learning classification model is generated by an iterative learning technique by using the plurality of context features extracted from the plurality of corpuses based on the primary prediction result. | 10-29-2015 |
20150309983 | SYSTEMS AND METHODS FOR ADVANCED GRAMMAR CHECKING - In embodiments of the present invention improved capabilities are described for methods and systems of grammar checking providing a web-based writing checking facility integrated into a computing environment to analyze text for writing errors, wherein a user initiates the analysis of the text through a single-action review button displayed to the user in proximity with a text box containing the text, the depressing of the single-action review button initiating writing checking of the text with the writing checking facility. | 10-29-2015 |
20150309985 | METHODS, SYSTEMS, AND DEVICES FOR MACHINES AND MACHINE STATES THAT FACILITATE MODIFICATION OF DOCUMENTS BASED ON VARIOUS CORPORA - Computationally implemented methods and systems include receiving a document that includes at least one particular lexical unit, acquiring potential readership data that includes data about a potential readership for the received document, and selecting at least one replacement lexical unit that is configured to replace at least a portion of the at least one particular lexical unit, wherein selection of the at least one replacement lexical unit is at least partly based on the acquired potential readership data. In addition to the foregoing, other aspects are described in the claims, drawings, and text. | 10-29-2015 |
20150309987 | Classification of Offensive Words - A computer-implemented method can include identifying a first set of text samples that include a particular potentially offensive term. Labels can be obtained for the first set of text samples that indicate whether the particular potentially offensive term is used in an offensive manner. A classifier can be trained based at least on the first set of text samples and the labels, the classifier being configured to use one or more signals associated with a text sample to generate a label that indicates whether a potentially offensive term in the text sample is used in an offensive manner in the text sample. The method can further include providing, to the classifier, a first text sample that includes the particular potentially offensive term, and in response, obtaining, from the classifier, a label that indicates whether the particular potentially offensive term is used in an offensive manner in the first text sample. | 10-29-2015 |
20150309988 | Evaluating Crowd Sourced Information Using Crowd Sourced Metadata - An approach is provided for utilizing crowd sourced data to score, or weigh, candidate answers in a question/answer (QA) system. In the approach, a question is received from a user and the system identifies question keywords and a context in the question using natural language processing (NLP). The system mines crowd sourced data sets for crowd sourced information, the mining being based on the identified question keywords and context. The crowd sourced data sets have stored therein a collective opinion of a crowd of individuals. The system evaluates the mined crowd sourced information based on crowd sourced metadata. The evaluation results in a most likely answer that is returned to the user, with the most likely answer that incorporating a portion of the crowd sourced information. | 10-29-2015 |
20150309989 | METHODS, SYSTEMS, AND DEVICES FOR LEXICAL CLASSIFICATION, GROUPING, AND ANALYSIS OF DOCUMENTS AND/OR DOCUMENT CORPORA - Computationally implemented methods and systems include selecting a target portion of a source document, presenting a representation of the target portion of the source document to a client, accepting input from the client that is configured to separate the target portion of the source document into a set of one or more designated lexical units, receiving association input from the client, said association input configured to associate at least one designated lexical unit with a further portion of the source document that is different than the target portion, and providing an output structure that represents the set of one or more designated lexical units. In addition to the foregoing, other aspects are described in the claims, drawings, and text. | 10-29-2015 |
20150309990 | Producing Insight Information from Tables Using Natural Language Processing - Mechanisms for generating insight statements from table data are provided. A portion of content comprising a table data structure and text associated with the table data structure is received and at least one of key terms or semantic relationships in the table data structure and the associated text are identified. Fields of an insight statement template are populated with information obtained from the key terms and semantic relationships to generate an insight statement data structure. The insight statement data structure is then output. The insight statement data structure is a natural language statement describing an aspect of the table data structure. | 10-29-2015 |
20150309992 | AUTOMATED COMPREHENSION OF NATURAL LANGUAGE VIA CONSTRAINT-BASED PROCESSING - A consistent meaning framework (CMF) graph including a plurality of nodes linked by a plurality of edges is maintained in data storage of a data processing system. Multiple nodes among the plurality of nodes are meaning nodes corresponding to different word meanings for a common word spelling of a natural language. Each of the multiple word meanings has a respective one of a plurality of associated constraints. A natural language communication is processed by reference to the CMF graph. The processing includes selecting, for a word in the natural language communication, a selected word meaning from among the multiple word meanings based on which of the plurality of associated constraints is satisfied by the natural language communication. An indication of the selected word meaning is stored in data storage. | 10-29-2015 |
20150317300 | METHOD FOR FAST INPUTTING A RELATED WORD - The present invention provides a text input method, which is integrated in a text input program or device supporting word input (e.g., software/hardware keyboard, input method, etc.) and assists a user in easily inputting a word or a phrase (e.g., various tense forms of a verb, etc.) relating to a certain word. The user may fast input a specific word relating to the certain word by a specific operation (e.g., clicking a software or hardware key, moving a screen contact point, etc.) or by a combination of a plurality of operations. | 11-05-2015 |
20150317301 | NLP-BASED SYSTEMS AND METHODS FOR PROVIDING QUOTATIONS - Techniques for providing quotations obtained from text documents using natural language processing techniques are described. Some embodiments provide a content recommendation system (“CRS”) configured to provide quotations by extracting quotations from a corpus text documents, and providing access to the extracted quotations in response to search requests received from users. The CRS may extract quotations by using natural language processing-based techniques to identify one or more entities, such as people, places, objects, concepts, or the like, that are referenced by the extracted quotations. The CRS may then store the extracted quotations along with identified entities, such as quotation speakers and subjects, for later access via search requests. | 11-05-2015 |
20150317302 | TRANSFERRING INFORMATION ACROSS LANGUAGE UNDERSTANDING MODEL DOMAINS - Aspects of the present invention provide a technique to validate the transfer of intents or entities between existing natural language model domains (hereafter “domain” or “NLU”) using click logs, a knowledge graph, or both. At least two different types of transfers are possible. Intents from a first domain may be transferred to a second domain. Alternatively or additionally, entities from the second domain may be transferred to an existing intent in the first domain. Either way, additional intent/entity pairs can be generated and validated. Before the new intent/entity pair is added to a domain, aspects of the present invention validate that the intent or entity is transferable between domains. Validation techniques that are consistent with aspects of the invention can use a knowledge graph, search query click logs, or both to validate a transfer of intents or entities from one domain to another. | 11-05-2015 |
20150324347 | METHOD AND APPARATUS FOR AGGREGATING WITH INFORMATION GENERALIZATION - Methods, apparatuses, and computer program products are described herein that are configured to perform aggregation of phrase specifications. In some example embodiments, a method is provided that comprises identifying two or more generalized phrase specifications. In some example embodiments, the two or more generalized phrase specifications contain at least one aggregatable constituent. The method of this embodiment may also include generating an aggregated phrase specification from the two or more generalized phrase specifications. In some example embodiments, the aggregated phrase specification comprises a combined noun phrase generated from the aggregatable constituents and one or more additional constituents based on a determined level of generalization. | 11-12-2015 |
20150324348 | ASSOCIATING AN IMAGE THAT CORRESPONDS TO A MOOD - For associating an image that corresponds to a mood, code identifies the mood of a digital message. The code further associates the image that corresponds to the mood to the digital message. | 11-12-2015 |
20150324349 | AUTOMATED READING COMPREHENSION - Methods and apparatus are disclosed for determining similarities and/or differences between entities in a segment of text based on various signals are presented, and for determining one or more likelihoods that one or more subjects found in a segment of text are capable of performing one or more associated actions based on various signals. | 11-12-2015 |
20150324350 | Identifying Content Relationship for Content Copied by a Content Identification Mechanism - A mechanism is provided, in a data processing system comprising a processor and a memory configured to implement a natural language processing (NLP) system, for identifying content relationship for content copied by a content identification mechanism. The content identification mechanism identifies content from a website and then identifies relationship content information associated with a current web page where the content is found. The content identification mechanism modifies a file structure associated with the content with the relationship content information. The content identification mechanism identifies one or more classification identifiers in order to classify the content. Finally, the content identification mechanism transmits the content and the file structure to a specific corpus based on the one or more classification identifiers. | 11-12-2015 |
20150324351 | METHOD AND APPARATUS FOR EXPRESSING TIME IN AN OUTPUT TEXT - Methods, apparatuses, and computer program products are described herein that are configured to express a time in an output text. In some example embodiments, a method is provided that comprises identifying a time period to be described linguistically in an output text. The method of this embodiment may also include identifying a communicative context for the output text. The method of this embodiment may also include determining one or more temporal reference frames that are applicable to the time period and a domain defined by the communicative context. The method of this embodiment may also include generating a phrase specification that linguistically describes the time period based on the descriptor that is defined by a temporal reference frame of the one or more temporal reference frames. In some examples, the descriptor specifies a time window that is inclusive of at least a portion of the time period to be described linguistically. | 11-12-2015 |
20150324353 | TRANSLATING APPLICATION RESOURCES - According to one general aspect a system includes an identification module, a translation module, and a display module. The identification module being configured to identify when an application running within the system attempts to display a work to a user of the application in a first language. The translation module being configured to translate the word from the first language to a second language different than the first language. The display module being configured to display the word in the second language to the user. | 11-12-2015 |
20150331845 | TABLE NARRATION USING NARRATION TEMPLATES - A computer system for narrating a table using at least one narration template, wherein the table is extracted from a data source is provided. The computer system may include parsing the extracted table. The computer system may also include performing structural analysis on the parsed extracted table. The computer system may further include selecting at least one structural template based on the structural analysis of the parsed extracted table. Additionally, the computer system may include selecting the at least one narration template based on the at least one selected structural template. The computer system may also include applying the at least one selected narration template to the extracted table. The computer system may further include narrating the extracted table based on the applying of the at least one selected narration template to the extracted table. | 11-19-2015 |
20150331846 | TABLE NARRATION USING NARRATION TEMPLATES - A method for narrating a table using at least one narration template, wherein the table is extracted from a data source is provided. The method may include parsing the extracted table. The method may also include performing structural analysis on the parsed extracted table. The method may further include selecting at least one structural template based on the structural analysis of the parsed extracted table. Additionally, the method may include selecting the at least one narration template based on the at least one selected structural template. The method may also include applying the at least one selected narration template to the extracted table. The method may further include narrating the extracted table based on the applying of the at least one selected narration template to the extracted table. | 11-19-2015 |
20150331849 | System and Method for Enhancing Personalized Conversation within the Social Network - Disclosed is a system and method for enhancing personalized conversation across a plurality of users within the network. The method enhances the personalized conversation in terms of efficiency, effectiveness, and service satisfaction provided to one or more customers. Further, the method enhances the personalized conversation by generating one or more conversation decision trees using one or more extracted keywords/phrases from the active conversation occurring across the plurality of users. Further, the method optimizes and sends the standardized response to the active conversation by analyzing the number of responses and the level of responses received from the users for the active conversation. | 11-19-2015 |
20150331850 | SYSTEM FOR SEMANTIC INTERPRETATION - A semantic database is generated to provide answers to questions by users. Text processors can receive text from text sources, and can convert the text into intermediate logical statements. The text processors can then convert these statements into unambiguous semantic representations. A semantic database connected to the text processors can store the semantic representations. Query processors connected to the semantic database can receive a question from a computing device operated by a user, and can convert the question into intermediate logical subqueries. The query processors can then use a disambiguation table to generate unambiguous semantic subqueries from these intermediate logical subqueries. Using the semantic database, the query processors can match each semantic subquery to the stored semantic representation, and join results of the matching as appropriate, to determine one or more answers to the question. The query processors can send the one or more answers to the computing device. | 11-19-2015 |
20150331851 | Assisted input of rules into a knowledge base - There is disclosed a method implemented by computer for manipulating a sentence expressed in natural language comprising the determination of one or more destination locations in an initial sentence in response to the selecting of one or more words, said destination locations or “ghost locations” being determined by syntactic analysis of the sentence. Various developments include the display of the destination location or locations in response to the selecting of the word or words, and the display of one or more suggestions of words corresponding to the one or more ghost locations. Semantic validations by logical verification, by similarity and by graph analysis are described. System aspects are described, including the employment of a touch-sensitive tablet. | 11-19-2015 |
20150331852 | FINDING AN APPROPRIATE MEANING OF AN ENTRY IN A TEXT - Disclosed are systems, computer-readable mediums, and methods for providing a meaning of an entry in a text is described. A lexico-morphological analysis is performed on the text. A syntactical analysis is performed on the text. A semantic analysis is performed on the text. A syntactical structure and a semantic structure for the entry is chosen. One or more syntactic links between each alternative meaning of words in proximity to the entry is determined. A weight is determined. One or more semantic links between each word in proximity to the entry are determined. For each semantic link, a weight associated with each semantic link is determined; and based on the weights associated with each semantic and syntactic link, determining meaning of the entry. | 11-19-2015 |
20150331853 | AUTOMATED MULTI-GRAMMAR LANGUAGE PROCESSING SYSTEM TO OPTIMIZE REQUEST HANDLING IN CONTACT CENTERS - An automated multi-grammar language processing system provides optimized request handling in contact centers. It enables a contact center to receive and analyze requests from users in the form of text messages, such as sms, email, instant messages, voice messages converted to text, etc., and to understand in real time if the request is to be managed by an automated system or queued for processing by a human operator. | 11-19-2015 |
20150331854 | DOMAIN SPECIFIC NATURAL LANGUAGE NORMALIZATION - Embodiments of the present invention provide a method, system and computer program product for the domain specific normalization of a corpus of text. In an embodiment of the invention, a method for domain specific normalization of a corpus of text is provided, including an industrial, organization, demographic or geographic domain. The method includes loading a corpus of text in memory of a computer and determining a domain for the corpus of text. The method also includes retrieving a lexicon of replacement words for the determined domain. Finally, the method includes text simplifying the corpus of text using the retrieved lexicon. In one aspect of the embodiment, the domain is determined through inference based upon words already presence in the corpus of text. In another aspect of the embodiment, the domain is determined based upon meta-data provided with the corpus of text. | 11-19-2015 |
20150332152 | Systems And Methods For Generating Models For Physical Systems Using Sentences In A Formal Grammar - A human expert creates sentences in a formal grammar to describe the state of a physical system through aspects of the behavior of such systems. A software process combines these sentences with historical data about physical systems of the same type and uses machine learning to generate a model that detects this state in such systems. These models are able to detect important states of physical systems, such as states that are predictive of future failures, without needing precise guidance from a human user. | 11-19-2015 |
20150332670 | Language Modeling For Conversational Understanding Domains Using Semantic Web Resources - Systems and methods are provided for training language models using in-domain-like data collected automatically from one or more data sources. The data sources (such as text data or user-interactional data) are mined for specific types of data, including data related to style, content, and probability of relevance, which are then used for language model training. In one embodiment, a language model is trained from features extracted from a knowledge graph modified into a probabilistic graph, where entity popularities are represented and the popularity information is obtained from data sources related to the knowledge. Embodiments of language models trained from this data are particularly suitable for domain-specific conversational understanding tasks where natural language is used, such as user interaction with a game console or a personal assistant application on personal device. | 11-19-2015 |
20150339287 | MAINTAINING COVERSATIONAL CADENCE IN AN ONLINE SOCIAL RELATIONSHIP - A method for maintaining conversational cadence may include determining, by a processor, a conversational cadence associated with a user in a social network. The conversational cadence may be determined based on a plurality of messages previously transmitted by the user. The method may also include detecting, by the processor, a reduction in the conversational cadence of the user. The method may further include providing, by the processor, a set of fill-in messages that create an appearance to another user in the social network that there is no reduction in the conversational cadence. | 11-26-2015 |
20150339288 | Systems and Methods for Generating Summaries of Documents - Systems and methods for summarizing online articles for consumption on a user device are disclosed herein. The system extracts the main body of an article's text from the HTML code of an online article. The system may then classify the extracted article into one of several different categories and removes duplicate articles. The system breaks down the article into its component sentences, and each sentence is classified into one of three categories: (1) potential candidate sentences that may be included in the generated summary; (2) weakly rejected sentences that will not be included in the summary but may be used to generate the summary; and (3) strongly rejected sentences that are not included in the summary. Finally, the system applies a document summarizer to generate quickly readable article summaries, for viewing on the user device, using relevant sentences from the article while maintaining the coherence of the article. | 11-26-2015 |
20150339289 | MEDIA EVENT STRUCTURE AND CONTEXT IDENTIFICATION USING SHORT MESSAGES - The present disclosure is descriptive of discovering structure, content, and context of a media event, e.g., a live media event, using real-time discussions that unfold through short messaging services. Generally, a sampling of short messages of a plurality of users is obtained. The sampling of short messages corresponds to a media event. A segment in the media event is identified using the sampling of short messages, and at least one term taken from the sampling of short messages is identified. The at least one term is indicative of a context of the identified segment. | 11-26-2015 |
20150339290 | Context Based Synonym Filtering for Natural Language Processing Systems - Mechanisms are provided for performing context based synonym filtering for natural language processing. Content is parsed into one or more conceptual units, wherein each conceptual unit comprises a portion of text of the content that is associated with a single concept. For each conceptual unit, a term in the conceptual unit is identified that has a synonym to be utilized during natural language processing of the content. A first measure of relatedness of the term to at least one other term in the conceptual unit is determined. A second measure of relatedness of the synonym of the term to the at least one other term in the conceptual unit is determined. A determination whether or not to utilize the synonym when performing natural language processing on the conceptual unit is made based on the first and second measures of relatedness and natural language processing on the content is performed accordingly. | 11-26-2015 |
20150339292 | SYSTEM AND METHOD FOR BUILDING DIVERSE LANGUAGE MODELS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model. | 11-26-2015 |
20150347375 | AUTOMATED QUALITY ASSURANCE CHECKS FOR IMPROVING THE CONSTRUCTION OF NATURAL LANGUAGE UNDERSTANDING SYSTEMS - Aspects described herein provide quality assurance checks for improving the construction of natural language understanding grammars. An annotation module may obtain a set of annotations for a set of text samples based, at least in part, on an ontology and a grammar. A quality assurance module may automatically perform one or more quality assurance checks on the set of annotations, the ontology, the grammar, or combinations thereof. The quality assurance module may generate a list of flagged annotations during performance of a quality assurance check. The list of flagged annotations may be presented at an annotation review interface displayed at a display device. One of the flagged annotations may be selected and presented at an annotation interface displayed at the display device. Responsive to presentation of the flagged annotation, the ontology, the grammar, the flagged annotation selected, or combinations thereof may be updated based on user input received. | 12-03-2015 |
20150347381 | ENTROPY-GUIDED TEXT PREDICTION USING COMBINED WORD AND CHARACTER N-GRAM LANGUAGE MODELS - Systems and processes are disclosed for predicting words in a text entry environment. Candidate words and probabilities associated therewith can be determined by combining a word n-gram language model and a character m-gram language model. Based on entered text, candidate word probabilities from the word n-gram language model can be integrated with the corresponding candidate character probabilities from the character m-gram language model. A reduction in entropy can be determined from integrated candidate word probabilities before entry of the most recent character to integrated candidate word probabilities after entry of the most recent character. If the reduction in entropy exceeds a predetermined threshold, candidate words with high integrated probabilities can be displayed or otherwise made available to the user for selection. Otherwise, displaying candidate words can be deferred (e.g., pending receipt of an additional character from the user leading to reduced entropy in the candidate set). | 12-03-2015 |
20150347382 | PREDICTIVE TEXT INPUT - Systems and processes for predictive text input are provided. In one example process, a text input can be received. The text input can be associated with an input context. A frequency of occurrence of an m-gram with respect to a subset of a corpus can be determined using a language model. The subset can be associated with a context. A weighting factor can be determined based on a degree of similarity between the input context and the context. A weighted probability of a predicted text given the text input can be determined based on the frequency of occurrence of the m-gram and the weighting factor. The m-gram can include at least one word in the text input and at least one word in the predicted text. | 12-03-2015 |
20150347383 | TEXT PREDICTION USING COMBINED WORD N-GRAM AND UNIGRAM LANGUAGE MODELS - Systems and processes are disclosed for predicting words in a text entry environment. Candidate words and probabilities associated therewith can be determined by combining a word n-gram language model and a unigram language model. Using the word n-gram language model, based on previously entered words, candidate words can be identified and a probability can be calculated for each candidate word. Using the unigram language model, based on a character entered for a new word, candidate words beginning with the character can be identified along with a probability for each candidate word. In some examples, a geometry score can be included in the unigram probability related to typing geometry on a virtual keyboard. The probabilities of the n-gram language model and unigram model can be combined, and the candidate word or words having the highest probability can be displayed for a user. | 12-03-2015 |
20150347384 | Systems and Methods for Identifying and Suggesting Emoticons - Various embodiments provide a method that comprises receiving a set of segments from a text field, analyzing the set of segments to determine at least one of a target subtext or a target meaning associated with the set of segments, and identifying a set of candidate emoticons where each candidate emoticon in the set of candidate emoticons has an association between the candidate emoticon and at least one of the target subtext or the target meaning. The method may further comprise presenting the set of candidate emoticons for entry selection at a current position of an input cursor, receiving an entry selection for a set of selected emoticons from the set of candidate emoticons, and inserting the set of selected emoticons into the text field at the current position of the input cursor. | 12-03-2015 |
20150347385 | Systems and Methods for Determining Lexical Associations Among Words in a Corpus - Systems and methods are provided for identifying one or more target words of a corpus that have a lexical relationship to a plurality of provided cue words. The cue words and statistical lexical information derived from a corpus of documents are analyzed to determine candidate words that have a lexical association with the cue words. The statistical information includes numerical values indicative of probabilities of word pairs appearing together as adjacent words in a well-formed text or appearing together within a paragraph of a well-formed text. For each candidate word, a statistical association score between the candidate word and each of the cue words is determined. An aggregate score for each of the candidate words is determined based on the statistical association scores. One or more of the candidate words are selected to be the one or more target words based on the aggregate scores. | 12-03-2015 |
20150347387 | ADJUSTING RANGES OF DIRECTED GRAPH ONTOLOGIES ACROSS MULTIPLE DIMENSIONS - A method, system, and/or computer program product constructs and utilizes an ontological graph. A seed term and an expansion signal are received from a user. An ontological graph is constructed based on the expansion signal as applied to the seed term. The ontological graph includes nodes representing the seed term plus other terms that are located in accordance with instructions derived from the first expansion signal, such that the seed term and the other terms share a common trait. Terms from the ontological graph are displayed as string literals in a dictionary, wherein the dictionary contains related other terms at a resolution level that is controlled by the first expansion signal from the user and the seed term. | 12-03-2015 |
20150347388 | Digital Content Genre Representation - A digital media store may receive content from a content creator and/or distributor thereof. The distributor may provide the content with a label genre. A first mapping rule may be applied to the label genre to convert it to a canonical genre and the canonical genre may be converted to a regional genre based on the application of a second mapping rule. The regional genre may be presented to a consumer's electronic device along with a translation, if necessary. | 12-03-2015 |
20150347390 | Compliance Standards Metadata Generation - Compliance standard documents can be automatically processed to generate meta data that can simplify the application of these documents to products, for example, to devise a compliance testing strategy for the products. The meta data can include the relevancy of the compliance standard documents to aspects of standard compliance, which can be established based on a characteristic of keywords found in the documents. The meta data can include clauses in the compliance standard documents that relate to aspects of standard compliance, which can be established based on a presence of the keywords in the clauses. | 12-03-2015 |
20150347391 | PERSONA MANAGEMENT SYSTEM FOR COMMUNICATIONS - A system to apply persona styles to written communications. The system includes a communication analyzer and a modification engine. The communication analyzer receives an element of original content of a written communication. The communication analyzer also receives a selection of a persona style. The selected persona style defines a communication style. The modification engine presents a substitute element to a user in response to a determination that the element of the original content of the written communication is incompatible with the selected persona style. The substitute element is compatible with the selected persona style. | 12-03-2015 |
20150347392 | REAL-TIME FILTERING OF MASSIVE TIME SERIES SETS FOR SOCIAL MEDIA TRENDS - A method for determining significant words or phrases within social media data includes receiving a stream of data from at least one social media source. The stream includes one or more words or phrases along with corresponding time stamps indicating when the word/phrase was used. One or more words/phrases to be analyzed is determined from the stream. A time period of interest is identified. The time period is divided into a plurality of non-overlapping time windows. The stream is analyzed within the time period of interest to determine how many instances of each words/phrases have timestamps within each time window. One or more of the words/phrases are identified as significant based on a level of co-occurrence of the words/phrases related to the determination as to how many instances of each words/phrases have timestamps within each window. | 12-03-2015 |
20150347393 | EXEMPLAR-BASED NATURAL LANGUAGE PROCESSING - Systems and processes for exemplar-based natural language processing are provided. In one example process, a first text phrase can be received. It can be determined whether editing the first text phrase to match a second text phrase requires one or more of inserting, deleting, and substituting a word of the first text phrase. In response to determining that editing the first text phrase to match the second text phrase requires one or more of inserting, deleting, and substituting a word of the first text phrase, one or more of an insertion cost, a deletion cost, and a substitution cost can be determined. A semantic edit distance between the first text phrase and the second text phrase in a semantic space can be determined based on one or more of the insertion cost, the deletion cost, and the substitution cost. | 12-03-2015 |
20150347400 | METHOD AND APPARATUS FOR MOTION DESCRIPTION - A method, apparatus, and computer program product for describing motion. The method may include receiving a set of eventualities ( | 12-03-2015 |
20150356057 | NLU TRAINING WITH MERGED ENGINE AND USER ANNOTATIONS - Techniques for training a natural language understanding (NLU) engine may include generating a first annotation of free-form text documenting a healthcare patient encounter and a link between the first annotation and a corresponding portion of the text, using the NLU engine. A second annotation of the text and a link between the second annotation and a corresponding portion of the text may be received from a human user. The first annotation and its corresponding link may be merged with the second annotation and its corresponding link. Training data may be provided to the engine in the form of the text and the merged annotations and links. | 12-10-2015 |
20150356072 | Method and Apparatus of Matching Text Information and Pushing a Business Object - Methods and apparatuses of matching text information and pushing a business object are disclosed. The method of matching text information includes: acquiring a first text information set and a second text information set to be matched, the first text information set including a finite amount of first text information and the second text information set including a finite amount of second text information; and finding one or more pieces of the finite amount of second text information that match with each piece of the finite amount of first text information according to a preset rule. The embodiments of the present disclosure abandon an open-ended expansion approach way of directly searching extended words from the first text information and turns to a closed interval to find one or more pieces of the finite amount of second text information which match with each piece of the finite amount of first text information, thus avoiding an unnecessary amount of matching computation, reducing a waste of system resources and improving an efficiency of matching computation. | 12-10-2015 |
20150356073 | METHOD AND APPARATUS FOR PROVIDING SEMANTIC DATA ARCHITECTURE - A method of providing a semantic data architecture includes providing a data model layer. The data model layer is formed by a processor, a storage device, a memory, and a communication interface in combination with a data model application program stored in the storage device. The processor, running the data model application program, is configured to use the storage device, memory, and communication interface to selectively receive source data from a source device, process the corresponding source data based on pre-defined data types and filtering terms to form semantic data arranged in a binary tree structure, and store the semantic data in the storage device. The method also includes providing a data filtering layer. The method may also include providing memory model, general purpose parser, backward inference, primitive functions, rewriting engine, and reasoning engine layers in various combinations. An apparatus for providing a semantic data architecture is also provided. | 12-10-2015 |
20150363363 | GENERATING LANGUAGE SECTIONS FROM TABULAR DATA - A computer implemented method of generating a language section from tabular data in an electronic document may include identifying, in a first tabular portion of the electronic document, a set of categories used to organize tabular data. The method may include identifying a content characteristic for each category of the set of categories in the first tabular portion. And the method may include generating a first language section from at least two distinct categories of the set of categories, wherein a format of the first language section is based on the content characteristics for the at least two distinct categories. | 12-17-2015 |
20150363382 | GENERATING LANGUAGE SECTIONS FROM TABULAR DATA - A computer implemented method of generating a language section from tabular data in an electronic document may include identifying, in a first tabular portion of the electronic document, a set of categories used to organize tabular data. The method may include identifying a content characteristic for each category of the set of categories in the first tabular portion. And the method may include generating a first language section from at least two distinct categories of the set of categories, wherein a format of the first language section is based on the content characteristics for the at least two distinct categories. | 12-17-2015 |
20150363384 | SYSTEM AND METHOD OF GROUPING AND EXTRACTING INFORMATION FROM DATA CORPORA - A system for annotating words of a data corpus based upon their particular concept and their corresponding grammatical sense with Conceptual Numerical Identifiers (CNIs) from a Conceptual Dictionary, pairing the words based on conceptual inter-relating network (CIRN) rules, and determining if a selected plurality of paired words are grammatically, syntactically, and linguistically correct by matching CNIs from each pair of words. | 12-17-2015 |
20150363385 | SYSTEM AND METHOD FOR COMPUTERIZED PSYCHOLOGICAL CONTENT ANALYSIS OF COMPUTER AND MEDIA GENERATED COMMUNICATIONS TO PRODUCE COMMUNICATIONS MANAGEMENT SUPPORT, INDICATIONS AND WARNINGS OF DANGEROUS BEHAVIOR, ASSESSMENT OF MEDIA IMAGES, AND PERSONNEL SELECTION SUPPORT - At least one computer-mediated communication produced by or received by an author is collected and parsed to identify categories of information within it. The categories of information are processed with at least one analysis to quantify at least one type of information in each category. A first output communication is generated regarding the at least one computer-mediated communication, describing the psychological state, attitudes or characteristics of the author of the communication. A second output communication is generated when a difference between the quantification of at least one type of information for at least one category and a reference for the at least one category is detected involving a psychological state, attitude or characteristic of the author to which a responsive action should be taken. | 12-17-2015 |
20150363386 | Domain Knowledge Driven Semantic Extraction System - A semantic extraction system leverages domain expert knowledge, to impart meaningful business information aiding ordinary knowledge consumers in understanding large/complex data volumes and models thereof. Certain embodiments may comprise a layered structure comprising an information uplifting layer, a semantic processing layer, and a visual representation layer. Referencing domain knowledge model(s) created by human domain experts, the information uplifting layer extracts and maintains meaningful information in a semantic structure. The semantic processing layer then processes this extracted information for various different business analysis purposes. Finally, the visual representation layer allows the analyzed and aggregated information to be arranged and visualized via a range of interactive tools. The overall layered structure is powered by the domain knowledge models, which capture specialized knowledge from experts in different domains. Such domains can include industry and enterprise characteristics, data visualization, and model structure and function. | 12-17-2015 |
20150363387 | Systems And Methods of Detecting, Measuring, And Extracting Signatures of Signals Embedded in Social Media Data Streams - A system for scoring micro-blogging messages is provided, including an extractor, and evaluator, a calculator, and a publisher. The extractor may be configured to receive micro-blogging messages, to detect messages containing terms of interest, to extract raw data, and to store the data in a database. The evaluator may be configured to access and parse the stored data into tokenized data, and to store the tokenized data in a database. The evaluator may also be configured to identify relevant micro-blogging messages; to tag message as indicative; and to filter messages from low-volume or malicious sources before being tagged as indicative. The calculator may be configured to access a sentiment dictionary; to calculate a sentiment score of the tokenized data, and to calculate a sentiment signature for a term of interest. The publisher may be configured to provide access to clients of the system. | 12-17-2015 |
20150363390 | SOLVING AND ANSWERING ARITHMETIC AND ALGEBRAIC PROBLEMS USING NATURAL LANGUAGE PROCESSING - A computer system for solving and answering an arithmetic or algebraic problem using natural language processing (NLP) is provided. The computer system may include receiving an input statement associated with the arithmetic or algebraic problem. The computer system may also include determining whether each sentence within a plurality of sentences associated with the input statement is a well-formed sentence from a mathematical perspective. The computer system may further include converting each statement into a well-formed sentence based on the determining whether each sentence within a plurality of sentences associated with the input statement is a well-formed sentence from a mathematical perspective. Additionally, the computer system may include converting each well-formed sentence into a mathematical equation to form a set of equations. Also, the computer system may include solving the set of equations to compute a mathematical result. The computer system may include narrating the mathematical result in natural language. | 12-17-2015 |
20150363391 | SOLVING AND ANSWERING ARITHMETIC AND ALGEBRAIC PROBLEMS USING NATURAL LANGUAGE PROCESSING - A method for solving and answering an arithmetic or algebraic problem using natural language processing (NLP) is provided. The method may include receiving an input statement associated with the arithmetic or algebraic problem. The method may also include determining whether each sentence within a plurality of sentences associated with the input statement is a well-formed sentence from a mathematical perspective. The method may further include converting each statement into a well-formed sentence based on the determining whether each sentence within a plurality of sentences associated with the input statement is a well-formed sentence from a mathematical perspective. Additionally, the method may include converting each well-formed sentence into a mathematical equation to form a set of equations. Also, the method may include solving the set of equations to compute a mathematical result. The method may include narrating the mathematical result in natural language. | 12-17-2015 |
20150370778 | Syntactic Parser Assisted Semantic Rule Inference - Natural language understanding (NLU) engines perform better when they are trained with large amounts of data. However, a large amount of data is not always available. Embodiments of the present invention overcome this problem by generating annotated data for use in a NLU system. An example embodiment generates annotated data by parsing an input annotated phrase, generating a syntactic tree reflecting a grammatical structure of the parsed phrase, and generating one or more alternative versions of the input annotated phrase based on the syntactic tree. Alignment between expressions and corresponding annotations in the annotated phrase are preserved in the one or more alternative versions generated to ensure intention of the input annotated phrase is maintained. | 12-24-2015 |
20150370780 | PREDICTIVE CONVERSION OF LANGUAGE INPUT - Systems and processes for predictive conversion of language input are provided. In one example process, text composed by a user can be obtained. Input comprising a sequence of symbols of a first symbolic system can be received from the user. Candidate word strings corresponding to the sequence of symbols can be determined. Each candidate word string can comprise two or more words of a second symbolic system. The candidate word strings can be ranked based on a probability of occurrence of each candidate word string in the obtained text. Based on the ranking, a portion of the candidate word strings can be displayed for selection by the user. | 12-24-2015 |
20150370781 | EXTENDED-CONTEXT-DIVERSE REPEATS - A method for identifying repeat subsequences based a diversity of on their extended contexts includes identifying repeat subsequences of symbols in a sequence that are left and/or right maximal and which have at least a threshold value of different left and/or right contexts. The different right contexts are all right-maximal repeats with respect to subsequences of the symbols that immediately follow an occurrence of the respective repeat subsequence and similarly, the different left contexts are all left-maximal repeats with respect to subsequences of the symbols that immediately precede an occurrence of the respective repeat subsequence. This class of repeat subsequences, referred to as extended-context diverse repeats, since the contexts are not limited to a single symbol, can be output or used for characterizing the sequence or a collection of sequences, such as a document or collection of documents. | 12-24-2015 |
20150370782 | RELATION EXTRACTION USING MANIFOLD MODELS - According to an aspect, relation extraction using manifold models includes identifying semantic relations to be modeled in a selected domain. Data is collected from at least one unstructured data source based on the identified semantic relations. Labeled and unlabeled data that were both generated from the collected data is received. The labeled data includes indicators of validity of the identified semantic relations in the labeled data. Training data that includes both the labeled and unlabeled data is created. A manifold model is trained based on the training data. The manifold model is applied to new data, and a semantic relation is extracted from the new data based on the applying. | 12-24-2015 |
20150378985 | METHOD AND SYSTEM FOR PROVIDING SEMANTICS BASED TECHNICAL SUPPORT - A method and system for providing semantics based technical support. The embodiments herein relates to providing semantics based technical support, and more particularly to providing semantics based technical support based on available knowledge sources and similarity of technical support issues. Embodiments disclosed herein provide users with requisite information in real time while an issue is being reported. | 12-31-2015 |
20150378986 | CONTEXT-AWARE APPROACH TO DETECTION OF SHORT IRRELEVANT TEXTS - Systems and methods are disclosed for determining whether a short amount of text is irrelevant. Initially, an article is selected having one or more comments of varying length. Depending on the number of comments available, a native context may be constructed based on a given comment and other neighboring comments. In other embodiments, a transferred context may be constructed from the given comment and topically similar comments extracted from other, topically similar articles. A native context-aware feature may be determined from the constructed native context and a transferred context-aware feature may be determined from the constructed transferred context. These features may be leveraged by a language classifier to determine whether a given comment is irrelevant. | 12-31-2015 |
20150378987 | INSIGHT ENGINE - Embodiments of the invention provide systems and methods for generating natural language insights about a set of data. More specifically, embodiments of the present invention are directed to methods and systems that transform data into insights or actionable information. The output generated by embodiments of the present invention would be equivalent to that of an observation made or insights gathered by a qualified data scientist presented with the same data. Embodiments as described herein can include an insight engine that can analyze both structured and unstructured data and generate information in a natural language of the user's choice. Insights provided by embodiments described herein can be supported by an ability to drilldown to graphs/tables and atomic data and provide a good starting point for further analysis. | 12-31-2015 |
20150378988 | AUTOMATIC QUESTION DETECTION IN NATURAL LANGUAGE - Systems and methods may provide for separating a sentence into a plurality of clauses and applying a set of question detection rules to each of the plurality of clauses. Additionally, the sentence may be automatically designated as a question if the question detection rules indicate that at least one of the plurality of clauses is a question. In one example, at least one of the question detection rules defines an order of a plurality of parts of speech. | 12-31-2015 |
20160004690 | SYSTEM AND METHOD FOR LEARNING LATENT REPRESENTATIONS FOR NATURAL LANGUAGE TASKS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for learning latent representations for natural language tasks. A system configured to practice the method analyzes, for a first natural language processing task, a first natural language corpus to generate a latent representation for words in the first corpus. Then the system analyzes, for a second natural language processing task, a second natural language corpus having a target word, and predicts a label for the target word based on the latent representation. In one variation, the target word is one or more word such as a rare word and/or a word not encountered in the first natural language corpus. The system can optionally assigning the label to the target word. The system can operate according to a connectionist model that includes a learnable linear mapping that maps each word in the first corpus to a low dimensional latent space. | 01-07-2016 |
20160012020 | METHOD AND SYSTEM FOR ROBUST TAGGING OF NAMED ENTITIES IN THE PRESENCE OF SOURCE OR TRANSLATION ERRORS | 01-14-2016 |
20160012033 | Method and System for Linear Generalized LL Recognition and Context-Aware Parsing | 01-14-2016 |
20160012034 | ENTAILMENT EVALUATION DEVICE, ENTAILMENT EVALUATION METHOD, AND RECORDING MEDIUM | 01-14-2016 |
20160012037 | LANGUAGE INDEPENDENT PROBABILISTIC CONTENT MATCHING | 01-14-2016 |
20160012038 | SEMANTIC TYPING WITH N-GRAM ANALYSIS | 01-14-2016 |
20160012040 | DATA PROCESSING DEVICE AND SCRIPT MODEL CONSTRUCTION METHOD | 01-14-2016 |
20160012041 | Identifying Unchecked Criteria in Unstructured and Semi-Structured Data | 01-14-2016 |
20160019200 | SYSTEMS FOR DYNAMICALLY GENERATING AND PRESENTING NARRATIVE CONTENT - In some embodiments, a non-transitory processor-readable medium stores code representing instructions that when executed cause a processor to select a narrative content template based at least in part on a predetermined content type associated with a real-world and/or virtual event. The code further represents instructions that when executed cause the processor to select a narrative tone type. The code further represents instructions that when executed cause the processor to, for each phrase included in an ordered set of phrases associated with the narrative content template, select, based at least in part on the narrative tone type, a phrase variation from a set of phrase variations associated with that phrase, and define, based on the selected phrase variation and at least one datum from a set of data, a narrative content portion associated with the real-world event. The code further represents instructions that when executed cause the processor to output, at a display, the narrative content portion. | 01-21-2016 |
20160019201 | TONE MARK BASED TEXT SUGGESTIONS FOR CHINESE OR JAPANESE CHARACTERS OR WORDS - For suggesting input text based on tone mark information for Chinese or Japanese characters or words, an apparatus, system, method, and computer program product are disclosed. The apparatus may include a processor, a handwriting input unit operatively coupled to the processor, an input text module that receives input text comprising at least one character, a tone mark module that identifies a tone mark associated with the input text, and a suggestion module that proposes at least one next character based on the identified tone mark. The input text module may receive a user selection of the at least one next character. The input text may include characters selected from the group consisting of: Chinese characters and Japanese characters. | 01-21-2016 |
20160019202 | SYSTEM, METHOD, AND APPARATUS FOR REVIEW AND ANNOTATION OF AUDIOVISUAL MEDIA CONTENT - An apparatus for reviewing and annotating audiovisual media content includes a transcript parser, syncing module, video module, transcript viewer module, annotation module, and data module. The transcript parser parses a transcript of a video. The syncing module synchronizes the video with the transcript. The video module streams the video to a user. The transcript viewer module displays the transcript to the user in sync with the video. The annotation module assigns a tag to a portion of the video in response to the user marking with the tag a portion of the transcript in sync with the portion of the video. The data module stores information pertaining to the portion of the transcript marked by the user. | 01-21-2016 |
20160019204 | MATCHING LARGE SETS OF WORDS - Word phrases are stored in a phrase structure. Each word is stored as a keyword in a keyword structure. Each keyword is associated with usage attributes identifying use of a word in a word phrase. Any preceding words associated with a keyword, and a mapping from any preceding words to a word phrase, is stored for each word. A word string is input. Match attributes are updated in a match structure if a word in the word string matches any keyword and if any preceding words associated with any matching keyword includes a preceding word which precedes the word in the word string. The match attributes indicate use of the matching word in the word string and in a word phrase. Whether a word phrase is present in the word string is determined based on the usage attributes and the match attributes associated with multiple matching words. | 01-21-2016 |
20160019885 | WORD CLOUD DISPLAY - Machine learning-based methods to improve the knowledge extraction process in a specific domain or business environment, and then provides that extracted knowledge in a word cloud user interface display capable of summarizing and conveying a vast amount of information to a user very quickly. Based on the self-training mechanism developed by the inventors, the ontology programming automatically trains itself to understand the domain or environment of the communication data by processing and analyzing a defined corpus of communication data. The developed ontology can be applied to process a dataset of communication information to create a word cloud that can provide a quick view into the content of the dataset, including information about the language used by participants in the communications, such as identifying for a user key phrases and terms, the frequency of those phrases, the originator of the terms of phrases, and the confidence levels of such identifications. | 01-21-2016 |
20160026617 | SYSTEM AND METHOD DETECTING HIDDEN CONNECTIONS AMONG PHRASES - A system and method for identifying hidden connections among non-sentiment phrases are presented. The method includes identifying all connections among a plurality of non-sentiment phrases based on at least one proximity rule; determining direct connections among the identified connections, wherein each direct connection meets a predetermined correlation; filtering out the determined direct connections from the identified connections to yield hidden connections among the identified connections; analyzing the hidden connections to identify a common phrase, wherein the common phrase is associated with at least two hidden connections; generating a new hidden connection among the plurality of non-sentiment phrases based on the common phrase; and associating a sentiment phrase with at least two non-sentiment phrases having a hidden connection, wherein the association is a term taxonomy. | 01-28-2016 |
20160026621 | INFERRING TYPE CLASSIFICATIONS FROM NATURAL LANGUAGE TEXT - A device may obtain text to be processed to infer type classifications associated with terms in the text. The type classifications may indicate types of values that the terms are intended to represent. The device may infer type classifications corresponding to terms in the text by performing a type classification technique. The type classification technique may include a name-based analysis, a context-based analysis a synonym-based analysis, or a valued-based analysis. These analyses may compare information, associated with the terms in the text, to type indicators that indicate the type classifications. The device may provide information that identifies a type relationship between a particular type classification and a particular term based on inferring the one or more type classifications. | 01-28-2016 |
20160026622 | HYBRID MACHINE-USER LEARNING SYSTEM AND PROCESS FOR IDENTIFYING, ACCURATELY SELECTING AND STORING SCIENTIFIC DATA - A process for identifying, accurately selecting, and storing scientific data that is present in textual formats. The process includes providing scientific data located in a text document and searching the text document using a computer and selecting a plurality of key words and phrases using an algorithm. The selected key words and phrases are matched with a plurality of semantic definitions and a plurality of semantic definition-key words and phrase pairs are created. The created plurality of semantic definition-key words and phrase pairs are displayed to a user via a computer user interface and the user selects which of the created plurality of semantic definition-key words and phrase pairs are accurate. The process also includes storing the selected and accurate semantic definition-key words and phrase pairs in computer memory. | 01-28-2016 |
20160026624 | CUSTOMIZABLE AND LOW-LATENCY INTERACTIVE COMPUTER-AIDED TRANSLATION - Methods and systems for computer-aided translation include receiving a document having one or more sentences to be translated; generating a suggestion pool of possible translations for each sentence in the document; providing a best suggestion from the suggestion pool to a user for a sentence being translated; updating the suggestion pool based on the user's input of a translation prefix; and providing an updated best suggestion from the updated suggestion pool to the user for the sentence being translated. | 01-28-2016 |
20160027433 | METHOD OF SELECTING TRAINING TEXT FOR LANGUAGE MODEL, AND METHOD OF TRAINING LANGUAGE MODEL USING THE TRAINING TEXT, AND COMPUTER AND COMPUTER PROGRAM FOR EXECUTING THE METHODS - Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods. The present invention provides for selecting training text for a language model that includes: generating a template for selecting training text from a corpus in a first domain according to generation techniques of: (i) replacing one or more words in a word string selected from the corpus in the first domain with a special symbol representing any word or word string, and adopting the word string after replacement as a template for selecting the training text; and/or (ii) adopting the word string selected from the corpus in the first domain as the template for selecting the training text; and selecting text covered by the template as the training text from a corpus in a second domain different from the first domain. | 01-28-2016 |
20160034445 | METHOD AND SYSTEM FOR IMPLEMENTING SEMANTIC TECHNOLOGY - Disclosed is an approach for allowing an entity to perform semantic analysis upon private data possessed by an enterprise, and to automatically perform categorization of that data for processing within the enterprise. A semantic API can be provided to allow the enterprise to provide the private data to a sematic analysis system, even when the semantic analysis system is configured as a multi-tenant system that handles other items of public or private data. A rules-based routing architecture may be provided to facilitate analysis and routing of analyzed messages to the appropriate destination within the organization. | 02-04-2016 |
20160034446 | ESTIMATION OF TARGET CHARACTER TRAIN - A desired character train included in a predefined reference character train, such as lyrics, is set as a target character train, and a user designates a target phoneme train that is indirectly representative of the target character train by use of a limited plurality of kinds of particular phonemes, such as vowels and a particular consonants. A reference phoneme train indirectly representative of the reference character train by use of the particular phonemes is prepared in advance. Based on a comparison between the target phoneme train and the reference phoneme train, a sequence of the particular phonemes in the reference phoneme train that matches the target phoneme train is identified, and a character sequence in the reference character train that corresponds to the identified sequence of the particular phonemes is identified. The thus-identified character sequence estimates the target character train. | 02-04-2016 |
20160041967 | System for Natural Language Understanding - A general-purpose apparatus for analyzing natural language text that allows for the implementation of a broad range of natural language understanding applications. The apparatus for natural language understanding analyzes a source text and transforms the source text into a semantically-interpretable syntactic representation (SISR), comprising a syntax template and semantic clause annotations. The general-purpose apparatus for natural language understanding is adaptable to various source text natural languages and is adaptable to various natural language understanding applications, such as query answering, translation, summarization, information extraction, disambiguation, and parsing. A natural language query answering apparatus for answering questions about a source text, whereby the query answering apparatus utilizes the general-purpose apparatus for transforming the natural language query into SISR format. | 02-11-2016 |
20160041980 | ANSWERING TIME-SENSITIVE QUESTIONS - A method providing an answer to an input question containing at least one time-sensitive word or at least one time-sensitive phrase using natural language processing (NLP) is provided. The method may include receiving the input question. The method may also include performing natural language processing (NLP) analysis on the input question to extract a required value phrase. The method may further include forming at least one mathematical equation based on the extracted required value phrase. Additionally, the method may include forming at least one interim question based on the extracted required value phrase. The method may further include solving the at least one formed mathematical equation and the at least one formed interim question. The method may also include narrating the answer to the input question in natural language based on the solved at least one interim question or the solved at least one mathematical equation. | 02-11-2016 |
20160042748 | VOICE APPLICATION ARCHITECTURE - A voice-based system may comprise a local speech interface device and a remote control service. A user may interact with the system using speech to obtain services and perform functions. The system may allow a user to install applications to provide enhanced or customized functionality. Such applications may be installed on either the speech interface device or the control service. The control service receives user speech and determines user intent based on the speech. If an application installed on the control service can respond to the intent, that application is called. Otherwise, the intent is provided to the speech interface device which responds by invoking one of its applications to respond to the intent. | 02-11-2016 |
20160047670 | METHOD AND APPARATUS FOR NAVIGATION - A method for generating a next valid character tree may comprise:—providing a first name, the first name comprising a first character represented by a first character code,—providing a second name, the second name comprising a second character represented by a second character code, wherein the first character and the second character are homoglyphs with respect to each other, and—generating a next valid character tree according to the names such that the next valid character tree comprises a combined node, which is a place holder for the first character and for the second character. | 02-18-2016 |
20160048499 | SYSTEMATIC TUNING OF TEXT ANALYTIC ANNOTATORS - A data structure is generated containing enumerators for data types of a domain, text forms of the enumerators and context patterns for the text forms. The data structure also includes information extraction rules that are associated with the enumerators. The data structure is updated with additional context patterns and text forms that are identified within a set of documents to which text analytic annotators are to be tuned. The set of documents are analyzed against the updated data structure and additional extraction rules are generated based on the analysis. | 02-18-2016 |
20160048500 | Concept Identification and Capture - Disclosed methods and systems are directed to concept identification and capture. The methods and systems may include receiving, by a device, a first natural language input comprising one or more terms, and analyzing the first natural language input via a natural language processing engine to identify one or more named entities associated with the one or more terms, wherein each of the one or more named entities is associated with at least one category of a plurality of categories. The methods and systems may also include detecting a text field configured to receive text, the text field being associated with one of the plurality of categories, and inputting into the text field one of the one or more identified named entities based on the text field being associated with a same category as the one of the one or more named entities. | 02-18-2016 |
20160048501 | SYSTEMATIC TUNING OF TEXT ANALYTIC ANNOTATORS - A data structure is generated containing enumerators for data types of a domain, text forms of the enumerators and context patterns for the text forms. The data structure also includes information extraction rules that are associated with the enumerators. The data structure is updated with additional context patterns and text forms that are identified within a set of documents to which text analytic annotators are to be tuned. The set of documents are analyzed against the updated data structure and additional extraction rules are generated based on the analysis. | 02-18-2016 |
20160048504 | CONVERSION OF INTERLINGUA INTO ANY NATURAL LANGUAGE - The embodiments herein achieve a natural language generation system and mechanisms for converting an interlingua into any set of natural languages. The system is capable of converting a large class of generic, semantically-oriented interlingua into any natural language. The system may be incorporated on PCs, mobile devices or may be an application running on a remote system which allows for language-independent messages to be constructed, which can be de-constructed into any language on the receiver's side. Mechanisms of implementation would also be of assistance in allowing people with speech, communication or language disabilities, language difficulties, language-independent or precise human-human or human-machine communication to communicate effectively. | 02-18-2016 |
20160049149 | METHOD AND DEVICE FOR PROACTIVE DIALOGUE GUIDANCE - In a method for the proactive guidance of an information system with a user, information potentially to be transmitted to the user by the information system, which is built into a vehicle or another mobile technical system, passes through a relevance and/or plausibility check based on contextual knowledge about the user and, on detection of information to be transmitted, the information system proactively initiates a dialogue with the user. A corresponding system is also described. | 02-18-2016 |
20160055141 | STRING COMPARISON RESULTS FOR CHARACTER STRINGS USING FREQUENCY DATA - A similarity between character strings is assessed by identifying first and second character strings as candidate similar character strings, determining a frequency of occurrence for at least one of the first and second character strings from a collection of character strings, and designating the first and second character strings as similar based on the determined frequency of occurrence. | 02-25-2016 |
20160055144 | STRING COMPARISON RESULTS FOR CHARACTER STRINGS USING FREQUENCY DATA - A similarity between character strings is assessed by identifying first and second character strings as candidate similar character strings, determining a frequency of occurrence for at least one of the first and second character strings from a collection of character strings, and designating the first and second character strings as similar based on the determined frequency of occurrence. | 02-25-2016 |
20160055145 | ESSAY MANAGER AND AUTOMATED PLAGIARISM DETECTOR - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for essay managing and plagiarism detecting are disclosed. A method includes receiving one or more essay drafts in response to an essay prompt that is provided by an online college application. The method includes determining one or more subject-verb pairs and one or more adjective-noun pairs for the one or more essay drafts by parsing the one or more essay drafts. The method includes storing the one or more essay drafts and the one or more subject-verb pairs and one or more adjective-noun pairs for the one or more essay drafts. The method includes receiving an additional essay draft in response to an additional essay prompt that is provided by an additional online college application. The method includes determining one or more additional subject-verb pairs and one or more additional adjective-noun pairs for the additional essay draft. | 02-25-2016 |
20160055146 | DOCUMENT PROCESSING DEVICE, DOCUMENT PROCESSING METHOD, PROGRAM, AND INFORMATION STORAGE MEDIUM - Displaying supplemental information for an element in a document based on changes in a user's ability to read the document. A document processing device configured to: acquire information on a document including a plurality of words; acquire pieces of supplemental information being linked with the plurality of words; decide whether or not a piece of supplemental information linked with corresponding one of the plurality of words is to be displayed based on a frequency with which each of the plurality of words has appeared; and control displaying the plurality of words and the pieces of supplemental information. In the deciding, it is decided whether or not the corresponding one of the piece of supplemental information is to be displayed based on a frequency with which each of the plurality of words has been displayed along with the piece of supplemental information. | 02-25-2016 |
20160055147 | METHOD AND SYSTEM FOR PROCESSING SEMANTIC FRAGMENTS - The present invention discloses a method and system for processing semantic fragments. Some embodiments of the present invention provides a method for processing semantic fragments. The method comprises: obtaining a plurality of groups of semantic fragments, the plurality of groups of semantic fragments at least including a first group of semantic fragments generated from a first data processing flow and a second group of semantic fragments generated from a second data processing flow, the first data processing flow being different from the second data processing flow; and merging the first group of semantic fragment and the second group of semantic fragment based on semantic equivalence. A corresponding system is also disclosed. | 02-25-2016 |
20160055187 | INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD AND STORAGE MEDIUM - Setting or verification of a monitoring rule in response to a monitoring target environment is supported. An information processing system includes a situation information receiving unit that receives an input of situation information indicating a situation in a monitoring target environment. The information processing system further includes a normal situation storage unit. The normal situation storage unit stores environment information indicating the monitoring target environment in association with a set of situation information indicating a situation that is not abnormal in the monitoring target environment. The information processing system further includes a retrieval unit. The retrieval unit refers to the normal situation storage unit upon receiving the input of the situation information indicating the information in the monitoring target environment. The retrieval unit then retrieves the environment information associated with the set of the situation information that does not include the input situation information. | 02-25-2016 |
20160062965 | GENERATION OF PARSABLE DATA FOR DEEP PARSING - One or more processors identify one or more character errors in a document. The one or more processors replace a character having the identified one or more character errors with a replacement character. The replacement of the character error with the replacement character allows deep parsing of the document to complete. The one or more processors apply to the document one or both of a deep parsing and natural language processing after the replacing. | 03-03-2016 |
20160062969 | METHODS AND APPARATUS RELATED TO AUTOMATICALLY REWRITING STRINGS OF TEXT - Methods and apparatus related to automatically rewriting a string of text utilizing one or more rewrite rules. Some implementations are directed to scoring rewrite rules based at least in part on user interactions with rewrites that are generated by applying the rewrite rules. Some implementations are directed to determining the effectiveness of a rewrite generated based on applying one or more rewrite rules to a string of text. In some of those implementations, the determination may be based at least in part on one or more characteristics of the string of text, one or more characteristics of the rewrite, and/or scores associated with the rewrite rules. | 03-03-2016 |
20160062979 | WORD CLASSIFICATION BASED ON PHONETIC FEATURES - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining a textual term; determining, by one or more computers, a vector representing a phonetic feature of the textual term; comparing the vector representing the phonetic feature of the textual term with a reference vector representing a phonetic feature of a reference textual term; and classifying the textual term based on the comparing the vector with the reference vector. | 03-03-2016 |
20160062981 | METHODS AND APPARATUS RELATED TO DETERMINING EDIT RULES FOR REWRITING PHRASES - Methods and apparatus related to determining an edit rule based on a plurality of edits. Some implementations are directed to identifying the plurality of edits in one or more documents and determining an edit rule based on the pre-edit and post-edit phrases of the edits. Some implementations are directed to identifying the edits from one or more mature documents. The determined edit rule may be utilized to determine one or more candidate rephrasings of a subsequent phrase. | 03-03-2016 |
20160062982 | NATURAL LANGUAGE PROCESSING SYSTEM AND METHOD - A natural language processing system is disclosed herein. Embodiments of the NLP system perform hand-written rule-based operations that do not rely on a trained corpus. Rules can be added or modified at any time to improve accuracy of the system, and to allow the same system to operate on unstructured plain text from many disparate contexts (e.g. articles as well as twitter contexts as well as medical articles) without harming accuracy for any one context. Embodiments also include a language decoder (LD) that generates information which is stored in a three-level framework (word, clause, phrase). The LD output is easily leveraged b various software applications to analyze large quantities of text from any source in a more sophisticated and flexible manner than previously possible. A query language (LDQL) for information extraction from NLP parsers' output is disclosed, with emphasis on on its embodiment implemented for LD. It is also presented, how to use LDQL for knowledge extraction on the example of application named Knowledge Browser. | 03-03-2016 |
20160062983 | ELECTRONIC DEVICE AND METHOD FOR RECOGNIZING NAMED ENTITIES IN ELECTRONIC DEVICE - A method for operating an electronic device is provided. The method includes analyzing text to recognize at least one named entity, compare the recognized at least one named entity with at least one piece of reference information to determine the similarity, as a result of the determination, selecting at least one piece of reference information of which the similarity with respect to the recognized at least one named entity is greater than or equal to a reference value, and executing a predetermined function, based on the selected at least one piece of reference information. | 03-03-2016 |
20160062984 | DEVICES AND METHODS FOR DETERMINING A RECIPIENT FOR A MESSAGE - In one aspect, a device includes a processor, a touch-enabled display accessible to the processor, and a memory accessible to the processor. The memory bears instructions executable by the processor to receive first input pertaining to at least a first recipient to which a message is to be sent, receive second input pertaining to a body of the message, and parse at least a portion of the body of the message to determine, based on at least a portion of the body of the message, whether the first recipient is a correct recipient for the message. | 03-03-2016 |
20160062985 | Clustering Classes in Language Modeling - This document describes, among other things, a computer-implemented method. The method can include obtaining a plurality of text samples that each include one or more terms belonging to a first class of terms. The plurality of text samples can be classified into a plurality of groups of text samples. Each group of text samples can correspond to a different sub-class of terms. For each of the groups of text samples, a sub-class context model can be generated based on the text samples in the respective group of text samples. Particular ones of the sub-class context models that are determined to be similar can be merged to generate a hierarchical set of context models. Further, the method can include selecting particular ones of the context models and generating a class-based language model based on the selected context models. | 03-03-2016 |
20160062986 | SYSTEMS AND METHODS FOR ANALYZING DOCUMENT COVERAGE - A system including a memory storing a meaning taxonomy is provided. The meaning taxonomy includes meaning loaded entities and associations between meaning loaded entities and syntactic structures. Each association links a meaning loaded entity to a syntactic structure. The system includes a processor coupled with the memory and components executable by the processor configured to receive content generated by a source, the content including syntactic structures, identify meaning loaded entities that are linked to the syntactic structures by associations, calculate a content summary indicating a level of coverage of the meaning loaded entities within the content, and provide a representation of the summary to an external entity. | 03-03-2016 |
20160062988 | GENERATING RESPONSES TO ELECTRONIC COMMUNICATIONS WITH A QUESTION ANSWERING SYSTEM - Text is received from a first client. The text is associated with an electronic communication tool for communication to a second client. Candidate answers are generated based on the text using a question answering system. The question answering system generates the candidate answers based on a plurality of data sources, including at least one personalized data source and at least one informational data source. At least one of the candidate answers is provided to the second client. Each of the candidate answers provided to the second client is selectable. | 03-03-2016 |
20160062989 | APPARATUS AND METHOD FOR PREDICTING THE PLEASANTNESS-UNPLEASANTNESS INDEX OF WORDS USING RELATIVE EMOTION SIMILARITY - An apparatus and a method for predicting the pleasantness-unpleasantness index of words are disclosed. The disclosed apparatus includes: a computing unit configured to compute an emotion correlation between a word and one or more comparison word, compute emotion correlations between multiple reference words included in a reference word set and the one or more comparison word, compute multiple first absolute emotion similarity values between the word and the multiple reference words, and compute at least one second absolute emotion similarity value between a reference word and another reference word for all of the reference words included in the reference word set; and a prediction unit configured to predict the pleasantness-unpleasantness index of the word by using the multiple number of first absolute emotion similarity values, the at least one second absolute emotion similarity value, and a preset pleasantness-unpleasantness index of the multiple number of reference words. | 03-03-2016 |
20160070692 | DETERMINING SEGMENTS FOR DOCUMENTS - A document is received for segmentation. The document includes multiple atomic textual units in a sequence. These units may correspond to sentences, phrases, paragraphs, concept phrases, chapters, etc. A distance function is selected that determines a distance between one set of atomic textual units and another set of atomic textual units. The distance between the sets is large for sets that are dissimilar, and small for sets that are similar. The distance function is applied to the atomic textual units to separate each of the atomic textual units into multiple segments, while maintaining the sequence of the atomic textual units. | 03-10-2016 |
20160070696 | TASK SWITCHING IN DIALOGUE PROCESSING - Disclosed methods and systems are directed to task switching in dialogue processing. The methods and systems may include activating a primary task, receiving, one or more ambiguous natural language commands, and identifying a first candidate task for each of the one or more ambiguous natural language commands. The methods and system may also include identifying, for each of the one or more ambiguous natural language commands and based on one or more rules, a second candidate task of the plurality of tasks corresponding to the ambiguous natural language command, determining whether to modify at least one of the one or more rules-based task switching rules based on whether a quality metric satisfies a threshold quantity, and when the second quality metric satisfies the threshold quantity, changing the task switching rule for the corresponding candidate task from a rules-based model to the optimized statistical based task switching model. | 03-10-2016 |
20160070697 | LANGUAGE MODEL WITH STRUCTURED PENALTY - A penalized loss is optimized using a corpus of language samples respective to a set of parameters of a language model. The penalized loss includes a function measuring predictive accuracy of the language model respective to the corpus of language samples and a penalty comprising a tree-structured norm. The trained language model with optimized values for the parameters generated by the optimizing is applied to predict a symbol following sequence of symbols of the language modeled by the language model. In some embodiments the penalty comprises a tree-structured l | 03-10-2016 |
20160071517 | Evaluating Conversation Data based on Risk Factors - This disclosure describes techniques and architectures for evaluating conversations. In some instances, conversations with users, virtual assistants, and others may be analyzed to identify potential risks within a language model that is employed by the virtual assistants and other entities. The potential risks may be evaluated by administrators, users, systems, and others to identify potential issues with the language model that need to be addressed. This may allow the language model to be improved and enhance user experience with the virtual assistants and others that employ the language model. | 03-10-2016 |
20160072903 | ASSOCIATION OF AN EMOTIONAL INFLUENCER TO A POST IN A SOCIAL MEDIUM - A method for associating an emotional influencer to a post may include determining, by a processor, an emotional baseline for a user and detecting, by the processor, a post by the user on a social medium. The method may also include analyzing the content of the post to determine an emotion of the user based on the content of the post and determining a difference between the emotion of the user associated with the post and the emotional baseline of the user. The method may additionally include determining an emotional influencer of the post in response to the difference between the emotion of the user associated with the post and the emotional baseline of the user exceeding a preset threshold. The method may further include tagging the emotional influencer to the post based on the emotional influencer being related to the post. | 03-10-2016 |
20160078014 | RULE DEVELOPMENT FOR NATURAL LANGUAGE PROCESSING OF TEXT - In a computing device that defines a rule for natural language processing of text, annotated text is selected from a first document of a plurality of annotated documents. An entity rule type is selected from a plurality of entity rule types. An argument of the selected entity rule type is identified. A value for the identified argument is randomly selected based on the selected annotated text to generate a rule instance. The generated rule instance is applied to remaining documents of the plurality of annotated documents. A rule performance measure is computed based on application of the generated rule instance. The generated rule instance and the computed rule performance measure are stored for application to other documents. | 03-17-2016 |
20160078017 | System and Method for Integrated Development Environments for Dynamically Generating Narrative Content - The present invention is a method and apparatus for narrative content generation using narrative frameworks by receiving a first phrase variation and a second phrase variation and displaying an error indication when the first phrase variation fails to satisfy a criterion relative to the second phrase variation. If there is an error indication, alternate phrase variations are received and compared against the first phrase variation until an alternate phrase variation is selected that has no error indication. Additionally, multiple sets of operators for updating one or more narrative phrases selected for inclusion in the narrative content framework may be utilized to update selected phrases after inclusion in the narrative framework but prior to finalizing the narrative content to be output. | 03-17-2016 |
20160078018 | Method for Identifying Verifiable Statements in Text - A method, system and computer-usable medium are disclosed for identifying verifiable statements in a corpus of text. A training corpus of text containing manually annotated instances of verifiable and non-verifiable statements is processed to parse the text into segmented statements, which are in turn processed to extract features. The extracted features and the annotated statements are then processed with a machine learning algorithm to generate a verifiable statement classification model. In turn, the verifiable statement classification model is referenced by a verifiable statement classification system to distinguish verifiable and non-verifiable statements contained within an input corpus of text. | 03-17-2016 |
20160078020 | SPEECH TRANSLATION APPARATUS AND METHOD - According to one embodiment, a speech translation apparatus includes a recognizer, a detector, a convertor and a translator. The recognizer recognizes a speech in a first language to generate a recognition result. The detector detects translation segments suitable for machine translation from the recognition result to generate translation-segmented character strings that are obtained by dividing the recognition result based on the detected translation segments. The convertor converts the translation-segmented character strings into converted character strings which are expressions suitable for the machine translation. The translator translates the converted character strings into a second language which is different from the first language to generate translated character strings. | 03-17-2016 |
20160078127 | AUTOMATIC DATA INTERPRETATION AND ANSWERING ANALYTICAL QUESTIONS WITH TABLES AND CHARTS - A method providing an answer to at least one analytical question containing at least one table or at least one chart is provided. The method may include receiving an input question. The method may also include extracting a plurality of information from the input question based on a natural language analysis. The method may further include forming a well-defined sentence. The method may include extracting at least one table or at least one chart associated with the input question. The method may include forming at least one mathematical equation. The method may also include solving the at least one mathematical equation. The method may include determining the answer to the input question in natural language based on the solved at least one mathematical equation. The method may further include narrating the determined answer to the input question in natural language. | 03-17-2016 |
20160078861 | Discriminative Training of Document Transcription System - A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript. | 03-17-2016 |
20160078866 | PLATFORM FOR CREATING CUSTOMIZABLE DIALOG SYSTEM ENGINES - Provided are systems and methods for creating custom dialog system engines. The system comprises a dialog system interface installed on a first server or a user device and a platform deployed on a second server. The platform is configured to receive dialog system entities and intents associated with a developer profile and associate the dialog system entities with the dialog system intents to form a custom dialog system engine associated with the dialog system interface. The web platform receives a user request from the dialog system interface, activates the custom dialog system engine based on identification, and retrieves the dialog system entities and intents. The user request is processed by applying the dialog system entities and intents to generate a response to the user request. The response is sent to the dialog system interface. | 03-17-2016 |
20160078868 | SUGGESTING INTENT FRAME(S) FOR USER REQUEST(S) - Techniques are described herein that are capable of suggesting intent frame(s) for user request(s). For instance, the intent frame(s) may be suggested to elicit a request from a user. An intent frame is a natural language phrase (e.g., a sentence) that includes at least one carrier phrase and at least one slot. A slot in an intent frame is a placeholder that is identified as being replaceable by one or more words that identify an entity and/or an action to indicate an intent of the user. A carrier phrase in an intent frame includes one or more words that suggest a type of entity and/or action that is to be identified by the one or more words that may replace the corresponding slot. In accordance with these techniques, the intent frame(s) are suggested in response to determining that natural language functionality of a processing system is activated. | 03-17-2016 |
20160080165 | SMART HOME AUTOMATION SYSTEMS AND METHODS - A smart home interaction system is presented. It is built on a multi-modal, multithreaded conversational dialog engine. The system provides a natural language user interface for the control of household devices, appliances or household functionality. The smart home automation agent can receive input from users through sensing devices such as a smart phone, a tablet computer or a laptop computer. Users interact with the system from within the household or from remote locations. The smart home system can receive input from sensors or any other machines with which it is interfaced. The system employs interaction guide rules for processing reaction to both user and sensor input and driving the conversational interactions that result from such input. The system adaptively learns based on both user and sensor input and can learn the preferences and practices of its users. | 03-17-2016 |
20160085740 | GENERATING TRAINING DATA FOR DISAMBIGUATION - A method for generating training data for disambiguation of an entity comprising a word or word string related to a topic to be analyzed includes acquiring sent messages by a user, each including at least one entity in a set of entities; organizing the messages and acquiring sets, each containing messages sent by each user; identifying a set of messages including different entities, greater than or equal to a first threshold value, and identifying a user corresponding to the identified set as a hot user; receiving an instruction indicating an object entity to be disambiguated; determining a likelihood of co-occurrence of each keyword and the object entity in sets of messages sent by hot users; and determining training data for the object entity on the basis of the likelihood of co-occurrence of each keyword and the object entity in the sets of messages sent by the hot users. | 03-24-2016 |
20160085741 | ENTITY EXTRACTION FEEDBACK - Techniques associated with entity extraction feedback are described in various implementations. In one example implementation, a method may include generating a proposed entity extraction result associated with a document, the proposed entity extraction result being generated based on a ruleset applied to the document. The method may also include receiving feedback about the proposed entity extraction result, the feedback including an actual entity associated with the document and a feature of the document that is indicative of the actual entity. The method may also include determining a proposed modification to the ruleset based on the feedback. | 03-24-2016 |
20160085742 | AUTOMATED COLLECTIVE TERM AND PHRASE INDEX - Knowledge automation techniques may include selecting a knowledge element from a knowledge corpus of an enterprise for extraction of n-grams, and deriving a term vector comprising terms in the knowledge element. Based at least on a frequency of occurrence of each term in the knowledge element, key terms are identified in the term vector. Thereafter, the identified key terms are used to extract one or more n-grams from the knowledge element. Each of the extracted n-grams is scored as a function of at least a frequency of occurrence of each of the n-grams across the knowledge corpus of the enterprise, and based on the scoring, one or more of the n-grams is added to a collective term and phrase index. | 03-24-2016 |
20160085743 | SYSTEM FOR KNOWLEDGE ACQUISITION - A system and method that translates sentences of natural language text into sets of axioms of formal logic that are consistent with parses resulting from NLP and acquired constraints as they accumulate. The system and method further present these axioms so as to facilitate further disambiguation of such sentences and produces axioms of formal logic suitable for processing by automated reasoning technologies, such as first-order or description logic suitable for processing by various reasoning algorithms, such as logic programs, inference engines, theorem provers, and rule-based systems. | 03-24-2016 |
20160085745 | PERSPECTIVE DATA ANALYSIS AND MANAGEMENT - A system and computer implemented method for managing perspective data is disclosed. The method may include collecting a first lot of perspective data for an item. The method may include introducing a variant feature to the item to constitute a modified item. The method may include collecting a second lot of perspective data for the modified item. The method may also include evaluating the first and second lots of perspective data to ascertain a sentiment fluctuation based on information relevant to the variant feature. | 03-24-2016 |
20160092426 | GENERATIVE GRAMMAR MODELS FOR EFFECTIVE PROMOTION AND ADVERTISING - A system comprising a computer-readable storage medium storing at least one program and a computer-implemented method for creating messages using generative grammar models is presented. Consistent with some embodiments, the method may include receiving a request to generate a message, which in an example embodiment is to be published to a social network platform. In response to receiving the request, a generative grammar model defining the structure of the message is accessed. The generative grammar model may include a number of blanks and may specify a source along with a grammatical constraint for a term to populate each blank. The method may further include generating the message in accordance with the generative grammar model, and causing the generated message to be published. | 03-31-2016 |
20160092427 | Language Identification - A plurality of documents in each of a plurality of languages can be received. A Latent Semantic Indexing (LSI) index can be created from the plurality of documents. A language classification model can be trained from the LSI index. A document to be identified by language can be received. A vector in the LSI index can be generated for the document to be identified by language. The vector can be evaluated against the language classification model. | 03-31-2016 |
20160092434 | INTEGRATED WORD N-GRAM AND CLASS M-GRAM LANGUAGE MODELS - Systems and processes for discourse input processing are provided. In one example process, a discourse input can be received from a user. An integrated probability of a candidate word in the discourse input and one or more subclasses associated with the candidate word can be determined based on a conditional probability of the candidate word given one or more words in the discourse input, a probability of the candidate word within a corpus, and a conditional probability of the candidate word given one or more classes associated with the one or more words. A text string corresponding to the discourse input can be determined based on the integrated probability. An output based on the text string can be generated. | 03-31-2016 |
20160093300 | LIBRARY OF EXISTING SPOKEN DIALOG DATA FOR USE IN GENERATING NEW NATURAL LANGUAGE SPOKEN DIALOG SYSTEMS - A machine-readable medium may include a group of reusable components for building a spoken dialog system. The reusable components may include a group of previously collected audible utterances. A machine-implemented method to build a library of reusable components for use in building a natural language spoken dialog system may include storing a dataset in a database. The dataset may include a group of reusable components for building a spoken dialog system. The reusable components may further include a group of previously collected audible utterances. A second method may include storing at least one set of data. Each one of the at least one set of data may include ones of the reusable components associated with audible data collected during a different collection phase. | 03-31-2016 |
20160093301 | PARSIMONIOUS HANDLING OF WORD INFLECTION VIA CATEGORICAL STEM + SUFFIX N-GRAM LANGUAGE MODELS - Systems and processes are disclosed for predicting words using a categorical stem and suffix word n-gram language model. A word prediction includes determining a stem probability using a stem language model. The word prediction also includes determining a suffix probability using suffix language model decoupled from the stem model, in view of one or more stem categories. The word prediction also includes determine a probability of the stem belonging to the stem category. A joint probability is determined based on the foregoing, and one or more word predictions having sufficient likelihood. In this way, the categorical stem and suffix language model constraints predicted suffixes to those that would be grammatically valid with predicted stems, thereby producing word predictions with grammatically valid stem and suffix combinations. | 03-31-2016 |
20160098386 | SYSTEM AND METHOD FOR UNSUPERVISED TEXT NORMALIZATION USING DISTRIBUTED REPRESENTATION OF WORDS - A system, method and computer-readable storage devices for providing unsupervised normalization of noisy text using distributed representation of words. The system receives, from a social media forum, a word having a non-canonical spelling in a first language. The system determines a context of the word in the social media forum, identifies the word in a vector space model, and selects an “n-best” vector paths in the vector space model, where the n-best vector paths are neighbors to the vector space path based on the context and the non-canonical spelling. The system can then select, based on a similarity cost, a best path from the n-best vector paths and identify a word associated with the best path as the canonical version. | 04-07-2016 |
20160098387 | Natural Language Processing Utilizing Propagation of Knowledge through Logical Parse Tree Structures - Mechanisms are provided for processing logical relationships in natural language content. A logical parse of a first parse of a natural language content is generated by identifying latent logical operators within the first parse indicative of logical relationships between elements of the natural language content. The logical parse comprises nodes and edges linking nodes. At least one knowledge value is associated with each node in the logical parse. The at least one knowledge value of at least a subset of the nodes in the logical parse is propagated to one or more other nodes in the logical parse based on propagation rules. A reasoning operation is performed on the logical parse to generate a knowledge output indicative of knowledge associated with one or more of the logical relationships between elements of the natural language content. | 04-07-2016 |
20160098389 | Natural Language Processing Utilizing Transaction Based Knowledge Representation - Mechanisms are provided for processing logical relationships in natural language content. A logical parse of a first parse of the natural language content is generated by identifying latent logical terms within the first parse indicative of logical relationships between elements of the natural language content. The logical parse comprises nodes and edges linking nodes. At least one knowledge value is associated with each node in the logical parse. The at least one knowledge value associated with at least a subset of the nodes in the logical parse is propagated to one or more other nodes in the logical parse based on propagation rules. The propagating of the at least one knowledge value generates transaction records in a transaction knowledgebase data structure. A reasoning operation is executed based on the transaction knowledgebase data structure. | 04-07-2016 |
20160098393 | NATURAL LANGUAGE UNDERSTANDING (NLU) PROCESSING BASED ON USER-SPECIFIED INTERESTS - Methods and apparatus for natural language understanding (NLU) processing based on user-specified interests. Information specifying a weight for each of a plurality of domains is received via a user interface. The plurality of domains each relates to a potential area of interest for the user, and the weight for a domain from among the plurality of domains indicates a level of interest for the user in the domain. A ranking classifier used to rank NLU hypotheses generated by an NLU engine is trained using training data from which features are, at least in part, based on the information specifying a weight for each of the plurality of domains. | 04-07-2016 |
20160098394 | Natural Language Processing Utilizing Logical Tree Structures - Mechanisms are provided for processing logical relationships in natural language content. Natural language content is received, upon which a reasoning operation is to be performed. A first parse representation of the natural language content is generated, by a parser, by performing natural language processing on the natural language content. A logical parse of the first parse is generated by identifying latent logical operators within the first parse indicative of logical relationships between elements of the natural language content. A reasoning operation on the logical parse is executed to generate a knowledge output indicative of knowledge associated with one or more of the logical relationships between elements of the natural language content. | 04-07-2016 |
20160098994 | CROSS-PLATFORM DIALOG SYSTEM - Provided are systems and methods for operating a dialog system in a cross-platform environment. The method comprises receiving, by a server comprising at least one processor and a memory storing processor-executable codes, a first request from a first client device to initiate operation of the dialog system. The first client device is identified based at least on the first request. Based on the identification, a first set of predetermined settings associated with a user and the first client device is applied to the dialog system. The operation of the dialog system according to the first set of predetermined settings is initiated and the dialog system is connected to the first client device. | 04-07-2016 |
20160103822 | NATURAL LANGUAGE CONSUMER SEGMENTATION - Techniques are disclosed for using natural language processing techniques to define, manipulate, and interact with consumer segmentations. In such embodiments a content consumption analytics engine can be configured to receive and process a natural language segmentation query. The query may comprise, for example, a command that defines a new segmentation, a command that manipulates existing segmentations, or a command that solicits information relating to existing consumer segmentations. The query is parsed to identify individual grammatical tokens which are then correlated with specific segment token types through the use of a token repository. A custom thesaurus is used to identify synonymous terms for grammatical tokens which may not exist in the token repository. User feedback enables the custom thesaurus to learn additional synonyms for future use. Once the grammatical tokens are mapped onto the identified segment token types, a formal segment definition can be constructed based on a segment definition structure. | 04-14-2016 |
20160103823 | Machine Learning Extraction of Free-Form Textual Rules and Provisions From Legal Documents - Disclosed herein is a system and method for machine learning extraction of free-form textual rules and provisions from legal documents. The method comprising electronically receiving, by the legal rules extraction engine, a document, processing the document using a first trained model executed by the legal rules extraction engine to classify the document into a document class, processing the document using a second trained model executed by the legal rules extraction engine to extract rules within the document conditional on the document class identified by the first trained model, extracting a plurality of data variables from the document by processing the classified features in the document using a third trained model executed by the legal rules extraction engine, generating by the legal rules extraction engine an output vector based on the plurality of data variables, and displaying the output vector by the legal rules extraction engine at the user interface. | 04-14-2016 |
20160103824 | METHOD AND SYSTEM FOR TRANSFORMING UNSTRUCTURED TEXT TO A SUGGESTION - A system for transforming unstructured text into at least one suggestion for content creation, the system having: a tagging module having instructions in memory, said instructions executable by a processor to receive unstructured text from external sources having at least one sentence; disassemble said at least one sentence into individual words; and tag said individual words by determining a speech type for each of said individual works; a chunking module having instructions in memory, said instructions executable by a processor to said individual words together into phrases to form a tree-like structure of the text, when said individual words are tagged correctly; and a suggestion module having instructions in memory, said instructions executable by a processor to generate said at least one suggestion based on said chunking. | 04-14-2016 |
20160103873 | Enhanced Answers in DeepQA System According to User Preferences - A semantic search engine is enhanced to employ user preferences to customize answer output by, for a first user, extracting user preferences and sentiment levels associated with a first question; receiving candidate answer results of a semantic search of the first question; weighting the candidate answer results according to the sentiment levels for each of the user preferences; and producing the selected candidate answers to the first user. Optionally, user preferences and sentiment levels may be accumulated over different questions for the same user, or over different users for similar questions. And, supplemental information may be retrieved relative to a user preference in order to further tune the weighting per the preferences and sentiment levels. | 04-14-2016 |
20160104481 | PHRASE-BASED DIALOGUE MODELING WITH PARTICULAR APPLICATION TO CREATING RECOGNITION GRAMMARS FOR VOICE-CONTROLLED USER INTERFACES - The invention enables creation of grammar networks that can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces (NLVUI). More specifically, the invention concerns a phrase-based modeling of generic structures of verbal interaction and use of these models for the purpose of automating part of the design of such grammar networks. | 04-14-2016 |
20160104485 | Cognitive Security for Voice Phishing Activity - An approach is provided in which a question answer system monitors a voice conversation between a first entity and a second entity. During the conversation, the question answer system parses the conversation into information phrases, and constructs the information phrases into a current conversation pattern. The question answer system identifies deceptive conversation properties of the current conversation by analyzing the current conversation pattern against domain-based conversation patterns. The question answer system, in turn, sends an alert message to the first entity to notify the first entity of the identified deceptive conversation properties. | 04-14-2016 |
20160110327 | TEXT CORRECTION BASED ON CONTEXT - An embodiment provides a method, including: accessing, using a processor of an electronic device, a data store; determining, using a processor, a predetermined context based on the data store; receiving, at an input device of the electronic device, a user text input; analyzing, using a processor, the user text input based on the predetermined context; and offering, using a processor, a suggested modification of the user text input based on the predetermined context. Other aspects are described and claimed. | 04-21-2016 |
20160110339 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM - An information processing apparatus may include a control device to determine a plurality of candidate texts to correct a target text which is from an input text string, based on preceding text located at a position preceding the target text and succeeding text located at a position succeeding the target text. | 04-21-2016 |
20160110340 | Systems and Methods for Language Detection - Implementations of the present disclosure are directed to a method, a system, and a computer program storage device for detecting a language in a text message. A plurality of different language detection tests are performed on a message associated with a user. Each language detection test determines a set of scores representing a likelihood that the message is in one of a plurality of different languages. One or more combinations of the score sets are provided as input to one or more distinct classifiers. Output from each of the classifiers includes a respective indication that the message is in one of the different languages. The language in the message may be identified as being the indicated language from one of the classifiers, based on a confidence score and/or an identified linguistic domain. | 04-21-2016 |
20160110342 | System and Method for Performing Analysis on Information, Such as Social Media - A system for analyzing text-based information is presented. Each datum of information includes an author, a description and a timestamp. A fetcher fetches the raw information according to keywords. A parser parses the raw information to refine the results. A lexicon management module extracts lemmas from the raw information, and creates an edited lexicon containing the raw data and the lemmas for each datum. A data manager correlates lemmas in the edited lexicon and identifies clusters of lemmas that are correlated between each other. The results can be visually displayed to a user, and clusters of lemma that are less correlated than the other clusters can be visually identified. In one aspect, the user is able to excise the less correlated clusters, in order to further refine the results of the keyword search. | 04-21-2016 |
20160110343 | UNSUPERVISED TOPIC MODELING FOR SHORT TEXTS - Topics are determined for short text messages using an unsupervised topic model. In a training corpus created from a number of short text messages, a vocabulary of words is identified, and for each word a distributed vector representation is obtained by processing windows of the corpus having a fixed length. The corpus is modeled as a Gaussian mixture model in which Gaussian components represent topics. To determine a topic of a sample short text message, a posterior distribution over the corpus topics is obtained using the Gaussian mixture model. | 04-21-2016 |
20160110345 | PERSPECTIVE DATA MANAGEMENT FOR COMMON FEATURES OF MULTIPLE ITEMS - A computer-implemented method of managing perspective data associated with a common feature in items is disclosed. The method can include identifying a common feature in a first item and a second item, the first item having a set of perspective data and establishing a subset of perspective data associated with the common feature. The method can include associating the subset of perspective with the second item. The method can include determining a set of relevancy scores for the subset of perspective data associated with the common feature and establishing a set of relevant perspective data from the subset of perspective data. The set of relevant perspective data can have relevancy scores outside of a relevancy threshold. The method can include associating the set of relevant perspective data with the second item. | 04-21-2016 |
20160110347 | SYSTEM AND METHOD FOR PROVIDING FOLLOW-UP RESPONSES TO PRIOR NATURAL LANGUAGE INPUTS OF A USER - In certain implementations, follow-up responses may be provided for prior natural language inputs of a user. As an example, a natural language input associated with a user may be received at a computer system. A determination of whether information sufficient for providing an adequate response to the natural language input is currently accessible to the computer system may be effectuated. A first response to the natural language input (that indicates that a follow-up response will be provided) may be provided based on a determination that information sufficient for providing an adequate response to the natural language input is not currently accessible. Information sufficient for providing an adequate response to the natural language input may be received. A second response to the natural language input may then be provided based on the received sufficient information. | 04-21-2016 |
20160110399 | PERSPECTIVE DATA MANAGEMENT FOR COMMON FEATURES OF MULTIPLE ITEMS - A computer-implemented method of managing perspective data associated with a common feature in items is disclosed. The method can include identifying a common feature in a first item and a second item, the first item having a set of perspective data and establishing a subset of perspective data associated with the common feature. The method can include associating the subset of perspective with the second item. The method can include determining a set of relevancy scores for the subset of perspective data associated with the common feature and establishing a set of relevant perspective data from the subset of perspective data. The set of relevant perspective data can have relevancy scores outside of a relevancy threshold. The method can include associating the set of relevant perspective data with the second item. | 04-21-2016 |
20160117311 | Method and Device for Performing Story Analysis - A method and apparatus for performing story analysis are described including accepting a story, segmenting the received story into a plurality of scenes, detecting characters for each scene in the story, analyzing a relationship of each set of characters in the each scene of the story, by parsing, tagging and filtering descriptive text and words of dialog between the each set of characters to calculate a number of dialogs between each the set of characters and a number of words in each dialog between each the set of characters, determining an importance of each character in the each scene of the story, determining an interaction characterization for the each character in the each scene of the story using the importance of the each character and generating character relationship data responsive to the importance of each character in the each scene of the story and the interaction characterization for the each character in the each scene of the story. | 04-28-2016 |
20160117312 | NATURAL LANGUAGE PROCESSING FOR EXTRACTING CONVEYANCE GRAPHS - Provided is a process for extracting conveyance records from unstructured text documents, the process including: obtaining, with one or more processors, a plurality of documents describing, in unstructured form, one or more conveyances of interest in real property; determining, with one or more processors, for each of the documents, a respective jurisdiction; selecting, with one or more processors, from a plurality of language processing models for the English language, a respective language processing model for each of the documents based on the respective determined jurisdiction; extracting, with one or more processors, for each of the documents, a plurality of structured conveyance records from each of the plurality of documents by applying the language processing model selected for the respective document based on the jurisdiction associated with the document; and storing, with one or more processors, the extracted, structured conveyance record in memory. | 04-28-2016 |
20160117313 | DISCOVERING TERMS USING STATISTICAL CORPUS ANALYSIS - Software that extracts contextually relevant terms from a text sample (or corpus) by performing the following steps: (i) identifying a first term from a corpus, based, at least in part, on a set of initial contextual characteristic(s), where each initial contextual characteristic of the set of initial contextual characteristic(s) relates to the contextual use of at least one category related term of a set of category related term(s) in the corpus; (ii) adding the first term to the set of category related term(s), thereby creating a revised set of category related term(s) and a set of first term contextual characteristic(s), where each first term contextual characteristic of the set of first term contextual characteristic(s) relates to the contextual use of the first term in the corpus; and (iii) identifying a second term from the corpus, based, at least in part, on the set of first term contextual characteristic(s). | 04-28-2016 |
20160117314 | Automatic Question Generation from Natural Text - A mechanism is provided for generating a natural language question for a given input text. The input text is parsed using a minimal recursion semantics (MRS) generating grammar to obtain a minimal recursion semantics (MRS) representation of the input text. Semantic rote labelling transforms the input text into at least one semantic association of a verb and semantic arguments of the verb, the semantic arguments of the verb being fragments of the input text. A question type is received for at least one verb/semantic argument association. The MRS representation of the input text is transformed into a MRS representation of one or more questions based on the at least one semantic association of the verb and respective question types. At least one question of the one or more questions is generated based on the MRS representation of the at least one question using the MRS generating grammar, | 04-28-2016 |
20160117316 | NEURAL MACHINE TRANSLATION SYSTEMS WITH RARE WORD PROCESSING - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural translation systems with rare word processing. One of the methods is a method training a neural network translation system to track the source in source sentences of unknown words in target sentences, in a source language and a target language, respectively and includes deriving alignment data from a parallel corpus, the alignment data identifying, in each pair of source and target language sentences in the parallel corpus, aligned source and target words; annotating the sentences in the parallel corpus according to the alignment data and a rare word model to generate a training dataset of paired source and target language sentences; and training a neural network translation model on the training dataset. | 04-28-2016 |
20160124908 | FACILITATING A MEETING USING GRAPHICAL TEXT ANALYSIS - Embodiments relate to facilitating a meeting. A method for facilitating a meeting of a group of participants is provided. The method generates a graph of words from speeches of the participants as the words are received from the participants. The method partitions the group of participants into a plurality of subgroups of participants. The method performs a graphical text analysis on the graph to identify a cognitive state for each participant and a cognitive state for each subgroup of participants. The method informs at least one of the participants about the identified cognitive state of a participant or a subgroup of participants. | 05-05-2016 |
20160124936 | GRAMMAR COMPILING METHODS, SEMANTIC PARSING METHODS, DEVICES, COMPUTER STORAGE MEDIA, AND APPARATUSES - A corresponding grammar description file and a word category description file are defined based on a logical grammar by manifest language LGML according to a common sentence expression of a semantic meaning, in the grammar description file, the description for a common sentence is composed by operators, word categories, and functions, the word category description file is used to describe specific values for the word categories; the grammar description file and the word category description file are generated, with a reduction method according to a defined sequence, into a grammar tree for the grammar description file and word category trees for the word category description file; the word category trees are grafted at the positions of corresponding word categories on the grammar tree, forming a grammar tree for the semantic meaning, in this way, grammar compiling is accomplished. Based on the grammar tree of a semantic meaning with aforementioned ways, semantic parsing is carried out with whole-sentence matching, semantic mapping matching, or a combination of whole-sentence matching and semantic mapping matching. | 05-05-2016 |
20160124937 | NATURAL LANGUAGE EXECUTION SYSTEM, METHOD AND COMPUTER READABLE MEDIUM - Disclosed is a method, system, and computer readable medium for natural language execution. The method includes, in a processing system: receiving input data indicative of natural language text; using a natural language processor to generate natural language parse information; generating, using the natural language parse information, an input object composite including objects; determining, for the objects of the input object composite and using an object knowledge network, a plurality of interpretation object composites that represent interpretation functions; executing each interpretation function; determining, for the objects of the input object composite and using the object knowledge network, executable object composites that represent executable functions; executing the executable functions thereby generating an output object composite; updating the object knowledge network based on the input and out object composite and the execution of each interpretation and execution function; and outputting, based on the output object composite, output data indicative of natural language text. | 05-05-2016 |
20160124939 | DISAMBIGUATION IN MENTION DETECTION - Disambiguation in mention detection. The method includes: determining at least one location in a text at which a target surface form in the text appears; obtaining an overall word-bag context of the target surface form in the text, the word-bag context at each of the at least one location including words within a predetermined neighborhood of the location; obtaining an overall resource context of the target surface form in the text, the resource context at each of the at least one location including resources corresponding to a further surface form within a predetermined neighborhood of the location; and determining a similarity between the target surface form and a candidate resource for the target surface form based on the overall word-bag context and the overall resource context. A system for disambiguation in mention detection is also provided. | 05-05-2016 |
20160124940 | FACILITATING A MEETING USING GRAPHICAL TEXT ANALYSIS - Embodiments relate to facilitating a meeting. A method for facilitating a meeting of a group of participants is provided. The method generates a graph of words from speeches of the participants as the words are received from the participants. The method partitions the group of participants into a plurality of subgroups of participants. The method performs a graphical text analysis on the graph to identify a cognitive state for each participant and a cognitive state for each subgroup of participants. The method informs at least one of the participants about the identified cognitive state of a participant or a subgroup of participants. | 05-05-2016 |
20160125753 | SYSTEM AND METHODS FOR TRANSFORMING LANGUAGE INTO INTERACTIVE ELEMENTS - A computer operable method is described for transforming phonemes, graphemes, and other language structures into interactive elements. The method may comprise, receiving a word, wherein the word consists of a group of phonemes; forming a group of graphemes, wherein the group of graphemes is constructed using information relating to the group of phonemes; and forming a group of manipulatives, wherein the group of manipulatives is constructed using information relating to the group of phonemes or the group of graphemes. | 05-05-2016 |
20160125881 | Mobile Device for Speech Input and Text Delivery - Aspects of the disclosure provide systems and methods for facilitating dictation. Speech input may be provided to an audio input device of a computing device. A speech recognition engine at the computing device may obtain text corresponding to the speech input. The computing device may transmit the text to a remotely-located storage device. A login webpage that includes a session identifier may be accessed from a target computing device also located remotely relative to the storage device. The session identifier may be transmitted to the storage device and, in response, a text display webpage may be received at the target computing device. The text display webpage may include the speech-derived text and may be configured to automatically copy the text to a copy buffer of the target computing device. The speech-derived text may also be provided to native applications at target computing devices or NLU engines for natural language processing. | 05-05-2016 |
20160132482 | AUTOMATIC ONTOLOGY GENERATION FOR NATURAL-LANGUAGE PROCESSING APPLICATIONS - A method of generating ontologies for a Virtual Assistant across different languages may include extracting a plurality of tokens in a first language from a plurality of web resources in a web domain that includes the Virtual Assistant. The web resources may be made available in a first language and a second language. The method may also include determining a first part-of-speech (POS) for each of the plurality of tokens, where the first POS may be specific to the first language. The method may additionally include mapping the first POS to a second POS from a standardized set of POS's that are general across the first language and the second language, and generating a plurality of lemmas from the plurality of tokens. The method may further include displaying a network representing the ontology. | 05-12-2016 |
20160132483 | SYSTEMS AND METHODS FOR SEMANTIC INFORMATION RETRIEVAL - A semantic tagging method may add context to a sentence in order to increase search efficiency. Regardless of an author's writing style, translating semantic concepts into tags may increase search efficiency. Automatic semantic tagging of documents may allow semantic search and reasoning. Text for semantic tagging may include an email, a website chat room, an internet forum, or a text message. Additional texts may include aggregating general consensus of an emailed topic across multiple emails, whether in the same email chain or separate emails. To increase search efficiency, the analysis of prior communications within the body of text may comprise analyzing structured contextual information to facilitate with homophora resolution. The structured contextual information may include at least one of a sender email address, one or more recipient email addresses, a subject field, a message date and time stamp, and an attachment title. | 05-12-2016 |
20160132484 | AUTOMATIC GENERATION OF N-GRAMS AND CONCEPT RELATIONS FROM LINGUISTIC INPUT DATA - A method of automatically generating a lemma dictionary from a web resource may include extracting a plurality of tokens from text-based documents within the web resource, and generating a plurality of N-grams from the plurality of tokens. The method may additionally include receiving one or more filter definitions that identify valid N-grams, and filtering the plurality of N-grams using the one or more filter definitions to generate a lemma dictionary. The method may further include generating an ontology that comprises the lemma dictionary. | 05-12-2016 |
20160132488 | COGNITIVE MATCHING OF NARRATIVE DATA - A cognitive matching of narrative data mechanism may include a collecting module configured to collect a set of data for a party and a determining module configured to determine, by analyzing the set of data, an identifiable event for the party. The mechanism may include may also include an identifying module configured to identify, using the identifiable event, a relevant feature of a corpus, and a providing module configured to provide an output corresponding to the relevant feature. | 05-12-2016 |
20160132489 | METHOD AND APPARATUS FOR CONFIGURABLE MICROPLANNING - Methods, apparatuses, and computer program products are described herein that are configured to be embodied as a configurable microplanner. In some example embodiments, a method is provided that comprises accessing a document plan containing one or more messages. The method of this embodiment may also include generating a text specification containing one or more phrase specifications that correspond to the one or more messages in the document plan. The method of this embodiment may also include applying a set of lexicalization rules to each of the one or more messages to populate the one or more phrase specifications. In some example embodiments, the set of lexicalization rules are specified using a microplanning rule specification language that is configured to hide linguistic complexities from a user. In some example embodiments, genre parameters may also be used to specify constraints that provide default behaviors for the realization process. | 05-12-2016 |
20160132490 | WORD COMFORT/DISCOMFORT INDEX PREDICTION APPARATUS AND METHOD THEREFOR - Disclosed are a word comfort/discomfort index prediction apparatus and method therefore. The word comfort/discomfort index prediction apparatus includes: a calculation unit calculating emotional associations between the word and one or more respective comparative words, calculating emotional associations between at least one predefined reference word and the one or more respective comparative words, and calculating an emotional similarity between the word and each of the at least one reference word; and a prediction unit predicting the comfort/discomfort index of the word by using the at least one emotional similarity and a predefined comfort/discomfort index of the at least one reference word. | 05-12-2016 |
20160132492 | TEXT SEGMENTATION WITH MULTIPLE GRANULARITY LEVELS - Text processing includes: segmenting received text based on a lexicon of smallest semantic units to obtain medium-grained segmentation results; merging the medium-grained segmentation results to obtain coarse-grained segmentation results, the coarse-grained segmentation results having coarser granularity than the medium-grained segmentation results; looking up in the lexicon of smallest semantic units respective search elements that correspond to segments in the medium-grained segmentation results; and forming fine-grained segmentation results based on the respective search elements, the fine-grained segmentation results having finer granularity than the medium-grained segmentation results. | 05-12-2016 |
20160132796 | CPW method with application in an application system - The Cognitive Process Workflow (CPW) method with its grammar, syntax and semantics is a process method, a process modeling method and a workflow method applicable to different technical systems. Among others also the application of the CPW Method in a process tool, workflow tool, workflow engine, enterprise architecture framework and enterprise architecture engine. The CPW Process is represented as simple sentence with a CPW Subject, a CPW Predicate and a CPW Object. The CPW Method—with CPW Process, CPW Dialog and CPW Workflow and in addition with CPW Context Diagrams (CPW Subject Context Diagram, CPW Object Context Diagram and CPW Subject Object Context Diagram)—can be applied to the business areas of financial services (banking, assurance and financial approval), chemistry, pharmacy, medicine, transport, travel, film industry, politics, psychology, legal practice and other business areas. | 05-12-2016 |
20160132809 | IDENTIFYING AND AMALGAMATING CONDITIONAL ACTIONS IN BUSINESS PROCESSES - Methods and systems for identifying conditional actions in a business process are disclosed. In accordance with one such method, text fragments are extracted from input documents. In addition, a plurality of pairs of the text fragments that respectively include text fragments that are similar according to a pre-defined similarity standard are determined. For each pair of at least a subset of the pairs, at least one difference between the text fragments of the corresponding pair is determined. Further, at least two particular pairs of the subset of the pairs are merged in response to determining that the particular pairs have at least one of the determined differences in common. Additionally, the merged particular pairs are output to indicate the conditional actions in the business process. | 05-12-2016 |
20160133251 | PROCESSING OF AUDIO DATA - Examples of processing audio data are described. In certain examples, a transcript language model is based on text data representative of a transcript associated with the audio data. The audio data is processed to determine at least a set of confidence values for language elements in a text output of the processing, wherein the processing uses the transcript language model. The set of confidence values enable a determination to be made. The determination relates to whether the text data is associated with said audio data based on said set of confidence values. | 05-12-2016 |
20160140104 | METHODS AND SYSTEMS RELATED TO INFORMATION EXTRACTION - The invention relates to information extraction systems having discriminative models which utilize hierarchical cluster trees and active learning to enhance training. | 05-19-2016 |
20160140107 | PREDICTING INDIVIDUAL OR CROWD BEHAVIOR BASED ON GRAPHICAL TEXT ANALYSIS OF POINT RECORDINGS OF AUDIBLE EXPRESSIONS - Embodiments relate to determining a crowd behavior. A method of determining a crowd behavior is provided. The method collects, at one or more recording points in a crowd of individuals, audible expressions that the individuals of the crowd make. The method generates a graph of the audible expressions as the audible expressions are collected from the individuals. The method determines a crowd behavior by performing a graphical text analysis on the graph. The method outputs an indication of the crowd behavior to trigger a crowd control measure. | 05-19-2016 |
20160140109 | GENERATION OF A SEMANTIC MODEL FROM TEXTUAL LISTINGS - A corpus of textual listings is received and main concept words and attribute words therein are identified via an iterative process of parsing listings and expanding a semantic model. During the parsing phase, the corpus of textual listings is parsed to tag one or more head noun words and/or one or more identifier words in each listing based on previously identified main concept words or using a head noun identification rule. Once substantially each listing in the corpus has been parsed in this manner, the expansion phase assigns head noun words as main concept words and modifier words as attribute words, where possible. During the next iteration, the newly identified main concept words and/or attribute words are used to further parse the listings. These iterations arc repeated until a termination condition is reached. Remaining words in the corpus arc clustered based on the main concept words and attribute words. | 05-19-2016 |
20160140858 | Grading Ontological Links Based on Certainty of Evidential Statements - Mechanisms for evaluating a link between information concept entities are provided. A set of evidential data specifying a plurality of information concept entities is received and a link between at least two information concept entities in the set of evidential data is generated. The set of evidential data is evaluated with regard to whether or not the set of evidential data supports or refutes the link. The evaluation of the set of evidential data comprises analyzing language of natural language statements in the set of evidential data to identify certainty terms within the natural language statements. A confidence value for the link is calculated based on results of the evaluation of the set of evidential data and a knowledge output is generated based on the link and the confidence value associated with the link. | 05-19-2016 |
20160140958 | NATURAL LANGUAGE QUESTION ANSWERING SYSTEM AND METHOD, AND PARAPHRASE MODULE - A natural language question answering system and method, and a paraphrase module are provided. The natural language question answering system includes a conversion module configured to generate a plurality of modified questions by paraphrasing a user's question; a plurality of question answering engines configured to receive each of the user's question and the modified questions, and select candidate answers corresponding to each of the user's question and the modified questions; and a detection module configured to detect at least one among the selected candidate answers as an answer. | 05-19-2016 |
20160140965 | MULTI-LEVEL CONTENT ANALYSIS AND RESPONSE - Predetermined services are provided using preset instructions. A transcript of audible content provided over an electronic network and received at a communications device is analyzed to determine whether a trigger is present in the audible content. When the trigger is present in the audible content, preset instructions correlated with the trigger and instructing how to provide a predetermined service are identified. The predetermined service is provided by following the preset instructions. | 05-19-2016 |
20160147733 | Pattern Identification and Correction of Document Misinterpretations in a Natural Language Processing System - An approach is provided in which a knowledge manager analyzes multiple document phrases using a natural language processing model and generates multiple interpretations based upon the analysis. The knowledge manager identifies misinterpretation patterns by comparing the multiple interpretations with multiple corrections that include corrections to the multiple interpretations. In turn, the knowledge manager generates interpretation rules based upon the identified patterns and applies the interpretation rules to the natural language processing model. | 05-26-2016 |
20160147734 | Pattern Identification and Correction of Document Misinterpretations in a Natural Language Processing System - An approach is provided in which a knowledge manager analyzes multiple document phrases using a natural language processing model and generates multiple interpretations based upon the analysis. The knowledge manager identifies misinterpretation patterns by comparing the multiple interpretations with multiple corrections that include corrections to the multiple interpretations. In turn, the knowledge manager generates interpretation rules based upon the identified patterns and applies the interpretation rules to the natural language processing model. | 05-26-2016 |
20160147736 | CREATING ONTOLOGIES BY ANALYZING NATURAL LANGUAGE TEXTS - Systems and methods for creating ontologies by analyzing natural language texts. An example method comprises: receiving a plurality of semantic structures associated with a text corpus; identifying a first semantic structure and a second semantic structure, wherein the first semantic structure comprises a first substructure and a second substructure, wherein the second semantic structure comprises a third substructure and a fourth substructure, and wherein the first substructure is similar to the third substructure in view of a first similarity criterion; and responsive to determining that the second substructure is similar to the fourth substructure in view of a second similarity criterion, associating, with a certain concept of an ontology associated with the text corpus, objects represented by the second substructure and the fourth substructure. | 05-26-2016 |
20160147737 | QUESTION ANSWERING SYSTEM AND METHOD FOR STRUCTURED KNOWLEDGEBASE USING DEEP NATUAL LANGUAGE QUESTION ANALYSIS - Disclosed are a question answering system for structured knowledgebase using deep natural language question analysis, and a method thereof, the question answering system for structured knowledgebase using deep natural language question analysis includes a deep natural language question analysis unit configured to create a structure of a semantic frame by analyzing a natural language question that is input, a question-intermediate expression creation unit configured to create a question-intermediate expression of a lexicon level based on the semantic frame, a knowledgebase-specialized query creation unit configured to create a query used to search in knowledgebase that is a subject of search, based on the question-intermediate expression, and a knowledgebase search unit configured to find a correct answer in the knowledgebase that is subject of search based on the query, to provide an accuracy of the correct answer, a confidence of the correct answer and an evidence for the correct answer. | 05-26-2016 |
20160147739 | APPARATUS AND METHOD FOR UPDATING LANGUAGE ANALYSIS RESULT - An apparatus and method for updating a language analysis result are provided. The apparatus includes a storage unit configured to store language analysis result and language analysis metadata to be used for update of the language analysis result, and an update unit configured to reanalyze the language analysis metadata based on language knowledge which is added to language knowledge resources, and update the language analysis result based on the reanalyzed result. | 05-26-2016 |
20160154782 | CALL FLOW AND DISCOURSE ANALYSIS | 06-02-2016 |
20160154784 | AUTOMATIC MESSAGE PRESENTATION BASED ON PAST MESSAGES | 06-02-2016 |
20160154785 | OPTIMIZING GENERATION OF A REGULAR EXPRESSION | 06-02-2016 |
20160154786 | Patent Analyzing System | 06-02-2016 |
20160154787 | Inter Thread Anaphora Resolution | 06-02-2016 |
20160154790 | NATURAL LANGUAGE PROCESSING METHOD | 06-02-2016 |
20160154791 | APPARATUS AND METHOD FOR DYNAMICALLY UPDATING LANDMARKS IN A SPACE DURING EXECUTION OF NATURAL LANGUAGE INSTRUCTIONS | 06-02-2016 |
20160154792 | CONTEXTUAL LANGUAGE UNDERSTANDING FOR MULTI-TURN LANGUAGE TASKS | 06-02-2016 |
20160162445 | System and Method for Using Data and Angles to Automatically Generate a Narrative Story - A system and method for automatically generating a narrative story receives data and information pertaining to a domain event. The received data and information and/or one or more derived features are then used to identify a plurality of angles for the narrative story. The plurality of angles is then filtered, for example through use of parameters that specify a focus for the narrative story, length of the narrative story, etc. Points associated with the filtered plurality of angles are then assembled and the narrative story is rendered using the filtered plurality of angles and the assembled points. | 06-09-2016 |
20160162456 | METHODS FOR GENERATING NATURAL LANGUAGE PROCESSING SYSTEMS - Methods are presented for generating a natural language model. The method may comprise: ingesting training data representative of documents to be analyzed by the natural language model, generating a hierarchical data structure comprising at least two topical nodes within which the training data is to be subdivided into by the natural language model, selecting a plurality of documents among the training data to be annotated, generating an annotation prompt for each document configured to elicit an annotation about said document indicating which node among the at least two topical nodes said document is to be classified into, receiving the annotation based on the annotation prompt; and generating the natural language model using an adaptive machine learning process configured to determine patterns among the annotations for how the documents in the training data are to be subdivided according to the at least two topical nodes of the hierarchical data structure. | 06-09-2016 |
20160162464 | TECHNIQUES FOR COMBINING HUMAN AND MACHINE LEARNING IN NATURAL LANGUAGE PROCESSING - Methods, apparatuses and computer readable medium are presented for generating a natural language model. A method for generating a natural language model comprises: receiving more than one annotation of a document; calculating a level of agreement among the received annotations; determining that a criterion among a first criterion, a second criterion, and a third criterion is satisfied based at least in part on the level of agreement; determining an aggregated annotation representing an aggregation of information in the received annotations and training a natural language model using the aggregated annotation, when the first criterion is satisfied; generating at least one human readable prompt configured to receive additional annotations of the document, when the second criterion is satisfied; and discarding the received annotations from use in training the natural language model, when the third criterion is satisfied. | 06-09-2016 |
20160162465 | INTENTION DETECTION IN DOMAIN-SPECIFIC INFORMATION - A new information in a language and relating to a subject matter domain is parsed into a constituent set of complete grammatical constructs. In a subset of the complete grammatical constructs, a set of linguistic styles of the language is identified according to a subset of a set of word-style associations related to the language and independent of the subject matter domain. A first weight is assigned to a first linguistic style and a second weight to a second linguistic style from the set of linguistic styles. A first intention information is mapped to the first style using a first style-intention rule, and a second intention information to the second style using a second style-intention rule. A complete grammatical construct in the subset is tagged with the first intention information responsive to a weight associated with the first intention information exceeding an intention selection threshold. | 06-09-2016 |
20160162466 | INTELLIGENT SYSTEM THAT DYNAMICALLY IMPROVES ITS KNOWLEDGE AND CODE-BASE FOR NATURAL LANGUAGE UNDERSTANDING - Systems, methods, and apparatuses are presented for a novel natural language tokenizer and tagger. In some embodiments, a method for tokenizing text for natural language processing comprises: generating from a pool of documents, a set of statistical models comprising one or more entries each indicating a likelihood of appearance of a character/letter sequence in the pool of documents; receiving a set of rules comprising rules that identify character/letter sequences as valid tokens; transforming one or more entries in the statistical models into new rules that are added to the set of rules when the entries indicate a high likelihood; receiving a document to be processed; dividing the document to be processed into tokens based on the set of statistical models and the set of rules, wherein the statistical models are applied where the rules fail to unambiguously tokenize the document; and outputting the divided tokens for natural language processing. | 06-09-2016 |
20160162467 | METHODS AND SYSTEMS FOR LANGUAGE-AGNOSTIC MACHINE LEARNING IN NATURAL LANGUAGE PROCESSING USING FEATURE EXTRACTION - Methods, apparatuses, and systems are presented for generating natural language models using a novel system architecture for feature extraction. A method for extracting features for natural language processing comprises: accessing one or more tokens generated from a document to be processed; receiving one or more feature types defined by user; receiving selection of one or more feature types from a plurality of system-defined and user-defined feature types, wherein each feature type comprises one or more rules for generating features; receiving one or more parameters for the selected feature types, wherein the one or more rules for generating features are defined at least in part by the parameters; generating features associated with the document to be processed based on the selected feature types and the received parameters; and outputting the generated features in a format common among all feature types. | 06-09-2016 |
20160162468 | METHODS AND SYSTEMS FOR PROVIDING UNIVERSAL PORTABILITY IN MACHINE LEARNING - Systems, methods, and apparatuses are presented for a trained language model to be stored in an efficient manner such that the trained language model may be utilized in virtually any computing device to conduct natural language processing. Unlike other natural language processing engines that may be computationally intensive to the point of being capable of running only on high performance machines, the organization of the natural language models according to the present disclosures allows for natural language processing to be performed even on smaller devices, such as mobile devices. | 06-09-2016 |
20160162473 | LOCALIZATION COMPLEXITY OF ARBITRARY LANGUAGE ASSETS AND RESOURCES - A “Linguistic Complexity Tool” uses Machine Learning (ML) based techniques to predict “source complexity scores” for localization of source language assets or resources (i.e., “source content”), or subsections of that content, to provide users with predicted levels of difficulty in localizing source content into target languages, dialects, or linguistic styles. These predicted source complexity scores provide a number of advantages, including but not limited to, improved user efficiency and user interaction performance by identifying source content, or subsections of that content, that are likely to be difficult or time consuming for users to localize. Further, these source complexity scores enable users to modify source content prior to localization to provide lower source complexity scores, thereby reducing error rates with respect to localized text or language presented in software applications or other media including, but not limited to, spoken or written localizations of the source content. | 06-09-2016 |
20160162474 | METHODS AND SYSTEMS FOR AUTOMATIC ANALYSIS OF CONVERSATIONS BETWEEN CUSTOMER CARE AGENTS AND CUSTOMERS - The technical solution under the present disclosure automatically analyzes conversations between users by receiving a training dataset having a text sequence including sentences of a conversation between the users; extracting feature(s) from the training dataset based on features; providing equation(s) for a plurality of tasks, the equation(s) being a mathematical function for calculating value of a parameter for each of the tasks based on the extracted feature; determining value of the parameter for tasks by processing the equation(s); assigning label(s) to each of the sentences based on the determined value of the parameter, a first label being selected from a plurality of first labels, and a second label being selected from a number of second labels; and storing and maintaining with the database a pre-defined value of the parameter, first labels, conversations, second labels, a test dataset, equation(s), and pre-defined features. | 06-09-2016 |
20160162475 | AUTOMATIC PROCESS GUIDANCE - User interactions with a computing system are sensed and recorded. The recording represents a process for controlling a computer system. Voice input are received and the computer system actions that are taken based upon the voice inputs and a task recording. | 06-09-2016 |
20160162476 | METHODS AND SYSTEMS FOR MODELING COMPLEX TAXONOMIES WITH NATURAL LANGUAGE UNDERSTANDING - Systems and methods are presented for the automatic placement of rules applied to topics in a logical hierarchy when conducting natural language processing. In some embodiments, a method includes: accessing, at a child node in a logical hierarchy, at least one rule associated with the child node; identifying a percolation criterion associated with a parent node to the child node, said percolation criterion indicating that the at least one rule associated with the child node is to be associated also with the parent node; associating the at least one rule with the parent node such that the at least one rule defines a second factor for determining whether the document is to also be classified into the parent node; accessing the document for natural language processing; and determining whether the document is to be classified into the parent node or the child node based on the at least one rule. | 06-09-2016 |
20160163228 | SYSTEMS AND METHODS FOR EXTRACTING KEYWORDS IN LANGUAGE LEARNING - Disclosed are systems, methods, and products for language learning that may extract text from various resources having text, using various natural-language processing features, which can be combined with custom-designed learning activities to offer a needs-based, adaptive learning methodology. The system may receive a resource, extract keywords pedagogically valuable to non-native language learning and academic exercises. Metadata describing various aspects of resources from which keywords are extracted may be associated with keywords. Metadata describing various aspects of keywords may also be associated with keywords. Extracted keywords may be stored into a keyword store along with any metadata associated with keywords. | 06-09-2016 |
20160163309 | METHOD OF SELECTING TRAINING TEXT FOR LANGUAGE MODEL, AND METHOD OF TRAINING LANGUAGE MODEL USING THE TRAINING TEXT, AND COMPUTER AND COMPUTER PROGRAM FOR EXECUTING THE METHODS - Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods. The present invention provides for selecting training text for a language model that includes: generating a template for selecting training text from a corpus in a first domain according to generation techniques of: (i) replacing one or more words in a word string selected from the corpus in the first domain with a special symbol representing any word or word string, and adopting the word string after replacement as a template for selecting the training text; and/or (ii) adopting the word string selected from the corpus in the first domain as the template for selecting the training text; and selecting text covered by the template as the training text from a corpus in a second domain different from the first domain. | 06-09-2016 |
20160170938 | Performance Modification Based on Aggregate Feedback Model of Audience Via Real-Time Messaging | 06-16-2016 |
20160170952 | Method of Improving NLP Processing of Real-World Forms via Element-Level Template Correlation | 06-16-2016 |
20160170956 | Performance Modification Based on Aggregation of Audience Traits and Natural Language Feedback | 06-16-2016 |
20160170957 | Inter Thread Anaphora Resolution | 06-16-2016 |
20160170961 | PERCEPTUAL ASSOCIATIVE MEMORY FOR A NEURO-LINGUISTIC BEHAVIOR RECOGNITION SYSTEM | 06-16-2016 |
20160170962 | DATA RELATIONSHIPS IN A QUESTION-ANSWERING ENVIRONMENT | 06-16-2016 |
20160170965 | USING NATURAL LANGUAGE PROCESSING (NLP) TO CREATE SUBJECT MATTER SYNONYMS FROM DEFINITIONS | 06-16-2016 |
20160170966 | Methods and systems for automated language identification | 06-16-2016 |
20160170967 | Performing Cognitive Operations Based on an Aggregate User Model of Personality Traits of Users | 06-16-2016 |
20160170969 | Priori Performance Modification Based on Aggregation of Personality Traits of a Future Audience | 06-16-2016 |
20160170971 | OPTIMIZING A LANGUAGE MODEL BASED ON A TOPIC OF CORRESPONDENCE MESSAGES | 06-16-2016 |
20160170972 | GENERATING NATURAL LANGUAGE TEXT SENTENCES AS TEST CASES FOR NLP ANNOTATORS WITH COMBINATORIAL TEST DESIGN | 06-16-2016 |
20160171979 | TILED GRAMMAR FOR PHRASE SPOTTING WITH A PERSISTENT COMPANION DEVICE | 06-16-2016 |
20160179774 | Orthographic Error Correction Using Phonetic Transcription | 06-23-2016 |
20160179782 | DOMAIN-SPECIFIC COMPUTATIONAL LEXICON FORMATION | 06-23-2016 |
20160179783 | DOMAIN-SPECIFIC COMPUTATIONAL LEXICON FORMATION | 06-23-2016 |
20160179784 | VALIDATING TOPICAL DATA | 06-23-2016 |
20160179785 | Responding to Data Requests Related to Constrained Natural Language Vocabulary Terms | 06-23-2016 |
20160179786 | DIAGNOSING AUTISM SPECTRUM DISORDER USING NATURAL LANGUAGE PROCESSING | 06-23-2016 |
20160179787 | EXTENSIBLE CONTEXT-AWARE NATURAL LANGUAGE INTERACTIONS FOR VIRTUAL PERSONAL ASSISTANTS | 06-23-2016 |
20160179788 | VALIDATING TOPICAL DATA | 06-23-2016 |
20160180742 | PREPOSITION ERROR CORRECTING METHOD AND DEVICE PERFORMING SAME | 06-23-2016 |
20160188535 | VERIFICATION OF NATURAL LANGUAGE PROCESSING DERIVED ATTRIBUTES - System, method, and computer program product to identify candidate values to provide to a deep question answering (QA) system as part of a case, by receiving a case, wherein the case includes a plurality of documents for evaluation by the deep QA system, evaluating the plurality of documents using natural language processing (NLP) to identify one or more concepts reflected by text content within the plurality of documents in the case, wherein the plurality of documents includes a plurality of distinct values for at least a first one of the concepts, selecting, from the plurality of distinct values, a candidate value for the first concept to provide to the deep QA system to process the case, and prior to submitting the case to the deep QA system, returning at least the candidate value selected for the first concept to present in a user interface. | 06-30-2016 |
20160188564 | AUTOMATED ONTOLOGY BUILDING - A method and system are provided for automated ontology building. The method includes creating contextual tokens from text, parsing the text into at least one parse tree, and calculating a dependency graph across the contextual tokens using the at least one parse tree. The method further includes generating concept instance candidates and parent-child relationships based on pattern matching and transformation of the at least one parse tree. The method also includes grouping concept instance candidates into concept candidates. The method additionally includes arranging the concept candidates into a tree having tree nodes and creating predicate-based relationships between the tree nodes based on patterns and predicates identified in the text. The method further includes scoring and sorting the tree nodes. The method also includes performing an analysis of the tree nodes and rebalancing the tree based on the analysis to provide an ontology based on the text. | 06-30-2016 |
20160188565 | DISCRIMINATING AMBIGUOUS EXPRESSIONS TO ENHANCE USER EXPERIENCE - Methods and systems are provided for discriminating ambiguous expressions to enhance user experience. For example, a natural language expression may be received by a speech recognition component. The natural language expression may include at least one of words, terms, and phrases of text. A dialog hypothesis set from the natural language expression may be created by using contextual information. In some cases, the dialog hypothesis set has at least two dialog hypotheses. A plurality of dialog responses may be generated for the dialog hypothesis set. The dialog hypothesis set may be ranked based on an analysis of the plurality of the dialog responses. An action may be performed based on ranking the dialog hypothesis set. | 06-30-2016 |
20160188567 | IDENTIFYING EXPANDING HASHTAGS IN A MESSAGE - A social networking system receives messages from users that include hashtags. The social networking system may use a natural language model to identify terms in the hashtag corresponding to words or phrases of the hashtag. The words or phrases may be used to modify a string of the hashtag. The social networking system may also generate computer models to determine likely membership of a message with various hashtags. Prior to generating the computer models, the social networking system may filter certain hashtags from eligibility for computer modeling, particularly hashtags that are not frequently used or that more typically appear as normal text in a message instead of as a hashtag. The social networking system may also calibrate the computer model outputs by comparing a test message output with outputs of a calibration group that includes positive and negative examples with respect to the computer model output. | 06-30-2016 |
20160188568 | SYSTEM AND METHOD FOR DETERMINING THE MEANING OF A DOCUMENT WITH RESPECT TO A CONCEPT - A computerized method for determining an impact of a document on the specific concept of interest. The method can be configured to identify a cluster of clauses or sentences from a plurality of semantically similar clauses of the document and determine one or more representative concepts for the cluster of the document. An impact of each clause of the cluster is determined using one or more semantic parameters and impact analysis rules. The impact of the each sentence of the cluster is then determined using the impact of the respective clauses and subsequently, the impact of the cluster is determined using the impact of the respective sentences. Based on the impact of the cluster, an impact of the document on the one or more representative concepts is determined. | 06-30-2016 |
20160188569 | Generating a Table of Contents for Unformatted Text - An approach is provided for an information handling system that includes a processor and a memory to generate a table of contents pertaining to a document. The approach semantically analyzes the document to identify semantic relationships of proximate elements of the document. A number of candidate headings corresponding to a semantically related section of the document are identified and each of the candidate headings are scored. Based on the scores of each of the candidate headings, a section heading for the semantically related section of the document is selected. The selected heading is then included in the table of contents for the section of the document. The process of identifying candidate headings, scoring candidates, and selecting the section heading is repeated for other semantically related sections of the document. | 06-30-2016 |
20160188570 | AUTOMATED ONTOLOGY BUILDING - A method and system are provided for automated ontology building. The method includes creating contextual tokens from text, parsing the text into at least one parse tree, and calculating a dependency graph across the contextual tokens using the at least one parse tree. The method further includes generating concept instance candidates and parent-child relationships based on pattern matching and transformation of the at least one parse tree. The method also includes grouping concept instance candidates into concept candidates. The method additionally includes arranging the concept candidates into a tree having tree nodes and creating predicate-based relationships between the tree nodes based on patterns and predicates identified in the text. The method further includes scoring and sorting the tree nodes. The method also includes performing an analysis of the tree nodes and rebalancing the tree based on the analysis to provide an ontology based on the text. | 06-30-2016 |
20160188571 | TECHNIQUES FOR GRAPH BASED NATURAL LANGUAGE PROCESSING - Techniques for graph based natural language processing are described. In one embodiment an apparatus may comprise a client service component operative on the processor circuit to receive a natural language user request from a device and to execute the natural language user request based on matched one or more objects and a social object relation component operative on the processor circuit to match the natural language user request to the one or more objects in an object graph, the object graph comprising token mappings for objects within the object graph, the token mappings based on data extracted from a plurality of interactions by a plurality of users of the network system, wherein the one or more objects are matched with the natural language user request based on the token mappings. Other embodiments are described and claimed. | 06-30-2016 |
20160188573 | INTEGRATION OF DOMAIN INFORMATION INTO STATE TRANSITIONS OF A FINITE STATE TRANSDUCER FOR NATURAL LANGUAGE PROCESSING - The invention relates to a system and method for integrating domain information into state transitions of a Finite State Transducer (“FST”) for natural language processing. A system may integrate semantic parsing and information retrieval from an information domain to generate an FST parser that represents the information domain. The FST parser may include a plurality of FST paths, at least one of which may be used to generate a meaning representation from a natural language input. As such, the system may perform domain-based semantic parsing of a natural language input, generating more robust meaning representations using domain information. The system may be applied to a wide range of natural language applications that use natural language input from a user such as, for example, natural language interfaces to computing systems, communication with robots in natural language, personalized digital assistants, question-answer query systems, and/or other natural language processing applications. | 06-30-2016 |
20160188574 | INTENTION ESTIMATION EQUIPMENT AND INTENTION ESTIMATION SYSTEM - An intention estimation equipment includes: a first training data group; a second training data group; a model creation unit that creates first and second statistical models that estimates an intention of an input text; an error data extraction unit that extracts, from the second training data group, training data corresponding to a text, of which an intention estimation result based on the first and the second statistical models is correct and erroneous, respectively, as error data; an opposite data extraction unit that extracts, from the second training data group, training data that is a cause for an intention estimation result of the error data based on the second statistical model becoming erroneous as opposite data; and a data correction unit that performs correction of the second training data group so that an influence of the error data or of the opposite data on creation of the statistical model is changed. | 06-30-2016 |
20160189103 | APPARATUS AND METHOD FOR AUTOMATICALLY CREATING AND RECORDING MINUTES OF MEETING - A computing device for automatically acquiring and revising minutes of meeting and a method thereof includes the steps of converting spoken words from a meeting to text and determining one or more written words or expressions to be recalibrated for strict correctness. Revising automatically the determined one or more recalibrations included in the text against equivalent common words and expressions, according to a phrasebook database stored in a non-transitory storage medium, the phrasebook database mapping a relationship between at least one common word or expression and one or more written words and expressions requiring recalibration. An original minutes of the meeting according to the revised text and a meeting minutes template stored in the non-transitory storage medium is created. | 06-30-2016 |
20160189414 | AUTOCAPTIONING OF IMAGES - The description relates to sentence autocaptioning of images. One example can include a set of information modules and a set of sentence generation modules. The set of information modules can include individual information modules configured to operate on an image or metadata associated with the image to produce image information. The set of sentence generation modules can include individual sentence generation modules configured to operate on the image information to produce a sentence caption for the image. | 06-30-2016 |
20160196237 | GENERATING NAVIGABLE CONTENT OVERVIEWS | 07-07-2016 |
20160203117 | NATURAL LANGUAGE METRIC CONDITION ALERTS | 07-14-2016 |
20160203118 | INTENTION DETECTION IN DOMAIN-SPECIFIC INFORMATION | 07-14-2016 |
20160203119 | EXTRACTION OF LEXICAL KERNEL UNITS FROM A DOMAIN-SPECIFIC LEXICON | 07-14-2016 |
20160203120 | EXTRACTION OF LEXICAL KERNEL UNITS FROM A DOMAIN-SPECIFIC LEXICON | 07-14-2016 |
20160203122 | SYSTEMS AND METHODS FOR DETECTING AND COORDINATING CHANGES IN LEXICAL ITEMS | 07-14-2016 |
20160203125 | BUILDING CONVERSATIONAL UNDERSTANDING SYSTEMS USING A TOOLSET | 07-14-2016 |
20160203132 | TRIGGERING ACTIONS IN RESPONSE TO OPTCALLY OR ACOUSTICALLY CAPTURING KEYWORDS FROM A RENDERED DOCUMENT | 07-14-2016 |
20160252972 | SYNCHRONIZATION OF TEXT DATA AMONG A PLURALITY OF DEVICES | 09-01-2016 |
20160253309 | APPARATUS AND METHOD FOR RESOLVING ZERO ANAPHORA IN CHINESE LANGUAGE AND MODEL TRAINING METHOD | 09-01-2016 |
20160253310 | CREATING A CALENDAR EVENT USING CONTEXT | 09-01-2016 |
20160253312 | TOPICALLY AWARE WORD SUGGESTIONS | 09-01-2016 |
20160253315 | Automatic Question Generation and Answering Based on Monitored Messaging Sessions | 09-01-2016 |
20160253316 | METHOD, SYSTEM AND APPARATUS FOR ASSEMBLING A RECORDING PLAN AND DATA DRIVEN DIALOGS FOR AUTOMATED COMMUNICATIONS | 09-01-2016 |
20160253990 | KERNEL-BASED VERBAL PHRASE SPLITTING DEVICES AND METHODS | 09-01-2016 |
20160253991 | METHOD AND SYSTEM FOR CONVEYING AN EXAMPLE IN A NATURAL LANGUAGE UNDERSTANDING APPLICATION | 09-01-2016 |
20160378747 | VIRTUAL ASSISTANT FOR MEDIA PLAYBACK - An exemplary method for identifying media may include receiving user input associated with a request for media, where that user input includes unstructured natural language speech including one or more words; identifying at least one context associated with the user input; causing a search for the media based on the at least one context and the user input; determining, based on the at least one context and the user input, at least one media item that satisfies the request; and in accordance with a determination that the at least one media item satisfies the request, obtaining the at least one media item. | 12-29-2016 |
20170235717 | Method and Unit for Building Semantic Rule for a Semantic Data | 08-17-2017 |
20170235718 | Method and System for Enabling Verifiable Semantic Rule Building for Semantic Data | 08-17-2017 |
20170235719 | EVALUATING PARSE TREES IN LINGUISTIC ANALYSIS | 08-17-2017 |
20170235721 | METHOD AND SYSTEM FOR DETECTING SEMANTIC ERRORS IN A TEXT USING ARTIFICIAL NEURAL NETWORKS | 08-17-2017 |
20170235723 | SYSTEMS FOR DYNAMICALLY GENERATING AND PRESENTING NARRATIVE CONTENT | 08-17-2017 |
20170235724 | SYSTEMS AND METHODS FOR GENERATING PERSONALIZED LANGUAGE MODELS AND TRANSLATION USING THE SAME | 08-17-2017 |
20180024986 | EXTRACTING ACTIONABLE INFORMATION FROM EMAILS | 01-25-2018 |
20180024987 | A Method For Suggesting One Or More Multi-Word Candidates Based On An Input String Received At An Electronic Device | 01-25-2018 |
20180024989 | AUTOMATED BUILDING AND SEQUENCING OF A STORYLINE AND SCENES, OR SECTIONS, INCLUDED THEREIN | 01-25-2018 |
20180024990 | ENCODING APPARATUS, SEARCH APPARATUS, ENCODING METHOD, AND SEARCH METHOD | 01-25-2018 |
20180024991 | NETWORKED DEVICE WITH SUGGESTED RESPONSE TO INCOMING MESSAGE | 01-25-2018 |
20180024992 | Standard Exact Clause Detection | 01-25-2018 |
20180024993 | COMPUTER-READABLE RECORDING MEDIUM, DETERMINATION DEVICE AND DETERMINATION METHOD | 01-25-2018 |
20180024994 | CONTEXTUAL LANGUAGE GENERATION BY LEVERAGING LANGUAGE UNDERSTANDING | 01-25-2018 |
20180025001 | METHOD AND SYSTEM FOR VOICE BASED MEDIA SEARCH | 01-25-2018 |
20180025074 | Analogy Finder | 01-25-2018 |
20190147001 | METHOD FOR PROCESSING RANDOM INTERACTION DATA, NETWORK SERVER AND INTELLIGENT DIALOG SYSTEM | 05-16-2019 |
20190147034 | PREDICTING STYLE BREACHES WITHIN TEXTUAL CONTENT | 05-16-2019 |
20190147036 | PHONETIC PATTERNS FOR FUZZY MATCHING IN NATURAL LANGUAGE PROCESSING | 05-16-2019 |
20190147037 | CALCULATING STRUCTURAL DIFFERENCES FROM BINARY DIFFERENCES IN PUBLISH SUBSCRIBE SYSTEM | 05-16-2019 |
20190147038 | PRESERVING AND PROCESSING AMBIGUITY IN NATURAL LANGUAGE | 05-16-2019 |
20190147039 | INFORMATION PROCESSING APPARATUS, INFORMATION GENERATION METHOD, WORD EXTRACTION METHOD, AND COMPUTER-READABLE RECORDING MEDIUM | 05-16-2019 |
20190147040 | DETECTING EXPRESSIONS LEARNED BASED ON A THEME AND ON WORD CORRELATION AND CO-OCCURENCE | 05-16-2019 |
20190147041 | REAL-TIME ON-DEMAND AUCTION BASED CONTENT CLARIFICATION | 05-16-2019 |
20190147043 | Mood Map for Assessing a Dynamic Emotional or Mental State (dEMS) of a User | 05-16-2019 |
20190147045 | AUTOMATIC RESPONSE SERVER DEVICE, TERMINAL DEVICE, RESPONSE SYSTEM, RESPONSE METHOD, AND PROGRAM | 05-16-2019 |
20190147098 | LEVERAGING CONTEXTUAL INFORMATION IN TOPIC COHERENT QUESTION SEQUENCES | 05-16-2019 |
20190147111 | CHATBOT-BASED CLOUD MANAGEMENT SYSTEM AND METHOD FOR OPERATING THE SAME | 05-16-2019 |
20190149494 | EMOTIVE TONE ADJUSTMENT BASED COGNITIVE MANAGEMENT | 05-16-2019 |
20220138193 | CONVERSION METHOD AND SYSTEMS FROM NATURAL LANGUAGE TO STRUCTURED QUERY LANGUAGE - The present application discloses a conversion method and system from natural language to structured query language. The method includes obtaining a natural language question text; converting from the natural language question text to the structured query language according to similarities between the natural language question text and natural language questions in a preset dataset; and when there is no target natural language question in the preset dataset, converting the natural language question text to the structured query language by a conversion algorithm model. | 05-05-2022 |
20220138233 | System and Method for Partial Name Matching Against Noisy Entities Using Discovered Relationships - A method, system and computer-usable medium are disclosed to identify a set of entity names based on a partial name of the entity utilizing discovered relationships. A partial name from a user is received as to the entity in order to retrieve a plurality of names of the entity in a corpus which can be a body or works, document, etc. References to the entries containing the partial name are retrieved from the corpus. A natural language processing is applied to content associated with references to identify candidate entities. A similarity is performed as to the identified candidate entities to form a similarity assessment, and from the candidate entities a selection is made based on a merging criteria. | 05-05-2022 |
20220138239 | TEXT GENERATION APPARATUS, TEXT GENERATION METHOD, TEXT GENERATION LEARNING APPARATUS, TEXT GENERATION LEARNING METHOD AND PROGRAM - A sentence generation device has: an estimation unit for receiving input of a first sentence and an output length, and estimating importance of each word constituting the first sentence using a pre-trained model; and a generation unit for generating a second sentence based on the importance, and thus makes it possible to evaluate importance of a constituent element of an input sentence, in correspondence with a designated output length. | 05-05-2022 |
20220138241 | User-Focused, Ontological, Automatic Text Summarization - The present disclosure is directed to systems and methods of providing systems and methods of autonomously generating summary documents based, at least in part, on a plurality of queries provided by a system user. The systems and methods disclosed herein include processor circuitry to identify a plurality of information sources for a specific topic guided by an ontology with specific concepts and relations. The systems and methods disclosed herein also include processor circuitry to generate user-focused extractive text summarization from each of at least some of the plurality of identified information sources using a plurality of queries supplied by the user/researcher. | 05-05-2022 |
20220138267 | GENERATION APPARATUS, LEARNING APPARATUS, GENERATION METHOD AND PROGRAM - A generation apparatus includes a generation unit configured to use a machine learning model learned in advance, with a document as an input, to generate a question representation for a range of an answer in the document, wherein when generating a word of the question representation by performing a copy from the document, the generation unit adjusts a probability that a word included in the range is copied. | 05-05-2022 |
20220138406 | REVIEWING METHOD, INFORMATION PROCESSING DEVICE, AND REVIEWING PROGRAM - An information processing device ( | 05-05-2022 |
20220138422 | DETERMINING LEXICAL DIFFICULTY IN TEXTUAL CONTENT - Techniques performed by a data processing system for analyzing the lexical difficulty of words of textual content include analyzing plurality of textual content sources to determine a first frequency at which each of a plurality of first words appears, analyzing search data to determine a second frequency at which each of the plurality of first words appear in searches for a definition, generating a lexical difficulty model based on the first frequency and the second frequency, the model is configured to receive a word as an input and to output a prediction for how difficult the word is likely to be for a user, receiving a request to analyze first textual content from a client device, analyzing the first textual content using the lexical difficulty model to generate lexical difficulty information, and sending a response to the client device that includes requested information. | 05-05-2022 |
20220138426 | ELECTRONIC DEVICE, METHOD, AND COMPUTER PROGRAM WHICH SUPPORT NAMING - A naming support system is provided that includes a processing unit that receives first language name information input from a user, determines name evaluation's basic information about the first language name information, and generates and transmits name's evaluation information to an output unit based on a target language which includes at least one of a plurality of languages for the name evaluation's basic information, wherein the first language name information includes at least one of character notation information of a first language name, pronunciation information of a first language name, or desired information for a first language name, and the name evaluation's basic information includes the first language name information. | 05-05-2022 |
20220138428 | SYSTEMS AND METHODS FOR INSERTING DIALOGUE INTO A QUERY RESPONSE - Systems and methods are described herein for inserting dialogue into query responses by generating and using dialogue metadata in conjunction with response templates. Metadata for each portion of dialogue of a plurality of portions of dialogue from a number of content items is stored, including information regarding the source content item, a transcript of the dialogue, and grammatical information. Upon receiving a query related to a content item, a type of response is first determined. Based on the type of response, and using the dialogue metadata, a portion of dialogue is identified for insertion into the response. The identified portion of dialogue is retrieved and inserted at an appropriate position within the response. The response is then generated for output. | 05-05-2022 |
20220138429 | METHOD AND APPARATUS FOR MATCHING WIRELESS HOTSPOT AND POINT OF INTEREST - This disclosure relates to a method for matching a wireless hotspot and a point of interest. The method may include performing semantic normalization preprocessing on a wireless hotspot name and a point of interest name respectively. The method may further include determining a first similarity between the preprocessed wireless hotspot name and the preprocessed point of interest name according to a matching result of consecutive common strings of the preprocessed wireless hotspot name and the preprocessed point of interest name. The method may further include determining a second similarity between the preprocessed wireless hotspot name and the preprocessed point of interest name according to a length difference between lengths of the preprocessed wireless hotspot name and the preprocessed point of interest name. The method may further include determining a matching result of the wireless hotspot and the point of interest according to the first similarity or the second similarity. | 05-05-2022 |
20220138432 | RELYING ON DISCOURSE ANALYSIS TO ANSWER COMPLEX QUESTIONS BY NEURAL MACHINE READING COMPREHENSION - An autonomous agent receives a user query comprising the complex question. The agent can obtain, from a corpus of unstructured texts, an answer candidate text corresponding to the user query and comprising text from which the answer is subsequently identified. The agent may generate first linguistic data corresponding to the user query and second linguistic data corresponding to the answer candidate text. Each instance of linguistic data may comprise a combination of respective syntactic data, semantic data, and discourse data generated from the user query and/or answer candidate text. Both instances of linguistic data may be provided to a machine-learning model that has been previously trained to output an answer identified from an instance of unstructured text (e.g., the answer candidate text). The model may output the answer identified from the answer candidate text, which in turn may be provided in response to the user query. | 05-05-2022 |
20220138434 | GENERATION APPARATUS, GENERATION METHOD AND PROGRAM - Included are input means for inputting first data that is data relating to a plurality of letters included in a text string that is a generation target, and generating means for generating second data that is data relating to the text string that satisfies predetermined constraint conditions including at least a condition relating to plausibility of the sequence of letters, on the basis of the first data. | 05-05-2022 |
20220138438 | TEXT GENERATION APPARATUS, TEXT GENERATION METHOD, TEXT GENERATION LEARNING APPARATUS, TEXT GENERATION LEARNING METHOD AND PROGRAM - A sentence generation device has: an estimation unit for receiving input of a first sentence and a focus point related to generation of a second sentence to be generated based on the first sentence, and estimating importance of each word constituting the first sentence using a pre-trained model; and a generation unit for generating the second sentence based on the importance, and thus makes it possible to evaluate importance of a constituent element of an input sentence in correspondence with a designated focus point. | 05-05-2022 |
20220138591 | DEVELOPING EVENT-SPECIFIC PROVISIONAL KNOWLEDGE GRAPHS - Techniques and a framework are described herein for constructing and/or updating, e.g., on top of a general-purpose knowledge graph, an “event-specific provisional knowledge graph.” In various implementations, live data stream(s) may be analyzed to identify entity(s) associated with a developing event. The entity(s) may form part of a general-purpose knowledge graph that includes entity nodes and edges between the entity nodes. Based on the identified one or more entities, an event-specific provisional knowledge graph may be constructed or updated in association with the developing event. In some implementations, the event-specific provisional knowledge graph may be queried for new information about the developing event. Computing devices may be caused to render, as output, the new information. | 05-05-2022 |
20220139385 | SCALABLE MULTI-SERVICE VIRTUAL ASSISTANT PLATFORM USING MACHINE LEARNING - The present invention is a masterbot architecture in a scalable multi-service virtual assistant platform that can construct a fluid and dynamic dialogue by assembling responses to end user utterances from two kinds of agents, information agents and action agents. A plurality of information agents obtain at least one information value from a parsed user input and/or contextual data. A plurality of action agents perform one or more actions in response to the parsed user input, the contextual data, and/or the information value. A masterbot arbitrates an activation of the plurality of information agents and the plurality of action agents. The masterbot comprises access to a machine-learning module to select an appropriate action agent, where one or more information agents are activated based on the selected appropriate action agent. | 05-05-2022 |