Dictionary building, modification, or prioritization

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704001000 - LINGUISTICS

Patent class list (only not empty are listed)

Deeper subclasses:

Document	Title	Date
Entries
20080201133	SYSTEM AND METHOD FOR SEMANTIC CATEGORIZATION - There is disclosed a system and method for automatically performing semantic categorization. In one embodiment at least one text description pertaining to a category set is accepted along with words that are anticipated to be uttered by a user pertaining to that category set; lexical chaining confidence score is attached to each pair matched between the anticipated words and the accepted text description. These confidence scores are used subsequently by a categorization circuit that accepts a text phrase utterance from an input source along with a category set pertaining to the accepted utterance. The categorization circuit, in one embodiment, creates word pairs matched between the accepted text phrase utterance and the accepted category set. From these word scores, the category pertaining to the utterance is determined based, at least in part, on the assigned lexical chaining confidence scores as previously determined.	08-21-2008
20080201134	COMPUTER-READABLE RECORD MEDIUM IN WHICH NAMED ENTITY EXTRACTION PROGRAM IS RECORDED, NAMED ENTITY EXTRACTION METHOD AND NAMED ENTITY EXTRACTION APPARATUS - A named entity extraction apparatus includes an extraction result acquisition unit for acquiring a named entity extraction result obtained as a result of a named entity extraction process; and a lexicon information creation unit for creating lexicon information which is utilized as clues in extracting named entities from text data, on the basis of the named entity extraction result acquired by said extraction result acquisition unit.	08-21-2008
20080215313	Speech and Textual Analysis Device and Corresponding Method - A speech and textual analysis device and method for forming a search and/or classification catalog. The device is based on a linguistic database and includes a taxonomy table containing variable taxon nodes. The speech and textual analysis device includes a weighting module, a weighting parameter being additionally assigned to each taxon node to register the recurrence frequency of terms in the linguistic and/or textual data that is to be classified and/or sorted. The speech and/or textual analysis device includes an integration module for determining a predefinable number of agglomerates based on the weighting parameters of the taxon nodes in the taxonomy table and at least one neuronal network module for classifying and/or sorting the speech and/or textual data based on the agglomerates in the taxonomy table.	09-04-2008
20080215314	METHOD FOR ADAPTING A K-MEANS TEXT CLUSTERING TO EMERGING DATA - A method and structure for clustering documents in datasets which include clustering first documents and a first dataset to produce first document classes, creating centroid seeds based on the first document classes, and clustering second documents in a second dataset using the centroid seeds, wherein the first dataset and the second dataset are related. The clustering of the first documents in the first dataset forms a first dictionary of most common words in the first dataset and generates a first vector space model by counting, for each word in the first dictionary, a number of the first documents in which the word occurs, and clusters the first documents in the first dataset based on the first vector space model, and further generates a second vector space model by counting, for each word in the first dictionary, a number of the second documents in which the word occurs. Creation of the centroid seeds includes classifying second vector space model using the first document classes to produce a classified second vector space model and determining a mean of vectors in each class in the classified second vector space model, the mean includes the centroid seeds.	09-04-2008
20080228469	ROLLUP FUNCTIONS FOR EFFICIENT STORAGE, PRESENTATION, AND ANALYSIS OF DATA - Methods of organizing a series of sibling data entities in a digital computer are provided for preserving sibling ranking information associated with the sibling data entities and for attaching the sibling ranking information to a joint parent of the sibling data entities to facilitate on-demand generation of ranked parent candidates. A rollup function of the present invention builds a rollup matrix (	09-18-2008
20080243485	METHOD, APPARATUS, SYSTEM, USER INTERFACE AND COMPUTER PROGRAM PRODUCT FOR USE WITH MANAGING CONTENT - A method for adding content to a dictionary for use with a communication terminal, including: parsing a media content for one or more expressions; extracting the expressions from the media content; and providing the expressions to the dictionary for subsequent retrieval.	10-02-2008
20080243486	Apparatus and Method for Identifying Unknown Word Based on a Definition - An apparatus and method of identifying an unknown word based on a known meaning, or word identification system. The word identification system allows a user that knows the meaning of a word but cannot recall the word that corresponds to the known meaning to identify the word through a series of simple questions. Each of the questions elicits information known about the unknown word through the knowledge of the meaning of the word and its use in language. With each response, the number of words matching the definitional characteristics is reduced until a set of one or more probable words and definitions for the unknown word is finally presented to the user.	10-02-2008
20080243487	HYBRID TEXT SEGMENTATION USING N-GRAMS AND LEXICAL INFORMATION - A hybrid n-gram/lexical analysis tokenization system including a lexicon and a hybrid tokenizer operative to perform both N-gram tokenization of a text and lexical analysis tokenization of a text using the lexican, and to construct either of an index and a classifier from the results of both of the N-gram tokenization and the lexical analysis tokenization, where the hybrid tokenizer is implemented in at least one of computer hardware and computer software and is embodied within a computer-readable medium.	10-02-2008
20080243488	AUTOMATED GLOSSARY CREATION - A method and device for creating a glossary includes a processor operable for executing computer instructions for identifying, in at least one information source, at least one glossary item identifying a part or a component, determining at least one glossary item form as a canonical form, defining, by using the canonical form, at least one syntactic structure, that includes one of the at least one identified glossary items, for each of at least one semantic classes, and searching a second information source for the at least one syntactic structure of the semantic class.	10-02-2008
20080255826	Dictionary data generating apparatus, character input apparatus, dictionary data generating method, and character input method - A dictionary data generating apparatus is disclosed. The apparatus includes: an acquiring part configured to acquire a current issue keyword from inputted information including a current issue keyword; and a generating part configured to generate current issue dictionary data for prediction conversion based on the current issue keyword acquired by the acquiring part.	10-16-2008
20080262832	Document Processing Device, and Document Processing Method - A technique is provided for adding an annotation to a document described in a markup language. Upon acquisition of a document described in a markup language, a document processing apparatus	10-23-2008
20080262833	Document Processing Device and Document Processing Method - A technique is provided for adding an annotation to a document described in a markup language. Upon the document processing apparatus	10-23-2008
20080270122	System for handling novel words in a spellchecking module - A system for adding words to an online dictionary used for spellchecking is described. A spellchecker module compares words of an electronic document with words in the online dictionary and identifies a word in the electronic document that is missing from the dictionary. After a user indicates a desire to add the missing word to the dictionary, the spellchecker module determines at least one related-word form of the missing word. The related-word forms depend upon the part of speech of the missing word. The spellchecker can prompt the user to identify the part of speech and then to verify each determined related-word form. The spellchecker concurrently adds the missing word and at least one related-word form of the missing word to the online dictionary in a single ‘add-to-dictionary’ operation.	10-30-2008
20080281582	Input system for mobile search and method therefor - An input system for mobile search and a method therefor are provided. The input system includes an input module receiving a code input for a specific term and a voice input corresponding thereto, a database including a glossary and an acoustic model, wherein the glossary includes a plurality of terms and a sequence list, and each of the terms has a search weight based on an order of the sequence list, a process module selecting a first number of candidate terms from the glossary according to the code input by using an input algorithm and obtaining a second number of candidate terms by using a speech recognition algorithm to compare the voice input with the first number of candidate terms via the acoustic model, wherein the second number of candidate terms are listed in a particular order based on their respective search weights, and an output module showing the second number of candidate terms in the particular order for selecting the specific term therefrom.	11-13-2008
20080281583	CONTEXT-DEPENDENT PREDICTION AND LEARNING WITH A UNIVERSAL RE-ENTRANT PREDICTIVE TEXT INPUT SOFTWARE COMPONENT - A system and method for supporting predictive text entry in software applications by sharing a common, predictive, software text-entry widget within a consumer device across multiple software applications and input contexts. The method comprises: a software application invoking an instance of a text-entry widget in a particular input context, the application optionally providing the widget a description of allowed symbols and a dictionary of expected symbol strings associated with the current context, the widget modifying a virtual keyboard display and predictive algorithm data according to the allowed symbols and dictionary, the user entering text via the widget, the widget returning the entered text to the application, and the application optionally including information derived from entered text in the associated dictionary to enhance the predictive capability of the widget on future invocations.	11-13-2008
20080306731	ELECTRONIC EQUIPMENT EQUIPPED WITH DICTIONARY FUNCTION - Disclosed is an electronic dictionary including a plurality of dictionary databases. In a twin retrieval using two dictionaries, a sub-dictionary according to a main dictionary is set besides the main dictionary. When a user inputs a retrieval character string, the electronic dictionary retrieves corresponding headwords from the main dictionary to produce a headword list and displays it. Furthermore, the electronic dictionary displays explanation information pertaining to one headword in the headword list. In addition to this, the electronic dictionary reads the explanation information corresponding to the headword of the main dictionary specified with a cursor from a sub-dictionary database to display the explanation information besides that of the main dictionary. If the specification of a headword of the main dictionary is changed by moving the cursor, then the electronic dictionary re-retrieves the changed headword from the sub-dictionary, and changes the explanation information of the sub-dictionary to display it.	12-11-2008
20080312910	DICTIONARY WORD AND PHRASE DETERMINATION - Candidate words in search queries are identified, each candidate word including one or more consecutive characters. For each candidate word, a first count is determined, the first count representing a number of times that the candidate word is the only word in the search queries, and a second count is determined, the second count representing a number of times that the candidate word and one or more other words are included in each of the search queries. One or more of the candidate words are added to an input method editor dictionary based on a relationship between the first count and the second count.	12-18-2008
20080312911	DICTIONARY WORD AND PHRASE DETERMINATION - Context signals in documents are identified, characters bounded by the context signals are identified, one or more candidate words defined by the characters bounded by the context signals are identified, and one or more of the candidate words are added to an input method editor dictionary.	12-18-2008
20080319738	WORD PROBABILITY DETERMINATION - A word corpus is identified and a word probability value is associated with each word in the word corpus. A sentence is identified, candidate segmentations of the sentence are determined based on the word corpus, and the associated probability value for each word in the word corpus is iteratively adjusted based on the probability values associated with the words and the candidate segmentations.	12-25-2008
20090018822	METHODS AND APPARATUS FOR BUSINESS RULES AUTHORING AND OPERATION EMPLOYING A CUSTOMIZABLE VOCABULARY - In one embodiment, a method comprises creating at least one individualized language resource, creating at least one individualized language rule referencing at least one of said individualized language resource, and transforming said at least one individualized language rule into computer executable format.	01-15-2009
20090048824	ACOUSTIC SIGNAL PROCESSING METHOD AND APPARATUS - An audible signal process method includes preparing, in at least one dictionary, a plurality of weighting factors each learned to optimize evaluation function established by a weighted learning audible signal and a target speech signal corresponding to the learning audible signal and used for weighting, estimating a noise component included in the input audible signal, calculating a feature quantity depending upon the noise component of the input audible signal, selecting a weighting factor corresponding to the feature quantity from the dictionary, and weighting the input audible signal using the selected weighting factor to generate a processed output audible signal.	02-19-2009
20090055168	Word Detection - Methods, systems, and apparatus, including computer program products, in which data from web documents are partitioned into a training corpus and a development corpus are provided. First word probabilities for words are determined for the training corpus, and second word probabilities for the words are determined for the development corpus. Uncertainty values based on the word probabilities for the training corpus and the development corpus are compared, and new words are identified based on the comparison.	02-26-2009
20090063134	Media Content Assessment and Control Systems - Computer implemented methods and computing systems wherein relationships of words or phrases within a textual corpus are assessed via frequencies of occurrence of particular words or phrases and via frequencies of co-occurrence of particular pairs of words or phrases within defined tracts of text from within the textual corpus.	03-05-2009
20090063135	Handheld Electronic Device and Method Employing Logical Proximity of Characters in Spell Checking - An improved handheld electronic device and associated method employing an improved spell checking routine enable proposed spelling corrections having a close logical proximity to an active input to be output at a position of preference for easy selection by the user. By way of example, a base character and the various accented forms thereof can be said to have a logical proximity to one another that is closer than their logical proximity to any character having a different base character, whether additionally having a diacritical element or not.	03-05-2009
20090063136	Information processing apparatus, information processing method, and information processing program - An information processing apparatus includes a foreign language dictionary, an input unit through which a user input a letter, a determination unit, a storage unit storing notation information about the letter, a search unit, a converter, and an output unit. The foreign language dictionary stores each foreign word in one of uppercase, lowercase, and a combination of at least one uppercase letter and at least one lowercase letter. The determination unit determines whether a letter input through the input unit is uppercase or lowercase. The search unit searches the foreign language dictionary for a word corresponding to an initially input string of letters in the order in which the letters are input. The converter converts a notation of each letter included in the word retrieved by the search unit according to the notation information. The output unit outputs the word in a notation converted by the converter as a candidate word.	03-05-2009
20090076800	Dual Cross-Media Relevance Model for Image Annotation - A dual cross-media relevance model (DCMRM) is used for automatic image annotation. In contrast to the traditional relevance models which calculate the joint probability of words and images over a training image database, the DCMRM model estimates the joint probability by calculating the expectation over words in a predefined lexicon. The DCMRM model may be advantageous because a predefined lexicon potentially has better behavior than a training image database. The DCMRM model also takes advantage of content-based techniques and image search techniques to define the word-to-image and word-to-word relations involved in image annotation. Both relations can be estimated by using image search techniques on the web data as well as available training data.	03-19-2009
20090089048	Two-Pass Hash Extraction of Text Strings - Data compression and key word recognition may be provided. A first pass may walk a text string, generate terms, and calculate a hash value for each generated term. For each hash value, a hash bucket may be created where an associated occurrence count may be maintained. The hash buckets may be sorted by occurrence count and a few top buckets may be kept. Once those top buckets are known, a second pass may walk the text string, generate terms, and calculate a hash value for each term. If the hash values of terms match hash values of one of the kept buckets, then the term may be considered a frequent term. Consequently, the term may be added to a dictionary along with a corresponding frequency count. Then, the dictionary may be examined to remove terms that may not be frequent, but appeared due to hash collisions.	04-02-2009
20090132237	Orthogonal classification of words in multichannel speech recognizers - A computerized method for distribution among a multiple dictionaries of a target vocabulary. The vocabulary includes words for use in a speech recognition application installed in a computer system. Each word of the target vocabulary is found in only one of the dictionaries. The words are first categorized based on phonetic length, and distributed into multiple groups each of equal phonetic length. The first groups are secondly categorized based on combinations of vowel sounds. The words of the first groups are placed into second groups accordingly based on having identical vowel sounds. The second groups are thirdly categorized into third groups based on the consonants of the words of the second groups and placement of the consonants relative to the vowel sounds. The words within each of the third groups are compared in pairs for phonetic distance and the words of minimal pairwise phonetic distance between them are placed in fourth groups. The words of each of the fourth groups are distributed into the multiple dictionaries, preferably with no more than one member per fourth group distributed into each of the dictionaries. The multiple dictionaries are preferably mutually orthogonal, that is each of the dictionaries includes words of maximal phonetic distance from each other.	05-21-2009
20090144051	METHOD OF PROVIDING PERSONAL DICTIONARY - Disclosed is a method of using a computerized dictionary for building an electronic word list. The method can include accessing a computerized dictionary, opening an entry of the dictionary for a word, the entry comprising a listing of a plurality of descriptions of the word, selecting a first one of the plurality of descriptions, and causing to store the first description or a portion thereof in an electronic word list of a user.	06-04-2009
20090144052	METHOD AND SYSTEM FOR PROVIDING CONVERSATION DICTIONARY SERVICES BASED ON USER CREATED DIALOG DATA - A method for providing a conversation dictionary service includes the steps of: (a) receiving a request for editing conversation expressions from a user; (b) providing the user with a format page for editing the expressions; (c) creating expression data by connecting the information inputted in a conversation entry window and a situation entry window with identification information on the user; and (d) recording the expression data in an expression database. The service can make a user edit conversation expressions and expanding the coverage of the conversation expressions, thereby providing more various examples reflecting the newly or uniquely used conversation expressions in a daily life.	06-04-2009
20090157390	Method and Apparatus for Discovering and Classifying Polysemous Word Instances in Web Documents - A method and apparatus for discovering polysemous words and classifying polysemous words found in web documents. All document corpi in any natural language have words that have multiple usage contexts or words that have multiple meanings. Semantic analysis is not feasible for classifying all word occurrences in all documents on the web, which contain trillions of words in total. In addition, semantic analysis typically cannot distinguish multiple usages of a given meaning of a given word. In one embodiment of this invention, polysemous words in natural languages can be discovered by analyzing the co-occurrence of other words with the polysemous word in web documents. In one embodiment, the multiple meanings and usages of a polysemous word can be determined by analyzing the co-occurrences of other words with the polysemous word. No semantic analysis is used in discovering or classifying polysemous words.	06-18-2009
20090177463	Media Content Assessment and Control Systems - Computer implemented methods, computing devices, and computing systems, wherein relationships of words or phrases within a textual corpus are assessed via frequencies of occurrence of particular words or phrases and via frequencies of co-occurrence of particular pairs of words or phrases within defined tracts of text from within the textual corpus.	07-09-2009
20090187401	HANDHELD ELECTRONIC DEVICE AND ASSOCIATED METHOD FOR OBTAINING NEW LANGUAGE OBJECTS FOR A TEMPORARY DICTIONARY USED BY A DISAMBIGUATION ROUTINE ON THE DEVICE - A method of obtaining data for use on a handheld electronic device having a text disambiguation function which employs a dictionary including a number of first language objects that is usable by the text disambiguation function. The method includes receiving a block of information, displaying a representation of at least a portion of the block of information, and determining whether at least one of one or more predetermined events has occurred, wherein each of the events have been deemed to indicate that the information includes language objects that is accessible for use in disambiguating ambiguous inputs. If it is determined that at least one of the events has occurred, the method further includes obtaining a number of second language objects from the block of information and storing the number of second language objects in a temporary dictionary included in the memory and usable by the text disambiguation function.	07-23-2009
20090204392	Communication terminal having speech recognition function, update support device for speech recognition dictionary thereof, and update method - A simple means for expanding a speech recognition dictionary between communication terminals is provided. A speech recognition dictionary update support device (	08-13-2009
20090248401	System and Methods For Using Short-Hand Interpretation Dictionaries In Collaboration Environments - A method for creating and using a short-hand interpretation dictionary in a collaboration environment includes creating or editing a document in a collaboration environment, said document comprising at least one short-hand notation; and replacing the at least one short-hand notation with an interpretation from at least one short-hand dictionary.	10-01-2009
20090265163	SYSTEMS AND METHODS TO ENABLE INTERACTIVITY AMONG A PLURALITY OF DEVICES - Methods and systems to exchange and display data among a plurality of devices in response to one or more of user input and context-based information. User input may include one or more of motion, speech, text, pointing, and touch-selecting. Context-based information may include one or more of user location, which may be relative to one or more devices, background audio, information related to one or more products and/or services, and user-based context information. User context-based information may correspond one or more of prior transactions, prior activities, prior content exposure, and demographic information. Also disclosed herein are methods and systems to correlate user speech to one or more of commands and data objects, with respect to context-based information. Methods and systems to recognize speech may be implemented in combination with methods and systems to exchange and/or display of data among a plurality of devices, and in other environments.	10-22-2009
20090271180	DICTIONARY FOR TEXTUAL DATA COMPRESSION AND DECOMPRESSION - A dictionary for compressing and decompressing textual data has a number of keys. Each key is associated with an identifier. The keys include static word or phrase keys, where each static word or phrase key lists one or more unchanging words in a particular order. The keys further include dynamic phrase keys, where each dynamic phrase key lists a number of words and one or more placeholders in a particular order, and each placeholder denotes a place where a word or phrase other than the words of the dynamic phrase key is to be inserted. At least one of the dynamic phrase keys may identify one or more of the words by identifiers for corresponding static words or phrase keys. At least one of the static word or phrase keys may identify one or more of the words by identifiers for corresponding other static words or phrase keys.	10-29-2009
20090271181	DICTIONARY FOR TEXTUAL DATA COMPRESSION AND DECOMPRESSION - A dictionary for compressing and decompressing textual data has a number of keys. Each key is associated with an identifier. The keys include static word or phrase keys, where each static word or phrase key lists one or more unchanging words in a particular order. The keys further include dynamic phrase keys, where each dynamic phrase key lists a number of words and one or more placeholders in a particular order, and each placeholder denotes a place where a word or phrase other than the words of the dynamic phrase key is to be inserted. At least one of the dynamic phrase keys may identify one or more of the words by identifiers for corresponding static words or phrase keys. At least one of the static word or phrase keys may identify one or more of the words by identifiers for corresponding other static words or phrase keys.	10-29-2009
20090287476	METHOD AND SYSTEM FOR EXTRACTING INFORMATION FROM UNSTRUCTURED TEXT USING SYMBOLIC MACHINE LEARNING - A method (and structure) of extracting information from text, includes parsing an input sample of text to form a parse tree and using user inputs to define a machine-labeled learning pattern from the parse tree.	11-19-2009
20090292530	Method and system for grammar relaxation - The method and system for modifications of grammars presented in this invention applies to automatic speech recognition systems which take a spoken utterance as input and use a grammar to assign word sequence(s) and, possibly, one or more semantic interpretations to that utterance. One type of modification may take a form of reducing the importance of select grammar components based on the analysis of the occurrence of these components in the original grammar. Another type of modification may take form of adding new grammar components to the grammar of some semantic interpretations based on the analysis of the occurrence of these components in the select set of other semantic interpretations. Both modifications can be carried out either automatically or offered for validation. Some benefits of the presented method and system are: reduced effort for building grammars, improvement of recognition accuracy, automatic adaptation of dynamic grammars to the context.	11-26-2009
20090299732	CONTEXTUAL DICTIONARY INTERPRETATION FOR TRANSLATION - A method and apparatus provides for interpreting a foreign word or phrase using a contextual likelihood model and a dictionary. An apparatus may translate foreign language text by taking context into account and displaying the translation with alternatives on an adaptive user interface display. The contextual likelihood model may be interlaced with a dictionary. In an embodiment, the interaction between the contextual likelihood model and a dictionary may result in an adaptive adjustment of the meanings or the order of meanings displayed. The order of meanings displayed may be representative of the calculated likelihoods.	12-03-2009
20090306969	Systems and Methods for an Automated Personalized Dictionary Generator for Portable Devices - A system and method for automated dictionary population is provided to facilitate the entry of textual material in dictionaries for enhancing word prediction. The automated dictionary population system is useful in association with a mobile device including at least one dictionary which includes entries. The device receives a communication which is parsed and textual data extracted. The text is compared to the entries of the dictionaries to identify new words. Statistical information for the parsed words, including word usage frequency, recency, or likelihood of use, is generated. Profanities may be processed by identifying profanities, modifying the profanities, and asking the user to provide feedback. Phrases are identified by phrase markers. Lastly, the new words are stored in a supplementary word list as single words or by linking the words of the identified phrases to preserve any phrase relationships. Likewise, the statistical information may be stored.	12-10-2009
20090313008	Information apparatus for use in mobile unit - An information apparatus for use in mobile unit, which is mounted on a mobile unit, includes at least a broadcast receiver	12-17-2009
20090326927	ADAPTIVE GENERATION OF OUT-OF-DICTIONARY PERSONALIZED LONG WORDS - A system is provided, including a display unit, a memory unit, and a processor. The processor is configured to calculate a mutual information value between a first chunk and a second chunk, and to add a new word to a language unit when a condition involving the mutual information value is satisfied. The new word is a combination of the first chunk and the second chunk. The processor is also configured to add the new word into an n-gram store. The n-gram store includes a plurality of n-grams and associated frequency or count information. The processor is also configured to alter the frequency or count information based on the new word.	12-31-2009
20100010806	Storage system for symptom information of Traditional Chinese Medicine (TCM) and method for storing TCM symptom information - A storage system is disclosed for symptom information of Traditional Chinese Medicine (TCM). In at least one embodiment, the system includes a processing module, a TCM standard data module and a storage module, wherein: a TCM specialized glossary and the correlated attributes of the TCM specialized glossary are stored in said TCM standard data module; the processing module is used for dividing the TCM symptom information into at least one phrase, matching the phrase(s) on the basis of the TCM specialized glossary so as to obtain terms belonging to said TCM specialized glossary, for establishing correlated relationships of terms in the phrase(s) according to the correlation attributes in the TCM specialized glossary, and for storing the terms for which a correlated relationship has been established as structured data in said storage module. At least one embodiment of the present invention further discloses a method for storing TCM symptom information. By implementing at least one embodiment of the present invention, symptom information recorded in any language customary to doctors can be accepted, thus reducing the complexity in recording symptom information, thereby facilitating the input thereof by the doctor.	01-14-2010
20100036655	PROBABILITY-BASED APPROACH TO RECOGNITION OF USER-ENTERED DATA - A method for entering keys in a small key pad is provided. The method comprising the steps of: providing at least a part of keyboard having a plurality of keys; and predetermining a first probability of a user striking a key among the plurality of keys. The method further uses a dictionary of selected words associated with the key pad and/or a user.	02-11-2010
20100042405	RELATED WORD PRESENTATION DEVICE - A related word presentation device (	02-18-2010
20100049504	MEASURING TOPICAL COHERENCE OF KEYWORD SETS - Methods and apparatus are described for measuring the topical coherence of a keyword set while simultaneously partitioning the set into contextually related clusters.	02-25-2010
20100076751	VOICE RECOGNITION SYSTEM - A voice recognition system used for onboard equipment having a genre database (DB) that stores search target vocabularies in accordance with respective genres. It has a mike	03-25-2010
20100076752	Automated Data Cleanup - The described implementations relate to automated data cleanup. One system includes a language model generated from language model seed text and a dictionary of possible data substitutions. This system also includes a transducer configured to cleanse a corpus utilizing the language model and the dictionary.	03-25-2010
20100082333	LEMMATIZING, STEMMING, AND QUERY EXPANSION METHOD AND SYSTEM - A method of stemming text and system therefore are described. The method comprises removing stop words from a document based on at least one stop word entry in an array of stop words and flagging as nouns words determined to be attached to definite articles and preceded by a noun array entry in an array of stop words preceding at least one noun; adding flagged nouns to a noun dictionary; flagging as verbs words determined to be preceded by an verb array entry in an array of stop words preceding at least one verb; adding flagged verbs to a verb dictionary; searching the document for nouns and verbs based on the flagged nouns and the flagged verbs; removing remaining stop words subsequent to searching the document; applying light stemming on the flagged nouns; applying a root-based stemming on the flagged verbs; and storing the stemmed document.	04-01-2010
20100114564	DYNAMIC UPDATE OF GRAMMAR FOR INTERACTIVE VOICE RESPONSE - A device provides a question to a user, and receives, from the user, an unrecognized voice response to the question. The device also provides the unrecognized voice response to an utterance agent for determination of the unrecognized voice response without user involvement, and provides an additional question to the user prior to receiving the determination of the unrecognized voice response from the utterance agent.	05-06-2010
20100131267	SPEECH SAMPLES LIBRARY FOR TEXT-TO-SPEECH AND METHODS AND APPARATUS FOR GENERATING AND USING SAME - A method of recording speech for use in a speech samples library. In an exemplary embodiment, the method comprises recording a speaker pronouncing a phoneme with musical parameters characterizing pronunciation of another phoneme by the same or another speaker. For example, in one embodiment the method comprises: providing a recording of a first speaker pronouncing a first phoneme in a phonemic context. The pronunciation is characterized by some musical parameters. A second reader, who may be the same as the first reader, is then recorded pronouncing a second phoneme (different from the first phoneme) with the musical parameters that characterizes pronunciation of the first phoneme by the first speaker. The recordings made by the second reader are used for compiling a speech samples library.	05-27-2010
20100138217	METHOD FOR CONSTRUCTING CHINESE DICTIONARY AND APPARATUS AND STORAGE MEDIA USING THE SAME - A method for constructing a Chinese dictionary is disclosed, including determining a probability for nominalization of a Chinese term with a given collocation term according to a determination rule and the correlation between the Chinese term and its corresponding collocations, wherein the Chinese term is determined to be a verb part-of-speech. The method further includes modifying the verb part-of-speech of the Chinese term with the given collocation term to an appropriate part-of-speech when the probability for nominalization of the Chinese term with the given collocation term is higher than a predetermined value, and storing the correlation between the Chinese term, the given collocation term and the appropriate part-of-speech in a storage device.	06-03-2010
20100145680	METHOD AND APPARATUS FOR SPEECH RECOGNITION USING DOMAIN ONTOLOGY - A speech recognition method using a domain ontology includes: constructing domain ontology DB; forming a speech recognition grammar using the formed domain ontology DB; extracting a feature vector from a speech signal; modeling the speech signal using an acoustic model. The method performs speech recognition by using the acoustic model, the speech recognition dictionary and the speech recognition grammar on the basis of the feature vector.	06-10-2010
20100161318	Systems and Methods of Building and Using Custom Word Lists - Standard word lists that are often used for such operations as predictive text, spell checking, and word completion are based on general linguistic data that might not accurately reflect actual text usage patterns of particular users. Systems and methods of building and using a custom word list for use in text operations on an electronic device are provided. A collection of text items associated with a user of the electronic device is scanned to identify words in the text items. A weighting is then assigned to each identified word, and the words and corresponding weightings are stored.	06-24-2010
20100174528	CREATING A TERMS DICTIONARY WITH NAMED ENTITIES OR TERMINOLOGIES INCLUDED IN TEXT DATA - A computer system of an embodiment of the disclosure can be used to automatically create or populate a terms dictionary using a set of computing units. A morphological analysis unit can acquire token sequence data by performing morphological analysis for the text data. A category distinguishing unit can distinguish tokens of the token sequence data by using a category dictionary to extract uncategorized words. An uncategorized-word comparing unit can compare each of the extracted uncategorized words with an uncategorized-word comparison rule to extract an uncategorized word matching the uncategorized-word comparison rule as a registration candidate word. A token-sequence comparing unit can compare a token sequence of the token sequence data with a token-sequence comparison rule to extract a token sequence matching the token-sequence comparison rule as registration candidate words. A permission unit can permit a user to select whether to register the registration candidate words in the category dictionary.	07-08-2010
20100174529	Explicit Character Filtering of Ambiguous Text Entry - The present invention relates to a method and apparatus for explicit filtering in ambiguous text entry. The invention provides embodiments including various explicit text entry methodologies, such as 2-key and long pressing. The invention also provides means for matching words in a database using build around methodology, stem locking methodology, word completion methodology, and n-gram searches.	07-08-2010
20100185438	METHOD OF CREATING A DICTIONARY - An apparatus, program product and method for creating a dictionary. The method may be performed automated, semi-automated or manually. Dictionary allows entries to be stored with a plurality of data elements.	07-22-2010
20100211381	System and Method of Creating and Using Compact Linguistic Data - A system and method of creating and using compact linguistic data are provided. Frequencies of words appearing in a corpus are calculated. Each unique character in the words is mapped to a character index, and characters in the words are replaced with the character indexes. Sequences of characters are mapped to substitution indexes, and the sequences of characters in the words are replaced with the substitution indexes. The words are grouped by common prefixes, and each prefix is mapped to location information for the group of words which start with the prefix.	08-19-2010
20100250239	SHARABLE DISTRIBUTED DICTIONARY FOR APPLICATIONS - Architecture for providing and processing a dictionary in a universal format such as XML, for example. The dictionary can be authored while in the universal format, designated for use with multiple compatible applications, and compiled on-the-fly using a dictionary compiler. The dictionary can be shared and/or distributed via a web server, e-mail, and other suitable data transmission techniques. Once downloaded to the client application, the dictionary is registered with the requesting client application for use. With this model, the dictionary created by a user for a specific domain and for a specific application can be easily reused by other applications, and shared among the users belonging to the same domain.	09-30-2010
20100250240	SYSTEM AND METHOD FOR TRAINING AN ACOUSTIC MODEL WITH REDUCED FEATURE SPACE VARIATION - Feature space variation associated with specific text elements is reduced by training an acoustic model with a phoneme set, dictionary and transcription set configured to better distinguish the specific text elements and at least some specific phonemes associated therewith. The specific text elements can include the most frequently occurring text elements from a text data set, which can include text data beyond the transcriptions of a training data set. The specific text elements can be identified using a text element distribution table sorted by occurrence within the text data set. Specific phonemes can be limited to consonant phonemes to improve speed and accuracy.	09-30-2010
20100250241	Non-dialogue-based Learning Apparatus and Dialogue-based Learning Apparatus - The invention provides a dialogue-based learning apparatus through dialogue with users comprising: a speech input unit (	09-30-2010
20100299143	Voice Recognition Dictionary Generation Apparatus and Voice Recognition Dictionary Generation Method - A voice recognition dictionary generation apparatus and method for suppressing reduction of processing speed at the time of updating. The apparatus includes an input unit configured to receive a text subjected to voice recognition, a storage unit configured to store the text with respect to each file of a predetermined item, a reading data generation unit configured to analyze the text and generate a reading data, and a voice recognition dictionary configured to include content dictionaries that store therein the reading data of the text with respect to each file of the predetermined item. When the file of the predetermined item including the text stored in the storage unit is updated, a control unit detects a total number of the content dictionaries, and when the total number is smaller than a predetermined limit, the control unit generates the content dictionaries with respect to each updated predetermined item.	11-25-2010
20110035211	SYSTEMS, METHODS AND APPARATUS FOR RELATIVE FREQUENCY BASED PHRASE MINING - Example systems, methods, processes, and apparatus identify phrases in electronic information. One or more phrase dictionaries are created from content in one or more electronic documents. A relative frequency value is generated for each phrase in each of the one or more phrase dictionaries. The relative frequency value for a phrase is based at least in part on a comparison between a frequency of the phrase in the electronic document and a frequency of each individual word in the phrase. One or more phrases are selected based at least in part on a threshold and the relative frequency value generated for each phrase. The selected one or more phrases and the relative frequency values associated with each of the selected one or more phrases are output for graphical display to a user.	02-10-2011
20110066425	SYSTEMS, METHODS, AND APPARATUS FOR AUTOMATED MAPPING AND INTEGRATED WORKFLOW OF A CONTROLLED MEDICAL VOCABULARY - Systems, methods, and apparatus provide clinical terminology services including a controlled medical vocabulary supplemented by local clinical content. An example method includes accessing an initial controlled medical vocabulary including at least one external terminology via a vocabulary management server; processing local clinical content including unstructured local clinical content provided via an importer framework; analyzing and extracting the unstructured local clinical content using a text analyzer and extraction tool to generate one or more proposed terms; identifying one or more synonyms for the one or more proposed terms and placing the one or more synonyms into a queue to be added to the controlled medical vocabulary; reviewing the one or more synonyms; and adding one or more synonyms to the controlled medical vocabulary with placement and relationship based on analyzing unstructured local clinical content to automatically map between the at least one external terminology and the local clinical content.	03-17-2011
20110077937	ELECTRONIC APPARATUS WITH DICTIONARY FUNCTION AND COMPUTER-READABLE MEDIUM - An electronic apparatus includes a storage which includes dictionary information, a conjugation chart database which stores conjugation charts for a language stored in the dictionary information so as to cause the charts to correspond to conjugation chart numbers, and a verb-verb conjugation chart correspondence table which stores the conjugation chart numbers so as to cause the numbers to correspond to the spellings of verbs, and a processor which causes to display letter strings stored in the dictionary information, accepts the specification of an arbitrary word from the letter strings displayed, when the specified word is a verb, refers to the verb-verb conjugation chart correspondence table and determines a conjugation chart number caused to correspond to the spelling of the specified verb, and reads a conjugation chart corresponding to the determined conjugation chart number from the conjugation charts stored in the conjugation chart database and displays the conjugation chart.	03-31-2011
20110119051	Phonetic Variation Model Building Apparatus and Method and Phonetic Recognition System and Method Thereof - A phonetic variation model building apparatus, having a phoneme database for recording at least a standard phonetic model of a language and a plurality of non-standardized phonemes of the language is provided. A phonetic variation identifier identifies a plurality of phonetic variations between the non-standardized phonemes and the standard phonetic model. A phonetic transformation calculator calculates a plurality of coefficients of a phonetic transformation function based on the phonetic variations and the phonetic transformation function. A phonetic variation model generator generates at least a phonetic variation model based on the standard phonetic model, the phonetic transformation function and the coefficients thereof.	05-19-2011
20110119052	Speech recognition dictionary creating support device, computer readable medium storing processing program, and processing method - A device extracts prosodic information including a power value from a speech data and an utterance section including a period with a power value equal to or larger than a threshold, from the speech data, divides the utterance section into each section in which a power value equal to or larger than another threshold, acquires phoneme sequence data for each divided speech data by phoneme recognition, generates clusters which is a set of the classified phoneme sequence data by clustering, calculates an evaluation value for each cluster, selects clusters for which the evaluation value is equal to or larger than a given value as candidate clusters, determines one of the phoneme sequence data from the phoneme sequence data constituting the cluster for each candidate cluster to be a representative phoneme sequence, and selects the divided speech data corresponding to the representative phoneme sequence as listening target speech data.	05-19-2011
20110131037	Vocabulary Dictionary Recompile for In-Vehicle Audio System - An in-vehicle audio system and methods are provided. A respective word or a respective phrase may be associated with each item of audio content stored in the in-vehicle audio system. The in-vehicle audio system may perform an action with respect to one of the stored items of audio content in response to a spoken command, which may include the respective word or the respective phrase associated with the one of the stored items. When audio content is to be added to the in-vehicle audio system, phonetics related to the audio content may be generated and added to a vocabulary dictionary during a compile process. When stored audio content is to be deleted from the in-vehicle audio system, phonetics related to the stored audio content to be deleted may be eliminated from the vocabulary dictionary during the compile process, which, in some embodiments, may be performed during a shutdown process.	06-02-2011
20110131038	EXCEPTION DICTIONARY CREATING UNIT, EXCEPTION DICTIONARY CREATING METHOD, AND PROGRAM THEREFOR, AS WELL AS SPEECH RECOGNITION UNIT AND SPEECH RECOGNITION METHOD - An exception dictionary creating device, an exception dictionary creating method, and a program therefor allowing creating an exception dictionary are provided for affording high speech recognition performance while reducing the size of the exception dictionary, as well as a speech recognition device and a speech recognition method capable of recognizing a speech with high accuracy of recognition by using the exception dictionary. To achieve this, a text-to-phonetic symbol converting unit (	06-02-2011
20110137642	Word Detection - Methods, systems, and apparatus, including computer program products, in which data from web documents are partitioned into a training corpus and a development corpus are provided. First word probabilities for words are determined for the training corpus, and second word probabilities for the words are determined for the development corpus. Uncertainty values based on the word probabilities for the training corpus and the development corpus are compared, and new words are identified based on the comparison.	06-09-2011
20110144978	SYSTEM AND METHOD FOR ADVANCEMENT OF VOCABULARY SKILLS AND FOR IDENTIFYING SUBJECT MATTER OF A DOCUMENT - A system and method for providing vocabulary information includes one or more computer processors that, for each of a plurality of words of a text, determine a relevance of the word to the text, and, for each of at least a subset of the plurality of words, output an indication of the respective determined relevance of the word to the text, where, for each of the plurality of words, the determination includes comparing a frequency of the word in the text to a frequency threshold.	06-16-2011
20110161073	SYSTEM AND METHOD OF DISAMBIGUATING AND SELECTING DICTIONARY DEFINITIONS FOR ONE OR MORE TARGET WORDS - Systems and methods for automatically selecting dictionary definitions for one or more target words include receiving electronic signals from an input device indicating one or more target words for which a dictionary definition is desired. The target word(s) and selected surrounding words defining an observation sequence are subjected to a part of speech tagging algorithm to electronically determine one or more most likely part of speech tags for the target word(s). Potential relations are examined between the target word(s) and selected surrounding keywords. The target word(s), the part of speech tag(s) and the discovered keyword relations are then used to map the target word(s) to one or more specific dictionary definitions. The dictionary definitions are then provided as electronic output, such as by audio and/or visual display, to a user.	06-30-2011
20110191100	LANGUAGE MODEL SCORE LOOK-AHEAD VALUE IMPARTING DEVICE, LANGUAGE MODEL SCORE LOOK-AHEAD VALUE IMPARTING METHOD, AND PROGRAM STORAGE MEDIUM - A speech recognition apparatus that performs frame synchronous beam search by using a language model score look-ahead value prevents the pruning of a correct answer hypothesis while suppressing an increase in the number of hypotheses. A language model score look-ahead value imparting device	08-04-2011
20110196672	VOICE RECOGNITION DEVICE - A voice recognition device is provided with a sentence selecting unit	08-11-2011
20110208513	SPLITTING A CHARACTER STRING INTO KEYWORD STRINGS - Systems and methods of the present invention provide for the word splitting and reliability score for an entered character string. A list of keywords may be extracted from the character string entered into a user interface on a client. These keywords may be compared to potential matches in a dictionary database and a reliability score for word splits and keywords strings may be compiled and displayed to the user. The client may also display the reliability score using a plurality of logical groupings within a reliability score process.	08-25-2011
20110238412	Method for Constructing Pronunciation Dictionaries - Embodiments of the invention disclose a system and a method for constructing a pronunciation dictionary by transforming an unaligned entry to an aligned entry. The unaligned entry and the aligned entry include a set of words and a set of pronunciations corresponding to the set of words. The method aligns each word in the aligned entry with a subset of pronunciations by determining a pronunciation prediction for each word, such that there is one-to-one correspondence between the word and the pronunciation prediction; mapping each pronunciation prediction to the subset of pronunciations to produce a predictions-pronunciation map having each pronunciation prediction aligned with the subset of pronunciations; and determining the aligned entry based on the predictions-pronunciation map using the one-to-one correspondence between the word and the pronunciation prediction.	09-29-2011
20110238413	DOMAIN DICTIONARY CREATION - Methods, systems, and apparatus, including computer program products, to identify topic words in a collection of documents that includes topic documents related to a topic are disclosed. A reference topic word divergence value based on a document collection and the topic document collection is determined. A candidate topic word divergence value for a candidate topic word is determined based on the document collection and the topic document collection. The candidate topic word is determined to be a topic word if the candidate topic word divergence value is greater than the reference topic word divergence value.	09-29-2011
20110307247	METHOD AND SYSTEM FOR LEXICAL NAVIGATION OF ITEMS - A method and a system for lexical navigation of a corpus of items are provided. For example, the method may include generating a data structure in a non-transitory, computer readable medium. The data structure may include a number of items, a number of keywords, and a frequency that each of the keywords is associated with each of the items. The method may further include generating a top-level lexical cloud that includes a subset of the keywords. Each keyword in the subset may be associated with a size that is proportional its frequency of occurrence. Finally, the method may include generating a plurality of lower-level lexical clouds by eliminating any one of the plurality of items not associated with a particular one of the keywords from the data structure, and generating the lower level lexical cloud as a second subset of the plurality of keywords that remain in the data structure.	12-15-2011
20120041758	SYNCHRONIZATION OF AN INPUT TEXT OF A SPEECH WITH A RECORDING OF THE SPEECH - A method and system for synchronizing words in an input text of a speech with a continuous recording of the speech. A received input text includes previously recorded content of the speech to be reproduced. A synthetic speech corresponding to the received input text is generated. Ratio data including a ratio between the respective pronunciation times of words included in the received text in the generated synthetic speech is computed. The ratio data is used to determine an association between erroneously recognized words of the received text and a time to reproduce each erroneously recognized word. The association is outputted in a recording medium and/or displayed on a display device.	02-16-2012
20120078617	System and Method for Increasing Recognition Rates of In-Vocabulary Words By Improving Pronunciation Modeling - The present disclosure relates to systems, methods, and computer-readable media for generating a lexicon for use with speech recognition. The method includes receiving symbolic input as labeled speech data, overgenerating potential pronunciations based on the symbolic input, identifying potential pronunciations in a speech recognition context, and storing the identified potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word. Speech data can be labeled explicitly or implicitly and can include words as text and recorded audio.	03-29-2012
20120084077	BUILDING AND CONTRACTING A LINGUISTIC DICTIONARY - A method for building and contracting a linguistic dictionary, the linguistic dictionary comprising a list of surface forms and a list of normalized forms, each normalized form being associated with a surface form, the method comprising the steps of: comparing each character of a surface form with each character of the surface form's normalized form; in response to the comparing step, determining an edit operation for each character compared; and generating a transform code from the set of the edit operations in order to transform the surface form to its normalized form.	04-05-2012
20120101811	PREDICTIVE TEXT DICTIONARY POPULATION - A method and system for populating a predictive text dictionary is provided. A connection between a handheld electronic device and a network is detected. The handheld electronic device is operable to allow a user to enter text. The handheld electronic device has a predictive text dictionary that is operable to receive and employ sets of words. User preferences for the handheld electronic device are retrieved. The predictive text dictionary of the handheld electronic device is populated with a set of words at least partially based on the user preferences.	04-26-2012
20120143598	SERVER, DICTIONARY CREATION METHOD, DICTIONARY CREATION PROGRAM, AND COMPUTER-READABLE RECORDING MEDIUM RECORDING THE PROGRAM - A search server includes a category database that stores category information containing location information indicating a geographical location, a word assigned to the location, and a user ID identifying a user having assigned the word to the location in association with one another, and a dictionary registration unit that reads first input information indicating locations to which a first word is assigned by a first user and second input information indicating locations to which a second word is assigned by a second user, and when determining that the first and second users have assigned the words to a predetermined number or more of common locations based on those information, creates dictionary data containing the first and second words in association with each other and enters the dictionary data into a dictionary database.	06-07-2012
20120173228	Definitional Method to Increase Precision and Clarity of Information - In order to know precisely and clearly what words or terms mean, the DMTIPCI definitional method by an algorithm implementing the DMTIPCI method's unique and novel steps in a computer microprocessor of iteratively deconstructing all usage predicate words of all words in any language to their primary words and storing said words with their deconstructed predicates and primary words in computer repositories and/or in printed form—the DMTIPCI Dictionary. Primary words as herein defined are words or terms that have no non-tautological words in their predicate(s). All words of any language are arranged under their primary words by another DMTIPCI algorithm implemented in a computer microprocessor creating a DMTIPCI Primary Word Dictionary. Other embodiments are described and shown.	07-05-2012
20120179455	LANGUAGE LEARNING APPARATUS AND METHOD USING GROWING PERSONAL WORD DATABASE SYSTEM - Disclosed herein is a language learning apparatus and method using a growing personal word DB system, which construct an individual person-based word DB in which known words and unknown words are stored separately. The language learning apparatus includes a word extraction unit for extracting words included in learning content and generating a word list. A word analysis unit sets learning levels of words included in the word list based on a level-based word DB. A control unit generates an individual person-based word DB in which classification into known words and unknown words is performed and known words and unknown words are stored separately based on a learning level of a learner and the level-based word DB, and performs control such that the words included in the word list are classified into known words and unknown words and are stored separately based on the set learning level.	07-12-2012
20120185239	SYSTEMS AND METHODS FOR AN AUTOMATED PERSONALIZED DICTIONARY GENERATOR FOR PORTABLE DEVICES - A system and method for automated dictionary population is provided to facilitate the entry of textual material in dictionaries for enhancing word prediction. The automated dictionary population system is useful in association with a mobile device including at least one dictionary which includes entries. The device receives a communication which is parsed and textual data extracted. The text is compared to the entries of the dictionaries to identify new words. Statistical information for the parsed words, including word usage frequency, recency, or likelihood of use, is generated. Profanities may be processed by identifying profanities, modifying the profanities, and asking the user to provide feedback. Phrases are identified by phrase markers. Lastly, the new words are stored in a supplementary word list as single words or by linking the words of the identified phrases to preserve any phrase relationships. Likewise, the statistical information may be stored.	07-19-2012
20120209595	DICTIONARY INFORMATION DISPLAY DEVICE AND DICTIONARY INFORMATION DISPLAY METHOD - A dictionary information display device displays explanatory information on a desired search word retrieved from dictionary data on a main display module, creates list data about examples, descriptions, and phrases incidental to each word meaning included in the explanatory information, and displays, on an auxiliary display module, a supplementary information list screen that arranges the beginning parts of the individual examples, descriptions, and phrases line by line in headline form. When the user selects any one of the examples, descriptions, and phrases on the list screen displayed by the auxiliary display module, the entire contents of the selected example or description are displayed by the auxiliary display module.	08-16-2012
20120303359	DICTIONARY CREATION DEVICE, WORD GATHERING METHOD AND RECORDING MEDIUM - When gathering words through a dictionary growth process, a dictionary growth unit (	11-29-2012
20120323565	METHOD AND APPARATUS FOR ANALYZING TEXT - An apparatus, a method, an applications programming interface and a computer program product for analyzing text. The text is transmitted between users of a text based network mediated system. The text is analyzed by intended word filter rule processing elements to determine a presence of a variation word of an intended word in the text. A method for creating the intended word filter rule processing elements is also disclosed.	12-20-2012
20130080155	APPARATUS AND METHOD FOR CREATING DICTIONARY FOR SPEECH SYNTHESIS - Apparatus for creating a dictionary for speech synthesis includes a sentence storage unit configured to store N sentences, a sentence display unit configured to selectively display a first sentence which is one of the N sentences, a recording unit configured to record each user speech, a necessity determination unit configured to make a determination of whether to create the dictionary, a dictionary creation unit configured to create the dictionary by utilizing the user speech, and a speech synthesis unit configured to convert a second sentence to a synthesized speech with the dictionary. The determination unit makes the determination under a condition that the recording unit records the user speech of M first sentences (M is less than N) and the determination is based on at least one of an instruction from the user, M and an amount of the recorded user speech.	03-28-2013
20130080156	VOICE RECOGNITION APPARATUS, METHOD, AND COMPUTER PROGRAM PRODUCT - In an embodiment, a voice recognition apparatus includes: a program information storage unit; a dictionary storage unit; a calculating unit; an updating unit; a receiving unit; a recognizing unit; and an operation control unit. The program information storage unit stores metadata of a broadcast program with a user's viewing state. The dictionary storage unit stores a recognition dictionary including a recognition word and a priority of the recognition word. The calculating unit calculates a first score of a degree of the user's preference on a feature word based on the metadata and the viewing state. The updating unit updates the priority of the recognition word including the feature word according to the first score. The recognizing unit recognizes a voice using the recognition dictionary. The operation control unit controls an operation on the broadcast program based on a recognition result.	03-28-2013
20130085747	System, Method and Computer-Readable Storage Device for Providing Cloud-Based Shared Vocabulary/Typing History for Efficient Social Communication - An input method editor (IME) is associated with a local user. Memory stores local data and a processor, coupled to the memory, is configured to receive input from a local, first user, obtain shared data associated with at least a remote, second user from a remote server and generate prediction candidates and conversion candidates based on the input provided by the local, first user and correlation of the input and the obtained shared data.	04-04-2013
20130090921	PRONUNCIATION LEARNING FROM USER CORRECTION - Systems and methods are described for adding entries to a custom lexicon used by a speech recognition engine of a speech interface in response to user interaction with the speech interface. In one embodiment, a speech signal is obtained when the user speaks a name of a particular item to be selected from among a finite set of items. If a phonetic description of the speech signal is not recognized by the speech recognition engine, then the user is presented with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item. After the user has selected the particular item via the means for selecting, the phonetic description of the speech signal is stored in association with a text description of the particular item in the custom lexicon.	04-11-2013
20130132073	SYSTEMS AND METHODS OF BUILDING AND USING CUSTOM WORD LISTS - Standard word lists that are often used for such operations as predictive text, spell checking, and word completion are based on general linguistic data that might not accurately reflect actual text usage patterns of particular users. Systems and methods of building and using a custom word list for use in text operations on an electronic device are provided. A collection of text items associated with a user of the electronic device is scanned to identify words in the text items. A weighting is then assigned to each identified word, and the words and corresponding weightings are stored.	05-23-2013
20130158987	SYSTEM AND METHOD FOR DYNAMICALLY GENERATING GROUP-RELATED PERSONALIZED DICTIONARIES - A user device communicates with a network server that has access to one or more knowledge sources. Based on a current situational context for a user of the device, the network server dynamically generates a group-related personalized dictionary using information retrieved from the knowledge sources and provides the dictionary to the user device. Applications executing on the user device can then use the dictionary to suggest or predict words, terms, or symbols to the user in response to receiving user input.	06-20-2013
20130179154	SPEECH RECOGNITION APPARATUS - A speech recognition apparatus includes a first recognition dictionary, a speech input unit, a speech recognition unit, a speech transmission unit, a recognition result receipt unit, and a control unit. The speech recognition unit recognizes a speech based on a first recognition dictionary, and outputs a first recognition result. A server recognizes the speech based on a second recognition dictionary, and outputs a second recognition result. The control unit determines a likelihood level of a selected candidate obtained based on the first recognition result, and accordingly controls an output unit to output at least one of the first recognition result and the second recognition result. When the likelihood level of the selected candidate is equal to or higher than a threshold level, the control unit controls the output unit to output the first recognition result irrespective of whether the second recognition result is received from the server.	07-11-2013
20130179155	SYSTEM AND METHOD FOR ENHANCED LOOKUP IN AN ONLINE DICTIONARY - A system and method predictively generates words based on a user input, according to a frequency of lookup of each of the generated words. The system and method also allows for a user to add predictively generated words to a word list that assists in the facilitation of word and vocabulary comprehension for a user. Words in the online dictionary are grouped in word families where a user can navigate between different forms of a root word.	07-11-2013
20130197901	SYSTEMS AND METHODS FOR AN AUTOMATED PERSONALIZED DICTIONARY GENERATOR FOR PORTABLE DEVICES - A system and method for automated dictionary population is provided to facilitate the entry of textual material in dictionaries for enhancing word prediction. The automated dictionary population system is useful in association with a mobile device including at least one dictionary which includes entries. The device receives a communication which is parsed and textual data extracted. The text is compared to the entries of the dictionaries to identify new words. Statistical information for the parsed words, including word usage frequency, recency, or likelihood of use, is generated. Profanities may be processed by identifying profanities, modifying the profanities, and asking the user to provide feedback. Phrases are identified by phrase markers. Lastly, the new words are stored in a supplementary word list as single words or by linking the words of the identified phrases to preserve any phrase relationships. Likewise, the statistical information may be stored.	08-01-2013
20130211824	Single Identity Customized User Dictionary - In one embodiment, constructing a set of customized dictionaries for a particular user, each of the customized dictionaries in the set comprising a different blending of one or more frequently used words collected from texts submitted by one or more users; and sending a copy of the set of customized dictionaries to each of a plurality of electronic devices associated with the particular user to be stored on the electronic device and to aid the particular user in inputting text to the electronic device.	08-15-2013
20130211825	Creating Customized User Dictionary - In one embodiment, collecting a plurality of words from texts submitted by one or more users; for each of a plurality of communication categories, determining a usage frequency of each of one or more of the words within the communication category based on the texts; and constructing one or more customized dictionaries that each comprise a different blending of selected words.	08-15-2013
20130238322	ELECTRONIC DEVICE WITH A DICTIONARY FUNCTION AND DICTIONARY INFORMATION DISPLAY METHOD - An electronic device with a dictionary function includes a dictionary data memory, a dictionary search section which makes a dictionary search on the basis of a user operation, a series process storage section which stores data items representing the ones in a series of dictionary search processes performed at the dictionary search section, a detection section which detects an instruction to point in a direction given by a user operation, and a series process reproduction section which reads the series of processed data items stored in the series process storage section in response to the detection of an instruction to point in a specific direction at the detection section and displays a display screen corresponding to each of the processes on a display section.	09-12-2013
20130275125	AUTOMATED GLOSSARY CREATION - A method for creating a glossary of terms, the method including identifying data labels in a computer-based document set, tracing any of the data labels to a computer-readable data source external to the computer-based document set, identifying as synonyms different data labels that are traceable to the same data source, and storing any of the data labels in a computer-readable data store.	10-17-2013
20130297295	Script Detection Service - Script detection service techniques are described. In an implementation, a corpora of text is analyzed to determine which strings in the corpora of text are to be included in a targeted dictionary that is usable for language detection services. The targeted dictionary is populated with strings that are individually associated with a human language. The strings include individual text characters associated with values that correspond to a particular subset of values in a table that associates subsets of values with individual human writing systems.	11-07-2013
20130311172	HANDHELD ELECTRONIC DEVICE AND METHOD EMPLOYING LOGICAL PROXIMITY OF CHARACTERS IN SPELL CHECKING - An improved handheld electronic device and associated method employing an improved spell checking routine enable proposed spelling corrections having a close logical proximity to an active input to be output at a position of preference for easy selection by the user. By way of example, a base character and the various accented forms thereof can be said to have a logical proximity to one another that is closer than their logical proximity to any character having a different base character, whether additionally having a diacritical element or not.	11-21-2013
20130325445	METHOD FOR GENERATING TEXT THAT MEETS SPECIFIED CHARACTERISTICS IN A HANDHELD ELECTRONIC DEVICE AND A HANDHELD ELECTRONIC DEVICE INCORPORATING THE SAME - Incoming e-mails, instant messages, SMS, and MMS, are scanned for new language objects such as words, abbreviations, text shortcuts and, in appropriate languages, ideograms, that are placed in a list for use by a text input process of a handheld electronic device to facilitate the generation of text.	12-05-2013
20130332146	High Speed Large Scale Dictionary Matching - A mechanism is provided for dictionary matching. The mechanism loads a plurality of dictionary memory arrays with a set of dictionary words and updates a plurality of status arrays. Each status array of the plurality of status arrays corresponds to a respective one of the plurality of dictionary memory arrays. Each entry of a given status array stores a status bit, which indicates whether a corresponding entry of the corresponding dictionary memory array stores a valid dictionary word. The mechanism receives an input data word and generates a hash value based on the input data word. The mechanism reads a dictionary word from each of the dictionary memory arrays and a status bit from each of the status arrays using the hash value as a read address. The mechanism determines whether a dictionary memory array within the plurality of dictionary memory arrays stores a valid dictionary word that matches the input data word.	12-12-2013
20140019124	USING INFORMATION BANNERS TO COMMUNICATE WITH USERS OF ELECTRONIC DICTIONARIES - In one embodiment, computer-implemented systems and methods related to electronic dictionary systems are provided including: storing statistical information representing user interactions with the dictionary system over a period of time and electronically analyzing the statistical information so as to determine a customized message to a user. The customized message may be provided for display as part of a user interface comprising at least one field for entering a dictionary query, at least one field for providing dictionary results, and at least one field for customized user messages.	01-16-2014
20140032210	APPARATUS, METHOD AND COMPUTER READABLE MEDIUM FOR A MULTIFUNCTIONAL INTERACTIVE DICTIONARY DATABASE FOR REFERENCING POLYSEMOUS SYMBOL SEQUENCES - An embodiment of the present application is directed to an apparatus, method, and or computer readable medium for effectively storing an interactive dictionary database in a memory. The interactive dictionary database includes a plurality of symbol sequences, each of the plurality of symbol sequences including at least one polysemous symbol and each of the plurality of symbol sequences being stored in association with at least one word, sentence, phoneme, message, letter, number, morpheme, command and/or phrase. The method includes providing, in the interactive dictionary database, information useable to assign at least a subset of the plurality of symbol sequences at least one of an active and an inactive status.	01-30-2014
20140136190	SYSTEMS AND METHODS OF BUILDING AND USING CUSTOM WORD LISTS - Standard word lists that are often used for such operations as predictive text, spell checking, and word completion are based on general linguistic data that might not accurately reflect actual text usage patterns of particular users. Systems and methods of building and using a custom word list for use in text operations on an electronic device are provided. A collection of text items associated with a user of the electronic device is scanned to identify words in the text items. A weighting is then assigned to each identified word, and the words and corresponding weightings are stored.	05-15-2014
20140142925	SELF-ORGANIZING UNIT RECOGNITION FOR SPEECH AND OTHER DATA SERIES - An approach automated processing for audio or other data series or signals, which is applicable where little or no transcribed training data is available, makes uses identification of self-organizing units (SOUs) in conjunction with automated creation of, or augmentation of an existing dictionary, with “pseudo-words” or tokens represented in terms of the SOUs. In some examples, the dictionary is iteratively updated (e.g., augmented) during training, optionally with updating of models of the SOUs during the iteration.	05-22-2014
20140142926	TEXT PREDICTION USING ENVIRONMENT HINTS - Provided are techniques for text prediction using environment hints. A list of words is received, wherein each word in the list of words has an associated weight. For at least one word in the list of words, an environment weight is obtained from an environment dictionary. The associated weight of the at least one word is updated using the obtained environment weight. The words in the list of words are ordered based on the updated, associated weight of each of the words.	05-22-2014
20140172418	CUSTOM DICTIONARIES FOR E-BOOKS - A custom dictionary is generated for an e-book. A dictionary management system receives a custom dictionary request from a user client operated by a user, the custom dictionary request identifying the e-book and including dictionary management information describing the user. The dictionary management system chooses a group reader profile that has an associated group reading score for the user based on the dictionary management information and candidate words are identified in the identified e-book for inclusion in the custom dictionary. The dictionary management system selects words for inclusion in the custom dictionary from among the candidate words responsive to the associated group reading score for the chosen group reading profile. The dictionary management system generates the custom dictionary using the selected words, and provides the generated custom dictionary to the user client	06-19-2014
20140180678	ENTERPRISE CONCEPT DEFINITION MANAGEMENT - A system for managing an enterprise concept dictionary may include an electronic master dictionary and electronic local dictionaries. The master dictionary may include concept entries respectively associated with concept identifiers and with one or more concept definitions. The local dictionaries may include one or more of the concept identifiers of the master dictionary. A dictionary management module may be in signal communication with the master dictionary and the local dictionaries. The dictionary management module may be configured to query the master dictionary for a concept entry that corresponds to a concept associated with a modeling component. If a concept entry is found, the concept identifier may be provided. If a concept entry is not found, a new concept entry may be added to the master dictionary. A notification module may be in signal communication with the master dictionary and automatically provide notification when a concept entry is added or updated.	06-26-2014
20140180679	Method and system for text compression and decompression - The present invention is to provide a method and system for compression and decompression text comprising: creating a redundant universal permanent reference vocabulary which include commonest symbols utilized by all application, and symbols found in thousands of books, specific, professional vocabularies and is created in advance of any information processing, own vocabulary containing during process of text compression and decompression wherein the own vocabulary includes words and symbols e.g. slangs are found in written conversation between persons; splitting the universal vocabulary into a root of tree symbol/index and main symbol sections; creating a first temporary vocabulary; wherein the first temporary vocabulary includes commonest symbol/index utilizing by all applications, root of tree word/index section, and merging index table section to words content in specific vocabulary section; creating a second temporary vocabulary for repeating symbols found in the source text and not found in the first temporary vocabulary; creating pseudo-code by merging an indicator with root of tree or main indexes.	06-26-2014
20140180680	DICTIONARY DEVICE, DICTIONARY SEARCH METHOD, DICTIONARY SYSTEM, AND SERVER DEVICE - On a text display screen displayed on a touch panel color display unit, after a plurality of desired words are specified by a touch operation and it is detected that the touched points are moved downward, an example sentence including each of the specified words is searched for in dictionary data corresponding to the character type of each of the words, and displayed. When it is detected that the touched points are moved upward, a phrase including each of the specified words is searched for in the dictionary data corresponding to the character type of each of the words, and displayed.	06-26-2014
20140222418	FIXED STRING DICTIONARY - The subject matter described herein relates to implementation of a dictionary in a column-based, in-memory database where values are not stored directly, rather, for each column, a dictionary is created with all distinct values. For each row, a reference to the corresponding value in the dictionary is stored. In one aspect, data is stored in a memory structure organized in a column store format defined by a plurality of columns and a plurality of rows. A dictionary for each column in the memory structure is generated. The dictionary has distinct values for each column. A reference to the dictionary is generated for each column in the memory structure. The dictionary and the reference to the dictionary are stored in the memory structure.	08-07-2014
20140222419	Automated Ontology Development - Systems and methods of automated ontology development include a corpus of communication data. The corpus of communication data includes communication data from a plurality of interactions and is processed. A plurality of terms are extracted from the corpus. Each term of the plurality is a plurality of words that identify a single concept within the corpus. An ontology is automatedly generated from the extracted terms.	08-07-2014
20140288923	Semantic Application Logging and Analytics - A system for managing dictionaries, such as an application dictionary and a domain dictionary, and for adding entries to a data log is described herein. The system may, in response to a determination that an event occurs at an application, determine that the application uses a first concept name from the application dictionary to describe the event. An entry for the event may be added to a data log for the application. The entry may also include the first concept name from the application dictionary. A mapping of the first concept name from the application dictionary to a second concept name from the domain dictionary may be generated. In some aspects, the data log may be sent to a data log analysis system capable of accessing the domain dictionary.	09-25-2014
20140288924	SYSTEMS AND METHODS FOR AN AUTOMATED PERSONALIZED DICTIONARY GENERATOR FOR PORTABLE DEVICES - A system and method for automated dictionary population is provided to facilitate the entry of textual material in dictionaries for enhancing word prediction. The automated dictionary population system is useful in association with a mobile device including at least one dictionary which includes entries. The device receives a communication which is parsed and textual data extracted. The text is compared to the entries of the dictionaries to identify new words. Statistical information for the parsed words, including word usage frequency, recency, or likelihood of use, is generated. Profanities may be processed by identifying profanities, modifying the profanities, and asking the user to provide feedback. Phrases are identified by phrase markers. Lastly, the new words are stored in a supplementary word list as single words or by linking the words of the identified phrases to preserve any phrase relationships. Likewise, the statistical information may be stored.	09-25-2014
20140303964	System and method for generating ethnic and cultural emoticon language dictionaries - A computer-implemented system and method for developing ethnic and cultural emoticons that are downloadable or uploadable to smart devices or devices, such as laptops, smartphones, and tablet devices, for fast and efficient communications between smart device or other users is disclosed. The computer-implemented system and method also provides for updating cultural or ethnic dictionaries on a periodic basis to reflect the changing nature of language being used by ethnic and cultural groups so that effective communications can be carried out as these changes take place. The computer-implemented system and method include at least a system server connected to the Internet or similar wireless network and one or more databases connected to the system server that will store the ethnic and cultural dictionaries.	10-09-2014
20150012264	DICTIONARY GENERATION DEVICE, DICTIONARY GENERATION METHOD, DICTIONARY GENERATION PROGRAM AND COMPUTER-READABLE RECORDING MEDIUM STORING SAME PROGRAM - A dictionary generation device according to one embodiment includes a determination unit configured to (A) refer to an item database that stores a plurality of records containing an item name/item description including a noun sequence, an item category, and a shop selling the item as fields and determine whether the noun sequence included in the item name/item description of each record is set corresponding to the item category, (B) count the number of selling shops in a record containing the noun sequence for each item category and calculate a shop intensity of each noun sequence based on the counted number of selling shops, (C) determine whether one item category uniquely derived from the noun sequence exists based on the shop intensity for each item category, and (D) determine the noun sequence as a definitive category word when the one item category exists.	01-08-2015
20150019211	INTERACTIVE CONCEPT EDITING IN COMPUTER-HUMAN INTERACTIVE LEARNING - A collection of data that is extremely large can be difficult to search and/or analyze. Relevance may be dramatically improved by automatically classifying queries and web pages in useful categories, and using these classification scores as relevance features. A thorough approach may require building a large number of classifiers, corresponding to the various types of information, activities, and products. Creation of classifiers and schematizers is provided on large data sets. Exercising the classifiers and schematizers on hundreds of millions of items may expose value that is inherent to the data by adding usable meta-data. Some aspects include active labeling exploration, automatic regularization and cold start, scaling with the number of items and the number of classifiers, active featuring, and segmentation and schematization.	01-15-2015
20150046155	COGNITIVE NEURO-LINGUISTIC BEHAVIOR RECOGNITION SYSTEM FOR MULTI-SENSOR DATA FUSION - Embodiments presented herein describe techniques for generating a linguistic model of input data obtained from a data source (e.g., a video camera). According to one embodiment of the present disclosure, a sequence of symbols is generated based on an ordered stream of normalized vectors generated from the input data. A dictionary of words is generated from combinations of the ordered sequence of symbols based on a frequency at which combinations of symbols appear in the ordered sequence of symbols. A plurality of phrases is generated based an ordered sequence of words from the dictionary observed in the ordered sequence of symbols based on a frequency by which combinations of words in ordered sequence of words appear relative to one another.	02-12-2015
20150066485	Method and System for Dictionary Noise Removal - A method and system of removing noise from a dictionary using a weighted graph is presented. The method can include mapping, by a noise reducing agent executing on a processor, a plurality of dictionaries to a plurality of vertices of a graphical representation, wherein the plurality of vertices is connected by weighted edges representing noise. The plurality of dictionaries may further comprise a plurality of entries, wherein each entry further comprises a plurality of tokens. The method can include selecting a subset of the weighted edges, constructing an acyclic graphical representation flom the selected subset of weighted edges, and determining an ordering based on the acyclic graphical representation. The selected subset of weighted edges may approximate a solution to the Maximum Acyclic Subgraph problem. The method can include removing noise from the plurality of dictionaries according to the determined ordering.	03-05-2015
20150081281	Using Renaming Directives to Bootstrap Industry-Specific Knowledge and Lexical Resources - Mechanisms are provided for generating a lexical resource for linguistic analysis. A document data structure is received that comprises a renaming directive and filter logic is applied to the document data structure to identify the renaming directive within the document data structure. The renaming directive is analyzed to identify a relationship between semantic concepts represented by the renaming directive that are to be used to update a lexical resource based on the renaming directive. The lexical resource is updated based on results of analyzing the renaming directive. The updated lexical resource is output to a linguistic analysis system which performs linguistic analysis of a portion of textual content based on the updated lexical resource.	03-19-2015
20150088493	PROVIDING DESCRIPTIVE INFORMATION ASSOCIATED WITH OBJECTS - Techniques for providing descriptive information associated with objects may be provided. For example, requests to define an object may be received and monitored. When the number of requests indicates that a definition of the object should be updated, the definition of the object may be obtained and a reference for defining the object may be determined. Information for updating the reference with the definition may be transmitted to one or more associated computing devices.	03-26-2015
20150095022	LIST RECOGNIZING METHOD AND LIST RECOGNIZING SYSTEM - A list recognizing method and system, which comprises: parsing and analyzing metadata information within an original fixed-layout document, and extracting basic elements within a page; segmenting the basic elements, extracting segmented text lines within the page to obtain fragments; building an undirected graph with respect to the fragments; detecting indent features of a bullet according to features of the basic elements; training a learning model according to the indent features, local features of the fragments and neighborhood relation features among the fragments, obtaining model parameters, and establishing a list recognizing model; and invoking the list recognizing model to perform list recognizing on the required document, so as to get recognition result. This machine learning method may recognize not only a list, but also the contextual relationship between the first line and its subsequent lines of a list, and realize analyzing and understanding a layout of the list of the fixed-layout document ultimately. The accuracy of list recognizing on a fixed-layout document can be improved even if the bullets of the first line of the list are various.	04-02-2015
20150100308	Automated Formation of Specialized Dictionaries - A document analysis system analyzes a corpus of documents and automatically generates a dictionary of specialized phrases not already in conventional dictionaries. The dictionary generation process involves a series of operations on the phrases to identify the phrases most suitable for inclusion in a dictionary, such as phrase scoring and phrase clustering. The dictionary generation process also comprises the identification of one or more corresponding definitions for the various phrases identified for inclusion in the specialized dictionary.	04-09-2015
20150106082	System and Method for Learning Alternate Pronunciations for Speech Recognition - A system and method for learning alternate pronunciations for speech recognition is disclosed. Alternative name pronunciations may be covered, through pronunciation learning, that have not been previously covered in a general pronunciation dictionary. In an embodiment, the detection of phone-level and syllable-level mispronunciations in words and sentences may be based on acoustic models trained by Hidden Markov Models. Mispronunciations may be detected by comparing the likelihood of the potential state of the targeting pronunciation unit with a pre-determined threshold through a series of tests. It is also within the scope of an embodiment to detect accents.	04-16-2015
20150127326	SYSTEM FOR ADAPTING SPEECH RECOGNITION VOCABULARY - A system and method for adapting a speech recognition and generation system. The system and method include providing a speech recognition and generation engine that processes speech received from a user and providing a dictionary adaptation module that adds out of vocabulary words to a baseline dictionary of the speech recognition and generation system. Words are added by extracting words that are encountered and adding out of vocabulary words to the baseline dictionary of the speech recognition and generation system.	05-07-2015
20160004687	SYSTEMS AND METHODS FOR FACILITATING SPOTTING OF WORDS AND PHRASES - In accordance with an example embodiment, a system and method for facilitating spotting of words and phrases is disclosed. The system includes a scanning module, a storage module, a computation module, a dictionary generation module and a transceiver module. The scanning module periodically scans a plurality of content sources to identify words and phrases being shared as spots in one or more online communities of remote users. The storage module is configured to store the spots along with information related to the spots. The computation module determines at least one popularity-based metric for each spot stored in the storage module. The dictionary generation module is configured to generate and periodically update a spotting dictionary comprising at least a listing of popular spots based on the at least one popularity-based metric associated with each spot. The transceiver module is configured to provision the spotting dictionary to one or more remote users.	01-07-2016
20160012035	SPEECH SYNTHESIS DICTIONARY CREATION DEVICE, SPEECH SYNTHESIZER, SPEECH SYNTHESIS DICTIONARY CREATION METHOD, AND COMPUTER PROGRAM PRODUCT	01-14-2016
20160012036	WORD DETECTION AND DOMAIN DICTIONARY RECOMMENDATION	01-14-2016
20160026619	METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR DIVIDING A TERM WITH APPROPRIATE GRANULARITY - A method, computer system, and computer program product for dividing a term with appropriate granularity includes extracting an element word specifying granularity from content by parsing, and, if the term includes at least one element word in a part thereof, dividing the term at a position where the at least one element word exists.	01-28-2016
20160041951	CORPUS GENERATION DEVICE, CORPUS GENERATION METHOD AND CORPUS GENERATION PROGRAM - A corpus generation device according to an embodiment includes a web page acquisition unit, a reference word acquisition unit, an attachment unit and an output unit. The web page acquisition unit acquires a web page including description sentence data regarding a presentation target. The reference word acquisition unit acquires a reference word that is an attribute value regarding the presentation target from the web page. The attachment unit extracts a broader word belonging to a layer above the reference word acquired by the reference word acquisition unit from a storage unit that stores hierarchical relationship information indicating a hierarchical relationship between attribute values, and attaches an attribute tag corresponding to the reference word to the broader word included in the description sentence data. The output unit outputs, as corpus data, the description sentence data to which the attribute tag is attached by the attachment unit.	02-11-2016
20160092435	HIGH SPEED DICTIONARY EXPANSION - Embodiments of the present invention relate to a pattern-based system for building dictionaries of terms related to a seed set of terms. In one embodiment, a text is read. The text comprises a plurality of tokens. A first plurality of patterns is read. The first plurality of tokens is searched using the first plurality of patterns to generate a plurality of candidate terms. Each of the plurality of candidate term comprises one or more of the plurality of tokens. A plurality of seed terms is read. Each of the first plurality of patterns is scored based on the plurality of candidate terms and the plurality of seed terms.	03-31-2016
20160104475	SPEECH SYNTHESIS DICTIONARY CREATING DEVICE AND METHOD - According to an embodiment, a speech synthesis dictionary creating device includes a first speech input unit, a second speech input unit, a determining unit, and a creating unit. The first speech input unit receives input of first speech data. The second speech input unit receives input of second speech data which is considered to be appropriate speech data. The determining unit determines whether or not a speaker of the first speech data is the same as a speaker of the second speech data. When the determining unit determines that the speaker of the first speech data is the same as the speaker of the second speech data, the creating unit creates a speech synthesis dictionary using the first speech data and using a text corresponding to the first speech data.	04-14-2016
20160110344	Single identity customized user dictionary - In one embodiment, constructing a set of customized dictionaries for a particular user, each of the customized dictionaries in the set comprising a different blending of one or more frequently used words collected from texts submitted by one or more users; and sending a copy of the set of customized dictionaries to each of a plurality of electronic devices associated with the particular user to be stored on the electronic device and to aid the particular user in inputting text to the electronic device.	04-21-2016
20160116994	PROBABILITY-BASED APPROACH TO RECOGNITION OF USER-ENTERED DATA - A method for entering keys in a small key pad is provided. The method comprising the steps of: providing at least a part of keyboard having a plurality of keys; and predetermining a first probability of a user striking a key among the plurality of keys. The method further uses a dictionary of selected words associated with the key pad and/or a user.	04-28-2016
20160132485	SYSTEM AND METHOD FOR CONSTRUCTING MORPHEME DICTIONARY BASED ON AUTOMATIC EXTRACTION OF NON-REGISTERED WORD - A system and method for constructing a morpheme dictionary based on an automatic extraction of a non-registered word is provided. A non-registered word is automatically extracted based on a language-independent non-registered word automatic extraction method, and performance of a dictionary and a morpheme analysis is verified based on an automatic estimation by constructing a morpheme dictionary based on the automatically extracted non-registered word. Since the morpheme dictionary is constructed using only a dictionary in which a final verification is passed and it is helpful to improve the performance, the morpheme analysis can be properly performed on the non-registered word of a new field or a new word which newly appears as time passes.	05-12-2016
20160132486	Generating a Social Glossary - Particular embodiments determine that a textual term is not associated with a known meaning. The textual term may be related to one or more users of the social-networking system. A determination is made as to whether the textual term should be added to a glossary. If so, then the textual term is added to the glossary. Information related to one or more textual terms in the glossary is provided to enhance auto-correction, provide predictive text input suggestions, or augment social graph data. Particular embodiments discover new textual terms by mining information, wherein the information was received from one or more users of the social-networking system, was generated for one or more users of the social-networking system, is marked as being associated with one or more users of the social-networking system, or includes an identifier for each of one or more users of the social-networking system.	05-12-2016
20160162469	Dynamic Local ASR Vocabulary - Systems and methods for a dynamic local automatic speech recognition (ASR) vocabulary are provided. An example method includes defining a user actionable screen content based on user interactions. At least a portion of the user actionable screen content is labeled. A local vocabulary associated with a local ASR engine is created based partially on the labeling. The local vocabulary includes words associated with functions of a mobile device and is limited by resources of the mobile device. The method includes determining whether speech includes a local key phrase or a cloud-based key phrase. Based on the determination, the method includes performing ASR on the speech using the local ASR engine or forwarding the speech to a cloud-based computing engine and performing ASR therewithin based on the cloud-based computing engine's larger vocabulary.	06-09-2016
20160162470	Semantic Application Logging and Analytics - A system for managing dictionaries, such as an application dictionary and a domain dictionary, and for adding entries to a data log is described herein. The system may, in response to a determination that an event occurs at an application, determine that the application uses a first concept name from the application dictionary to describe the event. An entry for the event may be added to a data log for the application. The entry may also include the first concept name from the application dictionary. A mapping of the first concept name from the application dictionary to a second concept name from the domain dictionary may be generated. In some aspects, the data log may be sent to a data log analysis system capable of accessing the domain dictionary.	06-09-2016
20160170964	LEXICAL ANALYZER FOR A NEURO-LINGUISTIC BEHAVIOR RECOGNITION SYSTEM	06-16-2016
20160180835	USER-AIDED ADAPTATION OF A PHONETIC DICTIONARY	06-23-2016
20160188566	Computer Automated Organization Glossary Generation Systems and Methods - The present disclosure includes techniques pertaining to computer automated learning management systems and methods. In one embodiment, a system is disclosed where information is represented in a learning graph. In one embodiment, a framework may be used to access different algorithms for identifying customized learning content for a user. In another embodiment, the present disclosure includes techniques for analyzing content and incorporating content into an organizational glossary.	06-30-2016
20160253313	UPDATING LANGUAGE DATABASES USING CROWD-SOURCED INPUT	09-01-2016

Patent applications in class Dictionary building, modification, or prioritization

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Dictionary building, modification, or prioritization

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704001000 - LINGUISTICS

Patent class list (only not empty are listed)

Deeper subclasses: