Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Dictionary building, modification, or prioritization

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704001000 - LINGUISTICS

Patent class list (only not empty are listed)

Deeper subclasses:

Entries
DocumentTitleDate
20130211825Creating Customized User Dictionary - In one embodiment, collecting a plurality of words from texts submitted by one or more users; for each of a plurality of communication categories, determining a usage frequency of each of one or more of the words within the communication category based on the texts; and constructing one or more customized dictionaries that each comprise a different blending of selected words.08-15-2013
20130211824Single Identity Customized User Dictionary - In one embodiment, constructing a set of customized dictionaries for a particular user, each of the customized dictionaries in the set comprising a different blending of one or more frequently used words collected from texts submitted by one or more users; and sending a copy of the set of customized dictionaries to each of a plurality of electronic devices associated with the particular user to be stored on the electronic device and to aid the particular user in inputting text to the electronic device.08-15-2013
20110196672VOICE RECOGNITION DEVICE - A voice recognition device is provided with a sentence selecting unit 08-11-2011
20100076751VOICE RECOGNITION SYSTEM - A voice recognition system used for onboard equipment having a genre database (DB) that stores search target vocabularies in accordance with respective genres. It has a mike 03-25-2010
20100076752Automated Data Cleanup - The described implementations relate to automated data cleanup. One system includes a language model generated from language model seed text and a dictionary of possible data substitutions. This system also includes a transducer configured to cleanse a corpus utilizing the language model and the dictionary.03-25-2010
20130080156VOICE RECOGNITION APPARATUS, METHOD, AND COMPUTER PROGRAM PRODUCT - In an embodiment, a voice recognition apparatus includes: a program information storage unit; a dictionary storage unit; a calculating unit; an updating unit; a receiving unit; a recognizing unit; and an operation control unit. The program information storage unit stores metadata of a broadcast program with a user's viewing state. The dictionary storage unit stores a recognition dictionary including a recognition word and a priority of the recognition word. The calculating unit calculates a first score of a degree of the user's preference on a feature word based on the metadata and the viewing state. The updating unit updates the priority of the recognition word including the feature word according to the first score. The recognizing unit recognizes a voice using the recognition dictionary. The operation control unit controls an operation on the broadcast program based on a recognition result.03-28-2013
20130080155APPARATUS AND METHOD FOR CREATING DICTIONARY FOR SPEECH SYNTHESIS - Apparatus for creating a dictionary for speech synthesis includes a sentence storage unit configured to store N sentences, a sentence display unit configured to selectively display a first sentence which is one of the N sentences, a recording unit configured to record each user speech, a necessity determination unit configured to make a determination of whether to create the dictionary, a dictionary creation unit configured to create the dictionary by utilizing the user speech, and a speech synthesis unit configured to convert a second sentence to a synthesized speech with the dictionary. The determination unit makes the determination under a condition that the recording unit records the user speech of M first sentences (M is less than N) and the determination is based on at least one of an instruction from the user, M and an amount of the recorded user speech.03-28-2013
20130085747System, Method and Computer-Readable Storage Device for Providing Cloud-Based Shared Vocabulary/Typing History for Efficient Social Communication - An input method editor (IME) is associated with a local user. Memory stores local data and a processor, coupled to the memory, is configured to receive input from a local, first user, obtain shared data associated with at least a remote, second user from a remote server and generate prediction candidates and conversion candidates based on the input provided by the local, first user and correlation of the input and the obtained shared data.04-04-2013
20080319738WORD PROBABILITY DETERMINATION - A word corpus is identified and a word probability value is associated with each word in the word corpus. A sentence is identified, candidate segmentations of the sentence are determined based on the word corpus, and the associated probability value for each word in the word corpus is iteratively adjusted based on the probability values associated with the words and the candidate segmentations.12-25-2008
20090187401HANDHELD ELECTRONIC DEVICE AND ASSOCIATED METHOD FOR OBTAINING NEW LANGUAGE OBJECTS FOR A TEMPORARY DICTIONARY USED BY A DISAMBIGUATION ROUTINE ON THE DEVICE - A method of obtaining data for use on a handheld electronic device having a text disambiguation function which employs a dictionary including a number of first language objects that is usable by the text disambiguation function. The method includes receiving a block of information, displaying a representation of at least a portion of the block of information, and determining whether at least one of one or more predetermined events has occurred, wherein each of the events have been deemed to indicate that the information includes language objects that is accessible for use in disambiguating ambiguous inputs. If it is determined that at least one of the events has occurred, the method further includes obtaining a number of second language objects from the block of information and storing the number of second language objects in a temporary dictionary included in the memory and usable by the text disambiguation function.07-23-2009
20100042405RELATED WORD PRESENTATION DEVICE - A related word presentation device (02-18-2010
20090157390Method and Apparatus for Discovering and Classifying Polysemous Word Instances in Web Documents - A method and apparatus for discovering polysemous words and classifying polysemous words found in web documents. All document corpi in any natural language have words that have multiple usage contexts or words that have multiple meanings. Semantic analysis is not feasible for classifying all word occurrences in all documents on the web, which contain trillions of words in total. In addition, semantic analysis typically cannot distinguish multiple usages of a given meaning of a given word. In one embodiment of this invention, polysemous words in natural languages can be discovered by analyzing the co-occurrence of other words with the polysemous word in web documents. In one embodiment, the multiple meanings and usages of a polysemous word can be determined by analyzing the co-occurrences of other words with the polysemous word. No semantic analysis is used in discovering or classifying polysemous words.06-18-2009
20120185239SYSTEMS AND METHODS FOR AN AUTOMATED PERSONALIZED DICTIONARY GENERATOR FOR PORTABLE DEVICES - A system and method for automated dictionary population is provided to facilitate the entry of textual material in dictionaries for enhancing word prediction. The automated dictionary population system is useful in association with a mobile device including at least one dictionary which includes entries. The device receives a communication which is parsed and textual data extracted. The text is compared to the entries of the dictionaries to identify new words. Statistical information for the parsed words, including word usage frequency, recency, or likelihood of use, is generated. Profanities may be processed by identifying profanities, modifying the profanities, and asking the user to provide feedback. Phrases are identified by phrase markers. Lastly, the new words are stored in a supplementary word list as single words or by linking the words of the identified phrases to preserve any phrase relationships. Likewise, the statistical information may be stored.07-19-2012
20100049504MEASURING TOPICAL COHERENCE OF KEYWORD SETS - Methods and apparatus are described for measuring the topical coherence of a keyword set while simultaneously partitioning the set into contextually related clusters.02-25-2010
20090089048Two-Pass Hash Extraction of Text Strings - Data compression and key word recognition may be provided. A first pass may walk a text string, generate terms, and calculate a hash value for each generated term. For each hash value, a hash bucket may be created where an associated occurrence count may be maintained. The hash buckets may be sorted by occurrence count and a few top buckets may be kept. Once those top buckets are known, a second pass may walk the text string, generate terms, and calculate a hash value for each term. If the hash values of terms match hash values of one of the kept buckets, then the term may be considered a frequent term. Consequently, the term may be added to a dictionary along with a corresponding frequency count. Then, the dictionary may be examined to remove terms that may not be frequent, but appeared due to hash collisions.04-02-2009
20090271181DICTIONARY FOR TEXTUAL DATA COMPRESSION AND DECOMPRESSION - A dictionary for compressing and decompressing textual data has a number of keys. Each key is associated with an identifier. The keys include static word or phrase keys, where each static word or phrase key lists one or more unchanging words in a particular order. The keys further include dynamic phrase keys, where each dynamic phrase key lists a number of words and one or more placeholders in a particular order, and each placeholder denotes a place where a word or phrase other than the words of the dynamic phrase key is to be inserted. At least one of the dynamic phrase keys may identify one or more of the words by identifiers for corresponding static words or phrase keys. At least one of the static word or phrase keys may identify one or more of the words by identifiers for corresponding other static words or phrase keys.10-29-2009
20090271180DICTIONARY FOR TEXTUAL DATA COMPRESSION AND DECOMPRESSION - A dictionary for compressing and decompressing textual data has a number of keys. Each key is associated with an identifier. The keys include static word or phrase keys, where each static word or phrase key lists one or more unchanging words in a particular order. The keys further include dynamic phrase keys, where each dynamic phrase key lists a number of words and one or more placeholders in a particular order, and each placeholder denotes a place where a word or phrase other than the words of the dynamic phrase key is to be inserted. At least one of the dynamic phrase keys may identify one or more of the words by identifiers for corresponding static words or phrase keys. At least one of the static word or phrase keys may identify one or more of the words by identifiers for corresponding other static words or phrase keys.10-29-2009
20120101811PREDICTIVE TEXT DICTIONARY POPULATION - A method and system for populating a predictive text dictionary is provided. A connection between a handheld electronic device and a network is detected. The handheld electronic device is operable to allow a user to enter text. The handheld electronic device has a predictive text dictionary that is operable to receive and employ sets of words. User preferences for the handheld electronic device are retrieved. The predictive text dictionary of the handheld electronic device is populated with a set of words at least partially based on the user preferences.04-26-2012
20080215313Speech and Textual Analysis Device and Corresponding Method - A speech and textual analysis device and method for forming a search and/or classification catalog. The device is based on a linguistic database and includes a taxonomy table containing variable taxon nodes. The speech and textual analysis device includes a weighting module, a weighting parameter being additionally assigned to each taxon node to register the recurrence frequency of terms in the linguistic and/or textual data that is to be classified and/or sorted. The speech and/or textual analysis device includes an integration module for determining a predefinable number of agglomerates based on the weighting parameters of the taxon nodes in the taxonomy table and at least one neuronal network module for classifying and/or sorting the speech and/or textual data based on the agglomerates in the taxonomy table.09-04-2008
20100145680METHOD AND APPARATUS FOR SPEECH RECOGNITION USING DOMAIN ONTOLOGY - A speech recognition method using a domain ontology includes: constructing domain ontology DB; forming a speech recognition grammar using the formed domain ontology DB; extracting a feature vector from a speech signal; modeling the speech signal using an acoustic model. The method performs speech recognition by using the acoustic model, the speech recognition dictionary and the speech recognition grammar on the basis of the feature vector.06-10-2010
20100036655PROBABILITY-BASED APPROACH TO RECOGNITION OF USER-ENTERED DATA - A method for entering keys in a small key pad is provided. The method comprising the steps of: providing at least a part of keyboard having a plurality of keys; and predetermining a first probability of a user striking a key among the plurality of keys. The method further uses a dictionary of selected words associated with the key pad and/or a user.02-11-2010
20090313008Information apparatus for use in mobile unit - An information apparatus for use in mobile unit, which is mounted on a mobile unit, includes at least a broadcast receiver 12-17-2009
20100114564DYNAMIC UPDATE OF GRAMMAR FOR INTERACTIVE VOICE RESPONSE - A device provides a question to a user, and receives, from the user, an unrecognized voice response to the question. The device also provides the unrecognized voice response to an utterance agent for determination of the unrecognized voice response without user involvement, and provides an additional question to the user prior to receiving the determination of the unrecognized voice response from the utterance agent.05-06-2010
20090248401System and Methods For Using Short-Hand Interpretation Dictionaries In Collaboration Environments - A method for creating and using a short-hand interpretation dictionary in a collaboration environment includes creating or editing a document in a collaboration environment, said document comprising at least one short-hand notation; and replacing the at least one short-hand notation with an interpretation from at least one short-hand dictionary.10-01-2009
20090287476METHOD AND SYSTEM FOR EXTRACTING INFORMATION FROM UNSTRUCTURED TEXT USING SYMBOLIC MACHINE LEARNING - A method (and structure) of extracting information from text, includes parsing an input sample of text to form a parse tree and using user inputs to define a machine-labeled learning pattern from the parse tree.11-19-2009
20100138217METHOD FOR CONSTRUCTING CHINESE DICTIONARY AND APPARATUS AND STORAGE MEDIA USING THE SAME - A method for constructing a Chinese dictionary is disclosed, including determining a probability for nominalization of a Chinese term with a given collocation term according to a determination rule and the correlation between the Chinese term and its corresponding collocations, wherein the Chinese term is determined to be a verb part-of-speech. The method further includes modifying the verb part-of-speech of the Chinese term with the given collocation term to an appropriate part-of-speech when the probability for nominalization of the Chinese term with the given collocation term is higher than a predetermined value, and storing the correlation between the Chinese term, the given collocation term and the appropriate part-of-speech in a storage device.06-03-2010
20110208513SPLITTING A CHARACTER STRING INTO KEYWORD STRINGS - Systems and methods of the present invention provide for the word splitting and reliability score for an entered character string. A list of keywords may be extracted from the character string entered into a user interface on a client. These keywords may be compared to potential matches in a dictionary database and a reliability score for word splits and keywords strings may be compiled and displayed to the user. The client may also display the reliability score using a plurality of logical groupings within a reliability score process.08-25-2011
20080281583CONTEXT-DEPENDENT PREDICTION AND LEARNING WITH A UNIVERSAL RE-ENTRANT PREDICTIVE TEXT INPUT SOFTWARE COMPONENT - A system and method for supporting predictive text entry in software applications by sharing a common, predictive, software text-entry widget within a consumer device across multiple software applications and input contexts. The method comprises: a software application invoking an instance of a text-entry widget in a particular input context, the application optionally providing the widget a description of allowed symbols and a dictionary of expected symbol strings associated with the current context, the widget modifying a virtual keyboard display and predictive algorithm data according to the allowed symbols and dictionary, the user entering text via the widget, the widget returning the entered text to the application, and the application optionally including information derived from entered text in the associated dictionary to enhance the predictive capability of the widget on future invocations.11-13-2008
20080281582Input system for mobile search and method therefor - An input system for mobile search and a method therefor are provided. The input system includes an input module receiving a code input for a specific term and a voice input corresponding thereto, a database including a glossary and an acoustic model, wherein the glossary includes a plurality of terms and a sequence list, and each of the terms has a search weight based on an order of the sequence list, a process module selecting a first number of candidate terms from the glossary according to the code input by using an input algorithm and obtaining a second number of candidate terms by using a speech recognition algorithm to compare the voice input with the first number of candidate terms via the acoustic model, wherein the second number of candidate terms are listed in a particular order based on their respective search weights, and an output module showing the second number of candidate terms in the particular order for selecting the specific term therefrom.11-13-2008
20090265163SYSTEMS AND METHODS TO ENABLE INTERACTIVITY AMONG A PLURALITY OF DEVICES - Methods and systems to exchange and display data among a plurality of devices in response to one or more of user input and context-based information. User input may include one or more of motion, speech, text, pointing, and touch-selecting. Context-based information may include one or more of user location, which may be relative to one or more devices, background audio, information related to one or more products and/or services, and user-based context information. User context-based information may correspond one or more of prior transactions, prior activities, prior content exposure, and demographic information. Also disclosed herein are methods and systems to correlate user speech to one or more of commands and data objects, with respect to context-based information. Methods and systems to recognize speech may be implemented in combination with methods and systems to exchange and/or display of data among a plurality of devices, and in other environments.10-22-2009
20080312911DICTIONARY WORD AND PHRASE DETERMINATION - Context signals in documents are identified, characters bounded by the context signals are identified, one or more candidate words defined by the characters bounded by the context signals are identified, and one or more of the candidate words are added to an input method editor dictionary.12-18-2008
20080312910DICTIONARY WORD AND PHRASE DETERMINATION - Candidate words in search queries are identified, each candidate word including one or more consecutive characters. For each candidate word, a first count is determined, the first count representing a number of times that the candidate word is the only word in the search queries, and a second count is determined, the second count representing a number of times that the candidate word and one or more other words are included in each of the search queries. One or more of the candidate words are added to an input method editor dictionary based on a relationship between the first count and the second count.12-18-2008
20080270122System for handling novel words in a spellchecking module - A system for adding words to an online dictionary used for spellchecking is described. A spellchecker module compares words of an electronic document with words in the online dictionary and identifies a word in the electronic document that is missing from the dictionary. After a user indicates a desire to add the missing word to the dictionary, the spellchecker module determines at least one related-word form of the missing word. The related-word forms depend upon the part of speech of the missing word. The spellchecker can prompt the user to identify the part of speech and then to verify each determined related-word form. The spellchecker concurrently adds the missing word and at least one related-word form of the missing word to the online dictionary in a single ‘add-to-dictionary’ operation.10-30-2008
20090144052METHOD AND SYSTEM FOR PROVIDING CONVERSATION DICTIONARY SERVICES BASED ON USER CREATED DIALOG DATA - A method for providing a conversation dictionary service includes the steps of: (a) receiving a request for editing conversation expressions from a user; (b) providing the user with a format page for editing the expressions; (c) creating expression data by connecting the information inputted in a conversation entry window and a situation entry window with identification information on the user; and (d) recording the expression data in an expression database. The service can make a user edit conversation expressions and expanding the coverage of the conversation expressions, thereby providing more various examples reflecting the newly or uniquely used conversation expressions in a daily life.06-04-2009
20080306731ELECTRONIC EQUIPMENT EQUIPPED WITH DICTIONARY FUNCTION - Disclosed is an electronic dictionary including a plurality of dictionary databases. In a twin retrieval using two dictionaries, a sub-dictionary according to a main dictionary is set besides the main dictionary. When a user inputs a retrieval character string, the electronic dictionary retrieves corresponding headwords from the main dictionary to produce a headword list and displays it. Furthermore, the electronic dictionary displays explanation information pertaining to one headword in the headword list. In addition to this, the electronic dictionary reads the explanation information corresponding to the headword of the main dictionary specified with a cursor from a sub-dictionary database to display the explanation information besides that of the main dictionary. If the specification of a headword of the main dictionary is changed by moving the cursor, then the electronic dictionary re-retrieves the changed headword from the sub-dictionary, and changes the explanation information of the sub-dictionary to display it.12-11-2008
20090144051METHOD OF PROVIDING PERSONAL DICTIONARY - Disclosed is a method of using a computerized dictionary for building an electronic word list. The method can include accessing a computerized dictionary, opening an entry of the dictionary for a word, the entry comprising a listing of a plurality of descriptions of the word, selecting a first one of the plurality of descriptions, and causing to store the first description or a portion thereof in an electronic word list of a user.06-04-2009
20090326927ADAPTIVE GENERATION OF OUT-OF-DICTIONARY PERSONALIZED LONG WORDS - A system is provided, including a display unit, a memory unit, and a processor. The processor is configured to calculate a mutual information value between a first chunk and a second chunk, and to add a new word to a language unit when a condition involving the mutual information value is satisfied. The new word is a combination of the first chunk and the second chunk. The processor is also configured to add the new word into an n-gram store. The n-gram store includes a plurality of n-grams and associated frequency or count information. The processor is also configured to alter the frequency or count information based on the new word.12-31-2009
20090063135Handheld Electronic Device and Method Employing Logical Proximity of Characters in Spell Checking - An improved handheld electronic device and associated method employing an improved spell checking routine enable proposed spelling corrections having a close logical proximity to an active input to be output at a position of preference for easy selection by the user. By way of example, a base character and the various accented forms thereof can be said to have a logical proximity to one another that is closer than their logical proximity to any character having a different base character, whether additionally having a diacritical element or not.03-05-2009
20090055168Word Detection - Methods, systems, and apparatus, including computer program products, in which data from web documents are partitioned into a training corpus and a development corpus are provided. First word probabilities for words are determined for the training corpus, and second word probabilities for the words are determined for the development corpus. Uncertainty values based on the word probabilities for the training corpus and the development corpus are compared, and new words are identified based on the comparison.02-26-2009
20120078617System and Method for Increasing Recognition Rates of In-Vocabulary Words By Improving Pronunciation Modeling - The present disclosure relates to systems, methods, and computer-readable media for generating a lexicon for use with speech recognition. The method includes receiving symbolic input as labeled speech data, overgenerating potential pronunciations based on the symbolic input, identifying potential pronunciations in a speech recognition context, and storing the identified potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word. Speech data can be labeled explicitly or implicitly and can include words as text and recorded audio.03-29-2012
20090204392Communication terminal having speech recognition function, update support device for speech recognition dictionary thereof, and update method - A simple means for expanding a speech recognition dictionary between communication terminals is provided. A speech recognition dictionary update support device (08-13-2009
20090177463Media Content Assessment and Control Systems - Computer implemented methods, computing devices, and computing systems, wherein relationships of words or phrases within a textual corpus are assessed via frequencies of occurrence of particular words or phrases and via frequencies of co-occurrence of particular pairs of words or phrases within defined tracts of text from within the textual corpus.07-09-2009
20080262832Document Processing Device, and Document Processing Method - A technique is provided for adding an annotation to a document described in a markup language. Upon acquisition of a document described in a markup language, a document processing apparatus 10-23-2008
20110144978SYSTEM AND METHOD FOR ADVANCEMENT OF VOCABULARY SKILLS AND FOR IDENTIFYING SUBJECT MATTER OF A DOCUMENT - A system and method for providing vocabulary information includes one or more computer processors that, for each of a plurality of words of a text, determine a relevance of the word to the text, and, for each of at least a subset of the plurality of words, output an indication of the respective determined relevance of the word to the text, where, for each of the plurality of words, the determination includes comparing a frequency of the word in the text to a frequency threshold.06-16-2011
20100174529Explicit Character Filtering of Ambiguous Text Entry - The present invention relates to a method and apparatus for explicit filtering in ambiguous text entry. The invention provides embodiments including various explicit text entry methodologies, such as 2-key and long pressing. The invention also provides means for matching words in a database using build around methodology, stem locking methodology, word completion methodology, and n-gram searches.07-08-2010
20100161318Systems and Methods of Building and Using Custom Word Lists - Standard word lists that are often used for such operations as predictive text, spell checking, and word completion are based on general linguistic data that might not accurately reflect actual text usage patterns of particular users. Systems and methods of building and using a custom word list for use in text operations on an electronic device are provided. A collection of text items associated with a user of the electronic device is scanned to identify words in the text items. A weighting is then assigned to each identified word, and the words and corresponding weightings are stored.06-24-2010
20100185438METHOD OF CREATING A DICTIONARY - An apparatus, program product and method for creating a dictionary. The method may be performed automated, semi-automated or manually. Dictionary allows entries to be stored with a plurality of data elements.07-22-2010
20110238412Method for Constructing Pronunciation Dictionaries - Embodiments of the invention disclose a system and a method for constructing a pronunciation dictionary by transforming an unaligned entry to an aligned entry. The unaligned entry and the aligned entry include a set of words and a set of pronunciations corresponding to the set of words. The method aligns each word in the aligned entry with a subset of pronunciations by determining a pronunciation prediction for each word, such that there is one-to-one correspondence between the word and the pronunciation prediction; mapping each pronunciation prediction to the subset of pronunciations to produce a predictions-pronunciation map having each pronunciation prediction aligned with the subset of pronunciations; and determining the aligned entry based on the predictions-pronunciation map using the one-to-one correspondence between the word and the pronunciation prediction.09-29-2011
20100250239SHARABLE DISTRIBUTED DICTIONARY FOR APPLICATIONS - Architecture for providing and processing a dictionary in a universal format such as XML, for example. The dictionary can be authored while in the universal format, designated for use with multiple compatible applications, and compiled on-the-fly using a dictionary compiler. The dictionary can be shared and/or distributed via a web server, e-mail, and other suitable data transmission techniques. Once downloaded to the client application, the dictionary is registered with the requesting client application for use. With this model, the dictionary created by a user for a specific domain and for a specific application can be easily reused by other applications, and shared among the users belonging to the same domain.09-30-2010
20100211381System and Method of Creating and Using Compact Linguistic Data - A system and method of creating and using compact linguistic data are provided. Frequencies of words appearing in a corpus are calculated. Each unique character in the words is mapped to a character index, and characters in the words are replaced with the character indexes. Sequences of characters are mapped to substitution indexes, and the sequences of characters in the words are replaced with the substitution indexes. The words are grouped by common prefixes, and each prefix is mapped to location information for the group of words which start with the prefix.08-19-2010
20100299143Voice Recognition Dictionary Generation Apparatus and Voice Recognition Dictionary Generation Method - A voice recognition dictionary generation apparatus and method for suppressing reduction of processing speed at the time of updating. The apparatus includes an input unit configured to receive a text subjected to voice recognition, a storage unit configured to store the text with respect to each file of a predetermined item, a reading data generation unit configured to analyze the text and generate a reading data, and a voice recognition dictionary configured to include content dictionaries that store therein the reading data of the text with respect to each file of the predetermined item. When the file of the predetermined item including the text stored in the storage unit is updated, a control unit detects a total number of the content dictionaries, and when the total number is smaller than a predetermined limit, the control unit generates the content dictionaries with respect to each updated predetermined item.11-25-2010
20120143598SERVER, DICTIONARY CREATION METHOD, DICTIONARY CREATION PROGRAM, AND COMPUTER-READABLE RECORDING MEDIUM RECORDING THE PROGRAM - A search server includes a category database that stores category information containing location information indicating a geographical location, a word assigned to the location, and a user ID identifying a user having assigned the word to the location in association with one another, and a dictionary registration unit that reads first input information indicating locations to which a first word is assigned by a first user and second input information indicating locations to which a second word is assigned by a second user, and when determining that the first and second users have assigned the words to a predetermined number or more of common locations based on those information, creates dictionary data containing the first and second words in association with each other and enters the dictionary data into a dictionary database.06-07-2012
20110035211SYSTEMS, METHODS AND APPARATUS FOR RELATIVE FREQUENCY BASED PHRASE MINING - Example systems, methods, processes, and apparatus identify phrases in electronic information. One or more phrase dictionaries are created from content in one or more electronic documents. A relative frequency value is generated for each phrase in each of the one or more phrase dictionaries. The relative frequency value for a phrase is based at least in part on a comparison between a frequency of the phrase in the electronic document and a frequency of each individual word in the phrase. One or more phrases are selected based at least in part on a threshold and the relative frequency value generated for each phrase. The selected one or more phrases and the relative frequency values associated with each of the selected one or more phrases are output for graphical display to a user.02-10-2011
20090063136Information processing apparatus, information processing method, and information processing program - An information processing apparatus includes a foreign language dictionary, an input unit through which a user input a letter, a determination unit, a storage unit storing notation information about the letter, a search unit, a converter, and an output unit. The foreign language dictionary stores each foreign word in one of uppercase, lowercase, and a combination of at least one uppercase letter and at least one lowercase letter. The determination unit determines whether a letter input through the input unit is uppercase or lowercase. The search unit searches the foreign language dictionary for a word corresponding to an initially input string of letters in the order in which the letters are input. The converter converts a notation of each letter included in the word retrieved by the search unit according to the notation information. The output unit outputs the word in a notation converted by the converter as a candidate word.03-05-2009
20090063134Media Content Assessment and Control Systems - Computer implemented methods and computing systems wherein relationships of words or phrases within a textual corpus are assessed via frequencies of occurrence of particular words or phrases and via frequencies of co-occurrence of particular pairs of words or phrases within defined tracts of text from within the textual corpus.03-05-2009
20120173228Definitional Method to Increase Precision and Clarity of Information - In order to know precisely and clearly what words or terms mean, the DMTIPCI definitional method by an algorithm implementing the DMTIPCI method's unique and novel steps in a computer microprocessor of iteratively deconstructing all usage predicate words of all words in any language to their primary words and storing said words with their deconstructed predicates and primary words in computer repositories and/or in printed form—the DMTIPCI Dictionary. Primary words as herein defined are words or terms that have no non-tautological words in their predicate(s). All words of any language are arranged under their primary words by another DMTIPCI algorithm implemented in a computer microprocessor creating a DMTIPCI Primary Word Dictionary. Other embodiments are described and shown.07-05-2012
20090048824ACOUSTIC SIGNAL PROCESSING METHOD AND APPARATUS - An audible signal process method includes preparing, in at least one dictionary, a plurality of weighting factors each learned to optimize evaluation function established by a weighted learning audible signal and a target speech signal corresponding to the learning audible signal and used for weighting, estimating a noise component included in the input audible signal, calculating a feature quantity depending upon the noise component of the input audible signal, selecting a weighting factor corresponding to the feature quantity from the dictionary, and weighting the input audible signal using the selected weighting factor to generate a processed output audible signal.02-19-2009
20110077937ELECTRONIC APPARATUS WITH DICTIONARY FUNCTION AND COMPUTER-READABLE MEDIUM - An electronic apparatus includes a storage which includes dictionary information, a conjugation chart database which stores conjugation charts for a language stored in the dictionary information so as to cause the charts to correspond to conjugation chart numbers, and a verb-verb conjugation chart correspondence table which stores the conjugation chart numbers so as to cause the numbers to correspond to the spellings of verbs, and a processor which causes to display letter strings stored in the dictionary information, accepts the specification of an arbitrary word from the letter strings displayed, when the specified word is a verb, refers to the verb-verb conjugation chart correspondence table and determines a conjugation chart number caused to correspond to the spelling of the specified verb, and reads a conjugation chart corresponding to the determined conjugation chart number from the conjugation charts stored in the conjugation chart database and displays the conjugation chart.03-31-2011
20120303359DICTIONARY CREATION DEVICE, WORD GATHERING METHOD AND RECORDING MEDIUM - When gathering words through a dictionary growth process, a dictionary growth unit (11-29-2012
20080243485METHOD, APPARATUS, SYSTEM, USER INTERFACE AND COMPUTER PROGRAM PRODUCT FOR USE WITH MANAGING CONTENT - A method for adding content to a dictionary for use with a communication terminal, including: parsing a media content for one or more expressions; extracting the expressions from the media content; and providing the expressions to the dictionary for subsequent retrieval.10-02-2008
20080243487HYBRID TEXT SEGMENTATION USING N-GRAMS AND LEXICAL INFORMATION - A hybrid n-gram/lexical analysis tokenization system including a lexicon and a hybrid tokenizer operative to perform both N-gram tokenization of a text and lexical analysis tokenization of a text using the lexican, and to construct either of an index and a classifier from the results of both of the N-gram tokenization and the lexical analysis tokenization, where the hybrid tokenizer is implemented in at least one of computer hardware and computer software and is embodied within a computer-readable medium.10-02-2008
20080243486Apparatus and Method for Identifying Unknown Word Based on a Definition - An apparatus and method of identifying an unknown word based on a known meaning, or word identification system. The word identification system allows a user that knows the meaning of a word but cannot recall the word that corresponds to the known meaning to identify the word through a series of simple questions. Each of the questions elicits information known about the unknown word through the knowledge of the meaning of the word and its use in language. With each response, the number of words matching the definitional characteristics is reduced until a set of one or more probable words and definitions for the unknown word is finally presented to the user.10-02-2008
20080255826Dictionary data generating apparatus, character input apparatus, dictionary data generating method, and character input method - A dictionary data generating apparatus is disclosed. The apparatus includes: an acquiring part configured to acquire a current issue keyword from inputted information including a current issue keyword; and a generating part configured to generate current issue dictionary data for prediction conversion based on the current issue keyword acquired by the acquiring part.10-16-2008
20110119052Speech recognition dictionary creating support device, computer readable medium storing processing program, and processing method - A device extracts prosodic information including a power value from a speech data and an utterance section including a period with a power value equal to or larger than a threshold, from the speech data, divides the utterance section into each section in which a power value equal to or larger than another threshold, acquires phoneme sequence data for each divided speech data by phoneme recognition, generates clusters which is a set of the classified phoneme sequence data by clustering, calculates an evaluation value for each cluster, selects clusters for which the evaluation value is equal to or larger than a given value as candidate clusters, determines one of the phoneme sequence data from the phoneme sequence data constituting the cluster for each candidate cluster to be a representative phoneme sequence, and selects the divided speech data corresponding to the representative phoneme sequence as listening target speech data.05-19-2011
20090292530Method and system for grammar relaxation - The method and system for modifications of grammars presented in this invention applies to automatic speech recognition systems which take a spoken utterance as input and use a grammar to assign word sequence(s) and, possibly, one or more semantic interpretations to that utterance. One type of modification may take a form of reducing the importance of select grammar components based on the analysis of the occurrence of these components in the original grammar. Another type of modification may take form of adding new grammar components to the grammar of some semantic interpretations based on the analysis of the occurrence of these components in the select set of other semantic interpretations. Both modifications can be carried out either automatically or offered for validation. Some benefits of the presented method and system are: reduced effort for building grammars, improvement of recognition accuracy, automatic adaptation of dynamic grammars to the context.11-26-2009
20110191100LANGUAGE MODEL SCORE LOOK-AHEAD VALUE IMPARTING DEVICE, LANGUAGE MODEL SCORE LOOK-AHEAD VALUE IMPARTING METHOD, AND PROGRAM STORAGE MEDIUM - A speech recognition apparatus that performs frame synchronous beam search by using a language model score look-ahead value prevents the pruning of a correct answer hypothesis while suppressing an increase in the number of hypotheses. A language model score look-ahead value imparting device 08-04-2011
20110137642Word Detection - Methods, systems, and apparatus, including computer program products, in which data from web documents are partitioned into a training corpus and a development corpus are provided. First word probabilities for words are determined for the training corpus, and second word probabilities for the words are determined for the development corpus. Uncertainty values based on the word probabilities for the training corpus and the development corpus are compared, and new words are identified based on the comparison.06-09-2011
20110119051Phonetic Variation Model Building Apparatus and Method and Phonetic Recognition System and Method Thereof - A phonetic variation model building apparatus, having a phoneme database for recording at least a standard phonetic model of a language and a plurality of non-standardized phonemes of the language is provided. A phonetic variation identifier identifies a plurality of phonetic variations between the non-standardized phonemes and the standard phonetic model. A phonetic transformation calculator calculates a plurality of coefficients of a phonetic transformation function based on the phonetic variations and the phonetic transformation function. A phonetic variation model generator generates at least a phonetic variation model based on the standard phonetic model, the phonetic transformation function and the coefficients thereof.05-19-2011
20090306969Systems and Methods for an Automated Personalized Dictionary Generator for Portable Devices - A system and method for automated dictionary population is provided to facilitate the entry of textual material in dictionaries for enhancing word prediction. The automated dictionary population system is useful in association with a mobile device including at least one dictionary which includes entries. The device receives a communication which is parsed and textual data extracted. The text is compared to the entries of the dictionaries to identify new words. Statistical information for the parsed words, including word usage frequency, recency, or likelihood of use, is generated. Profanities may be processed by identifying profanities, modifying the profanities, and asking the user to provide feedback. Phrases are identified by phrase markers. Lastly, the new words are stored in a supplementary word list as single words or by linking the words of the identified phrases to preserve any phrase relationships. Likewise, the statistical information may be stored.12-10-2009
20110307247METHOD AND SYSTEM FOR LEXICAL NAVIGATION OF ITEMS - A method and a system for lexical navigation of a corpus of items are provided. For example, the method may include generating a data structure in a non-transitory, computer readable medium. The data structure may include a number of items, a number of keywords, and a frequency that each of the keywords is associated with each of the items. The method may further include generating a top-level lexical cloud that includes a subset of the keywords. Each keyword in the subset may be associated with a size that is proportional its frequency of occurrence. Finally, the method may include generating a plurality of lower-level lexical clouds by eliminating any one of the plurality of items not associated with a particular one of the keywords from the data structure, and generating the lower level lexical cloud as a second subset of the plurality of keywords that remain in the data structure.12-15-2011
20120209595DICTIONARY INFORMATION DISPLAY DEVICE AND DICTIONARY INFORMATION DISPLAY METHOD - A dictionary information display device displays explanatory information on a desired search word retrieved from dictionary data on a main display module, creates list data about examples, descriptions, and phrases incidental to each word meaning included in the explanatory information, and displays, on an auxiliary display module, a supplementary information list screen that arranges the beginning parts of the individual examples, descriptions, and phrases line by line in headline form. When the user selects any one of the examples, descriptions, and phrases on the list screen displayed by the auxiliary display module, the entire contents of the selected example or description are displayed by the auxiliary display module.08-16-2012
20080262833Document Processing Device and Document Processing Method - A technique is provided for adding an annotation to a document described in a markup language. Upon the document processing apparatus 10-23-2008
20110066425SYSTEMS, METHODS, AND APPARATUS FOR AUTOMATED MAPPING AND INTEGRATED WORKFLOW OF A CONTROLLED MEDICAL VOCABULARY - Systems, methods, and apparatus provide clinical terminology services including a controlled medical vocabulary supplemented by local clinical content. An example method includes accessing an initial controlled medical vocabulary including at least one external terminology via a vocabulary management server; processing local clinical content including unstructured local clinical content provided via an importer framework; analyzing and extracting the unstructured local clinical content using a text analyzer and extraction tool to generate one or more proposed terms; identifying one or more synonyms for the one or more proposed terms and placing the one or more synonyms into a queue to be added to the controlled medical vocabulary; reviewing the one or more synonyms; and adding one or more synonyms to the controlled medical vocabulary with placement and relationship based on analyzing unstructured local clinical content to automatically map between the at least one external terminology and the local clinical content.03-17-2011
20090132237Orthogonal classification of words in multichannel speech recognizers - A computerized method for distribution among a multiple dictionaries of a target vocabulary. The vocabulary includes words for use in a speech recognition application installed in a computer system. Each word of the target vocabulary is found in only one of the dictionaries. The words are first categorized based on phonetic length, and distributed into multiple groups each of equal phonetic length. The first groups are secondly categorized based on combinations of vowel sounds. The words of the first groups are placed into second groups accordingly based on having identical vowel sounds. The second groups are thirdly categorized into third groups based on the consonants of the words of the second groups and placement of the consonants relative to the vowel sounds. The words within each of the third groups are compared in pairs for phonetic distance and the words of minimal pairwise phonetic distance between them are placed in fourth groups. The words of each of the fourth groups are distributed into the multiple dictionaries, preferably with no more than one member per fourth group distributed into each of the dictionaries. The multiple dictionaries are preferably mutually orthogonal, that is each of the dictionaries includes words of maximal phonetic distance from each other.05-21-2009
20100250240SYSTEM AND METHOD FOR TRAINING AN ACOUSTIC MODEL WITH REDUCED FEATURE SPACE VARIATION - Feature space variation associated with specific text elements is reduced by training an acoustic model with a phoneme set, dictionary and transcription set configured to better distinguish the specific text elements and at least some specific phonemes associated therewith. The specific text elements can include the most frequently occurring text elements from a text data set, which can include text data beyond the transcriptions of a training data set. The specific text elements can be identified using a text element distribution table sorted by occurrence within the text data set. Specific phonemes can be limited to consonant phonemes to improve speed and accuracy.09-30-2010
20100250241Non-dialogue-based Learning Apparatus and Dialogue-based Learning Apparatus - The invention provides a dialogue-based learning apparatus through dialogue with users comprising: a speech input unit (09-30-2010
20120084077BUILDING AND CONTRACTING A LINGUISTIC DICTIONARY - A method for building and contracting a linguistic dictionary, the linguistic dictionary comprising a list of surface forms and a list of normalized forms, each normalized form being associated with a surface form, the method comprising the steps of: comparing each character of a surface form with each character of the surface form's normalized form; in response to the comparing step, determining an edit operation for each character compared; and generating a transform code from the set of the edit operations in order to transform the surface form to its normalized form.04-05-2012
20100082333LEMMATIZING, STEMMING, AND QUERY EXPANSION METHOD AND SYSTEM - A method of stemming text and system therefore are described. The method comprises removing stop words from a document based on at least one stop word entry in an array of stop words and flagging as nouns words determined to be attached to definite articles and preceded by a noun array entry in an array of stop words preceding at least one noun; adding flagged nouns to a noun dictionary; flagging as verbs words determined to be preceded by an verb array entry in an array of stop words preceding at least one verb; adding flagged verbs to a verb dictionary; searching the document for nouns and verbs based on the flagged nouns and the flagged verbs; removing remaining stop words subsequent to searching the document; applying light stemming on the flagged nouns; applying a root-based stemming on the flagged verbs; and storing the stemmed document.04-01-2010
20100174528CREATING A TERMS DICTIONARY WITH NAMED ENTITIES OR TERMINOLOGIES INCLUDED IN TEXT DATA - A computer system of an embodiment of the disclosure can be used to automatically create or populate a terms dictionary using a set of computing units. A morphological analysis unit can acquire token sequence data by performing morphological analysis for the text data. A category distinguishing unit can distinguish tokens of the token sequence data by using a category dictionary to extract uncategorized words. An uncategorized-word comparing unit can compare each of the extracted uncategorized words with an uncategorized-word comparison rule to extract an uncategorized word matching the uncategorized-word comparison rule as a registration candidate word. A token-sequence comparing unit can compare a token sequence of the token sequence data with a token-sequence comparison rule to extract a token sequence matching the token-sequence comparison rule as registration candidate words. A permission unit can permit a user to select whether to register the registration candidate words in the category dictionary.07-08-2010
20120179455LANGUAGE LEARNING APPARATUS AND METHOD USING GROWING PERSONAL WORD DATABASE SYSTEM - Disclosed herein is a language learning apparatus and method using a growing personal word DB system, which construct an individual person-based word DB in which known words and unknown words are stored separately. The language learning apparatus includes a word extraction unit for extracting words included in learning content and generating a word list. A word analysis unit sets learning levels of words included in the word list based on a level-based word DB. A control unit generates an individual person-based word DB in which classification into known words and unknown words is performed and known words and unknown words are stored separately based on a learning level of a learner and the level-based word DB, and performs control such that the words included in the word list are classified into known words and unknown words and are stored separately based on the set learning level.07-12-2012
20100010806Storage system for symptom information of Traditional Chinese Medicine (TCM) and method for storing TCM symptom information - A storage system is disclosed for symptom information of Traditional Chinese Medicine (TCM). In at least one embodiment, the system includes a processing module, a TCM standard data module and a storage module, wherein: a TCM specialized glossary and the correlated attributes of the TCM specialized glossary are stored in said TCM standard data module; the processing module is used for dividing the TCM symptom information into at least one phrase, matching the phrase(s) on the basis of the TCM specialized glossary so as to obtain terms belonging to said TCM specialized glossary, for establishing correlated relationships of terms in the phrase(s) according to the correlation attributes in the TCM specialized glossary, and for storing the terms for which a correlated relationship has been established as structured data in said storage module. At least one embodiment of the present invention further discloses a method for storing TCM symptom information. By implementing at least one embodiment of the present invention, symptom information recorded in any language customary to doctors can be accepted, thus reducing the complexity in recording symptom information, thereby facilitating the input thereof by the doctor.01-14-2010
20090076800Dual Cross-Media Relevance Model for Image Annotation - A dual cross-media relevance model (DCMRM) is used for automatic image annotation. In contrast to the traditional relevance models which calculate the joint probability of words and images over a training image database, the DCMRM model estimates the joint probability by calculating the expectation over words in a predefined lexicon. The DCMRM model may be advantageous because a predefined lexicon potentially has better behavior than a training image database. The DCMRM model also takes advantage of content-based techniques and image search techniques to define the word-to-image and word-to-word relations involved in image annotation. Both relations can be estimated by using image search techniques on the web data as well as available training data.03-19-2009
20090018822METHODS AND APPARATUS FOR BUSINESS RULES AUTHORING AND OPERATION EMPLOYING A CUSTOMIZABLE VOCABULARY - In one embodiment, a method comprises creating at least one individualized language resource, creating at least one individualized language rule referencing at least one of said individualized language resource, and transforming said at least one individualized language rule into computer executable format.01-15-2009
20110131038EXCEPTION DICTIONARY CREATING UNIT, EXCEPTION DICTIONARY CREATING METHOD, AND PROGRAM THEREFOR, AS WELL AS SPEECH RECOGNITION UNIT AND SPEECH RECOGNITION METHOD - An exception dictionary creating device, an exception dictionary creating method, and a program therefor allowing creating an exception dictionary are provided for affording high speech recognition performance while reducing the size of the exception dictionary, as well as a speech recognition device and a speech recognition method capable of recognizing a speech with high accuracy of recognition by using the exception dictionary. To achieve this, a text-to-phonetic symbol converting unit (06-02-2011
20110131037Vocabulary Dictionary Recompile for In-Vehicle Audio System - An in-vehicle audio system and methods are provided. A respective word or a respective phrase may be associated with each item of audio content stored in the in-vehicle audio system. The in-vehicle audio system may perform an action with respect to one of the stored items of audio content in response to a spoken command, which may include the respective word or the respective phrase associated with the one of the stored items. When audio content is to be added to the in-vehicle audio system, phonetics related to the audio content may be generated and added to a vocabulary dictionary during a compile process. When stored audio content is to be deleted from the in-vehicle audio system, phonetics related to the stored audio content to be deleted may be eliminated from the vocabulary dictionary during the compile process, which, in some embodiments, may be performed during a shutdown process.06-02-2011
20120323565METHOD AND APPARATUS FOR ANALYZING TEXT - An apparatus, a method, an applications programming interface and a computer program product for analyzing text. The text is transmitted between users of a text based network mediated system. The text is analyzed by intended word filter rule processing elements to determine a presence of a variation word of an intended word in the text. A method for creating the intended word filter rule processing elements is also disclosed.12-20-2012
20110238413DOMAIN DICTIONARY CREATION - Methods, systems, and apparatus, including computer program products, to identify topic words in a collection of documents that includes topic documents related to a topic are disclosed. A reference topic word divergence value based on a document collection and the topic document collection is determined. A candidate topic word divergence value for a candidate topic word is determined based on the document collection and the topic document collection. The candidate topic word is determined to be a topic word if the candidate topic word divergence value is greater than the reference topic word divergence value.09-29-2011
20100131267SPEECH SAMPLES LIBRARY FOR TEXT-TO-SPEECH AND METHODS AND APPARATUS FOR GENERATING AND USING SAME - A method of recording speech for use in a speech samples library. In an exemplary embodiment, the method comprises recording a speaker pronouncing a phoneme with musical parameters characterizing pronunciation of another phoneme by the same or another speaker. For example, in one embodiment the method comprises: providing a recording of a first speaker pronouncing a first phoneme in a phonemic context. The pronunciation is characterized by some musical parameters. A second reader, who may be the same as the first reader, is then recorded pronouncing a second phoneme (different from the first phoneme) with the musical parameters that characterizes pronunciation of the first phoneme by the first speaker. The recordings made by the second reader are used for compiling a speech samples library.05-27-2010
20110161073SYSTEM AND METHOD OF DISAMBIGUATING AND SELECTING DICTIONARY DEFINITIONS FOR ONE OR MORE TARGET WORDS - Systems and methods for automatically selecting dictionary definitions for one or more target words include receiving electronic signals from an input device indicating one or more target words for which a dictionary definition is desired. The target word(s) and selected surrounding words defining an observation sequence are subjected to a part of speech tagging algorithm to electronically determine one or more most likely part of speech tags for the target word(s). Potential relations are examined between the target word(s) and selected surrounding keywords. The target word(s), the part of speech tag(s) and the discovered keyword relations are then used to map the target word(s) to one or more specific dictionary definitions. The dictionary definitions are then provided as electronic output, such as by audio and/or visual display, to a user.06-30-2011
20080243488AUTOMATED GLOSSARY CREATION - A method and device for creating a glossary includes a processor operable for executing computer instructions for identifying, in at least one information source, at least one glossary item identifying a part or a component, determining at least one glossary item form as a canonical form, defining, by using the canonical form, at least one syntactic structure, that includes one of the at least one identified glossary items, for each of at least one semantic classes, and searching a second information source for the at least one syntactic structure of the semantic class.10-02-2008
20080228469ROLLUP FUNCTIONS FOR EFFICIENT STORAGE, PRESENTATION, AND ANALYSIS OF DATA - Methods of organizing a series of sibling data entities in a digital computer are provided for preserving sibling ranking information associated with the sibling data entities and for attaching the sibling ranking information to a joint parent of the sibling data entities to facilitate on-demand generation of ranked parent candidates. A rollup function of the present invention builds a rollup matrix (09-18-2008
20130179154SPEECH RECOGNITION APPARATUS - A speech recognition apparatus includes a first recognition dictionary, a speech input unit, a speech recognition unit, a speech transmission unit, a recognition result receipt unit, and a control unit. The speech recognition unit recognizes a speech based on a first recognition dictionary, and outputs a first recognition result. A server recognizes the speech based on a second recognition dictionary, and outputs a second recognition result. The control unit determines a likelihood level of a selected candidate obtained based on the first recognition result, and accordingly controls an output unit to output at least one of the first recognition result and the second recognition result. When the likelihood level of the selected candidate is equal to or higher than a threshold level, the control unit controls the output unit to output the first recognition result irrespective of whether the second recognition result is received from the server.07-11-2013
20130179155SYSTEM AND METHOD FOR ENHANCED LOOKUP IN AN ONLINE DICTIONARY - A system and method predictively generates words based on a user input, according to a frequency of lookup of each of the generated words. The system and method also allows for a user to add predictively generated words to a word list that assists in the facilitation of word and vocabulary comprehension for a user. Words in the online dictionary are grouped in word families where a user can navigate between different forms of a root word.07-11-2013
20080215314METHOD FOR ADAPTING A K-MEANS TEXT CLUSTERING TO EMERGING DATA - A method and structure for clustering documents in datasets which include clustering first documents and a first dataset to produce first document classes, creating centroid seeds based on the first document classes, and clustering second documents in a second dataset using the centroid seeds, wherein the first dataset and the second dataset are related. The clustering of the first documents in the first dataset forms a first dictionary of most common words in the first dataset and generates a first vector space model by counting, for each word in the first dictionary, a number of the first documents in which the word occurs, and clusters the first documents in the first dataset based on the first vector space model, and further generates a second vector space model by counting, for each word in the first dictionary, a number of the second documents in which the word occurs. Creation of the centroid seeds includes classifying second vector space model using the first document classes to produce a classified second vector space model and determining a mean of vectors in each class in the classified second vector space model, the mean includes the centroid seeds.09-04-2008
20130090921PRONUNCIATION LEARNING FROM USER CORRECTION - Systems and methods are described for adding entries to a custom lexicon used by a speech recognition engine of a speech interface in response to user interaction with the speech interface. In one embodiment, a speech signal is obtained when the user speaks a name of a particular item to be selected from among a finite set of items. If a phonetic description of the speech signal is not recognized by the speech recognition engine, then the user is presented with a means for selecting the particular item from among the finite set of items by providing input in a manner that does not include speaking the name of the item. After the user has selected the particular item via the means for selecting, the phonetic description of the speech signal is stored in association with a text description of the particular item in the custom lexicon.04-11-2013
20130132073SYSTEMS AND METHODS OF BUILDING AND USING CUSTOM WORD LISTS - Standard word lists that are often used for such operations as predictive text, spell checking, and word completion are based on general linguistic data that might not accurately reflect actual text usage patterns of particular users. Systems and methods of building and using a custom word list for use in text operations on an electronic device are provided. A collection of text items associated with a user of the electronic device is scanned to identify words in the text items. A weighting is then assigned to each identified word, and the words and corresponding weightings are stored.05-23-2013
20080201134COMPUTER-READABLE RECORD MEDIUM IN WHICH NAMED ENTITY EXTRACTION PROGRAM IS RECORDED, NAMED ENTITY EXTRACTION METHOD AND NAMED ENTITY EXTRACTION APPARATUS - A named entity extraction apparatus includes an extraction result acquisition unit for acquiring a named entity extraction result obtained as a result of a named entity extraction process; and a lexicon information creation unit for creating lexicon information which is utilized as clues in extracting named entities from text data, on the basis of the named entity extraction result acquired by said extraction result acquisition unit.08-21-2008
20080201133SYSTEM AND METHOD FOR SEMANTIC CATEGORIZATION - There is disclosed a system and method for automatically performing semantic categorization. In one embodiment at least one text description pertaining to a category set is accepted along with words that are anticipated to be uttered by a user pertaining to that category set; lexical chaining confidence score is attached to each pair matched between the anticipated words and the accepted text description. These confidence scores are used subsequently by a categorization circuit that accepts a text phrase utterance from an input source along with a category set pertaining to the accepted utterance. The categorization circuit, in one embodiment, creates word pairs matched between the accepted text phrase utterance and the accepted category set. From these word scores, the category pertaining to the utterance is determined based, at least in part, on the assigned lexical chaining confidence scores as previously determined.08-21-2008
20090299732CONTEXTUAL DICTIONARY INTERPRETATION FOR TRANSLATION - A method and apparatus provides for interpreting a foreign word or phrase using a contextual likelihood model and a dictionary. An apparatus may translate foreign language text by taking context into account and displaying the translation with alternatives on an adaptive user interface display. The contextual likelihood model may be interlaced with a dictionary. In an embodiment, the interaction between the contextual likelihood model and a dictionary may result in an adaptive adjustment of the meanings or the order of meanings displayed. The order of meanings displayed may be representative of the calculated likelihoods.12-03-2009
20130158987SYSTEM AND METHOD FOR DYNAMICALLY GENERATING GROUP-RELATED PERSONALIZED DICTIONARIES - A user device communicates with a network server that has access to one or more knowledge sources. Based on a current situational context for a user of the device, the network server dynamically generates a group-related personalized dictionary using information retrieved from the knowledge sources and provides the dictionary to the user device. Applications executing on the user device can then use the dictionary to suggest or predict words, terms, or symbols to the user in response to receiving user input.06-20-2013
20130197901SYSTEMS AND METHODS FOR AN AUTOMATED PERSONALIZED DICTIONARY GENERATOR FOR PORTABLE DEVICES - A system and method for automated dictionary population is provided to facilitate the entry of textual material in dictionaries for enhancing word prediction. The automated dictionary population system is useful in association with a mobile device including at least one dictionary which includes entries. The device receives a communication which is parsed and textual data extracted. The text is compared to the entries of the dictionaries to identify new words. Statistical information for the parsed words, including word usage frequency, recency, or likelihood of use, is generated. Profanities may be processed by identifying profanities, modifying the profanities, and asking the user to provide feedback. Phrases are identified by phrase markers. Lastly, the new words are stored in a supplementary word list as single words or by linking the words of the identified phrases to preserve any phrase relationships. Likewise, the statistical information may be stored.08-01-2013
20120041758SYNCHRONIZATION OF AN INPUT TEXT OF A SPEECH WITH A RECORDING OF THE SPEECH - A method and system for synchronizing words in an input text of a speech with a continuous recording of the speech. A received input text includes previously recorded content of the speech to be reproduced. A synthetic speech corresponding to the received input text is generated. Ratio data including a ratio between the respective pronunciation times of words included in the received text in the generated synthetic speech is computed. The ratio data is used to determine an association between erroneously recognized words of the received text and a time to reproduce each erroneously recognized word. The association is outputted in a recording medium and/or displayed on a display device.02-16-2012

Patent applications in class Dictionary building, modification, or prioritization