Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Jun Wu, Saratoga US

Jun Wu, Saratoga, CA US

Patent application numberDescriptionPublished
20090055168Word Detection - Methods, systems, and apparatus, including computer program products, in which data from web documents are partitioned into a training corpus and a development corpus are provided. First word probabilities for words are determined for the training corpus, and second word probabilities for the words are determined for the development corpus. Uncertainty values based on the word probabilities for the training corpus and the development corpus are compared, and new words are identified based on the comparison.02-26-2009
20090055381Domain Dictionary Creation - Methods, systems, and apparatus, including computer program products, to identify topic words in a document corpus that includes topic documents related to a topic are disclosed. A reference topic word divergence value based on the document corpus and the topic document corpus is determined. A candidate topic word divergence value for a candidate topic word is determined based on the document corpus and the topic document corpus. The candidate topic word is determined to be a topic word if the candidate topic word divergence value is greater than the reference topic word divergence value.02-26-2009
20090070097USER INPUT CLASSIFICATION - Systems and methods of classifying user input are disclosed. The user input can be, for example, in the form of Roman characters. An ambiguous word (e.g., a word that is a non-pinyin word written in Roman characters and a valid pinyin word) can be identified in the user input. Contextual words (e.g., words adjacent to the ambiguous word) are classified as a pinyin context or a non-pinyin context. The ambiguous word is classified based on the context of the contextual words.03-12-2009
20090077037SUGGESTING ALTERNATIVE QUERIES IN QUERY RESULTS - Methods, systems, and apparatus, including computer program products, for suggesting alternative queries based on original query search results. In one aspect, a method includes receiving search results for a first query, where each search result refers to a respective resource and includes a snippet of content from the respective resource, receiving one or more suggested second queries, for each of the suggested second queries: selecting a set of words in one of the snippets to represent the suggested second query, associating the suggested second query with the set so that a user can interact with a word in the set to invoke the suggested second query, and marking the set so as to indicate that the user can interact with a word in the set to invoke the suggested second query, and transmitting the search results including each marked set to a client device for presentation to the user.03-19-2009
20100005086RESOURCE LOCATOR SUGGESTIONS FROM INPUT CHARACTER SEQUENCE - Methods, systems, and apparatus, including computer program products, in which an input method editor receives Roman character inputs, identifies keywords for candidate sets of a non-Roman character, and identifies an associated resource location. Upon identifying an associated resource location, associating the resource location with the candidate set of non-Roman characters.01-07-2010
20100180199DETECTING NAME ENTITIES AND NEW WORDS - Various aspects can be implemented for detecting name entities and/or new words from input entries. In general, one aspect can be a method that includes receiving an input entry comprising a text string. The method also includes identifying segmentation information from the input entry. The method further includes generating a candidate text string from the text string of the input entry based on the segmentation information. Other implementations of this aspect includes corresponding systems, apparatus, and processing engines.07-15-2010
20100306139CJK NAME DETECTION - Aspects directed to name detection are provided. A method includes generating a raw name detection model using a collection of family names and an annotated corpus including a collection of n-grams, each n-gram having a corresponding probability of occurring. The method includes applying the raw name detection model to a collection of semi-structured data to form annotated semi?structured data identifying n-grams identifying names and n?grams not identifying names and applying the raw name detection model to a large unannotated corpus to form a large annotated corpus data identifying n-grams of the large unannotated corpus identifying names and n-grams not identifying names. The method includes generating a name detection model, including deriving a name model using the annotated semi-structured data identifying names and the large annotated corpus data identifying names, deriving a not-name model using the semi?structured data not identifying names, and deriving a language model using the large annotated corpus.12-02-2010
20110022952Determining Proximity Measurements Indicating Respective Intended Inputs - Determination of proximity measurements indicative of respective intended inputs are disclosed. User inputs are received, where each user input is one of a predefined plurality of inputs that each map to multiple characters in a language. Rates of user selections of candidates decoded from the user inputs into the language are received, where each of the candidates includes one or more characters in the language. User inputs for the candidates having low rates of selection as non-selected user inputs are identified. User inputs for the candidates having high rates of selection as intended inputs are identified. The intended user inputs to the non-selected user inputs are compared to identify one or more misspelled input and intended input pairs. A proximity measurement for each misspelled input and intended input pair is determined based on a ratio of the number of times corresponding candidates for the misspelled input were not selected to the number of times the misspelled input was entered.01-27-2011
20110137642Word Detection - Methods, systems, and apparatus, including computer program products, in which data from web documents are partitioned into a training corpus and a development corpus are provided. First word probabilities for words are determined for the training corpus, and second word probabilities for the words are determined for the development corpus. Uncertainty values based on the word probabilities for the training corpus and the development corpus are compared, and new words are identified based on the comparison.06-09-2011

Patent applications by Jun Wu, Saratoga, CA US