Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Jianfeng Gao, Kirkland US

Jianfeng Gao, Kirkland, WA US

Patent application numberDescriptionPublished
20090070095MINING BILINGUAL DICTIONARIES FROM MONOLINGUAL WEB PAGES - Systems and methods for identifying translation pairs from web pages are provided. One disclosed method includes receiving monolingual web page data of a source language, and processing the web page data by detecting the occurrence of a predefined pattern in the web page data, and extracting a plurality of translation pair candidates. Each of the translation pair candidates may include a source language string and target language string. The method may further include determining whether each translation pair candidate is a valid transliteration. The method may also include, for each translation pair that is determined not to be a valid transliteration, determining whether each translation pair candidate is a valid translation. The method may further include adding each translation pair that is determined to be a valid translation or transliteration to a dictionary.03-12-2009
20090106173LIMITED-MEMORY QUASI-NEWTON OPTIMIZATION ALGORITHM FOR L1-REGULARIZED OBJECTIVES - An algorithm that employs modified methods developed for optimizing differential functions but which can also handle the special non-differentiabilities that occur with the L04-23-2009
20090125501RANKER SELECTION FOR STATISTICAL NATURAL LANGUAGE PROCESSING - Systems and methods for selecting a ranker for statistical natural language processing are provided. One disclosed system includes a computer program configured to be executed on a computing device, the computer program comprising a data store including reference performance data for a plurality of candidate rankers, the reference performance data being calculated based on a processing of test data by each of the plurality of candidate rankers. The system may further include a ranker selector configured to receive a statistical natural language processing task and a performance target, and determine a selected ranker from the plurality of candidate rankers based on the statistical natural language processing task, the performance target, and the reference performance data.05-14-2009
20090240486HMM ALIGNMENT FOR COMBINING TRANSLATION SYSTEMS - A computing system configured to produce an optimized translation hypothesis of text input into the computing system. The computing system includes a plurality of translation machines. Each of the translation machines is configured to produce their own translation hypothesis from the same text. An optimization machine is connected to the plurality of translation machines. The optimization machine is configured to receive the translation hypotheses from the translation machines. The optimization machine is further configured to align, word-to-word, the hypotheses in the plurality of hypotheses by using a hidden Markov model.09-24-2009
20090276414RANKING MODEL ADAPTATION FOR SEARCHING - Search results provided by a search engine (e.g., for the Internet) are improved and/or made more accurate by addressing the limited availability of human labeled training data for certain domains (e.g., languages other than English, within certain date ranges, corresponding to queries over a certain length, etc.). More particularly, a ranking model trained on in-domain data, for which a small amount of human labeled training data (e.g., query/URL pairs) is available (e.g., languages other than English) is adjusted based upon out-domain data, for which a large amount of human labeled training data (e.g., query/URL pairs) is available (e.g., English). Thus, even though the resulting adapted in-domain ranking model is used in the context of in-domain data (e.g., non-English) to provide search results, the search results are improved because they are influenced by an abundance of, albeit out-domain, human labeled training data.11-05-2009
20090326916UNSUPERVISED CHINESE WORD SEGMENTATION FOR STATISTICAL MACHINE TRANSLATION - Described is using a generative model in processing an unsegmented sentence into a segmented sentence. A segmenter includes the generative model, which given an unsegmented sentence (e.g., in Chinese) provides candidate segmented sentences to a probability-based decoder that selects the segmented sentence. For example, the segmented (e.g., Chinese-language) sentence may be provided to a statistical machine translator that outputs a translated (e.g., English-language) sentence. The generative model may include a word sub-model that generates hidden words using a word model, a spelling sub-model that generates characters from the hidden words, and an alignment sub-model that generates translated words and alignment data from the characters. The word sub-model may correspond to a unigram model having words and associated frequency data therein, and the alignment sub-model may correspond to a word aligned corpus having source sentence, translated target sentence pairings therein. Training is also described.12-31-2009
20100082510TRAINING A SEARCH RESULT RANKER WITH AUTOMATICALLY-GENERATED SAMPLES - A search result ranker may be trained with automatically-generated samples. In an example embodiment, user interests are inferred from user interactions with search results for a particular query so as to determine respective relevance scores associated with respective query-identifier pairs of the search results. Query-identifier-relevance score triplets are formulated from the respective relevance scores associated with the respective query-identifier pairs. The query-identifier-relevance score triplets are submitted as training samples to a search result ranker. The search result ranker is trained as a learning machine with multiple training samples of the query-identifier-relevance score triplets.04-01-2010
20100082582COMBINING LOG-BASED RANKERS AND DOCUMENT-BASED RANKERS FOR SEARCHING - Log-based rankers and document-based rankers may be combined for searching. In an example embodiment, there is a method for combining rankers to perform a search operation. A count of query instances in log data is ascertained based on a query. A search for the query is performed to produce a set of search results. The set of search results is ranked by relevance score with a document-based ranker and a log-based ranker using a weighting factor that is adapted responsive to the count of the query instances in the log data.04-01-2010
20100318531SMOOTHING CLICKTHROUGH DATA FOR WEB SEARCH RANKING - Described is a technology for using clickthrough data (e.g., based on data of a query log) in learning a ranking model that may be used in online ranking of search results. Clickthrough data, which is typically sparse (because many documents are often not clicked or rarely clicked), is processed/smoothed into smoothed clickthrough streams. The processing includes determining similar queries for a document with incomplete (insufficient) clickthrough data to provide expanded clickthrough data for that document, and/or by estimating at least one clickthrough feature for a document when that document has missing (e.g., no) clickthrough data. Similar queries may be determined by random walk clustering and/or session-based query analysis. Features extracted from the clickthrough streams may be used to provide a ranking model which may then be used in online ranking of documents that are located with respect to a query.12-16-2010

Patent applications by Jianfeng Gao, Kirkland, WA US