Ming Zhou, Beijing CN

Patent application number	Description	Published
20080208567	Web-based proofing and usage guidance - A system is disclosed for checking grammar and usage using a flexible portfolio of different mechanisms, and automatically providing a variety of different examples of standard usage, selected from analogous Web content. The system can be used for checking the grammar and usage in any application that involves natural language text, such as word processing, email, and presentation applications. The grammar and usage can be evaluated using several complementary evaluation modules, which may include one based on a trained classifier, one based on regular expressions, and one based on comparative searches of the Web or a local corpus. The evaluation modules can provide a set of suggested alternative segments with corrected grammar and usage. A followup, screened Web search based on the alternative segments, in context, may provide several different in-context examples of proper grammar and usage that the user can consider and select from.	08-28-2008
20080215543	Graph-based search leveraging sentiment analysis of user comments - A search system and method is provided. The method includes constructing a graph-based query that is indicative of a user's preference-levels for different features of a search item (a product, for example). The constructed graph-based query is executed by comparing the user's preference-levels for the different features of the product, which are graphically represented in the query, with information related to sentiments expressed by other users regarding the product. Information related to the sentiments expressed by other users regarding the product can include system-generated product performance graphs constructed from comments regarding the product obtained from the World Wide Web (or other network). Results returned and output upon execution of the graph-based query include system-generated product performance graphs that are similar to the user-submitted query.	09-04-2008
20080249764	Smart Sentiment Classifier for Product Reviews - A sentiment classifier is described. In one implementation, a system applies both full text and complex feature analyses to sentences of a product review. Each analysis is weighted prior to linear combination into a final sentiment prediction. A full text model and a complex features model can be trained separately offline to support online full text analysis and complex features analysis. Complex features include opinion indicators, negation patterns, sentiment-specific sections of the product review, user ratings, sequence of text chunks, and sentence types and lengths. A Conditional Random Field (CRF) framework provides enhanced sentiment classification for each segment of a complex sentence to enhance sentiment prediction.	10-09-2008
20090024613	CROSS-LINGUAL QUERY SUGGESTION - Cross-lingual query suggestions (CLQS) aims to suggest relevant queries in a target language for a given query in a source language. The cross-lingual query suggestion is improved by exploiting the query logs in the target language. The disclosed techniques include a method for learning and determining a similarity measure between two queries in different languages. The similarity measure is based on both translation information and monolingual similarity information, and in one embodiment uses both the query log itself and click-through information associated therewith. Monolingual and cross-lingual information such as word translation relations and word co-occurrence statistics may be used to estimate the cross-lingual query similarity with a discriminative model.	01-22-2009
20090083096	Handling product reviews - A method for handling product reviews can detect a first quality product review from a second quality product review. The first and second quality product reviews can be associated with a product. The first quality product review can be filtered. An opinion segment in the second quality product review can be identified and the polarity can be determined of the opinion segment. An opinion set can be generated with the opinion segment for a product feature. A score (or weighty can be aggregated of segments in the opinion set for the product feature.	03-26-2009
20090106015	Statistical machine translation processing - A method of statistical machine translation (SMT) is provided. The method comprises generating reordering knowledge based on the syntax of a source language (SL) and a number of alignment matrices that map sample SL sentences with sample target language (TL) sentences. The method further comprises receiving a SL word string and parsing the SL word string into a parse tree that represents the syntactic properties of the SL word string. The nodes on the parse tree are reordered based on the generated reordering knowledge in order to provide reordered word strings. The method further comprises translating a number of reordered word strings to create a number of TL word strings, and identifying a statistically preferred TL word string as a preferred translation of the SL word string.	04-23-2009
20090119090	Principled Approach to Paraphrasing - A principled approach to paraphrasing analyzes input text and paraphrases at atomic linguistic level, instead of analyzing the input text and paraphrases as a whole set at one time. The principled approach extracts atomic linguistic elements from the input text and identifies matching atomic paraphrasing elements to form candidate atomic paraphrasing pairs. A variety of atomic transformation types are identified to form atomic paraphrasing pairs. The candidate atomic paraphrasing pairs are evaluated using feature functions and a probability model. The principled approach scores a combination of multiple candidate atomic paraphrasing pairs using a score function which derives its value from the feature functions of the candidate atomic paraphrasing pairs. A combination which has a high score may be used for constructing a paraphrasing text.	05-07-2009
20090132530	WEB CONTENT MINING OF PAIR-BASED DATA - Described herein is technology for, among other things, mining pair-based data on the web. The technology involves an online pair-based data mining system as well as an offline SVM training system. By subjecting a pair-based input data to the systems, one may grow a pool of pair-based data which share characteristics of the pair-based input data in more efficient manner.	05-21-2009
20090157386	DIAGNOSTIC EVALUATION OF MACHINE TRANSLATORS - A system for evaluating translation quality of a machine translator is discussed. The system includes a bilingual data generator configured to intermittently access a wide area network and generate a bilingual corpus from data received from the wide area network. The method also includes an example extraction component configured to receive an ontology input indicative of a plurality of ontological categories of evaluation and to extract evaluation examples from the bilingual corpus based on the ontology input. The system further includes an evaluation component configured to evaluate translation results from translation by a machine translator of the evaluation examples and to score the translation results according to the ontological categories.	06-18-2009
20090222437	CROSS-LINGUAL SEARCH RE-RANKING - Cross-lingual search re-ranking is performed during a cross-lingual search in which a search query of a first language is used to retrieve two sets of documents, a first set in the first language, and a second set in a second language. The two sets of documents are each first ranked by the search engine separately. Cross-lingual search re-ranking then aims to provide a uniform re-ranking of both sets of documents combined. Cross-lingual search re-ranking uses a unified ranking function to compute the ranking order of each document of the first set and the second set of documents. The unified ranking function is constructed using generative probabilities based on multiple features, and can be learned by optimizing weight parameters using a training corpus. Ranking SVM algorithms may be used for the optimization.	09-03-2009
20100082511	JOINT RANKING MODEL FOR MULTILINGUAL WEB SEARCH - Described is a technology in which a classifier is built to rank documents of different languages found in a query based at least in part on similarity to other documents and the relevance of those other documents to the query. A joint ranking model, e.g., based upon a Boltzmann machine, is used to represent the content similarity among documents, and to help determine joint relevance probability for a set of documents. The relevant documents of one language are thus leveraged to improve the relevance estimation for documents of different languages. In one aspect, a hidden layer of units (neurons) represents clusters (corresponding to relevant topics) among the retrieved documents, with an output layer representing the relevant documents and their features, and edges representing a relationship between clusters and documents.	04-01-2010
20100114574	RETRIEVAL USING A GENERALIZED SENTENCE COLLOCATION - A method and system for identifying documents relevant to a query that specifies a part of speech is provided. A retrieval system receives from a user an input query that includes a word and a part of speech. Upon receiving an input query that includes a word and a part of speech, the retrieval system identifies documents with a sentence that includes that word collocated with a word that is used as that part of speech. The retrieval system displays to the user an indication of the identified documents.	05-06-2010
20100138211	ADAPTIVE WEB MINING OF BILINGUAL LEXICON - Embodiments for the adaptive mining of bilingual lexicon are disclosed. In accordance with one embodiment, the adaptive mining of bilingual lexicon includes retrieving one or more bilingual web pages, wherein each of the bilingual web page including a search term and one or more additional terms. The adaptive mining also includes forming a plurality of candidate translation pairs for each of the terms and extracting one or more translation layout patterns from the plurality of candidate translation pairs. The adaptive mining further includes deriving a term translation in a second language for the search term. The term translation being derived based on a hidden conditional random field (HCRF) model that includes the one or more candidate translations, the one or more translation layout patterns, and one or more additional features. The term translation is further stored in a lexicon repository.	06-03-2010
20100241416	ADAPTIVE PATTERN LEARNING FOR BILINGUAL DATA MINING - Embodiments for the adaptive learning of translation layout patterns to mine bilingual data are disclosed. In accordance with at least one embodiment, the adaptive learning of patterns to mine bilingual data includes processing a bilingual web page into a Document Object Model (DOM) tree. The embodiment further includes linking the bilingual snippet pairs of each node into a plurality bilingual snippet pairs. The embodiment also includes determining one or more best fit candidate patterns based on the plurality of translation snippets via a Support Vector Machine classifier. The embodiment additionally includes mining one or more translation pairs from the bilingual web page using the one or more best fit candidate patterns. The translation pairs are further stored in a data storage. The one or more translation pairs including at least one of a term pair, a phrase pair, or a sentence pair.	09-23-2010
20100286978	ALIGNING HIERARCHIAL AND SEQUENTIAL DOCUMENT TREES TO IDENTIFY PARALLEL DATA - A set of candidate parallel pages is identified based on trigger words in one or more pages downloaded from a given network location (such as a website). A set of document trees representing each of the candidate pages are aligned to identify translationally parallel content and hyperlinks. The parallel content is further fed into conventional sentence aligner for parallel sentences. And the parallel hyperlinks usually refer to other parallel documents, and lead to a recursive mining of parallel documents.	11-11-2010
20110213763	WEB CONTENT MINING OF PAIR-BASED DATA - Described herein is technology for, among other things, mining pair-based data on the web. The technology involves an online pair-based data mining system as well as an offline SVM training system. By subjecting a pair-based input data to the systems, one may grow a pool of pair-based data which share characteristics of the pair-based input data in more efficient manner.	09-01-2011
20110246173	Interactive Multilingual Word-Alignment Techniques - Techniques for interactively presenting word-alignments of multilingual translations and automatically improving those translations based upon user feedback are described herein. With one or more implementations of the techniques described herein, a word-alignment user-interface (UI) concurrently displays a pair of bilingual sentences, where one is a translation of the other, and interactively highlights linked (i.e., “word-aligned”) words and phrases of the pair. Other implementations of the techniques described herein offer an option for a user to provide feedback about the existing word-alignments or realign the words or phrases. In still other described implementations, word-alignment is automatically improved based upon that user feedback.	10-06-2011
20110257959	GENERATING CHINESE LANGUAGE BANNERS - Embodiments are disclosed for automatically generating a banner given a first scroll sentence and a second scroll sentence of a Chinese couplet. The first and/or second scroll sentence can be generated by an automatic computer system or by a human (e.g., manually generated and then provided as input to an automated banner generation system) or obtained from any source (e.g., a book) and provided as input. In one embodiment, an information retrieval process is utilized to identify banner candidates that best match the first and second scroll sentences. In one embodiment, candidate banners are automatically generated. In one embodiment, a ranking model is applied in order to rank banner candidates derived from the banner search and generation processes. One or more banners are then selected from the ranked banner candidates.	10-20-2011
20120022850	Statistical machine translation processing - A method of statistical machine translation (SMT) is provided. The method comprises generating reordering knowledge based on the syntax of a source language (SL) and a number of alignment matrices that map sample SL sentences with sample target language (TL) sentences. The method further comprises receiving a SL word string and parsing the SL word string into a parse tree that represents the syntactic properties of the SL word string. The nodes on the parse tree are reordered based on the generated reordering knowledge in order to provide reordered word strings. The method further comprises translating a number of reordered word strings to create a number of TL word strings, and identifying a statistically preferred TL word string as a preferred translation of the SL word string.	01-26-2012
20120297294	NETWORK SEARCH FOR WRITING ASSISTANCE - Architecture that utilizes web search implicitly to assist users in improving writing and associated productivity. The architecture extends the authoring experience of applications of office suite applications which can draw on a web search engine to offer contextual suggestions for revision, word auto-complete, and text prediction. Web-based research and reference to users is enabled as the user writes or revises text. Suggestions are made as to how to complete a phrase or sentence using data from networks such as the Internet or intranet, to how a user how revises a word or phrase in an already-written sentence using data from the network, and to problems in writing style/writing rules. Paragraph analysis is performed to find improper language usage or errors. Prediction and revision suggestions are extracted from web search or enterprise search document summaries, and intent of the user to obtain word completion, revision assistance, and prediction suggestions is identified.	11-22-2012
20130056856	SEMICONDUCTOR DEVICE CAPABLE OF REDUCING PLASMA INDUCED DAMAGE AND FABRICATION METHOD THEREOF - A method of fabricating a semiconductor device having reduced plasma-induced damage includes providing a p-type semiconductor substrate. The p-type semiconductor substrate has a front surface including the semiconductor device and a back surface. The method further includes doping the back surface with an n-type dopant to form an n-type semiconductor region before forming metal interconnections on the front surface. The n-type semiconductor region and the p-type semiconductor substrate form a pn junction. The method also includes forming an insulation layer on an exposed surface of the n-type semiconductor region.	03-07-2013
20130062214	METHOD FOR MANUFACTURING SEMICONDUCTOR DEVICE - A method for manufacturing semiconductor devices comprises: applying a dual pulse power to the semiconductor device during metal electroplating a part of the semiconductor device and applying ultrasonic energy to said semiconductor device during the metal electroplating.	03-14-2013
20130144600	ADAPTIVE PATTERN LEARNING FOR BILINGUAL DATA MINING - Embodiments for the adaptive learning of translation layout patterns to mine bilingual data are disclosed. In accordance with at least one embodiment, the adaptive learning of patterns to mine bilingual data includes processing a bilingual web page into a plurality bilingual snippet pairs. The embodiment also includes determining one or more best fit candidate patterns based on the plurality of translation snippets. The embodiment additionally includes mining one or more translation pairs from the bilingual web page using the one or more best fit candidate patterns. The translation pairs are further stored in a data storage. The one or more translation pairs including at least one of a term pair, a phrase pair, or a sentence pair.	06-06-2013
20130152000	SENTIMENT AWARE USER INTERFACE CUSTOMIZATION - The customization of an application user interface with a skin package based on context data that includes the emotional states of a user may strengthen the emotional attachment for the application by the user. The customization includes determining an emotional state of a user that is inputting content into an application. A skin package for the user interface of the application is selected based on the emotional state of the user. The selected skin package is further applied to the user interface of the application.	06-13-2013
20130159277	TARGET BASED INDEXING OF MICRO-BLOG CONTENT - Target based indexing of micro-blog content may include extracting, labeling, and indexing data contained in micro-blog entries. For example, by adapting natural language processing (NLP) technologies to a micro-blog entry, data is extracted in order to create an index. In one embodiment, a search engine may access the index in order to return results of a search query. In another embodiment, a user interface may display micro-blog entries categorically, allowing the user to access micro-blog entries by event, quote, opinion, or other category.	06-20-2013
20140006012	Learning-Based Processing of Natural Language Questions	01-02-2014

Patent applications by Ming Zhou, Beijing CN

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Ming Zhou, Beijing CN

Ming Zhou, Beijing CN