Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Chin-Yew Lin, Beijing CN

Chin-Yew Lin, Beijing CN

Patent application number	Description	Published
20080249764	Smart Sentiment Classifier for Product Reviews - A sentiment classifier is described. In one implementation, a system applies both full text and complex feature analyses to sentences of a product review. Each analysis is weighted prior to linear combination into a final sentiment prediction. A full text model and a complex features model can be trained separately offline to support online full text analysis and complex features analysis. Complex features include opinion indicators, negation patterns, sentiment-specific sections of the product review, user ratings, sequence of text chunks, and sentence types and lengths. A Conditional Random Field (CRF) framework provides enhanced sentiment classification for each segment of a complex sentence to enhance sentiment prediction.	10-09-2008
20090253112	RECOMMENDING QUESTIONS TO USERS OF COMMUNITY QIESTION ANSWERING - The present system graphs topic terms in stored cQA questions and also converts a submitted question into a graph of topic terms. Topic terms that correspond to a question topic are delineated from topic terms that correspond to question focus. New questions are recommended to the user based on a comparison between the topics of the new questions and the topic of the submitted question as well as the focus of the new questions and the focus of the submitted question.	10-08-2009
20090259642	QUESTION TYPE-SENSITIVE ANSWER SUMMARIZATION - In a question answering system, the system identifies a type of question input by a user. The system then generates answer summaries that summarize answers to the input question in a format that is determined based on the type of question asked by the user. The answer summaries are output, in the corresponding format, in answer to the input question.	10-15-2009
20100030769	CLUSTERING QUESTION SEARCH RESULTS BASED ON TOPIC AND FOCUS - A method and system for presenting questions that are relevant to a queried question based on clusters of topics and clusters of focuses of the questions is provided. A question search system provides a collection of questions. Each question of the collection has an associated topic and focus. Upon receiving a queried question, the question search system identifies questions of the collection that may be relevant to the queried question and generates a score or ranking indicating relevance of the identified questions. The question search system clusters the identified questions into topic clusters of questions with similar topics. The question search system may also cluster the questions within each topic cluster into focus clusters of questions with similar focuses.	02-04-2010
20100030770	SEARCHING QUESTIONS BASED ON TOPIC AND FOCUS - A method and system for determining the relevance of questions to a queried question based on topics and focuses of the questions is provided. A question search system provides a collection of questions with topics and focuses. Upon receiving a queried question, the question search system identifies a queried topic and queried focus of the queried question. The question search system generates a score indicating the relevance of a question of the collection to the queried question based on a language model of the topic of the question and a language model of the focus of the question.	02-04-2010
20100049498	DETERMINING UTILITY OF A QUESTION - A question search system provides a collection of questions having words for use in evaluating the utility of the questions based on a language model. The question search system calculates n-gram probabilities for words within the questions of the collection. The n-gram probability of a word for a sequence of n−1 words indicates the probability of that word being next after that sequence in the collection of questions. The n-gram probabilities for the words of the collection represent the language model of the collection. The question search system calculates a language model utility score for each question within a collection that indicates the likelihood that a question is repeatedly asked by users. The question search system derives the language model utility score for a question from the n-gram probabilities of the words within that question.	02-25-2010
20100063797	DISCOVERING QUESTION AND ANSWER PAIRS - The present invention provides a new approach to extracting question-answer pairs from online forums. The system develops a classification-based technique to discover questions in forums using sequential patterns automatically extracted from both questions and non-question sentences in forums as features. Once the questions are discovered, the system discovers the answers. The invention includes a graph-based method is that it is complementary with supervised methods for knowledge extraction, and techniques for question answering.	03-11-2010
20100076978	SUMMARIZING ONLINE FORUMS INTO QUESTION-CONTEXT-ANSWER TRIPLES - In this paper, we propose a new approach to extracting question-context-answer triples from online discussion forums. More specifically, we propose a general framework based on Conditional Random Fields (CRFs) for context and answer detection, and also extend the basic framework to utilize contexts for answer detection and to better accommodate the features of forums.	03-25-2010
20100235311	QUESTION AND ANSWER SEARCH - Exemplary methods, computer-readable media, and systems are presented for leveraging question-answering knowledge from community sites by complementing product search services with a search of questions, answers, reviews and other Internet accessible content including user-generated content. Product or service information is obtained by crawling Internet-accessible Web sites including community sites. An integrated index of such information is generated. A user is able to browse questions by product or service feature, by topic, by identified comparative questions, and by question ranking (for example, interestingness or popularity).	09-16-2010
20100235343	Predicting Interestingness of Questions in Community Question Answering - Exemplary methods, computer-readable media, and systems are presented for learning to recommend questions and other user-generated submissions to community sites based on user ratings. The size of available training data is enlarged by taking into consideration questions without user ratings, which in turn benefits the learned model. Question or other user-generated submissions are obtained by crawling Internet-accessible Web sites including community sites. Questions and other submissions, even when not tagged, voted or indicated as “popular” or “interesting” by users are quantitatively indentified as “interesting.”	09-16-2010
20110302157	COMPARATIVE ENTITY MINING - One or more techniques and systems are disclosed for generating comparative patterns for use in identifying comparators. A set of comparator pairs is extracted from a first comparative pattern in a pattern database that comprises one or more comparative patterns. Questions are retrieved from a question collection using respective comparator pairs to generate comparative questions. Potential comparative patterns are generated from a combination of the comparator pairs and comparative questions, and the potential comparative patterns are evaluated by determining their reliability, in order to generate second comparative patterns for the pattern database.	12-08-2011
20120124077	Domain Constraint Based Data Record Extraction - Embodiments for a Mining Data Records based on Anchor Trees (MiBAT) process are disclosed. In accordance with at least one embodiment, the MiBAT process extracts data records containing user-generated content from web documents. The web document is processed into a Document Object Model (DOM) tree in which sub-trees of the DOM tree represent the data records of the web document. Domain constraints are used to locate structured portions of the DOM tree. Anchor trees are then located as being sets of sibling sub-trees which contain the domain constraints. The anchor trees are then used to determine a record boundary (i.e. the start offset and length) of the data records. Finally, the data records are extracted based on the anchor trees and the record boundaries.	05-17-2012
20120124086	Domain Constraint Path Based Data Record Extraction - Described herein are techniques for extracting data records containing user-generated content from documents. The documents may be processed into document trees in which sub-trees represent the data records of the document. Domain constraints may be used to locate structured portions of the document tree. For example, anchor trees may be located as being sets of sibling sub-trees with similar tag paths that contain the domain constraints. The anchor trees may then be used to determine a record boundary (e.g., the start offset and length) of the data records. Finally, the data records may be extracted based on the anchor trees and the record boundaries.	05-17-2012
20130097178	Question and Answer Forum Techniques - Techniques for unsupervised management of a question and answer (QA) forum include labeling of answers for quality purposes, and identification of experts. In a QA thread, a ranking of answers may include an initial labeling of the longest answer in each thread as the best answer. Such a labeling provides an initial point of reference. Then, in an iterative manner answerers are ranked using the labeling. The ranking of answerers allows selection of experts and poor or inexpert answerers. A label update is performed using the experts (and perhaps inexpert answerers) as input. The label update may be used to train a model, which may describe quality of answers in one or more QA threads and an indication of expert and inexpert answerers. The iterative process may be ended upon convergence or upon a maximum number of iterations.	04-18-2013
20130212103	RECORD LINKAGE BASED ON A TRAINED BLOCKING SCHEME - Some implementations disclosed herein provide techniques and arrangements to train a blocking scheme using both labeled data and unlabeled data. For example, training the blocking scheme may include iteratively: learning a conjunction, identifying first matches in the labeled data and the unlabeled data that are uncovered by the conjunction, and identifying second matches in the labeled data and the unlabeled data that are covered by the conjunction. The conjunction learned in each iteration may be combined using a disjunction. A search engine may use the search engine when searching for records that match an entity.	08-15-2013
20130262453	Estimating Thread Participant Expertise Using A Competition-Based Model - The subject disclosure is directed towards ranking participants in an online platform according to expertise. A competition-based metric is applied to question and answer threads in order to model each thread as a set of comparisons between various groups of the participants. After aggregating comparison results for the question and answer treads, one or more relative expertise scores may be estimated for each participant. Each relative expertise score may correspond to a specific category of questions.	10-03-2013
20140143223	Search Query User Interface - This disclosure describes, in part, techniques for operating a search query user interface to allow seamless creating, editing and/or refining of a search query using various interactive functions. The techniques described herein may display a search query divided into segments. A selection of a segment of the search query may then be received. One or more alternatives to the selected segment may then be presented. Next, a selection of one of the presented alternative may be received. As a result, the search query may be altered using the selected alternative. Furthermore, the techniques described herein allow a user to operate on a search query using query substitution, expansion, association and/or history functions.	05-22-2014

Patent applications by Chin-Yew Lin, Beijing CN