Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Yunyao Li, San Jose US

Yunyao Li, San Jose, CA US

Patent application number	Description	Published
20100205201	User-Guided Regular Expression Learning - A method, device, and computer program product are provided for regular expression learning is provided. An initial regular expression may be received from a user. The initial regular expression is executed over a database. Positive matches and negative matches are labeled. The initial regular expression and the labeled positive and negative matches are input in a transformation process. The transformation process may iteratively execute character class restrictions, quantifier restrictions, negative lookaheads on the initial regular expression to transform the initial regular expression into the pool of candidate regular expressions. The transformation process may execute, one at a time, the character class restrictions, quantifier restrictions, the negative lookaheads. A candidate regular expression is selected from the pool of candidate regular expressions, where the selected candidate regular expression has a best F-Measure out of the pool of candidate regular expressions.	08-12-2010
20110295853	EXTENSIBLE SYSTEM AND METHOD FOR INFORMATION EXTRACTION IN A DATA PROCESSING SYSTEM - A data mashup system having information extraction capabilities for receiving multiple streams of textual data, at least one of which contains unstructured textual data. A repository stores annotators that describe how to analyze the streams of textual data for specified unstructured data components. The annotators are applied to the data streams to identify and extract the specified data components according to the annotators. The extracted data components are tagged to generate structured data components and the specified unstructured data components in the input data streams are replaced with the tagged data components. The system then combines the tagged data from the multiple streams to form a mashup output data stream.	12-01-2011
20120209844	EXTENSIBLE SYSTEM AND METHOD FOR INFORMATION EXTRACTION IN A DATA PROCESSING SYSTEM - A data mashup system having information extraction capabilities for receiving multiple streams of textual data, at least one of which contains unstructured textual data. A repository stores annotators that describe how to analyze the streams of textual data for specified unstructured data components. The annotators are applied to the data streams to identify and extract the specified data components according to the annotators. The extracted data components are tagged to generate structured data components and the specified unstructured data components in the input data streams are replaced with the tagged data components. The system then combines the tagged data from the multiple streams to form a mashup output data stream.	08-16-2012
20120303661	SYSTEMS AND METHODS FOR INFORMATION EXTRACTION USING CONTEXTUAL PATTERN DISCOVERY - Described herein are methods, systems, apparatuses and products for automatically discovering patterns in a text corpus. An aspect provides extracting at least one context string related to at least one annotator from the at least one text corpus; analyzing the at least one context string for at least one sequence, the at least one sequence comprised of at least one subsequence; determining at least one sequence signature for each at least one sequence by applying applicable rules to the at least one sequence; and grouping the at least one sequence signature into at least one group.	11-29-2012
20130185304	RULE-DRIVEN RUNTIME CUSTOMIZATION OF KEYWORD SEARCH ENGINES - Described herein are methods, systems, apparatuses and products for rule-driven runtime customization of keyword search engines. An aspect provides a method for rule-driven customization of keyword searches, including: receiving by a computer an input keyword query; determining from the input keyword query and a dataset to be queried at least one rule selected from the group consisting of: a re-write rule; a category ranking rule, and a category grouping rule; and applying the at least one rule to generate search results based on domain knowledge of the dataset. Other embodiments are disclosed.	07-18-2013
20130185330	RULE-DRIVEN RUNTIME CUSTOMIZATION OF KEYWORD SEARCH ENGINES - Described herein are methods, systems, apparatuses and products for rule-driven runtime customization of keyword search engines. An aspect provides a method for rule-driven customization of keyword searches, including: receiving by a computer an input keyword query; determining from the input keyword query and a dataset to be queried at least one rule selected from the group consisting of: a re-write rule; a category ranking rule, and a category grouping rule; and applying the at least one rule to generate search results based on domain knowledge of the dataset. Other embodiments are disclosed.	07-18-2013
20130317806	ENTITY VARIANT GENERATION AND NORMALIZATION - Determining variants of a text entity comprises parsing the text entity into semantic components and generating variants for each of the semantic components. The entity is recomposed in different morphological forms from the different variants of the semantic components.	11-28-2013
20130317807	ENTITY VARIANT GENERATION AND NORMALIZATION - Determining variants of a text entity comprises parsing the text entity into semantic components and generating variants for each of the semantic components. The entity is recomposed in different morphological forms from the different variants of the semantic components.	11-28-2013
20130325831	SEARCH QUALITY VIA QUERY PROVENANCE VISUALIZATION - Methods and arrangements for enhancing search quality. Query search results are displayed, and search query provenance related to the search results is graphically depicted. There is graphically accorded an investigative function to avail investigation of at least one aspect of the search query provenance.	12-05-2013
20140129211	SVO-BASED TAXONOMY-DRIVEN TEXT ANALYTICS - Organizing textual data into statement clusters. Sentences are extracted from textual data and parsed. A verb usage pattern is identified and an SVO triplet is determined. The SVO triplet is compared to a taxonomy associated with the domain of the data and a sentiment is derived. A statement cluster is constructed comprising a higher level SVO triplet sensitive to the taxonomy and verb usage pattern, as well as the derived sentiment. Accordingly, the statement clusters may be organized by grouping.	05-08-2014
20140129213	SVO-BASED TAXONOMY-DRIVEN TEXT ANALYTICS - Organizing textual data into statement clusters. Sentences are extracted from textual data and parsed. A verb usage pattern is identified and an SVO triplet is determined. The SVO triplet is compared to a taxonomy associated with the domain of the data and a sentiment is derived. A statement cluster is constructed comprising a higher level SVO triplet sensitive to the taxonomy and verb usage pattern, as well as the derived sentiment. Accordingly, the statement clusters may be organized by grouping.	05-08-2014
20140143661	BUILDING AND MAINTAINING INFORMATION EXTRACTION RULES - Methods and arrangements for managing development of information extraction rules. One or more documents are opened for extraction. An interface is provided to create a label and thereupon label a portion of the document. The created label is stored, and an extractor is developed based on the labeling. A test interface is provided for the extractor, and results of a test conducted through the test interface are displayed. The extractor is exported. In accordance with at least one embodiment, developers are presented with eased automated guidance to write extractors, which thereby reduces an overall manual effort involved in extractor development. Generally, a focused, tutorial-type environment serves as a guide based on previously developed best practices.	05-22-2014
20140152667	AUTOMATIC PRESENTATIONAL LEVEL COMPOSITIONS OF DATA VISUALIZATIONS - Embodiments of the invention provide for generating a data presentation artifact. In one aspect of the invention a first data presentation object and a second data presentation object are received from a repository. The first data presentation object defines a first data presentation artifact. The second data presentation object defines a second data presentation artifact. At least one mashup operation is identified that may be performed using the first data presentation object and the second data presentation object. One or more mashup operations are selected from the identified mashup operations. A third data presentation artifact is then generated by applying the selected mashup operations to the first and the second data presentation objects.	06-05-2014
20140330804	AUTOMATIC SUGGESTION FOR QUERY-REWRITE RULES - Embodiments of the invention relate to automatically suggesting query-rewrite rules. One embodiment includes providing a missing search result for a query. A collection of semantically coherent rewrite rules are generated based on the missing search result. Generating the missing search result includes: selecting candidates including subsequences of the query and subsequences of particular fields of a document, invoking a search engine using the candidates for providing search results, filtering out particular candidates that fail to achieve a desired search result, and classifying remaining candidates based on a learned classifier. Query rewrite rules for document searching are suggested based on the classified remaining candidates.	11-06-2014
20150039290	KNOWLEDGE-RICH AUTOMATIC TERM DISAMBIGUATION - Embodiments of the invention relate to ambiguity detection. In one embodiment, an object and a topical domain associated with the object are obtained. In this embodiment, the object includes at least one term. At least one of a plurality of information sources is analyzed based on the at least one term and the topical domain. A determination is made that object is one of ambiguous and unambiguous based on analyzing at least one of the plurality of information sources.	02-05-2015

Patent applications by Yunyao Li, San Jose, CA US