Patent application number | Description | Published |
20100205201 | User-Guided Regular Expression Learning - A method, device, and computer program product are provided for regular expression learning is provided. An initial regular expression may be received from a user. The initial regular expression is executed over a database. Positive matches and negative matches are labeled. The initial regular expression and the labeled positive and negative matches are input in a transformation process. The transformation process may iteratively execute character class restrictions, quantifier restrictions, negative lookaheads on the initial regular expression to transform the initial regular expression into the pool of candidate regular expressions. The transformation process may execute, one at a time, the character class restrictions, quantifier restrictions, the negative lookaheads. A candidate regular expression is selected from the pool of candidate regular expressions, where the selected candidate regular expression has a best F-Measure out of the pool of candidate regular expressions. | 08-12-2010 |
20110295853 | EXTENSIBLE SYSTEM AND METHOD FOR INFORMATION EXTRACTION IN A DATA PROCESSING SYSTEM - A data mashup system having information extraction capabilities for receiving multiple streams of textual data, at least one of which contains unstructured textual data. A repository stores annotators that describe how to analyze the streams of textual data for specified unstructured data components. The annotators are applied to the data streams to identify and extract the specified data components according to the annotators. The extracted data components are tagged to generate structured data components and the specified unstructured data components in the input data streams are replaced with the tagged data components. The system then combines the tagged data from the multiple streams to form a mashup output data stream. | 12-01-2011 |
20120209844 | EXTENSIBLE SYSTEM AND METHOD FOR INFORMATION EXTRACTION IN A DATA PROCESSING SYSTEM - A data mashup system having information extraction capabilities for receiving multiple streams of textual data, at least one of which contains unstructured textual data. A repository stores annotators that describe how to analyze the streams of textual data for specified unstructured data components. The annotators are applied to the data streams to identify and extract the specified data components according to the annotators. The extracted data components are tagged to generate structured data components and the specified unstructured data components in the input data streams are replaced with the tagged data components. The system then combines the tagged data from the multiple streams to form a mashup output data stream. | 08-16-2012 |
20120303661 | SYSTEMS AND METHODS FOR INFORMATION EXTRACTION USING CONTEXTUAL PATTERN DISCOVERY - Described herein are methods, systems, apparatuses and products for automatically discovering patterns in a text corpus. An aspect provides extracting at least one context string related to at least one annotator from the at least one text corpus; analyzing the at least one context string for at least one sequence, the at least one sequence comprised of at least one subsequence; determining at least one sequence signature for each at least one sequence by applying applicable rules to the at least one sequence; and grouping the at least one sequence signature into at least one group. | 11-29-2012 |
20130185304 | RULE-DRIVEN RUNTIME CUSTOMIZATION OF KEYWORD SEARCH ENGINES - Described herein are methods, systems, apparatuses and products for rule-driven runtime customization of keyword search engines. An aspect provides a method for rule-driven customization of keyword searches, including: receiving by a computer an input keyword query; determining from the input keyword query and a dataset to be queried at least one rule selected from the group consisting of: a re-write rule; a category ranking rule, and a category grouping rule; and applying the at least one rule to generate search results based on domain knowledge of the dataset. Other embodiments are disclosed. | 07-18-2013 |
20130185330 | RULE-DRIVEN RUNTIME CUSTOMIZATION OF KEYWORD SEARCH ENGINES - Described herein are methods, systems, apparatuses and products for rule-driven runtime customization of keyword search engines. An aspect provides a method for rule-driven customization of keyword searches, including: receiving by a computer an input keyword query; determining from the input keyword query and a dataset to be queried at least one rule selected from the group consisting of: a re-write rule; a category ranking rule, and a category grouping rule; and applying the at least one rule to generate search results based on domain knowledge of the dataset. Other embodiments are disclosed. | 07-18-2013 |
20130317806 | ENTITY VARIANT GENERATION AND NORMALIZATION - Determining variants of a text entity comprises parsing the text entity into semantic components and generating variants for each of the semantic components. The entity is recomposed in different morphological forms from the different variants of the semantic components. | 11-28-2013 |
20130317807 | ENTITY VARIANT GENERATION AND NORMALIZATION - Determining variants of a text entity comprises parsing the text entity into semantic components and generating variants for each of the semantic components. The entity is recomposed in different morphological forms from the different variants of the semantic components. | 11-28-2013 |
20130325831 | SEARCH QUALITY VIA QUERY PROVENANCE VISUALIZATION - Methods and arrangements for enhancing search quality. Query search results are displayed, and search query provenance related to the search results is graphically depicted. There is graphically accorded an investigative function to avail investigation of at least one aspect of the search query provenance. | 12-05-2013 |
20140129211 | SVO-BASED TAXONOMY-DRIVEN TEXT ANALYTICS - Organizing textual data into statement clusters. Sentences are extracted from textual data and parsed. A verb usage pattern is identified and an SVO triplet is determined. The SVO triplet is compared to a taxonomy associated with the domain of the data and a sentiment is derived. A statement cluster is constructed comprising a higher level SVO triplet sensitive to the taxonomy and verb usage pattern, as well as the derived sentiment. Accordingly, the statement clusters may be organized by grouping. | 05-08-2014 |
20140129213 | SVO-BASED TAXONOMY-DRIVEN TEXT ANALYTICS - Organizing textual data into statement clusters. Sentences are extracted from textual data and parsed. A verb usage pattern is identified and an SVO triplet is determined. The SVO triplet is compared to a taxonomy associated with the domain of the data and a sentiment is derived. A statement cluster is constructed comprising a higher level SVO triplet sensitive to the taxonomy and verb usage pattern, as well as the derived sentiment. Accordingly, the statement clusters may be organized by grouping. | 05-08-2014 |
20140143661 | BUILDING AND MAINTAINING INFORMATION EXTRACTION RULES - Methods and arrangements for managing development of information extraction rules. One or more documents are opened for extraction. An interface is provided to create a label and thereupon label a portion of the document. The created label is stored, and an extractor is developed based on the labeling. A test interface is provided for the extractor, and results of a test conducted through the test interface are displayed. The extractor is exported. In accordance with at least one embodiment, developers are presented with eased automated guidance to write extractors, which thereby reduces an overall manual effort involved in extractor development. Generally, a focused, tutorial-type environment serves as a guide based on previously developed best practices. | 05-22-2014 |
20140152667 | AUTOMATIC PRESENTATIONAL LEVEL COMPOSITIONS OF DATA VISUALIZATIONS - Embodiments of the invention provide for generating a data presentation artifact. In one aspect of the invention a first data presentation object and a second data presentation object are received from a repository. The first data presentation object defines a first data presentation artifact. The second data presentation object defines a second data presentation artifact. At least one mashup operation is identified that may be performed using the first data presentation object and the second data presentation object. One or more mashup operations are selected from the identified mashup operations. A third data presentation artifact is then generated by applying the selected mashup operations to the first and the second data presentation objects. | 06-05-2014 |
20140330804 | AUTOMATIC SUGGESTION FOR QUERY-REWRITE RULES - Embodiments of the invention relate to automatically suggesting query-rewrite rules. One embodiment includes providing a missing search result for a query. A collection of semantically coherent rewrite rules are generated based on the missing search result. Generating the missing search result includes: selecting candidates including subsequences of the query and subsequences of particular fields of a document, invoking a search engine using the candidates for providing search results, filtering out particular candidates that fail to achieve a desired search result, and classifying remaining candidates based on a learned classifier. Query rewrite rules for document searching are suggested based on the classified remaining candidates. | 11-06-2014 |
20150039290 | KNOWLEDGE-RICH AUTOMATIC TERM DISAMBIGUATION - Embodiments of the invention relate to ambiguity detection. In one embodiment, an object and a topical domain associated with the object are obtained. In this embodiment, the object includes at least one term. At least one of a plurality of information sources is analyzed based on the at least one term and the topical domain. A determination is made that object is one of ambiguous and unambiguous based on analyzing at least one of the plurality of information sources. | 02-05-2015 |