| Patent application number | Description | Published |
| 20080208856 | Classification-Based Method and Apparatus for String Selectivity Estimation - Histogram construction and selectivity estimation for string and substring match queries in databases of data having strings associated with attributes. The histogram construction counts string-attribute pairs in the documents, and outputs string-attribute-count triples sorted by count. The collection is partitions the collection into buckets. A synopsis is generated for the partition, having an average selectivity or count of the string-attribute-count triples in the partition and summary information representing the set of string-attribute pairs belonging to the bucket. Subsequent queries, both for exact and substring matches, use the synopsis to estimate the selectivity of buckets. | 08-28-2008 |
| 20080259084 | METHOD AND APPARATUS FOR ORGANIZING DATA SOURCES - A method and apparatus for organizing deep Web services are provided. In one aspect, the method and apparatus obtains a collection of sources and their associated attributes and/or input modes, for instance, using a crawling algorithm. The method and apparatus uses this information to organize the sources into communities. A mining algorithm such as the hyperclique mining algorithm is used to obtain cliques of highly correlated attributes. A clustering algorithm such as the hierarchical agglomerative clustering algorithm is used to further cluster the cliques of attributes into larger cliques, which in the present disclosure is referred to as signatures. The sources that are associated with each signature form a community and a graph representation of the communities is constructed, where the vertices are communities and the edges are the shared attributes. | 10-23-2008 |
| 20100082545 | COMPRESSION OF SORTED VALUE INDEXES USING COMMON PREFIXES - A method, information processing system, and computer program storage product for compressing sorted values is disclosed. At least a first prefix and a second prefix in a plurality of prefixes are compared. Each prefix comprises at least a portion of a plurality of sorted values. A respective prefix comprises a set of consecutive characters including at least a first character of a respective sorted value. The respective sorted value further comprising a respective suffix comprising consecutive characters of the respective sorted value that are after the respective prefix. At least a respective first character of the first prefix and a respective first character of the second prefix are determined to be substantially identical. The first prefix is merged with the second prefix into a single prefix comprising the first character. A set of suffixes associated with the first prefix is updated to reflect an association with the second prefix. | 04-01-2010 |
| 20100145986 | Querying Data and an Associated Ontology in a Database Management System - A method, apparatus, and computer program for querying data and an associated ontology in a database. An ontology is associated with data in database. Responsive to receiving a query from a requestor, relational data in the database is identified using the query to form identified relational data. Ontological knowledge in the ontology is identified using the identified relational data and the ontology. A result is returned to the requestor. | 06-10-2010 |
| 20100161930 | STATISTICS COLLECTION USING PATH-VALUE PAIRS FOR RELATIONAL DATABASES - A method, system, and computer readable medium for collecting statistics associated with data in a database are disclosed. The method comprises determining an amount of memory needed to collect statistics for data associated with a defined data type in a relational database. The defined data type is based upon a mark-up language using a tree structure with one or more root-to-node paths therein. The amount of memory is allocated as determined for collecting the statistics for the data of the defined data type. A statistics collection is performed for the data of the defined data type in a single pass through the database and within the amount of memory which has been allocated. The performing includes at least determining a total number of instances of at least one path-identifier associated with a given value within a given set of documents. | 06-24-2010 |