Patent application number | Description | Published |
20130282765 | OPTIMIZING SPARSE SCHEMA-LESS DATA IN RELATIONAL STORES - Various embodiments of the invention relate to optimizing storage of schema-less data. A schema-less dataset including a plurality of resources is received. Each resource is associated with at least a plurality of properties. At least one set of co-occurring properties from the plurality of properties is identified. A graph including a plurality of nodes is generated. Each of the nodes represents a unique property in the set of co-occurring properties. The graph further includes an edge connecting each node representing a pair of co-occurring properties. A graph coloring operation is performed on the graph. The graph coloring operation includes assigning each of nodes to a color, where nodes connected by an edge are assigned different colors. A schema is generated that assigns a column identifier from a table to each unique property represented by one of the nodes in the graph based on the color assigned to the node. | 10-24-2013 |
20130332478 | QUERYING AND INTEGRATING STRUCTURED AND INSTRUCTURED DATA - A computer-implemented method, system, and article of manufacture for querying and integrating structured and unstructured data. The method includes: receiving entity information that is extracted from a first set of unstructured data using an open domain information extraction system, wherein the entity information comprises relationship information between a first entity and a second entity of the first set of unstructured data; recognizing a pattern based on the relationship information and creating a schema for the first set of unstructured data based on the pattern; and associating an element of the created schema with (i) an entity of a second set of unstructured data or (ii) a schema element of an existing set of structured data if there is sufficient overall similarity between the created schema element and either the second unstructured data entity or the schema element of the existing structured data. | 12-12-2013 |
20140012884 | OPTIMIZING SPARSE SCHEMA-LESS DATA IN DATA STORES - Various embodiments of the invention relate to optimizing storage of schema-less data. At least one of a schema-less dataset including a plurality of resources one or more query workloads associated with the plurality of resources is received. Each resource is associated with at least a plurality of properties. At least one set of co-occurring properties from the plurality of properties is identified. A graph including a plurality of nodes is generated. Each of the nodes represents a unique property in the set of co-occurring properties. The graph further includes an edge connecting each node representing a pair of co-occurring properties. A schema is generated based on the graph that assigns a column identifier from a table to each unique property represented by one of the nodes in the graph. | 01-09-2014 |
20140304251 | Method and Apparatus for Optimizing the Evaluation of Semantic Web Queries - A semantic query over an RDF database is received with RDF database statistics and access methods for evaluating triple patterns in the query. The semantic query is expressed as a parse tree containing triple patterns and logical relationships among the triple patterns. The parse tree and access methods create a data flow graph containing a plurality of triple pattern and access method pair nodes connected by a plurality of edges, and an optimal flow tree through the data flow graph is determined such that costs are minimized and all triple patterns in the semantic query are contained in the optimal flow tree. A structure independent execution tree defining a sequence of evaluation through the optimal flow tree is created and is transformed into a database structure dependent query plan. This is used to create an SQL query that is used to evaluate the semantic query over the RDF database. | 10-09-2014 |
20150052134 | Method and Apparatus for Storing Sparse Graph Data as Multi-Dimensional Cluster - A system for storing graph data as a multi-dimensional cluster having a database with a graph dataset containing data and relationships between data pairs and a schema list of storage methods that use a table with columns and rows associated with data or relationships. An analyzer module to collect statistics of a graph dataset and a dimension identification module to identify a plurality of dimensions that each represent a column in the table. A schema creation and loading module creates a modified storage method and having a plurality of distinct table blocks and a plurality of table block indexes, one index for each table block and arranges the data and relationships in the given graph dataset in accordance with the modified storage method to create the multi-dimensional cluster. | 02-19-2015 |
20150052175 | Method and Apparatus for Identifying the Optimal Schema to Store Graph Data in a Relational Store - A system for identifying a schema for storing graph data includes a database containing a graph dataset of data and relationships between data pairs and a list of storage methods that each are a distinct structural arrangement of the data and relationships from the graph data set. An analyzer module collects statistics for the graph dataset, and a data classification module uses the collected statistics to calculate metrics describing the data and relationships in the graph dataset, uses the calculated metrics to group the data and relationships into a plurality of graph dataset subsets and. associates each graph dataset subset with one of the plurality of storage methods. The resulting group of storage methods associated with the plurality of graph dataset subsets includes a unique storage method for each graph dataset subset. The data and relationships in each graph dataset subset are arranged in accordance with associated storage methods. | 02-19-2015 |