Patent application number | Description | Published |
20100312779 | ONTOLOGY-BASED SEARCHING IN DATABASE SYSTEMS - A method, information processing system, and computer program storage product retrieve data from a database. A search request is received from a user for a set of data in at least one database. An ontology query over is performed over at least one ontology associated with at least one database resulting in an ontological dataset associated with the search request in response to receiving the search request from the user. The ontological dataset includes at least one of a set of synonyms, a set of hypernyms, and a set of hyponyms, associated with the search request. A data query is performed over data in the at least one database using the ontological dataset in response to performing the ontology query. The set of data is returned to the user based on the data query that has been performed. | 12-09-2010 |
20110125730 | Optimizing Queries to Hierarchically Structured Data - Techniques are disclosed for optimizing queries to hierarchically structured data. For example, a method for processing a query directed to data having a hierarchical structure with a plurality of data nodes comprises the following steps. One or more structural attributes describing the hierarchical structure of the data are identified. The query is partitioned into two or more query partitions using at least one of the one or more identified structural attributes. A parallel execution plan is determined for the query by splitting into components one or more of: the query into at least two of the query partitions; and the hierarchical structure of the data. The split components are executed in parallel on different computer processes according to the parallel execution plan. | 05-26-2011 |
20120047114 | ENFORCING QUERY POLICIES OVER RESOURCE DESCRIPTION FRAMEWORK DATA - A method of performing a graph query issued by a user is provided. The method includes performing on a processor, receiving a user graph query; rewriting the user graph query as a new query based on a query policy expressed in a graph query language; and performing the new query on graph data to obtain a result. | 02-23-2012 |
20120047124 | DATABASE QUERY OPTIMIZATIONS - A method of processing a query is provided. The method includes performing on a processor: receiving a database query that includes a plurality of predicates that associate a subject with an object, where one or more of the predicates is a variable predicate; generating at least one new query by selectively replacing the at least one variable predicate in the database query with a non-variable predicate; and performing the at least one new database query on a database to obtain a query result. | 02-23-2012 |
20120197884 | CREATING BENCHMARK GRAPH DATA - According to an aspect of the present principles, a method is provided for generating resource description framework benchmarks. The method includes deriving a resultant benchmark dataset with a user specified size and a user specified coherence from and with respect to an input dataset of a given size and a given coherence by determining which triples of subject-property-object to add to the input dataset or remove from the input dataset to derive the resultant benchmark dataset. | 08-02-2012 |
20120246154 | AGGREGATING SEARCH RESULTS BASED ON ASSOCIATING DATA INSTANCES WITH KNOWLEDGE BASE ENTITIES - Methods and systems for aggregating search query results include receiving search query results and schema information for the query results from multiple heterogeneous sources, determining types for elements of the query results based on the schema information, determining potential aggregations for the query results based on the types, which are based on accumulated information from the plurality of heterogeneous resources, and aggregating the query results according to one or more of the potential aggregations. | 09-27-2012 |
20120246175 | ANNOTATING SCHEMA ELEMENTS BASED ON ASSOCIATING DATA INSTANCES WITH KNOWLEDGE BASE ENTITIES - Methods and systems for determining schema element types are shown that include pooling potential annotations for an element of an unlabeled schema from a plurality of heterogeneous sources, scoring the pool of potential annotations according to relevancy using information using instance information from the plurality of heterogeneous sources to produce a relevancy score, and annotating the element of the unlabeled schema using the most relevant potential annotations. | 09-27-2012 |
20120327087 | SUPPORTING RECURSIVE DYNAMIC PROVENANCE ANNOTATIONS OVER DATA GRAPHS - Systems and methods are provided for supporting dynamic provenance annotations over data graphs. A method includes receiving a plurality of dynamic graphs representing dynamic provenance data. The method further includes evaluating a provenance query over the plurality of dynamic graphs to obtain an answer to the provenance query. The method additionally includes providing the answer to the provenance query to a user, using at least a display device. | 12-27-2012 |
20130006984 | CREATING BENCHMARK GRAPH DATA - According to an aspect of the present principles, a method is provided for generating resource description framework benchmarks. The method includes deriving a resultant benchmark dataset with a user specified size and a user specified coherence from and with respect to an input dataset of a given size and a given coherence by determining which triples of subject-property-object to add to the input dataset or remove from the input dataset to derive the resultant benchmark dataset. | 01-03-2013 |
20130311507 | Representing Incomplete and Uncertain Information in Graph Data - A method for representing and querying incomplete and uncertain information in graph data receives a plurality of graphs containing subject nodes, object nodes and predicates extending between subject and object nodes. The subject nodes and predicates can be URIs or blank, and the object nodes can be URIs, literals or blank. Incomplete graph data sets are created by a variable into each blank subject node, each blank predicate and each blank object node, and uncertain graph data sets are created by substituting alternative values for all variables in the incomplete data graph. A query is received from a user and a naïve search of the graph data is performed for certain data. The incomplete and uncertain graphs are then used to determine potential answers and certain potential answers based on user-specified requirements. The certain answers and potential certain answers are returned to the user. | 11-21-2013 |
20130311517 | Representing Incomplete and Uncertain Information in Graph Data - A method for representing and querying incomplete and uncertain information in graph data receives a plurality of graphs containing subject nodes, object nodes and predicates extending between subject and object nodes. The subject nodes and predicates can be URIs or blank, and the object nodes can be URIs, literals or blank. Incomplete graph data sets are created by a variable into each blank subject node, each blank predicate and each blank object node, and uncertain graph data sets are created by substituting alternative values for all variables in the incomplete data graph. A query is received from a user and a naïve search of the graph data is performed for certain data. The incomplete and uncertain graphs are then used to determine potential answers and certain potential answers based on user-specified requirements. The certain answers and potential certain answers are returned to the user. | 11-21-2013 |
20130332466 | Linking Data Elements Based on Similarity Data Values and Semantic Annotations - Data elements from data sources and having a data value set are linked by using hash functions to determine a dimensionally reduced instance signature for each data element based on all data values associated with that data element to yield a plurality of dimensionally reduced instance signatures of equivalent fixed size such that similarities among the data values in the data value sets across all data elements is maintained among the plurality of instance signatures. Candidate pairs of data elements to link are identified using the plurality of instance signatures in locality sensitive hash functions, and a similarity index is generated for each candidate pair using a pre-determined measure of similarity. Candidate pairs of data elements having a similarity index above a given threshold are linked. | 12-12-2013 |
20130332467 | Linking Data Elements Based on Similarity Data Values and Semantic Annotations - Data elements from data sources and having a data value set are linked by using hash functions to determine a dimensionally reduced instance signature for each data element based on all data values associated with that data element to yield a plurality of dimensionally reduced instance signatures of equivalent fixed size such that similarities among the data values in the data value sets across all data elements is maintained among the plurality of instance signatures. Candidate pairs of data elements to link are identified using the plurality of instance signatures in locality sensitive hash functions, and a similarity index is generated for each candidate pair using a pre-determined measure of similarity. Candidate pairs of data elements having a similarity index above a given threshold are linked. | 12-12-2013 |
20130332478 | QUERYING AND INTEGRATING STRUCTURED AND INSTRUCTURED DATA - A computer-implemented method, system, and article of manufacture for querying and integrating structured and unstructured data. The method includes: receiving entity information that is extracted from a first set of unstructured data using an open domain information extraction system, wherein the entity information comprises relationship information between a first entity and a second entity of the first set of unstructured data; recognizing a pattern based on the relationship information and creating a schema for the first set of unstructured data based on the pattern; and associating an element of the created schema with (i) an entity of a second set of unstructured data or (ii) a schema element of an existing set of structured data if there is sufficient overall similarity between the created schema element and either the second unstructured data entity or the schema element of the existing structured data. | 12-12-2013 |
20140012884 | OPTIMIZING SPARSE SCHEMA-LESS DATA IN DATA STORES - Various embodiments of the invention relate to optimizing storage of schema-less data. At least one of a schema-less dataset including a plurality of resources one or more query workloads associated with the plurality of resources is received. Each resource is associated with at least a plurality of properties. At least one set of co-occurring properties from the plurality of properties is identified. A graph including a plurality of nodes is generated. Each of the nodes represents a unique property in the set of co-occurring properties. The graph further includes an edge connecting each node representing a pair of co-occurring properties. A schema is generated based on the graph that assigns a column identifier from a table to each unique property represented by one of the nodes in the graph. | 01-09-2014 |
20140074878 | SPREADSHEET SCHEMA EXTRACTION - Aspects of the present invention provide a tool for extracting schema from a spreadsheet. In an embodiment, a set of data that is stored in an uncataloged tabular format, such as a spreadsheet, is retrieved. The structure of the retrieved set of data is surveyed to determine the dataset schema thereof. Then, data elements within the dataset schema are analyzed to obtain information regarding the data elements. Based on dataset schema and the element information, an interface can be constructed that allows remote access to the set of data. | 03-13-2014 |
20140075278 | SPREADSHEET SCHEMA EXTRACTION - Aspects of the present invention provide a tool for extracting schema from a spreadsheet. In an embodiment, a set of data that is stored in an uncataloged tabular format, such as a spreadsheet, is retrieved. The structure of the retrieved set of data is surveyed to determine the dataset schema thereof. Then, data elements within the dataset schema are analyzed to obtain information regarding the data elements. Based on dataset schema and the element information, an interface can be constructed that allows remote access to the set of data. | 03-13-2014 |
20140143280 | Scalable Summarization of Data Graphs - Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently. | 05-22-2014 |
20140143281 | Scalable Summarization of Data Graphs - Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently. | 05-22-2014 |
20140156633 | Scalable Multi-Query Optimization for SPARQL - Multiquery optimization is performed in the context of RDF/SPARQL. Heuristic algorithms partition an input batch of queries into groups such that each group of queries can be optimized together. The optimization incorporates an efficient algorithm to discover the common sub-structures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. No assumptions are made about the underlying SPARQL query engine. This provides portability across different RDF stores. | 06-05-2014 |
20140304251 | Method and Apparatus for Optimizing the Evaluation of Semantic Web Queries - A semantic query over an RDF database is received with RDF database statistics and access methods for evaluating triple patterns in the query. The semantic query is expressed as a parse tree containing triple patterns and logical relationships among the triple patterns. The parse tree and access methods create a data flow graph containing a plurality of triple pattern and access method pair nodes connected by a plurality of edges, and an optimal flow tree through the data flow graph is determined such that costs are minimized and all triple patterns in the semantic query are contained in the optimal flow tree. A structure independent execution tree defining a sequence of evaluation through the optimal flow tree is created and is transformed into a database structure dependent query plan. This is used to create an SQL query that is used to evaluate the semantic query over the RDF database. | 10-09-2014 |
20150052134 | Method and Apparatus for Storing Sparse Graph Data as Multi-Dimensional Cluster - A system for storing graph data as a multi-dimensional cluster having a database with a graph dataset containing data and relationships between data pairs and a schema list of storage methods that use a table with columns and rows associated with data or relationships. An analyzer module to collect statistics of a graph dataset and a dimension identification module to identify a plurality of dimensions that each represent a column in the table. A schema creation and loading module creates a modified storage method and having a plurality of distinct table blocks and a plurality of table block indexes, one index for each table block and arranges the data and relationships in the given graph dataset in accordance with the modified storage method to create the multi-dimensional cluster. | 02-19-2015 |
20150052175 | Method and Apparatus for Identifying the Optimal Schema to Store Graph Data in a Relational Store - A system for identifying a schema for storing graph data includes a database containing a graph dataset of data and relationships between data pairs and a list of storage methods that each are a distinct structural arrangement of the data and relationships from the graph data set. An analyzer module collects statistics for the graph dataset, and a data classification module uses the collected statistics to calculate metrics describing the data and relationships in the graph dataset, uses the calculated metrics to group the data and relationships into a plurality of graph dataset subsets and. associates each graph dataset subset with one of the plurality of storage methods. The resulting group of storage methods associated with the plurality of graph dataset subsets includes a unique storage method for each graph dataset subset. The data and relationships in each graph dataset subset are arranged in accordance with associated storage methods. | 02-19-2015 |
20150149440 | SYSTEMS AND METHODS FOR FINDING OPTIMAL QUERY PLANS - Systems and methods for optimizing a query, and more particularly, systems and methods for finding optimal plans for graph queries by casting the task of finding the optimal plan as an integer programming (ILP) problem. A method for optimizing a query, comprises building a data structure for a query, the data structure including a plurality of components, wherein each of the plurality of components corresponds to at least one graph pattern, determining a plurality of flows of query variables between the plurality of components, and determining a combination of the plurality of flows between the plurality of components that results in a minimum cost to execute the query. | 05-28-2015 |
20150193478 | Method and Apparatus for Determining the Schema of a Graph Dataset - A schema for a dataset is identified by identifying a dataset comprising data and relationships between data pairs. An original schema is identified for the dataset. This original schema comprises an organizational structure. An initial fit between the dataset and the original schema is determined. The initial fit quantifying a conformity of the data in the dataset to the organizational structure of the original schema. A plurality of additional schemas are identified. Each additional schema is a distinct organizational schema. The dataset is partitioned into a plurality of subsets. Each subset comprises a modified fit quantifying a modified conformity of subset data in each subset to one of the original schema and the additional schemas. The modified fit is greater than the original fit. | 07-09-2015 |
20150213089 | Method and Apparatus for Optimizing the Evaluation of Semantic Web Queries - A semantic query over an RDF database is received with RDF database statistics and access methods for evaluating triple patterns in the query. The semantic query is expressed as a parse tree containing triple patterns and logical relationships among the triple patterns. The parse tree and access methods create a data flow graph containing a plurality of triple pattern and access method pair nodes connected by a plurality of edges, and an optimal flow tree through the data flow graph is determined such that costs are minimized and all triple patterns in the semantic query are contained in the optimal flow tree. A structure independent execution tree defining a sequence of evaluation through the optimal flow tree is created and is transformed into a database structure dependent query plan. This is used to create an SQL query that is used to evaluate the semantic query over the RDF database. | 07-30-2015 |