Kementsietsidis

Anastasios Kementsietsidis, Hawthorne, NY US

Patent application number	Description	Published
20110106821	Semantic-Aware Record Matching - A method of semantic-aware record matching includes receiving source and target string record specifications associated with a source string record and a target string record, receiving semantic knowledge referring to tokens of the source string record and target string record, creating a first set of tokens for the source string record and a second set of tokens for the target string record based on the semantic knowledge, assigning a similarity score to the source string record and the target string record based on a semantic relationship between the first set of tokens and the second set of tokens, and matching the source string record and the target string record based on the similarity score.	05-05-2011
20110106836	Semantic Link Discovery - A method of semantic link discovery through translation of basic declarative language includes receiving a set of linkage specifications, receiving a set of data sources related to the linkage specifications, the set of data sources and the set of linkage requirements forming a basic declarative language query, translating the basic declarative language query into a standard language query, executing the standard language query, and returning results of the standard language query in response to the executing.	05-05-2011

Anastasios Kementsietsidis, New York, NY US

Patent application number	Description	Published
20100312779	ONTOLOGY-BASED SEARCHING IN DATABASE SYSTEMS - A method, information processing system, and computer program storage product retrieve data from a database. A search request is received from a user for a set of data in at least one database. An ontology query over is performed over at least one ontology associated with at least one database resulting in an ontological dataset associated with the search request in response to receiving the search request from the user. The ontological dataset includes at least one of a set of synonyms, a set of hypernyms, and a set of hyponyms, associated with the search request. A data query is performed over data in the at least one database using the ontological dataset in response to performing the ontology query. The set of data is returned to the user based on the data query that has been performed.	12-09-2010
20110125730	Optimizing Queries to Hierarchically Structured Data - Techniques are disclosed for optimizing queries to hierarchically structured data. For example, a method for processing a query directed to data having a hierarchical structure with a plurality of data nodes comprises the following steps. One or more structural attributes describing the hierarchical structure of the data are identified. The query is partitioned into two or more query partitions using at least one of the one or more identified structural attributes. A parallel execution plan is determined for the query by splitting into components one or more of: the query into at least two of the query partitions; and the hierarchical structure of the data. The split components are executed in parallel on different computer processes according to the parallel execution plan.	05-26-2011
20120047114	ENFORCING QUERY POLICIES OVER RESOURCE DESCRIPTION FRAMEWORK DATA - A method of performing a graph query issued by a user is provided. The method includes performing on a processor, receiving a user graph query; rewriting the user graph query as a new query based on a query policy expressed in a graph query language; and performing the new query on graph data to obtain a result.	02-23-2012
20120047124	DATABASE QUERY OPTIMIZATIONS - A method of processing a query is provided. The method includes performing on a processor: receiving a database query that includes a plurality of predicates that associate a subject with an object, where one or more of the predicates is a variable predicate; generating at least one new query by selectively replacing the at least one variable predicate in the database query with a non-variable predicate; and performing the at least one new database query on a database to obtain a query result.	02-23-2012
20120197884	CREATING BENCHMARK GRAPH DATA - According to an aspect of the present principles, a method is provided for generating resource description framework benchmarks. The method includes deriving a resultant benchmark dataset with a user specified size and a user specified coherence from and with respect to an input dataset of a given size and a given coherence by determining which triples of subject-property-object to add to the input dataset or remove from the input dataset to derive the resultant benchmark dataset.	08-02-2012
20120246154	AGGREGATING SEARCH RESULTS BASED ON ASSOCIATING DATA INSTANCES WITH KNOWLEDGE BASE ENTITIES - Methods and systems for aggregating search query results include receiving search query results and schema information for the query results from multiple heterogeneous sources, determining types for elements of the query results based on the schema information, determining potential aggregations for the query results based on the types, which are based on accumulated information from the plurality of heterogeneous resources, and aggregating the query results according to one or more of the potential aggregations.	09-27-2012
20120246175	ANNOTATING SCHEMA ELEMENTS BASED ON ASSOCIATING DATA INSTANCES WITH KNOWLEDGE BASE ENTITIES - Methods and systems for determining schema element types are shown that include pooling potential annotations for an element of an unlabeled schema from a plurality of heterogeneous sources, scoring the pool of potential annotations according to relevancy using information using instance information from the plurality of heterogeneous sources to produce a relevancy score, and annotating the element of the unlabeled schema using the most relevant potential annotations.	09-27-2012
20120327087	SUPPORTING RECURSIVE DYNAMIC PROVENANCE ANNOTATIONS OVER DATA GRAPHS - Systems and methods are provided for supporting dynamic provenance annotations over data graphs. A method includes receiving a plurality of dynamic graphs representing dynamic provenance data. The method further includes evaluating a provenance query over the plurality of dynamic graphs to obtain an answer to the provenance query. The method additionally includes providing the answer to the provenance query to a user, using at least a display device.	12-27-2012
20130006984	CREATING BENCHMARK GRAPH DATA - According to an aspect of the present principles, a method is provided for generating resource description framework benchmarks. The method includes deriving a resultant benchmark dataset with a user specified size and a user specified coherence from and with respect to an input dataset of a given size and a given coherence by determining which triples of subject-property-object to add to the input dataset or remove from the input dataset to derive the resultant benchmark dataset.	01-03-2013
20130311507	Representing Incomplete and Uncertain Information in Graph Data - A method for representing and querying incomplete and uncertain information in graph data receives a plurality of graphs containing subject nodes, object nodes and predicates extending between subject and object nodes. The subject nodes and predicates can be URIs or blank, and the object nodes can be URIs, literals or blank. Incomplete graph data sets are created by a variable into each blank subject node, each blank predicate and each blank object node, and uncertain graph data sets are created by substituting alternative values for all variables in the incomplete data graph. A query is received from a user and a naïve search of the graph data is performed for certain data. The incomplete and uncertain graphs are then used to determine potential answers and certain potential answers based on user-specified requirements. The certain answers and potential certain answers are returned to the user.	11-21-2013
20130311517	Representing Incomplete and Uncertain Information in Graph Data - A method for representing and querying incomplete and uncertain information in graph data receives a plurality of graphs containing subject nodes, object nodes and predicates extending between subject and object nodes. The subject nodes and predicates can be URIs or blank, and the object nodes can be URIs, literals or blank. Incomplete graph data sets are created by a variable into each blank subject node, each blank predicate and each blank object node, and uncertain graph data sets are created by substituting alternative values for all variables in the incomplete data graph. A query is received from a user and a naïve search of the graph data is performed for certain data. The incomplete and uncertain graphs are then used to determine potential answers and certain potential answers based on user-specified requirements. The certain answers and potential certain answers are returned to the user.	11-21-2013
20130332466	Linking Data Elements Based on Similarity Data Values and Semantic Annotations - Data elements from data sources and having a data value set are linked by using hash functions to determine a dimensionally reduced instance signature for each data element based on all data values associated with that data element to yield a plurality of dimensionally reduced instance signatures of equivalent fixed size such that similarities among the data values in the data value sets across all data elements is maintained among the plurality of instance signatures. Candidate pairs of data elements to link are identified using the plurality of instance signatures in locality sensitive hash functions, and a similarity index is generated for each candidate pair using a pre-determined measure of similarity. Candidate pairs of data elements having a similarity index above a given threshold are linked.	12-12-2013
20130332467	Linking Data Elements Based on Similarity Data Values and Semantic Annotations - Data elements from data sources and having a data value set are linked by using hash functions to determine a dimensionally reduced instance signature for each data element based on all data values associated with that data element to yield a plurality of dimensionally reduced instance signatures of equivalent fixed size such that similarities among the data values in the data value sets across all data elements is maintained among the plurality of instance signatures. Candidate pairs of data elements to link are identified using the plurality of instance signatures in locality sensitive hash functions, and a similarity index is generated for each candidate pair using a pre-determined measure of similarity. Candidate pairs of data elements having a similarity index above a given threshold are linked.	12-12-2013
20130332478	QUERYING AND INTEGRATING STRUCTURED AND INSTRUCTURED DATA - A computer-implemented method, system, and article of manufacture for querying and integrating structured and unstructured data. The method includes: receiving entity information that is extracted from a first set of unstructured data using an open domain information extraction system, wherein the entity information comprises relationship information between a first entity and a second entity of the first set of unstructured data; recognizing a pattern based on the relationship information and creating a schema for the first set of unstructured data based on the pattern; and associating an element of the created schema with (i) an entity of a second set of unstructured data or (ii) a schema element of an existing set of structured data if there is sufficient overall similarity between the created schema element and either the second unstructured data entity or the schema element of the existing structured data.	12-12-2013
20140012884	OPTIMIZING SPARSE SCHEMA-LESS DATA IN DATA STORES - Various embodiments of the invention relate to optimizing storage of schema-less data. At least one of a schema-less dataset including a plurality of resources one or more query workloads associated with the plurality of resources is received. Each resource is associated with at least a plurality of properties. At least one set of co-occurring properties from the plurality of properties is identified. A graph including a plurality of nodes is generated. Each of the nodes represents a unique property in the set of co-occurring properties. The graph further includes an edge connecting each node representing a pair of co-occurring properties. A schema is generated based on the graph that assigns a column identifier from a table to each unique property represented by one of the nodes in the graph.	01-09-2014
20140074878	SPREADSHEET SCHEMA EXTRACTION - Aspects of the present invention provide a tool for extracting schema from a spreadsheet. In an embodiment, a set of data that is stored in an uncataloged tabular format, such as a spreadsheet, is retrieved. The structure of the retrieved set of data is surveyed to determine the dataset schema thereof. Then, data elements within the dataset schema are analyzed to obtain information regarding the data elements. Based on dataset schema and the element information, an interface can be constructed that allows remote access to the set of data.	03-13-2014
20140075278	SPREADSHEET SCHEMA EXTRACTION - Aspects of the present invention provide a tool for extracting schema from a spreadsheet. In an embodiment, a set of data that is stored in an uncataloged tabular format, such as a spreadsheet, is retrieved. The structure of the retrieved set of data is surveyed to determine the dataset schema thereof. Then, data elements within the dataset schema are analyzed to obtain information regarding the data elements. Based on dataset schema and the element information, an interface can be constructed that allows remote access to the set of data.	03-13-2014
20140143280	Scalable Summarization of Data Graphs - Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently.	05-22-2014
20140143281	Scalable Summarization of Data Graphs - Keyword searching is used to explore and search large Resource Description Framework datasets having unknown or constantly changing structures. A succinct and effective summarization is built from the underlying resource description framework data. Given a keyword query, the summarization lends significant pruning powers to exploratory keyword searches and leads to much better efficiency compared to previous work. The summarization returns exact results and can be updated incrementally and efficiently.	05-22-2014
20140156633	Scalable Multi-Query Optimization for SPARQL - Multiquery optimization is performed in the context of RDF/SPARQL. Heuristic algorithms partition an input batch of queries into groups such that each group of queries can be optimized together. The optimization incorporates an efficient algorithm to discover the common sub-structures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. No assumptions are made about the underlying SPARQL query engine. This provides portability across different RDF stores.	06-05-2014
20140304251	Method and Apparatus for Optimizing the Evaluation of Semantic Web Queries - A semantic query over an RDF database is received with RDF database statistics and access methods for evaluating triple patterns in the query. The semantic query is expressed as a parse tree containing triple patterns and logical relationships among the triple patterns. The parse tree and access methods create a data flow graph containing a plurality of triple pattern and access method pair nodes connected by a plurality of edges, and an optimal flow tree through the data flow graph is determined such that costs are minimized and all triple patterns in the semantic query are contained in the optimal flow tree. A structure independent execution tree defining a sequence of evaluation through the optimal flow tree is created and is transformed into a database structure dependent query plan. This is used to create an SQL query that is used to evaluate the semantic query over the RDF database.	10-09-2014
20150052134	Method and Apparatus for Storing Sparse Graph Data as Multi-Dimensional Cluster - A system for storing graph data as a multi-dimensional cluster having a database with a graph dataset containing data and relationships between data pairs and a schema list of storage methods that use a table with columns and rows associated with data or relationships. An analyzer module to collect statistics of a graph dataset and a dimension identification module to identify a plurality of dimensions that each represent a column in the table. A schema creation and loading module creates a modified storage method and having a plurality of distinct table blocks and a plurality of table block indexes, one index for each table block and arranges the data and relationships in the given graph dataset in accordance with the modified storage method to create the multi-dimensional cluster.	02-19-2015
20150052175	Method and Apparatus for Identifying the Optimal Schema to Store Graph Data in a Relational Store - A system for identifying a schema for storing graph data includes a database containing a graph dataset of data and relationships between data pairs and a list of storage methods that each are a distinct structural arrangement of the data and relationships from the graph data set. An analyzer module collects statistics for the graph dataset, and a data classification module uses the collected statistics to calculate metrics describing the data and relationships in the graph dataset, uses the calculated metrics to group the data and relationships into a plurality of graph dataset subsets and. associates each graph dataset subset with one of the plurality of storage methods. The resulting group of storage methods associated with the plurality of graph dataset subsets includes a unique storage method for each graph dataset subset. The data and relationships in each graph dataset subset are arranged in accordance with associated storage methods.	02-19-2015
20150149440	SYSTEMS AND METHODS FOR FINDING OPTIMAL QUERY PLANS - Systems and methods for optimizing a query, and more particularly, systems and methods for finding optimal plans for graph queries by casting the task of finding the optimal plan as an integer programming (ILP) problem. A method for optimizing a query, comprises building a data structure for a query, the data structure including a plurality of components, wherein each of the plurality of components corresponds to at least one graph pattern, determining a plurality of flows of query variables between the plurality of components, and determining a combination of the plurality of flows between the plurality of components that results in a minimum cost to execute the query.	05-28-2015
20150193478	Method and Apparatus for Determining the Schema of a Graph Dataset - A schema for a dataset is identified by identifying a dataset comprising data and relationships between data pairs. An original schema is identified for the dataset. This original schema comprises an organizational structure. An initial fit between the dataset and the original schema is determined. The initial fit quantifying a conformity of the data in the dataset to the organizational structure of the original schema. A plurality of additional schemas are identified. Each additional schema is a distinct organizational schema. The dataset is partitioned into a plurality of subsets. Each subset comprises a modified fit quantifying a modified conformity of subset data in each subset to one of the original schema and the additional schemas. The modified fit is greater than the original fit.	07-09-2015
20150213089	Method and Apparatus for Optimizing the Evaluation of Semantic Web Queries - A semantic query over an RDF database is received with RDF database statistics and access methods for evaluating triple patterns in the query. The semantic query is expressed as a parse tree containing triple patterns and logical relationships among the triple patterns. The parse tree and access methods create a data flow graph containing a plurality of triple pattern and access method pair nodes connected by a plurality of edges, and an optimal flow tree through the data flow graph is determined such that costs are minimized and all triple patterns in the semantic query are contained in the optimal flow tree. A structure independent execution tree defining a sequence of evaluation through the optimal flow tree is created and is transformed into a database structure dependent query plan. This is used to create an SQL query that is used to evaluate the semantic query over the RDF database.	07-30-2015

Patent applications by Anastasios Kementsietsidis, New York, NY US

Anastasios Kementsietsidis, White Plains, NY US

Patent application number	Description	Published
20100299339	INDEXING PROVENANCE DATA AND EVALUATING PROVENANCE DATA QUERIES IN DATA PROCESSING SYSTEMS - Techniques for indexing provenance data and evaluating provenance data queries are disclosed. For example, a method for processing one or more queries directed toward data associated with a data processing system comprises the following steps. One or more data items of a first data set associated with the data processing system are mapped to a first representation type and one or more data items of a second data set associated with the data processing system are mapped to a second representation type. A bi-directional index of a data provenance relation existing between the data items of the first data set and the data items of the second data set is computed. The bi-directional index is computed in terms of the first representation type and the second representation type. A query evaluation is performed using the bi-directional index, in response to receipt of a provenance query. The bi-directional index is used for query evaluation whether the received provenance query is a backward provenance query or a forward provenance query. A response is generated for the received provenance query based on a result of the query evaluation. In one further example, the provenance query evaluation step may be performed by using only the bi-directional index and does not require access to base data or maintaining stored provenance data.	11-25-2010

Anastasios Kementsietsidis, Edinburgh GB

Patent application number	Description	Published
20090006302	Methods and Apparatus for Capturing and Detecting Inconsistencies in Relational Data Using Conditional Functional Dependencies - Methods and apparatus are provided for detecting data inconsistencies. Methods are disclosed for determining whether a set of conditional functional dependencies are consistent; determining a minimal cover of a set of conditional functional dependencies and detecting a violation of one or more conditional functional dependencies in a set of conditional functional dependencies. The conditional functional dependencies comprise one or more constraints that data in a database must satisfy including at least one pattern with data values.	01-01-2009
20090006316	Methods and Apparatus for Rewriting Regular XPath Queries on XML Views - Methods and apparatus are provided for rewriting view queries into equivalent queries on the source document. According to one aspect of the invention, methods are provided for processing a view query on a database view. The method comprises the steps of translating the view query to a mixed finite state automata representation of a document query on one or more documents underlying the database view; and evaluating the document query on the one or more documents to obtain a result to the view query. The view query may be, for example, a regular XPath query.	01-01-2009
20090006329	Methods and Apparatus for Evaluating XPath Filters on Fragmented and Distributed XML Documents - Methods and apparatus are provided for evaluating XPath filters on fragmented and distributed XML documents. According to one aspect of the invention, a method is disclosed for evaluating a query over a tree having a plurality of fragments distributed over a plurality of sites. The method comprises the steps of identifying the plurality of sites storing at least one of the plurality of fragments of the tree;providing the query to the plurality of identified sites, wherein each of the identified sites partially evaluates the query against one of more fragments of the tree stored by the respective site; obtaining partial results from the plurality of identified sites; and composing the partial results to compute a result to the query. The query may be, for example, a boolean XPath query The method can be performed, for example, by a coordinating site that stores a root fragment of the tree.	01-01-2009

Anastasios Kementsietsidis, Mountain View, CA US

Patent application number	Description	Published
20160044038	PREVENTION OF QUERY OVERLOADING IN A SERVER APPLICATION - A system for processing a transaction request. A transaction request is received by a server computer that includes a user identification and an associated request token, to execute on an application on the server computer. It is determined by the server computer whether available resources on the server computer to perform the transaction are below respective threshold values. In response to determining, by the server computer, that the available resources on the server computer to perform the transaction are below the threshold value, the server computer determines whether the user identification is allowed access to the application. In response to determining that the user identification is allowed access to the application, the transaction is executed on the server computer. In response to determining that the user identification is not allowed access to the application, the transaction is rejected.	02-11-2016

Anastasios Kementsietsidis, Toronto CA

Patent application number	Description	Published
20160055184	DATA VIRTUALIZATION ACROSS HETEROGENEOUS FORMATS - Various embodiments virtualize data across heterogeneous formats. In one embodiment, a plurality of heterogeneous data sources is received as input. A local schema graph including a set of attribute nodes and a set of type nodes is generated for each of the plurality of heterogeneous data sources. A global schema graph is generated based on each local schema graph that has been generated. The global schema graph comprises each of the local schema graphs and at least one edge between at least one of two or more attributes nodes and two or more type nodes from different local schema graphs. The edge indicates a relationship between the data sources represented by the different local schema graphs comprising the two or more attributes nodes based on a computed similarity between at least one value associated with each of the two or more attributes nodes.	02-25-2016

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Kementsietsidis

Anastasios Kementsietsidis, Hawthorne, NY US

Anastasios Kementsietsidis, New York, NY US

Anastasios Kementsietsidis, White Plains, NY US

Anastasios Kementsietsidis, Edinburgh GB

Anastasios Kementsietsidis, Mountain View, CA US

Anastasios Kementsietsidis, Toronto CA