Patent application number | Description | Published |
20120066214 | Handling Data Sets - A method, system and computer program product provides a first characteristic associated with a first data set and a single data value, and a second characteristic associated with a second data set; and calculates at least one of: 1) the similarity of the first data set with the second data set based on the first and second characteristics, 2) the similarity of the first data set with the single data value based on the first characteristic and the single data value, 3) confidence indicating how well the first characteristic reflects properties of the first data set based on the first characteristic, and 4) confidence indicating how well the similarity of the first data set with the single data value reflects properties of the single data value based on the first characteristic and the single data value. | 03-15-2012 |
20120158625 | Creating and Processing a Data Rule - A data rule is created and processed by receiving an expression defining a logic of a rule and at least one logical variable, creating a rule definition including the expression and the at least one logical variable for binding each logical variable of the rule with at least one column, associating a characteristic enabling comparison of columns with a first logical variable of the rule definition, and storing the characteristic as part of the rule definition. | 06-21-2012 |
20130006931 | DATA QUALITY MONITORING - A computer implemented method, computer program product and system for data quality monitoring includes measuring a data quality of loaded data relative to a predefined data quality metric. The measuring the data quality includes identifying delta changes in at least one of the loaded data and the data quality rules relative to a previous measurement of the data quality of the loaded data. Logical calculus defined in the data quality rules is applied to the identified delta changes. | 01-03-2013 |
20130091094 | ACCELERATING DATA PROFILING PROCESS - A data profile request is handles by utilizing data in a distributed file system. Tabular data is extracted from a data source and stored in a distributed file system. Each table in the tabular data is split by columns, which are each stored in separate files in a set of physical nodes of the distributed file system. In response to a data profiling request, a master node determines, based on the profiling request, which groups of files are needed to be on a same physical node in order to perform the profiling analysis. The master node creates jobs using physical nodes that contain the requisite files needed for each job. | 04-11-2013 |
20140046927 | DETECTING MULTI-COLUMN COMPOSITE KEY COLUMN SETS - An aspect includes a computer-implemented method for detecting one or more multi-column composite key column sets. The method includes accessing a plurality of first columns, each first column representing a parameter, each first column including a set of distinct parameter values of its respective parameter, each distinct parameter value being stored in association with one or more object identifiers. Two or more of the first columns are selected for use as a current candidate column set, the current candidate column set including at least a first and a second candidate column, the current candidate column set being of a current cardinality. The method also includes determining, by comparing object-identifiers, whether for the current candidate column set at least one tuple of parameter values exists with parameter values respectively stored in association with two or more shared ones of the object identifiers to identify a multi-column composite key column set. | 02-13-2014 |
20140108639 | TRANSPARENTLY ENFORCING POLICIES IN HADOOP-STYLE PROCESSING INFRASTRUCTURES - Method, system, and computer program product to facilitate selection of data nodes configured to satisfy a set of requirements for processing client data in a distributed computing environment by providing, for each data node of a plurality of data nodes in the distributed computing environment, nodal data describing the respective data node of the plurality of data nodes, receiving a request to process the client data, the client data being identified in the request, retrieving the set of requirements for processing the client data, and analyzing the retrieved data policy and the nodal data describing at least one of the data nodes, to select a first data node of the plurality of data nodes as a delegation target, the first data node selected based on having a higher suitability level for satisfying the set of requirements than a second data node of the plurality of data nodes. | 04-17-2014 |
20140108648 | TRANSPARENTLY ENFORCING POLICIES IN HADOOP-STYLE PROCESSING INFRASTRUCTURES - Method, system, and computer program product to facilitate selection of data nodes configured to satisfy a set of requirements for processing client data in a distributed computing environment by providing, for each data node of a plurality of data nodes in the distributed computing environment, nodal data describing the respective data node of the plurality of data nodes, receiving a request to process the client data, the client data being identified in the request, retrieving the set of requirements for processing the client data, and analyzing the retrieved data policy and the nodal data describing at least one of the data nodes, to select a first data node of the plurality of data nodes as a delegation target, the first data node selected based on having a higher suitability level for satisfying the set of requirements than a second data node of the plurality of data nodes. | 04-17-2014 |
20150058280 | DATA QUALITY MONITORING - A computer implemented method, computer program product and system for data quality monitoring includes measuring a data quality of loaded data relative to a predefined data quality metric. The measuring the data quality includes identifying delta changes in at least one of the loaded data and the data quality rules relative to a previous measurement of the data quality of the loaded data. Logical calculus defined in the data quality rules is applied to the identified delta changes. | 02-26-2015 |