Patent application number | Description | Published |
20080208855 | METHOD FOR MAPPING A DATA SOURCE TO A DATA TARGET - The invention relates to a method for mapping at least one data column from a database source to at least one data column of a data target, the method comprising: defining at least one reference column of the data target and at least one database source column; performing a comparison of data contained in the data column(s) with the reference column(s); and determining mapping candidates between the data column(s) and the reference column(s). | 08-28-2008 |
20090006282 | USING A DATA MINING ALGORITHM TO GENERATE RULES USED TO VALIDATE A SELECTED REGION OF A PREDICTED COLUMN - Provided are an article of manufacture, system, and method for using a data mining algorithm to generate rules used to validate a selected region of a predicted column. A data set has a plurality of columns and records providing data for each of the columns. Selection is received of at least one predicted column for which rules are to be generated and at least one region of the selected at least one predicted column, wherein each region specifies data positions in the column. The data set is processed to determine association relationships among data in at least one predictor column and subsequences in the selected at least one region of the at least one predicted column. At least one rule is generated from the relationships specifying a condition involving at least one predictor column that predicts at least one value in the selected region of the at least one predicted column. | 01-01-2009 |
20090006283 | USING A DATA MINING ALGORITHM TO GENERATE FORMAT RULES USED TO VALIDATE DATA SETS - Provided are a method, system, and article of manufacture for using a data mining algorithm to generate format rules used to validate data sets. A data set has a plurality of columns and records providing data for each of the columns. Selection is received of at least one format column for which format rules are to be generated and selection is received of at least one predictor column. A format mask column is generated for each selected format column. For records in the data set, a value in the at least one format column is converted to a format mask representing a format of the value in the format column and storing the format mask in the format mask column in the record for which the format mask was generated. The at least one predictor column and the at least one format mask column are processed to generate at least one format rule. Each format rule specifies a format mask associated with at least one condition in the at least one predictor column. | 01-01-2009 |
20090024551 | MANAGING VALIDATION MODELS AND RULES TO APPLY TO DATA SETS - Provided are a method, system, and article of manufacture for managing validation models and rules to apply to data sets. A schema definition describing a structure of at least one column in a first data set having a plurality of columns and records providing data for each of the columns is received. At least one model is generated, wherein each model asserts conditions for at least one column in a record of the first data set. The schema definition and the at least one model are stored in a data quality model. Selection is received of a second data set and the data quality model. A determination is made as to whether a structure of the second data set is compatible with the schema definition in the selected data quality model. Each model in the data quality model is applied to the records in the second data set to validate the records in the second data set in response to determining that the structure of the second data set and the schema definition are compatible. | 01-22-2009 |
20090125579 | METHOD AND SYSTEM FOR IMPROVING CLIENT-SERVLET COMMUNICATION - The present invention provides a method and system for improving the client-Servlet communication in the World-Wide-Web (Web) without changing the existing communication protocol, and without changing the client. The existing prior art one-way communication path between client and Servlet remains unchanged if the initial request includes all information for retrieving the desired information. However in the case the Servlet identifies missing information not included in the initial client's web-browser request for retrieving the information desired, the Servlet automatically opens an another communication path for providing the missing information to the Servlet by making use of the HTTP-response functionality of said initial HTTP-request, wherein the another communication path is supported by a further Servlet functionality component and is characterized by the steps of: generating a script—when executed at the client's web-browser retrieves the missing information and invokes the further Servlet functionality component—by the Servlet, appending the script to said HTTP-response indicating it as a partial response, sending the HTTP-response including the script to the client's web-browser, suspending execution of the initial HTTP-response by the Servlet until the missing information will be available, receiving the missing information by the further Servlet functionality component, wherein the missing information is contained in a new HTTP-request created by the script during its execution on the client's web-browser, providing the missing information to the Servlet, and continuing execution of the initial HTTP-response by the Servlet using the missing information for retrieving the rest of the said HTTP-response and providing the rest of the HTTP-response to the client's web-browser for displaying. | 05-14-2009 |
20090164613 | Method and system for improving client-servlet communication - A method comprises providing a one way communication path that initiates a request by a client for retrieving information from a Servlet, and sending a response by the Servlet containing at least one return code specifying success or failure of the request, and including the result of the request if available. If the Servlet identifies missing information not included in the initial request, the method includes providing a complete response to automatically open an other communication path from the Servlet via the client's web-browser to the Servlet and providing the missing information to a Servlet by making use of the response functionality of the initial request. The other communication path is supported by a further Servlet functionality component. | 06-25-2009 |
20090327208 | DISCOVERING TRANSFORMATIONS APPLIED TO A SOURCE TABLE TO GENERATE A TARGET TABLE - Provided are a method, system, and article of manufacture for discovering transformations applied to a source table to generate a target table. Selection is made of a source table comprising a plurality of rows and a target table resulting from a transformation applied to the rows of the source table. A first pre-processing method is applied with respect to columns in the source and target tables to produce first category pre-processing output. The first category pre-processing output is used to determine first category transformation rules with respect to at least one source table column and at least one target table column. For each unpredicted target column in the target table not predicted by the determined first category transformation rules, a second pre-processing method is applied to columns in the source table and unpredicted target columns to produce second category pre-processing output. The second category pre-processing output is used to determine second category transformation rules with respect to at least one source table column and at least one target table column. | 12-31-2009 |
20100162210 | Visual Editor for Editing Complex Expressions - Methods and apparatus, including computer program products, implementing and using techniques for providing a visual editor allowing graphical editing of expressions in an expression language. A graphical user interface is displayed. A first user input of an expression is received. The expression is defined in a logical or textual form, and each component of the expression is represented by a graphical element on the graphical user interface. A syntax of the first user input is verified and an alert is provided to the user in response to detecting a syntax error or an inconsistency of the first user input when verifying the syntax. | 06-24-2010 |
20110178971 | PORTABLE DATA MANAGEMENT - Embodiments for methods, systems, and computer program products for creating and managing a portable data rule using an electronic computing device are presented including: causing the electronic computing device to create a rule definition including, defining an expression by a user, where the expression defines a logic of a rule, causing the electronic computing device to parse the expression into a logical variable associated with the expression, causing the electronic computing device to identify the logical variable, and causing the electronic computing device to store the rule definition, where the rule definition includes the expression and the logical variable. In some embodiments, the causing the electronic computing device to identify the logical variable includes: causing the electronic computing device to return a name of the logical variable; and causing the electronic computing device to return an expected type of the logical variable. | 07-21-2011 |
20120066214 | Handling Data Sets - A method, system and computer program product provides a first characteristic associated with a first data set and a single data value, and a second characteristic associated with a second data set; and calculates at least one of: 1) the similarity of the first data set with the second data set based on the first and second characteristics, 2) the similarity of the first data set with the single data value based on the first characteristic and the single data value, 3) confidence indicating how well the first characteristic reflects properties of the first data set based on the first characteristic, and 4) confidence indicating how well the similarity of the first data set with the single data value reflects properties of the single data value based on the first characteristic and the single data value. | 03-15-2012 |
20120089672 | SERVER SIDE PROCESSING OF USER INTERACTIONS WITH A WEB BROWSER - A method includes receiving input at a computer. The input is associated with an application frame of a client-side web browser. The method includes encoding control characteristics of the input as at least a portion of a request to a server-side web application. The method includes sending the request to the server-side web application and receiving an executable response from the server-side web application at a mediator frame of the client-side web browser. The method also includes executing the executable response via the mediator frame to update at least a portion of the application frame of the client-side browser. | 04-12-2012 |
20120151435 | Visual Editor for Editing Complex Expressions - Methods implementing and using techniques for providing a visual editor allowing graphical editing of expressions in an expression language. A graphical user interface is displayed. A first user input of an expression is received. The expression is defined in a logical or textual form, and each component of the expression is represented by a graphical element on the graphical user interface. A syntax of the first user input is verified and an alert is provided to the user in response to detecting a syntax error or an inconsistency of the first user input when verifying the syntax. | 06-14-2012 |
20120158625 | Creating and Processing a Data Rule - A data rule is created and processed by receiving an expression defining a logic of a rule and at least one logical variable, creating a rule definition including the expression and the at least one logical variable for binding each logical variable of the rule with at least one column, associating a characteristic enabling comparison of columns with a first logical variable of the rule definition, and storing the characteristic as part of the rule definition. | 06-21-2012 |
20120173479 | APPLICATION CACHE PROFILER - In an embodiment of the invention, a method for data profiling incorporating an enterprise service bus (ESB) coupling the target and source systems following an extraction, transformation, and loading (ETL) process for a target system and a source system is provided. The method includes receiving baseline data profiling results obtained during ETL from a source application to a target application, caching the updates, determining current data profiling results within the ESB for cached updates, and triggering an action if a threshold disparity is detected upon the current data profiling results and the baseline data profiling results. | 07-05-2012 |
20120173823 | APPLICATION CACHE PROFILER - In an embodiment of the invention, a method for data profiling incorporating an enterprise service bus (ESB) coupling the target and source systems following an extraction, transformation, and loading (ETL) process for a target system and a source system is provided. The method includes receiving baseline data profiling results obtained during ETL from a source application to a target application, caching the updates, determining current data profiling results within the ESB for cached updates, and triggering an action if a threshold disparity is detected upon the current data profiling results and the baseline data profiling results. | 07-05-2012 |
20120197836 | PORTABLE DATA MANAGEMENT - Embodiments for methods, systems, and computer program products for creating and managing a portable data rule using an electronic computing device are presented including: causing the electronic computing device to create a rule definition including, defining an expression by a user, where the expression defines a logic of a rule, causing the electronic computing device to parse the expression into a logical variable associated with the expression, causing the electronic computing device to identify the logical variable, and causing the electronic computing device to store the rule definition, where the rule definition includes the expression and the logical variable. In some embodiments, the causing the electronic computing device to identify the logical variable includes: causing the electronic computing device to return a name of the logical variable; and causing the electronic computing device to return an expected type of the logical variable. | 08-02-2012 |
20120246132 | MANAGING OVERFLOW ACCESS RECORDS IN A DATABASE - Overflow access records (OARs) are managed in a database system. An OAR is created in response to receiving an update command for a data record and to the updated data record generated by the update command not fitting onto the page in the table where the data record was stored. The OAR that is created includes an index counter that indicates a number of indexes associated with the table. When an OAR is accessed in response to a query command, an identifier of the accessed OAR is replaced in the index by an identifier of a data record pointed to by the OAR, and the index counter in the accessed OAR is changed by a predefined amount. When the index counter reaches a predefined value, the accessed OAR is removed from the table. | 09-27-2012 |
20130006931 | DATA QUALITY MONITORING - A computer implemented method, computer program product and system for data quality monitoring includes measuring a data quality of loaded data relative to a predefined data quality metric. The measuring the data quality includes identifying delta changes in at least one of the loaded data and the data quality rules relative to a previous measurement of the data quality of the loaded data. Logical calculus defined in the data quality rules is applied to the identified delta changes. | 01-03-2013 |
20130091094 | ACCELERATING DATA PROFILING PROCESS - A data profile request is handles by utilizing data in a distributed file system. Tabular data is extracted from a data source and stored in a distributed file system. Each table in the tabular data is split by columns, which are each stored in separate files in a set of physical nodes of the distributed file system. In response to a data profiling request, a master node determines, based on the profiling request, which groups of files are needed to be on a same physical node in order to perform the profiling analysis. The master node creates jobs using physical nodes that contain the requisite files needed for each job. | 04-11-2013 |
20140046927 | DETECTING MULTI-COLUMN COMPOSITE KEY COLUMN SETS - An aspect includes a computer-implemented method for detecting one or more multi-column composite key column sets. The method includes accessing a plurality of first columns, each first column representing a parameter, each first column including a set of distinct parameter values of its respective parameter, each distinct parameter value being stored in association with one or more object identifiers. Two or more of the first columns are selected for use as a current candidate column set, the current candidate column set including at least a first and a second candidate column, the current candidate column set being of a current cardinality. The method also includes determining, by comparing object-identifiers, whether for the current candidate column set at least one tuple of parameter values exists with parameter values respectively stored in association with two or more shared ones of the object identifiers to identify a multi-column composite key column set. | 02-13-2014 |
20140108639 | TRANSPARENTLY ENFORCING POLICIES IN HADOOP-STYLE PROCESSING INFRASTRUCTURES - Method, system, and computer program product to facilitate selection of data nodes configured to satisfy a set of requirements for processing client data in a distributed computing environment by providing, for each data node of a plurality of data nodes in the distributed computing environment, nodal data describing the respective data node of the plurality of data nodes, receiving a request to process the client data, the client data being identified in the request, retrieving the set of requirements for processing the client data, and analyzing the retrieved data policy and the nodal data describing at least one of the data nodes, to select a first data node of the plurality of data nodes as a delegation target, the first data node selected based on having a higher suitability level for satisfying the set of requirements than a second data node of the plurality of data nodes. | 04-17-2014 |
20140108648 | TRANSPARENTLY ENFORCING POLICIES IN HADOOP-STYLE PROCESSING INFRASTRUCTURES - Method, system, and computer program product to facilitate selection of data nodes configured to satisfy a set of requirements for processing client data in a distributed computing environment by providing, for each data node of a plurality of data nodes in the distributed computing environment, nodal data describing the respective data node of the plurality of data nodes, receiving a request to process the client data, the client data being identified in the request, retrieving the set of requirements for processing the client data, and analyzing the retrieved data policy and the nodal data describing at least one of the data nodes, to select a first data node of the plurality of data nodes as a delegation target, the first data node selected based on having a higher suitability level for satisfying the set of requirements than a second data node of the plurality of data nodes. | 04-17-2014 |
20140379667 | DATA QUALITY ASSESSMENT - According to one embodiment of the present invention, a system assesses the quality of column data. The system assigns a pre-defined domain to one or more columns of the data based on a validity condition for the domain, applies the validity condition for the domain assigned to a column to data values in the column to compute a data quality metric for the column, and computes and displays a metric for a group of columns based on the computed data quality metric of at least one column in the group. Embodiments of the present invention further include a method and computer program product for assessing the quality of column data in substantially the same manners described above. | 12-25-2014 |
20150058280 | DATA QUALITY MONITORING - A computer implemented method, computer program product and system for data quality monitoring includes measuring a data quality of loaded data relative to a predefined data quality metric. The measuring the data quality includes identifying delta changes in at least one of the loaded data and the data quality rules relative to a previous measurement of the data quality of the loaded data. Logical calculus defined in the data quality rules is applied to the identified delta changes. | 02-26-2015 |
20150066987 | METHOD AND SYSTEM FOR ACCESSING A SET OF DATA TABLES IN A SOURCE DATABASE - Embodiments relate to accessing a set of data tables in a source database. A set of table categories is provided for tables in the source database and a set of metrics is provided. For each table of the set of the data tables: the set of metrics is evaluated, the evaluated set of metrics is analyzed, and the table is categorized into one of the set of table categories using the result of the analysis. Information indicative of the table category of each table of the set of tables is output, and in response, a request to select data tables of the set of data tables is received according to a part of the table categories for data processing. A subset of data tables of the set of data tables is selected using the table categories for performing the data processing on the subset of data tables. | 03-05-2015 |