Patent application number | Description | Published |
20130218908 | COMPUTING AND APPLYING ORDER STATISTICS FOR DATA PREPARATION - Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic. | 08-22-2013 |
20130218909 | COMPUTING AND APPLYING ORDER STATISTICS FOR DATA PREPARATION - Provided are techniques for generating order statistics and error bounds. For each of multiple, distributed data sources, a finite number of data bins are created for each field in that data source. Data values in each of the multiple, distributed data sources are processed to generate basic summaries for each of the data bins in a single pass of the data values. The data bins from each of the multiple, distributed data sources are sorted. One or more approximate order statistics are computed for a data set by accumulating counts from a number of the sorted data bins. Lower and upper error bounds are provided for each of the computed one or more approximate order statistics, wherein the lower and upper error bounds are values delimiting an interval containing a true value of an order statistic. | 08-22-2013 |
20130226838 | MISSING VALUE IMPUTATION FOR PREDICTIVE MODELS - Provided are techniques for imputing a missing value for each of one or more predictor variables. Data is received from one or more data sources. For each of the one or more predictor variables, an imputation model is built based on information of a target variable; a type of imputation model to construct is determined based on the one or more data sources, a measurement level of the predictor variable, and a measurement level of the target variable; and the determined type of imputation model is constructed using basic statistics of the predictor variable and the target variable. The missing value is imputed for each of the one or more predictor variables using the data from the one or more data sources and one or more built imputation models to generate a completed data set. | 08-29-2013 |
20130226842 | MISSING VALUE IMPUTATION FOR PREDICTIVE MODELS - Provided are techniques for imputing a missing value for each of one or more predictor variables. Data is received from one or more data sources. For each of the one or more predictor variables, an imputation model is built based on information of a target variable; a type of imputation model to construct is determined based on the one or more data sources, a measurement level of the predictor variable, and a measurement level of the target variable; and the determined type of imputation model is constructed using basic statistics of the predictor variable and the target variable. The missing value is imputed for each of the one or more predictor variables using the data from the one or more data sources and one or more built imputation models to generate a completed data set. | 08-29-2013 |
20140032553 | RELATIONSHIP DISCOVERY IN BUSINESS ANALYTICS - A subset of (k−1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k−1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions. | 01-30-2014 |
20140032611 | RELATIONSHIP DISCOVERY IN BUSINESS ANALYTICS - A subset of (k−1)-dimensional tables are received, wherein k is greater than 1. A set of k-dimensional tables is created by combining each of the (k−1)-dimensional tables with a non-included dimension corresponding to a 1-dimensional table. Significance of interaction and interaction effect size is computed for the created set of k-dimensional tables to determine dimension and measure interactions. | 01-30-2014 |
20140258355 | INTERACTION DETECTION FOR GENERALIZED LINEAR MODELS - Provided are techniques for interaction detection for generalized linear models. Basic statistics are calculated for a pair of categorical predictor variables and a target variable from a dataset during a single pass over the dataset. It is determined whether there is a significant interaction effect for the pair of categorical predictor variables on the target variable by: calculating a log-likelihood value for a full generalized linear model without estimating model parameters; calculating the model parameters for a reduced generalized linear model with a recursive marginal mean accumulation technique using the basic statistics; calculating a log-likelihood value for the reduced generalized linear model; calculating a likelihood ratio test statistic using the log-likelihood value for the full generalized linear model and the log-likelihood value for the reduced generalized linear model; calculating a p-value of the likelihood ratio test statistic; and comparing the p-value to a significance level. | 09-11-2014 |
20150302318 | UPDATING PREDICTION MODEL - In an approach to updating a prediction model, where the prediction model is used for time series data, a computer selects a first prediction time window in an order from a plurality of prediction time windows associated with the prediction model, and predicts one or more predicted values of the time series data at a plurality of time points within the first prediction time window. The computer calculates a prediction error associated with the first prediction time window based on the one or more predicted values and one or more actual measured values of the time series data at the plurality of time points. The computer determines whether the prediction error is larger than a predefined error threshold associated with the first prediction time window, and in response to determining the prediction error is larger than the predefined error threshold, provides a notification of updating the prediction model. | 10-22-2015 |