Patent application number | Description | Published |
20090077068 | CONTENT AND QUALITY ASSESSMENT METHOD AND APPARATUS FOR QUALITY SEARCHING - A computer-based process retrieves information organized in documents containing text and/or coded representations of text. The process involves obtaining and labeling a selected set of documents based on content quality, and extracting and representing features from each document in the selected set. The extracted and selected features are modified, and models are constructed using parametric learning algorithms. The constructed models are capable of assigning a label to each document. The model parameters are instantiated using a first subset of the selected set of documents. Parameters are chosen by validating the corresponding model against at least a second subset of the full document set. The constructed models also are capable of assigning labels to similar documents outside a selected subset not previously given to the process of model construction. | 03-19-2009 |
20090157585 | Method for predicting citation counts - A computerized process to predict citation counts of articles comprising the steps of receiving an article through an input, obtaining, through the input, a selected set of articles exclusive of the article, storing in a memory the set of articles and the article, extracting through a computer processor an article feature from each article in the stored set of articles, constructing models through the computer processor using a pattern recognition process and the article feature, selecting, through the processor, a best model, predicting by application of the best model to the article by the processor a future citation count of the article, outputting, the article comprising the future citation count and controlling through a publication controller unit, distribution of the article. | 06-18-2009 |
Patent application number | Description | Published |
20100217599 | Computer Implemented Method for Determining All Markov Boundaries and its Application for Discovering Multiple Maximally Accurate and Non-Redundant Predictive Models - Methods for discovery of a Markov boundary from data constitute one of the most important recent developments in pattern recognition and applied statistics, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. Even though there is always a single Markov boundary of the response variable in faithful distributions, distributions with violations of the intersection property of probability theory may have multiple Markov boundaries. Such distributions are abundant in practical data-analytic applications, and there are several reasons why it is important to discover all Markov boundaries from such data. The present invention is a novel computer implemented generative method (termed TIE*) that can discover all Markov boundaries from a data sample drawn from a distribution. TIE* can be instantiated to discover all and only Markov boundaries independent of data distribution. TIE* has been tested with simulated and re-simulated data and then applied to (a) identify the set of maximally accurate and non-redundant molecular signatures and to (b) discover Markov boundaries in datasets from several application domains including but not limited to: biology, medicine, economics, ecology, digit recognition, text categorization, and computational biology. | 08-26-2010 |
20100217731 | Computer Implemented Method for the Automatic Classification of Instrumental Citations - The learning method taught in this patent document is significantly different from previous methods for automatic classification of citations that are labor intensive and subject to human bias and error. The present invention automatically generates and avoids these limitations. A set of operational definitions and features uniquely suited to the scientific literature is disclosed along with their use with a learning method that is capable of analyzing the textual content of articles along with bibliometric data to accurately classify instrumental citations. | 08-26-2010 |
20110202322 | Computer Implemented Method for Discovery of Markov Boundaries from Datasets with Hidden Variables - Methods for Markov boundary discovery are important recent developments in pattern recognition and applied statistics, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. Currently there exist two major local method families for identification of Markov boundaries from data: methods that directly implement the definition of the Markov boundary and newer compositional Markov boundary methods that are more sample efficient and thus often more accurate in practical applications. However, in the datasets with hidden (i.e., unmeasured or unobserved) variables compositional Markov boundary methods may miss some Markov boundary members. The present invention circumvents this limitation of the compositional Markov boundary methods and proposes a new method that can discover Markov boundaries from the datasets with hidden variables and do so in a much more sample efficient manner than methods that directly implement the definition of the Markov boundary. In general, the inventive method transforms a dataset with many variables into a minimal reduced dataset where all variables are needed for optimal prediction of some response variable. The power of the invention was empirically demonstrated with data generated by Bayesian networks and with 13 real datasets from a diversity of application domains. | 08-18-2011 |
20110307437 | Local Causal and Markov Blanket Induction Method for Causal Discovery and Feature Selection from Data - In many areas, recent developments have generated very large datasets from which it is desired to extract meaningful relationships between the dataset elements. However, to date, the finding of such relationships using prior art methods has proved extremely difficult especially in the biomedical arts. Methods for local causal learning and Markov blanket discovery are important recent developments in pattern recognition and applied statistics, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. The present invention provides a generative method for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large datasets and relatively small samples. The method is readily applicable to real-world data, and the selected feature sets can be used for causal discovery and classification. The generative method GLL-PC can be instantiated in many ways, giving rise to novel method variants. In general, the inventive method transforms a dataset with many variables into either a minimal reduced dataset where all variables are needed for optimal prediction of the response variable or a dataset where all variables are direct causes and direct effects of the response variable. The power of the invention and significant advantages over the prior art were empirically demonstrated with datasets from a diversity of application domains (biology, medicine, economics, ecology, digit recognition, text categorization, and computational biology) and data generated by Bayesian networks. | 12-15-2011 |
Patent application number | Description | Published |
20140278339 | Computer System and Method That Determines Sample Size and Power Required For Complex Predictive and Causal Data Analysis - Established methods for statistical “power-size” analysis for statistical modeling are geared toward statistical hypothesis testing, and have serious shortcomings in modern complex predictive and causal modeling applications where the determination of sample size is affected by parameters not addressed by the standard statistical power-size analysis. The present invention provides a method and computer-implemented system for determining sufficient sample size for training predictive or causal models for a given application field or distribution type and desired performance level taking into account the critical factors that affect the needed sample size. The invention can be applied to practically any field where predictive modeling or causal modeling are desired. | 09-18-2014 |
20140279760 | Data Analysis Computer System and Method For Conversion Of Predictive Models To Equivalent Ones - The present invention addresses two ubiquitous and pressing problems of modern data analytics technology. Many modern pattern recognition technologies produce models with excellent predictivity but (a) they are “black boxes”, that is they are opaque to the user; (b) they are too large, and/or expensive to execute in less powerful computing platforms. The invention “opens up” a black box model by converting it to a compact and understandable model that is functionally equivalent. The invention also converts a predictive model into a functionally equivalent model into a form that can be implemented and deployed more easily or efficiently in practice. The benefits include: model understandability and defensibility of modeling. A particularly interesting application is that of understanding the decision making of humans, comparison of the behavior of a human or computerized decision process against another and use to enhance education and guideline compliance/adherence detection and improvement. The invention can be applied to practically any field where predictive modeling (classification and regression) is desired because it relies on extremely broad distributional assumptions that are valid in numerous fields. | 09-18-2014 |
20140279761 | Document Coding Computer System and Method With Integrated Quality Assurance - The present invention consists of a computer-implemented system and method for automatically analyzing and coding documents into content categories suitable for high cost, high yield settings where quality and efficiency of classification are essential. A prototypical example application field is legal document predictive coding for purposes of e-discovery and litigation (or litigation readiness) where the automated classification of documents as “responsive” or not must be (a) efficient, (b) accurate, and (c) defensible in court. Many text classification technologies exist but they focus on the relatively simple steps of using a training method on training data, producing a model and testing it on test data. They invariably do not address effectively and simultaneously key quality assurance requirements. The invention applies several data design and validation steps that ensure quality and removal of all possible sources of document classification error or deficiencies. The invention employs multiple classification methods, preprocessing methods, visualization and organization of results, and explanation of models which further enhance predictive quality, but also ease of use of models and user acceptance. The invention can be applied to practically any field where text classification is desired. | 09-18-2014 |
20140279794 | Data Analysis Computer System and Method for Organizing, Presenting, and Optimizing Predictive Modeling - Predictive modeling is an important class of data analytics with applications in numerous fields. Once a predictive model is built, validated, and applied on a set of objects, by a data analytics system (or even by manual modeling), consumers of the model information need assistance to navigate through the results. This is because both regression and classification models that output continuous values (eg, probability of belonging to a class) are often used to rank objects and then a thresholding of the ranked scores needs to be used to separate objects into a “positive” and a “negative” class. The choice of threshold greatly affects the true positive, false positive, true negative, and false negative results of the model's application. An ideal data analytics system should allow the user to understand the tradeoffs of different threshold values for different thresholds. The user interface should convey this information in an intuitive manner and provide the ability to vary the threshold interactively while simultaneously presenting the effects of thresholding on predictivity. This is precisely the function of the present invention. In addition to manual thresholding, the invention also allows for the thresholding to be performed by fully automated means (via standard statistical optimization methods) once a user has identified the desired balance of false positives and false negatives (or other predictivity metrics of interest). The invention can be applied to any application field of predictive modeling. | 09-18-2014 |
20140280257 | Data Analysis Computer System and Method For Parallelized and Modularized Analysis of Big Data - The focus of the present invention is the modular analysis of Big Data encompassing parallelization, chunking, and distributed analysis applications. Typical application scenarios include: (i) data may not reside in one database but alternatively exist in more non-identical databases, and analysis has to take place in situ rather than combining all databases in one big database; (ii) data exceeding the working memory of the largest available computer and has to be broken into smaller pieces that need be analyzed separately and the results combined; (c) data encompassing several distinct data types that have to be analyzed separately by methods specific to each data type, and the results combined; (iv) data encompassing several distinct data types that have to be analyzed separately by analyst with knowledge/skills specific to each data type, and the results combined; and (v) data analysis that has to take place over time as new data is coming in and results are incrementally improved until analysis objectives are met, or no more data is available. The present Big Data Parallelization/Modularization data analysis system and method—“BDP/M”)) is implemented in general purpose digital computers and is capable of dealing with the above scenarios of Big Data analysis as well as any scenario where parallel, distributed, federated, chunked and serialized Big Data analysis is desired without compromising efficiency and correctness. | 09-18-2014 |
20140280361 | Data Analysis Computer System and Method Employing Local to Global Causal Discovery - Discovery of causal networks is essential for understanding and manipulating complex systems in numerous data analysis application domains. Several methods have been proposed in the last two decades for solving this problem. The inventive method uses local causal discovery methods for global causal network learning in a divide-and-conquer fashion. The usefulness of the invention is demonstrated in data capturing characteristics of several domains. The inventive method outputs more accurate networks compared to other discovery approaches. | 09-18-2014 |
20140289174 | Data Analysis Computer System and Method For Causal Discovery with Experimentation Optimization - Discovery of causal models via experimentation is essential in numerous applications fields. One of the primary objectives of the invention is to minimize the use of costly experimental resources while achieving high discovery accuracy. The invention provides new methods and processes to enable accurate discovery of local causal pathways by integrating high-throughput observational data with efficient experimentation strategies. At the core of these methods are computational causal discovery techniques that account for multiplicity (i.e., indistinguishability) of causal pathways consistent with observational data. The invention, when applied for discovery of local causal pathways from a combination of observational and experimental data, achieves higher discovery accuracy than existing observational approaches and uses fewer experimental resources than existing experimental approaches. Repeated application of the invention for each variable in the modeled system produces the full causal model. | 09-25-2014 |
20140324752 | Data Analysis Computer System and Method For Fast Discovery Of Multiple Markov Boundaries - Methods for discovery of a Markov boundary from data constitute one of the most important recent developments in pattern recognition and applied data analysis and modeling, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. Even though there is always a single Markov boundary of the response variable in faithful distributions, distributions with violations of the intersection property of probability theory may have multiple Markov boundaries. Such distributions are abundant in practical data-analytic applications, and there are several reasons why it is important to discover and extract all Markov boundaries from such data as a critical step of data analysis. The present invention is a novel fast generative method (termed Generalized-iTIE*) that can discover all Markov boundaries from a sample drawn from a distribution. The new method has been tested with simulated data and then applied to discover Markov boundaries in datasets from several application domains including but not limited to: biology, medicine, economics, ecology, image recognition, text processing, and computational biology. | 10-30-2014 |