Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Aliferis

Constantin Aliferis, New York, NY US

Patent application number	Description	Published
20090077068	CONTENT AND QUALITY ASSESSMENT METHOD AND APPARATUS FOR QUALITY SEARCHING - A computer-based process retrieves information organized in documents containing text and/or coded representations of text. The process involves obtaining and labeling a selected set of documents based on content quality, and extracting and representing features from each document in the selected set. The extracted and selected features are modified, and models are constructed using parametric learning algorithms. The constructed models are capable of assigning a label to each document. The model parameters are instantiated using a first subset of the selected set of documents. Parameters are chosen by validating the corresponding model against at least a second subset of the full document set. The constructed models also are capable of assigning labels to similar documents outside a selected subset not previously given to the process of model construction.	03-19-2009
20090157585	Method for predicting citation counts - A computerized process to predict citation counts of articles comprising the steps of receiving an article through an input, obtaining, through the input, a selected set of articles exclusive of the article, storing in a memory the set of articles and the article, extracting through a computer processor an article feature from each article in the stored set of articles, constructing models through the computer processor using a pattern recognition process and the article feature, selecting, through the processor, a best model, predicting by application of the best model to the article by the processor a future citation count of the article, outputting, the article comprising the future citation count and controlling through a publication controller unit, distribution of the article.	06-18-2009

Constantin F. Aliferis, Nashville, TN US

Patent application number	Description	Published
20110246403	Method and System for Automated Supervised Data Analysis - The invention relates to a method for automatically analyzing data and constructing data classification models based on the data. In an embodiment of the method, the method includes selecting a best combination of methods from a plurality of classification, predictor selection, and data preparatory methods; and determining a best model that corresponds to one or more best parameters of the classification, predictor selection, and data preparatory methods for the data to be analyzed. The method also includes estimating the performance of the best model using new data that was not used in selecting the best combination of methods or in determining the best model; and returning a small set of predictors sufficient for the classification task.	10-06-2011

Konstantinos (constantin) F. Aliferis, New York, NY US

Patent application number	Description	Published
20100217599	Computer Implemented Method for Determining All Markov Boundaries and its Application for Discovering Multiple Maximally Accurate and Non-Redundant Predictive Models - Methods for discovery of a Markov boundary from data constitute one of the most important recent developments in pattern recognition and applied statistics, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. Even though there is always a single Markov boundary of the response variable in faithful distributions, distributions with violations of the intersection property of probability theory may have multiple Markov boundaries. Such distributions are abundant in practical data-analytic applications, and there are several reasons why it is important to discover all Markov boundaries from such data. The present invention is a novel computer implemented generative method (termed TIE) that can discover all Markov boundaries from a data sample drawn from a distribution. TIE can be instantiated to discover all and only Markov boundaries independent of data distribution. TIE* has been tested with simulated and re-simulated data and then applied to (a) identify the set of maximally accurate and non-redundant molecular signatures and to (b) discover Markov boundaries in datasets from several application domains including but not limited to: biology, medicine, economics, ecology, digit recognition, text categorization, and computational biology.	08-26-2010
20100217731	Computer Implemented Method for the Automatic Classification of Instrumental Citations - The learning method taught in this patent document is significantly different from previous methods for automatic classification of citations that are labor intensive and subject to human bias and error. The present invention automatically generates and avoids these limitations. A set of operational definitions and features uniquely suited to the scientific literature is disclosed along with their use with a learning method that is capable of analyzing the textual content of articles along with bibliometric data to accurately classify instrumental citations.	08-26-2010
20110202322	Computer Implemented Method for Discovery of Markov Boundaries from Datasets with Hidden Variables - Methods for Markov boundary discovery are important recent developments in pattern recognition and applied statistics, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. Currently there exist two major local method families for identification of Markov boundaries from data: methods that directly implement the definition of the Markov boundary and newer compositional Markov boundary methods that are more sample efficient and thus often more accurate in practical applications. However, in the datasets with hidden (i.e., unmeasured or unobserved) variables compositional Markov boundary methods may miss some Markov boundary members. The present invention circumvents this limitation of the compositional Markov boundary methods and proposes a new method that can discover Markov boundaries from the datasets with hidden variables and do so in a much more sample efficient manner than methods that directly implement the definition of the Markov boundary. In general, the inventive method transforms a dataset with many variables into a minimal reduced dataset where all variables are needed for optimal prediction of some response variable. The power of the invention was empirically demonstrated with data generated by Bayesian networks and with 13 real datasets from a diversity of application domains.	08-18-2011
20110307437	Local Causal and Markov Blanket Induction Method for Causal Discovery and Feature Selection from Data - In many areas, recent developments have generated very large datasets from which it is desired to extract meaningful relationships between the dataset elements. However, to date, the finding of such relationships using prior art methods has proved extremely difficult especially in the biomedical arts. Methods for local causal learning and Markov blanket discovery are important recent developments in pattern recognition and applied statistics, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. The present invention provides a generative method for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large datasets and relatively small samples. The method is readily applicable to real-world data, and the selected feature sets can be used for causal discovery and classification. The generative method GLL-PC can be instantiated in many ways, giving rise to novel method variants. In general, the inventive method transforms a dataset with many variables into either a minimal reduced dataset where all variables are needed for optimal prediction of the response variable or a dataset where all variables are direct causes and direct effects of the response variable. The power of the invention and significant advantages over the prior art were empirically demonstrated with datasets from a diversity of application domains (biology, medicine, economics, ecology, digit recognition, text categorization, and computational biology) and data generated by Bayesian networks.	12-15-2011

Konstantinos (constantin) F. Aliferis, Astoria, NY US

Patent application number	Description	Published
20140278339	Computer System and Method That Determines Sample Size and Power Required For Complex Predictive and Causal Data Analysis - Established methods for statistical “power-size” analysis for statistical modeling are geared toward statistical hypothesis testing, and have serious shortcomings in modern complex predictive and causal modeling applications where the determination of sample size is affected by parameters not addressed by the standard statistical power-size analysis. The present invention provides a method and computer-implemented system for determining sufficient sample size for training predictive or causal models for a given application field or distribution type and desired performance level taking into account the critical factors that affect the needed sample size. The invention can be applied to practically any field where predictive modeling or causal modeling are desired.	09-18-2014
20140279760	Data Analysis Computer System and Method For Conversion Of Predictive Models To Equivalent Ones - The present invention addresses two ubiquitous and pressing problems of modern data analytics technology. Many modern pattern recognition technologies produce models with excellent predictivity but (a) they are “black boxes”, that is they are opaque to the user; (b) they are too large, and/or expensive to execute in less powerful computing platforms. The invention “opens up” a black box model by converting it to a compact and understandable model that is functionally equivalent. The invention also converts a predictive model into a functionally equivalent model into a form that can be implemented and deployed more easily or efficiently in practice. The benefits include: model understandability and defensibility of modeling. A particularly interesting application is that of understanding the decision making of humans, comparison of the behavior of a human or computerized decision process against another and use to enhance education and guideline compliance/adherence detection and improvement. The invention can be applied to practically any field where predictive modeling (classification and regression) is desired because it relies on extremely broad distributional assumptions that are valid in numerous fields.	09-18-2014
20140279761	Document Coding Computer System and Method With Integrated Quality Assurance - The present invention consists of a computer-implemented system and method for automatically analyzing and coding documents into content categories suitable for high cost, high yield settings where quality and efficiency of classification are essential. A prototypical example application field is legal document predictive coding for purposes of e-discovery and litigation (or litigation readiness) where the automated classification of documents as “responsive” or not must be (a) efficient, (b) accurate, and (c) defensible in court. Many text classification technologies exist but they focus on the relatively simple steps of using a training method on training data, producing a model and testing it on test data. They invariably do not address effectively and simultaneously key quality assurance requirements. The invention applies several data design and validation steps that ensure quality and removal of all possible sources of document classification error or deficiencies. The invention employs multiple classification methods, preprocessing methods, visualization and organization of results, and explanation of models which further enhance predictive quality, but also ease of use of models and user acceptance. The invention can be applied to practically any field where text classification is desired.	09-18-2014
20140279794	Data Analysis Computer System and Method for Organizing, Presenting, and Optimizing Predictive Modeling - Predictive modeling is an important class of data analytics with applications in numerous fields. Once a predictive model is built, validated, and applied on a set of objects, by a data analytics system (or even by manual modeling), consumers of the model information need assistance to navigate through the results. This is because both regression and classification models that output continuous values (eg, probability of belonging to a class) are often used to rank objects and then a thresholding of the ranked scores needs to be used to separate objects into a “positive” and a “negative” class. The choice of threshold greatly affects the true positive, false positive, true negative, and false negative results of the model's application. An ideal data analytics system should allow the user to understand the tradeoffs of different threshold values for different thresholds. The user interface should convey this information in an intuitive manner and provide the ability to vary the threshold interactively while simultaneously presenting the effects of thresholding on predictivity. This is precisely the function of the present invention. In addition to manual thresholding, the invention also allows for the thresholding to be performed by fully automated means (via standard statistical optimization methods) once a user has identified the desired balance of false positives and false negatives (or other predictivity metrics of interest). The invention can be applied to any application field of predictive modeling.	09-18-2014
20140280257	Data Analysis Computer System and Method For Parallelized and Modularized Analysis of Big Data - The focus of the present invention is the modular analysis of Big Data encompassing parallelization, chunking, and distributed analysis applications. Typical application scenarios include: (i) data may not reside in one database but alternatively exist in more non-identical databases, and analysis has to take place in situ rather than combining all databases in one big database; (ii) data exceeding the working memory of the largest available computer and has to be broken into smaller pieces that need be analyzed separately and the results combined; (c) data encompassing several distinct data types that have to be analyzed separately by methods specific to each data type, and the results combined; (iv) data encompassing several distinct data types that have to be analyzed separately by analyst with knowledge/skills specific to each data type, and the results combined; and (v) data analysis that has to take place over time as new data is coming in and results are incrementally improved until analysis objectives are met, or no more data is available. The present Big Data Parallelization/Modularization data analysis system and method—“BDP/M”)) is implemented in general purpose digital computers and is capable of dealing with the above scenarios of Big Data analysis as well as any scenario where parallel, distributed, federated, chunked and serialized Big Data analysis is desired without compromising efficiency and correctness.	09-18-2014
20140280361	Data Analysis Computer System and Method Employing Local to Global Causal Discovery - Discovery of causal networks is essential for understanding and manipulating complex systems in numerous data analysis application domains. Several methods have been proposed in the last two decades for solving this problem. The inventive method uses local causal discovery methods for global causal network learning in a divide-and-conquer fashion. The usefulness of the invention is demonstrated in data capturing characteristics of several domains. The inventive method outputs more accurate networks compared to other discovery approaches.	09-18-2014
20140289174	Data Analysis Computer System and Method For Causal Discovery with Experimentation Optimization - Discovery of causal models via experimentation is essential in numerous applications fields. One of the primary objectives of the invention is to minimize the use of costly experimental resources while achieving high discovery accuracy. The invention provides new methods and processes to enable accurate discovery of local causal pathways by integrating high-throughput observational data with efficient experimentation strategies. At the core of these methods are computational causal discovery techniques that account for multiplicity (i.e., indistinguishability) of causal pathways consistent with observational data. The invention, when applied for discovery of local causal pathways from a combination of observational and experimental data, achieves higher discovery accuracy than existing observational approaches and uses fewer experimental resources than existing experimental approaches. Repeated application of the invention for each variable in the modeled system produces the full causal model.	09-25-2014
20140324752	Data Analysis Computer System and Method For Fast Discovery Of Multiple Markov Boundaries - Methods for discovery of a Markov boundary from data constitute one of the most important recent developments in pattern recognition and applied data analysis and modeling, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. Even though there is always a single Markov boundary of the response variable in faithful distributions, distributions with violations of the intersection property of probability theory may have multiple Markov boundaries. Such distributions are abundant in practical data-analytic applications, and there are several reasons why it is important to discover and extract all Markov boundaries from such data as a critical step of data analysis. The present invention is a novel fast generative method (termed Generalized-iTIE*) that can discover all Markov boundaries from a sample drawn from a distribution. The new method has been tested with simulated data and then applied to discover Markov boundaries in datasets from several application domains including but not limited to: biology, medicine, economics, ecology, image recognition, text processing, and computational biology.	10-30-2014

Peter Aliferis, Brampton CA

Patent application number	Description	Published
20090322473	Remote controlled dead bolt door locking system - The remote controlled deadbolt door locking system (the “unit”) is designed to be an add-on safety mechanism for existing entry doors. The unit is designed to be mounted onto the bottom corner of a door, where in the engaged position, it prevents said door from being opened even if the main lock has been tampered with. The unit makes use of a small DC electric motor to move a steel shaft in the vertical direction into a steel bushing that is mounted into a drilled hole in the floor directly in front of the door and under the unit. The unit is mounted onto the door with four carriage bolts through the door, and a mounting plate from the outside of the door, through four matching holes in the unit itself. The unit is then simply tightened on with normal hex nuts. The unit is equipped with several safety circuits, which warn the user if any of the following occur: low battery or battery failure of either the main or backup batteries; the shaft does not fully engage upon closing; and if both batteries fall to a low condition. There is a built in triple redundancy to eliminate the possibility of the homeowner locking him/herself out. The operation of the unit is accomplished through a two-button remote control, or any commercially available remote entry system including but not limited to: fingerprint or voice recognition, or keypad entry. In this way, the unit acts just like a simple dead bolt, but one that can be locked while the homeowner is standing outside of the house.	12-31-2009