| Patent application number | Description | Published |
| 20080205774 | Document clustering using a locality sensitive hashing function - Documents from a data stream are clustered by first generating a feature vector for each document. A set of cluster centroids (e.g., feature vectors of their corresponding clusters) are retrieved from a memory based on the feature vector of the document using a locality sensitive hashing function. The centroids may be retrieved by retrieving a set of cluster identifiers from a cluster table, the cluster identifiers each indicative of a respective cluster centroid, and retrieving the cluster centroids corresponding to the retrieved cluster identifiers from a memory. Documents may then be clustered into one or more of the candidate clusters using distance measures from the feature vector of the document to the cluster centroids. | 08-28-2008 |
| 20080205775 | Online document clustering - Documents from a data stream are clustered by first generating a feature vector for each document. A set of cluster centroids (e.g., feature vectors of their corresponding clusters) are retrieved from a memory based on the feature vector of the document and a relative age of each of the cluster centroids. The centroids may be retrieved by retrieving a set of cluster identifiers from a cluster table, the cluster identifiers each indicative of a respective cluster centroid, and retrieving the cluster centroids corresponding to the retrieved cluster identifiers from a memory. A list of cluster identifiers in the cluster table may be maintained based on the relative age of cluster centroids corresponding to the cluster identifiers. Cluster identifiers that correspond to cluster centroids with a relative age exceeding a predetermined threshold are periodically removed from the list of cluster identifiers. | 08-28-2008 |
| 20080208847 | Relevance ranking for document retrieval - Documents and/or document clusters are ranked with respect to their geographical locations and/or user specific (e.g., user input) relevance. Highly relevant documents and/or document clusters are assigned higher ranks than less relevant documents and/or clusters. In this way, ranked lists of documents and/or clusters, top clusters (e.g., top stories), top documents (e.g., most important articles), etc. may be served (e.g., presented, delivered, etc.) to users. | 08-28-2008 |
| 20080255773 | Machine condition monitoring using pattern rules - Pattern rules are created by comparing a condition signal pattern to a plurality of known signal patterns and determining a machine condition pattern rule based at least in part on the comparison of the condition signal pattern to one of the plurality of known signal patterns. A matching score based on the comparison of the condition signal pattern to one of the plurality of known signal patterns as well as a signal pattern duration is determined. The machine condition pattern rule is then defined for nonparametric condition signal patterns as a multipartite threshold rule with a first threshold based on the determined matching score and a second threshold based on the determined signal duration. For parametric signal patterns, one or more parameters of the signal pattern are determined and the machine condition pattern rule is further defined with a third threshold based on the determined one or more parameters. | 10-16-2008 |
| 20080288213 | Machine condition monitoring using discontinuity detection - Condition signals of machines are observed and one or more discontinuities are detected in the condition signals. The discontinuities in the condition signals are compensated for (e.g., by applying a shifting factor to models of the signals) and trends of the compensated condition signals are determined. The trends are used to predict future fault conditions in machines. Kalman filters comprising observation models and evolution models are used to determine the trends. Discontinuity in observed signals is detected using hypothesis testing. | 11-20-2008 |
| 20090037155 | Machine condition monitoring using a flexible monitoring framework - A flexible framework and a corresponding user interface allow a user to configure a machine condition monitoring system. A user-configurable computation framework offers flexibility in designing the machine condition monitoring system. In this framework, every computation based on machine attributes is represented as an input-output system. A simple computation can be easily defined by specifying the computation type, number of inputs, structure, and parameters. The user can use the determined output attributes of computations as input attributes in other computations. Ultimately, the computations are aggregated by the framework configured by the user to produce an output computation attribute that indicates a machine condition or predicts a machine condition. | 02-05-2009 |
| 20090043536 | Use of Sequential Clustering for Instance Selection in Machine Condition Monitoring - A method is provided for selecting a representative set of training data for training a statistical model in a machine condition monitoring system. The method reduces the time required to choose representative samples from a large data set by using a nearest-neighbor sequential clustering technique in combination with a kd-tree. A distance threshold is used to limit the geometric size the clusters. Each node of the kd-tree is assigned a representative sample from the training data, and similar samples are subsequently discarded. | 02-12-2009 |
| 20090091443 | Segment-Based Change Detection Method in Multivariate Data Stream - A method and framework are described for detecting changes in a multivariate data stream. A training set is formed by sampling time windows in a data stream containing data reflecting normal conditions. A histogram is created to summarize each window of data, and data within the histograms are clustered to form test distribution representatives to minimize the bulk of training data. Test data is then summarized using histograms representing time windows of data and data within the test histograms are clustered. The test histograms are compared to the training histograms using nearest neighbor techniques on the clustered data. Distances from the test histograms to the test distribution representatives are compared to a threshold to identify anomalies. | 04-09-2009 |
| 20090119243 | Multivariate Analysis of Wireless Sensor Network Data for Machine Condition Monitoring - Machine condition monitoring on a system utilizes a wireless sensor network to gather data from a large number of sensors. The data is processed using a multivariate statistical model to determine whether the system has deviated from a normal condition. The wireless sensor network permits the acquisition of a large number of distributed data points from plural system modalities, which, in turn, yields enhanced prediction accuracy and a reduction in false alarms. | 05-07-2009 |
| 20110035187 | Scalable and Extensible Framework for Storing and Analyzing Sensor Data - In a framework for acquiring and analyzing data from a network of sensors, plug-in software interfaces are used to provide scalability and flexibility. Data collection set-up data is exchanged through one or more first plug-in software interfaces with data collection devices, to configure the processor to collect measurement data from the data collection devices. Analysis set-up data is exchanged through one or more second plug-in software interfaces with one or more data analysis software packages, to configure the processor to provide a predefined subset of the measurement data to the data analysis software packages and to accept analysis results from the data analysis software packages. Measurement data and analysis results are subsequently exchanged through the plug-in interfaces. | 02-10-2011 |