PureDiscovery Corporation Patent applications |
Patent application number | Title | Published |
20130159313 | Multi-Concept Latent Semantic Analysis Queries - A method includes accessing text, identifying a plurality of terms from the text, determining a plurality of term vectors associated with the identified plurality of terms, and clustering the determined plurality of term vectors into a plurality of clusters, the plurality of clusters comprising a first and a second cluster, the first and second clusters each comprising two or more of the determined term vectors. The method further includes creating a first pseudo-document according to the first cluster, creating a second pseudo-document according to the second cluster, identifying a first set of terms associated with the first cluster using latent semantic analysis (LSA) of the first pseudo-document, identifying a second set of terms associated with the second cluster using LSA of the second pseudo-document, and combining the first and second sets of terms into a list of output terms. | 06-20-2013 |
20130158979 | System and Method for Identifying Phrases in Text - A method includes accessing text that includes a plurality of words, tagging each of the plurality of words with one of a plurality of parts of speech (POS) tags, and creating a plurality of tokens, each token comprising one of the plurality of words and its associated POS tag. The method further includes clustering one or more of the created tokens into a chunk of tokens, the one or more tokens clustered into the chunk of tokens based on the POS tags of the one or more tokens, and forming a phrase based on the chunk of tokens, the phrase comprising the words of the one or more tokens clustered into the chunk of tokens. | 06-20-2013 |
20100114890 | System and Method for Discovering Latent Relationships in Data - A computerized method of querying an array of vectors includes receiving a first matrix, partitioning the first matrix into a plurality of subset matrices, and processing each subset matrix with a natural language analysis process to create a plurality of processed subset matrices. The first matrix includes a first plurality of terms and represents one or more data objects to be queried, each subset matrix includes similar vectors from the first matrix, and each processed subset matrix relates terms in each subset matrix to each other. | 05-06-2010 |