Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


MSC INTELLECTUAL PROPERTIES B.V.

MSC INTELLECTUAL PROPERTIES B.V. Patent applications
Patent application numberTitlePublished
20110191354SYSTEM AND METHOD FOR NEAR AND EXACT DE-DUPLICATION OF DOCUMENTS - A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.08-04-2011