Patent application number | Description | Published |
20080239365 | Masking of text in document reproduction - An apparatus for masking text in a rendered copy of an original document includes a text modification system which is configured to receive a print job from an application and modify the print job in accordance with a print job description, whereby when rendered on an output device, a selected text element is masked. A user interface is configured to receive instructions from a user to build the print job description including instructions for selecting text elements to be masked. | 10-02-2008 |
20080243842 | Optimizing the performance of duplicate identification by content - In accordance with the disclosure, there is provided a method for identifying duplicate documents comprising drafting a first document and creating a near unique representative string based on the document content. The method further comprises searching for other documents with the same NRS and selectively assigning a duplicate group identification to the first document, the duplicate group identification is unique if no near unique representative string matches are found, or the duplicate group identification is the same as an associated duplicate document's duplicate group identification that matches the NRS. The method further comprises placing the DGI into a meta-data of the first document and recalling a list of duplicates of a particular document based upon user demand by searching the meta-data and selecting documents using the same DGI. | 10-02-2008 |
20080246986 | METHODS AND APPARATUS FOR IMPROVED OPERATION OF NETWORKED PRINTING SYSTEM - Methods and systems are presented for performing one or more printer device management functions in a network, in which affinities between printers are determined from job tracking data to indicate associations between printer devices and user devices, and the affinity data is used to perform one or more printer management functions such as determining printer connections for new or roaming user devices, print job redirection, and identification of underutilized printer device assets. | 10-09-2008 |
20080270516 | Method and Apparatus for Controlling Document Service Requests from a Mobile Device - Methods for controlling a document service request involve defining a document service request workflow and redirecting document service requests from the mobile device. In one embodiment, through a short-range connection with a document processing device on a foreign network, a mobile device establishes a secure connection to the mobile device's native network to identify a document stored on a file server operating thereon. Once a document is identified, the mobile device over the secure connection initiates a document service request by requesting an output server operating on the native network to retrieve and convert the identified document into an output-ready format suitable for the document processing device. Upon receipt of the output-ready document, the mobile device resends the output-ready document over a local connection to the document processing device to carry out the document service request. | 10-30-2008 |
20100092084 | REPRESENTING DOCUMENTS WITH RUNLENGTH HISTOGRAMS - An apparatus and method are provided for generating a representation of an image which may be used in tasks such as classification, clustering, or similarity determination. An image, such as a scanned document, in which the pixel colorant values are quantized into a plurality of colorant quantization levels, is partitioned into regions, optionally at a plurality of different scales. For each region, a runlength histogram is computed, which may be a combination of sub-histograms for each of the colorant quantization levels and optionally each of plural directions. The runlength histograms, optionally normalized, can then be combined to generate a representation of the document image. | 04-15-2010 |
20110078191 | HANDWRITTEN DOCUMENT CATEGORIZER AND METHOD OF TRAINING - A method and an apparatus for training a handwritten document categorizer are disclosed. For each category in a set into which handwritten documents are to be categorized, discriminative words are identified from the OCR output of a training set of typed documents labeled by category. A group of keywords is established including some of the discriminative words identified for each category. Samples of each of the keywords in the group are synthesized using a plurality of different type fonts. A keyword model is then generated for each keyword, parameters of the model being estimated, at least initially, based on features extracted from the synthesized samples. Keyword statistics for each of a set of scanned handwritten documents labeled by category are generated by applying the generated keyword models to word images extracted from the scanned handwritten documents. The categorizer is trained with the keyword statistics and respective handwritten document labels. | 03-31-2011 |
20110137898 | UNSTRUCTURED DOCUMENT CLASSIFICATION - A document classification method comprises: (i) classifying pages of an input document to generate page classifications; (ii) aggregating the page classifications to generate an input document representation, the aggregating not being based on ordering of the pages; and (iii) classifying the input document based on the input document representation. A page classifier for use in the page classifying operation (i) is trained based on pages of a set of labeled training documents having document classification labels. In some such embodiments, the pages of the set of labeled training documents are not labeled, and the page classifier training comprises: clustering pages of the set of labeled training documents to generate page clusters; and generating the page classifier based on the page clusters. | 06-09-2011 |
20110150323 | CATEGORIZATION QUALITY THROUGH THE COMBINATION OF MULTIPLE CATEGORIZERS - A system categorizes one or more objects based at least in part upon one or more characteristics associated therewith. A first classifier includes a rule set to determine if each of the one or more objects meets or exceeds a quality threshold. A second classifier, orthogonal to the first classifier, includes a rule set to determine if each of the one or more objects meets or exceeds a quality threshold. In one embodiment, the quality threshold associated with the first classifier and the quality threshold associated with the second classifier are less than a predetermined target threshold. The result for each object of the first classifier is compared to the result of the second classifier. The object is categorized if the result of the first classifier and the result of the second classifier match. The object is uncategorized if the result of the first classifier does not match the result of the second classifier. | 06-23-2011 |
20110192894 | METHOD FOR ONE-STEP DOCUMENT CATEGORIZATION AND SEPARATION - A method, apparatus, and hardcopy document are provided. The method provides for separating and categorizing documents and includes receiving a scanned batch of documents. The batch includes a plurality of scanned documents to which document separator stamps have been applied before scanning. Each document separator stamp includes first and second machine recognizable patterns applied on a same page of a document, the first and second patterns being spaced by a designated field for receiving a user-applied category code. The scanned batch of documents is processed to identify pages that contain a document separator, the processing including identifying at least one of the first and second spaced patterns. For each of a plurality of document pages for which a document separator is identified, the method includes locating the corresponding designated field and identifying the category code associated with the designated field. The document containing the identified separator is separated from other documents in the batch based on at least the identified separator and a document category is assigned to the document from a set of document categories, based on the identified category code. | 08-11-2011 |
20110200256 | OPTICAL MARK CLASSIFICATION SYSTEM AND METHOD - A system, method, and apparatus for mark recognition in an image of an original document are provided. The method/system takes as input an image of an original document in which at least one designated field is provided for accepting a mark applied by a user (which may or may not have been marked). A region of interest (RoI) is extracted from the image, roughly corresponding to the designated field. A center of gravity (CoG) of the RoI is determined, based on a distribution of black pixels in the RoI. Thereafter, for one or more iterations, the RoI is partitioned into sub-RoIs, based on the determined CoG, where at a subsequent iteration, sub-RoIs generated at the prior iteration serve as the RoI partitioned. Data is extracted from the RoI and sub-RoIs at one or more of the iterations, which allows a representation of the entire RoI to be generated which is useful in classifying the designated field, e.g., as positive (marked) or negative (not marked). | 08-18-2011 |
20110311145 | SYSTEM AND METHOD FOR CLEAN DOCUMENT RECONSTRUCTION FROM ANNOTATED DOCUMENT IMAGES - A computer-implemented method and system for reconstructing a clean document from annotated document images and/or extracting annotations therefrom are provided. The method includes receiving a set of at least two annotated document images into computer memory, selecting a representative image from the set of annotated document images, performing a global alignment on each of the set of annotated document images with respect to the selected representative image, and forming a consensus document image based at least on the aligned annotated document images. A clean document based at least on the consensus document image is then formed which can be used for extracting the annotations. | 12-22-2011 |
20120033874 | Learning weights of fonts for typed samples in handwritten keyword spotting - A wordspotting system and method are disclosed. The method includes receiving a keyword and, for each of a set of typographical fonts, synthesizing a word image based on the keyword. A keyword model is trained based on the synthesized word images and the respective weights for each of the set of typographical fonts. Using the trained keyword model, handwritten word images of a collection of handwritten word images which match the keyword are identified. The weights allow a large set of fonts to be considered, with the weights indicating the relative relevance of each font for modeling a set of handwritten word images. | 02-09-2012 |
20120127540 | DOCUMENT SEPARATION BY DOCUMENT SEQUENCE RECONSTRUCTION BASED ON INFORMATION CAPTURE - A system and method for generating separations between jobs in a batch of scanned pages are provided. The method includes, for each of a set of jobs, capturing an image of a representative page of the job. Thereafter, the method includes scanning the set of jobs as a batch to generate a set of scanned pages. The captured images are compared with scanned pages in the set of scanned pages to identify, for at least one captured image, a respective scanned page which matches the captured image. At least one separator is generated for separating the set of scanned pages based on a location of the matching scanned page(s). | 05-24-2012 |
20120314954 | EMBEDDED FORM EXTRACTION DEFINITION TO ENABLE AUTOMATIC WORKFLOW CONFIGURATION - A system and methods are disclosed to automatically extract data from documents, such as scanned paper forms and/or digital forms that need to be pre-configured to understand a layout for the forms to be processed. The system extracts data from the form definition at a two dimensional barcode and dynamically configures a workflow with services for extracting desired user filled information from the data fields present on the form. Support for a re-flowable service is provided. | 12-13-2012 |
20140032558 | CATEGORIZATION OF MULTI-PAGE DOCUMENTS BY ANISOTROPIC DIFFUSION - A computer implemented system and method are provided for refining category scores for pages of a sequence of document pages that potentially includes document boundaries. The method uses initial category scores provided by a categorizer that considers one page at a time or concatenated pairs of pages (called bipages). The category scores represent the probability that a page belongs to a particular category. The method uses anisotropic diffusion to refine the initial page category scores using the scores of neighboring pages as a function of the probability that there is a boundary between the pages. The method may be performed iteratively. | 01-30-2014 |