Patent application number | Description | Published |
20100067793 | HANDWRITTEN WORD SPOTTER USING SYNTHESIZED TYPED QUERIES - A wordspotting system and method are disclosed for processing candidate word images extracted from handwritten documents. In response to a user inputting a selected query string, such as a word to be searched in one or more of the handwritten documents, the system automatically generates at least one computer-generated image based on the query string in a selected font or fonts. A model is trained on the computer-generated image(s) and is thereafter used in the scoring the candidate handwritten word images. The candidate or candidates with the highest scores and/or documents containing them can be presented to the user, tagged, or otherwise processed differently from other candidate word images/documents. | 03-18-2010 |
20100092084 | REPRESENTING DOCUMENTS WITH RUNLENGTH HISTOGRAMS - An apparatus and method are provided for generating a representation of an image which may be used in tasks such as classification, clustering, or similarity determination. An image, such as a scanned document, in which the pixel colorant values are quantized into a plurality of colorant quantization levels, is partitioned into regions, optionally at a plurality of different scales. For each region, a runlength histogram is computed, which may be a combination of sub-histograms for each of the colorant quantization levels and optionally each of plural directions. The runlength histograms, optionally normalized, can then be combined to generate a representation of the document image. | 04-15-2010 |
20100098343 | MODELING IMAGES AS MIXTURES OF IMAGE MODELS - A system and method for generating an image representation are provided. The image is modeled as a set of mixture weights, one for each of a set of reference image models, such as Gaussian mixture models (GMMs). The weights are derived by optimizing an objective function in which each reference image model is associated with its respective weight. | 04-22-2010 |
20100189354 | Modeling images as sets of weighted features - An apparatus, method, and computer program product are provided for generating an image representation. The method includes receiving an input digital image, extracting features from the image which are representative of patches of the image, generating weighting factors for the features based on location relevance data for the image, and weighting the extracted features with the weighting factors to form a representation of the image. | 07-29-2010 |
20100191532 | Model-based comparative measure for vector sequences and word spotting using same - An object comparison method comprises: generating a first ordered vector sequence representation of a first object; generating a second ordered vector sequence representation of a second object; representing the first object by a first ordered sequence of model parameters generated by modeling the first ordered vector sequence representation using a semi-continuous hidden Markov model employing a universal basis; representing the second object by a second ordered sequence of model parameters generated by modeling the second ordered vector sequence representation using a semi-continuous hidden Markov model employing the universal basis; and comparing the first and second ordered sequences of model parameters to generate a quantitative comparison measure. | 07-29-2010 |
20100191743 | Contextual similarity measures for objects and retrieval, classification, and clustering using same - A method of comparing first and second objects in a context comprises: maximizing or minimizing a quantitative comparison of the first object and a mixture of the second object and the context respective to a weighting parameter that controls the mixture; and outputting a comparison value based on the value of the weighting parameter determined by the maximizing or minimizing. | 07-29-2010 |
20110040711 | TRAINING A CLASSIFIER BY DIMENSION-WISE EMBEDDING OF TRAINING DATA - A classifier training method and apparatus for training, a linear classifier trained by the method, and its use, are disclosed. In training the linear classifier, signatures for a set of training samples, such as images, in the form of multi-dimension vectors in a first multi-dimensional space, are converted to a second multi-dimension space, of the same or higher dimensionality than the first multi-dimension space, by applying a set of embedding functions, one for each dimension of the vector space. A linear classifier is trained in the second multi-dimension space. The linear classifier can approximate the accuracy of a non-linear classifier in the original space when predicting labels for new samples, but with lower computation cost in the learning phase. | 02-17-2011 |
20110078191 | HANDWRITTEN DOCUMENT CATEGORIZER AND METHOD OF TRAINING - A method and an apparatus for training a handwritten document categorizer are disclosed. For each category in a set into which handwritten documents are to be categorized, discriminative words are identified from the OCR output of a training set of typed documents labeled by category. A group of keywords is established including some of the discriminative words identified for each category. Samples of each of the keywords in the group are synthesized using a plurality of different type fonts. A keyword model is then generated for each keyword, parameters of the model being estimated, at least initially, based on features extracted from the synthesized samples. Keyword statistics for each of a set of scanned handwritten documents labeled by category are generated by applying the generated keyword models to word images extracted from the scanned handwritten documents. The categorizer is trained with the keyword statistics and respective handwritten document labels. | 03-31-2011 |
20110123967 | DIALOG SYSTEM FOR COMPREHENSION EVALUATION - An automated system, apparatus and method for evaluation of comprehension are disclosed. The method includes receiving an input text and natural language processing the text to identify dependencies between text elements in the input text. Grammar rules are applied to generate questions and associated answers from the processed text, at least some of the questions being based on the identified dependencies. A set of the generated questions is posed to a reader of the input text and the comprehension of the reader evaluated, based on the reader's responses to the questions posed. | 05-26-2011 |
20130053141 | PHOTOGRAPH-BASED GAME - A system and a method for playing a photograph-based game are provided. The method includes establishing a communication link between a game playing system and one or more game playing devices, each of which is operated by a respective player. Game rules are presented to the player(s) on the respective game playing device(s). The game rules include at least one task for the submission of at least one photographic image. Provision is made for receiving a photographic image in the game playing system which has been submitted via the established link from the game playing device in response to the presented task. An image signature is computed for the submitted photographic image based on visual features extracted from the image and a relevance to the task is computed, based on the computed image signature. A score for the game is output for each player, based on the computed relevance of the submitted images for each of the tasks. | 02-28-2013 |
20130064444 | DOCUMENT CLASSIFICATION USING MULTIPLE VIEWS - A training system, training method, and a system and method of use of a trained classification system are provided. A classifier may be trained with a first “cheap” view but not using a second “costly” view of each of the training samples, which is not available at test time. The two views of samples are each defined in a respective original feature space. An embedding function is learned for embedding at least the first view of the training samples into a common feature space in which the second view can also be embedded or is the same as the second view original feature space. Labeled training samples (first view only) for training the classifier are embedded into the common feature space using the learned embedding function. The trained classifier can be used to predict labels for test samples for which the first view has been embedded in the common feature space with the embedding function. | 03-14-2013 |
20130156302 | HANDWRITTEN WORD SPOTTER SYSTEM USING SYNTHESIZED TYPED QUERIES - A wordspotting system and method are disclosed for processing candidate word images extracted from handwritten documents. In response to a user inputting a selected query string, such as a word to be searched in one or more of the handwritten documents, the system automatically generates at least one computer-generated image based on the query string in a selected font or fonts. A model is trained on the computer-generated image(s) and is thereafter used in the scoring the candidate handwritten word images. The candidate or candidates with the highest scores and/or documents containing them can be presented to the user, tagged, or otherwise processed differently from other candidate word images/documents. | 06-20-2013 |
20140219563 | LABEL-EMBEDDING FOR TEXT RECOGNITION - A system and method for comparing a text image and a character string are provided. The method includes embedding a character string into a vectorial space by extracting a set of features from the character string and generating a character string representation based on the extracted features, such as a spatial pyramid bag of characters (SPBOC) representation. A text image is embedded into a vectorial space by extracting a set of features from the text image and generating a text image representation based on the text image extracted features. A compatibility between the text image representation and the character string representation is computed, which includes computing a function of the text image representation and character string representation. | 08-07-2014 |
20140350961 | TARGETED SUMMARIZATION OF MEDICAL DATA BASED ON IMPLICIT QUERIES - A system and method for targeted summarization of a patient's electronic medical records are provided. The system includes an aggregation component which provides an aggregation of health records of a patient. A transformation component transforms the health records of the patient into representations in a multidimensional search space. A search component generates an implicit query in the multidimensional search space and retrieves responsive heath records based on the implicit query. A summarization component generates a summary based on the retrieved responsive health records for display to a healthcare provider on an associated user interface. A processor implements the aggregation component, transformation component, search component, and summarization component. | 11-27-2014 |
20140355835 | SYSTEM AND METHOD FOR OCR OUTPUT VERIFICATION - A system and method for computing confidence in an output of a text recognition system includes performing character recognition on an input text image with a text recognition system to generate a candidate string of characters. A first representation is generated, based on the candidate string of characters, and a second representation is generated based on the input text image. A confidence in the candidate string of characters is computed based on a computed similarity between the first and second representations in a common embedding space. | 12-04-2014 |