Distinguishing text from other regions

Subclass of:

382 - Image analysis

382173000 - IMAGE SEGMENTATION

Patent class list (only not empty are listed)

Deeper subclasses:

Document	Title	Date
Entries
20080199076	Image processing method, image processing apparatus, image forming apparatus, program, and storage medium - An image processing apparatus includes: a document type automatic classification section which determines whether input image data is image data for a text document or not; a newspaper document classification section which determines whether the input image data is image data for a newspaper document or not; a segmentation process section which identifies a page-background region in the input image data; and a color correction section for, if the input image data is classified as the text document and but not the newspaper document and if a page-background removal process is to be performed to the input image data, performing a first page-background removal process to the image data, but if the input image data is classified as the text document and the newspaper document, not performing the first page-background removal process to the image data. This makes it possible to prevent deterioration of visual sharpness of the text in the document image printed on the newspaper.	08-21-2008
20080205758	Distortion Correction of a Captured Image - Disclosed are embodiments of systems and methods for eliminating or reducing the distortion in a scanned image. Embodiments of the present invention allow for the automatic pruning, de-skewing, and unwarping of an image using document layout information. In embodiments, dominant baselines may be selected by examining the letter regions on boundary baselines rather than examining the entire document layout. The dominant baselines may then be used to reduce distortion in the image. It shall be noted that present invention is robust enough to handle many types of content, including different languages, as well as documents with different layouts. The present invention may also be applied to images obtained from bound documents and flat documents.	08-28-2008
20080253655	TEXT AND GRAPHIC SEPARATION METHOD AND TEXT ENHANCEMENT METHOD - The present invention provides a text and graphic separation method and a text enhancement method. The text and graphic separation method is used for separating texts and graphics of an image and comprises coarse classification and advanced classification. The method of the present invention also adjusts the luminance of the text to enhance the text image according to the separation result.	10-16-2008
20080267502	Variable skew correction system and method - A variable skew correction system comprises a de-skew application executable to transform scanned image data of a document exhibiting a variable skew condition to an output model representing a de-skewed image of the document by transferring pixel data for each of a plurality of raster lines of the scanned image data to the output model, wherein a variable spacing between at least two adjacent raster lines of the scanned image data is modified in the output model to correct non-linear distortion of the scanned image data.	10-30-2008
20080273796	Image Text Replacement - Image text enhancement techniques are described. In an implementation, graphically represented text included in an original image is converted into process capable text. The process capable text may be used to generate a text image which may replace the original text to enhance the image. In further implementations the process capable text may be translated from a first language to a second language for inclusion in the enhanced image.	11-06-2008
20080279453	OCR enabled hand-held device - A method of processing image data consistent with certain embodiments involves defining a segment of a visual field using a laser pointer; capturing an image of the segment of the visual field; and processing the captured segment to produce associated text associated with the selected segment. This abstract is not to be considered limiting, since other embodiments may deviate from the features described in this abstract.	11-13-2008
20080310718	Information Extraction in a Natural Language Understanding System - A method of extracting information from text within a natural language understanding system can include processing a text input through at least one statistical model for each of a plurality of features to be extracted from the text input. For each feature, at least one value can be determined, at least in part, using the statistical model associated with the feature. One value for each feature can be combined to create a complex information target. The complex information target can be output.	12-18-2008
20080310719	IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM - An image processing apparatus includes an analyzing unit configured to analyze an incomplete portion of input image data; and an obtaining unit configured to identify a storage location of original data corresponding to the input image data from the input image data, and to obtain the original data from the storage location. The original data obtained by the obtaining unit is corrected on the basis of a result of analysis by the analyzing unit to generate a complete image, and the complete image is output.	12-18-2008
20080317343	Methods and Systems for Identifying Text Orientation in a Digital Image - Aspects of the present invention relate to systems and methods for determining text orientation in a digital image.	12-25-2008
20080317344	Printing apparatus and method with respect to medium - A user operates the operation panel portion	12-25-2008
20090003700	Precise Identification of Text Pixels from Scanned Document Images - A system or method for identifying text in a document. A group of connected components is created. A plurality of characteristics of different types is calculated for each connected component. Statistics are computed which describe the group of characteristics. Outlier components are identified as connected components whose computed characteristics are outside a statistical range. The outlier components are removed from the group of connected components. Text pixels are identified by segmenting pixels in the group of connected components into a group of text pixels and a group of background pixels.	01-01-2009
20090003701	METHOD AND APPARATUS FOR APPLYING STEGANOGRAPHY TO DIGITAL IMAGE FILES - A method and apparatus for applying steganography to digital image files are provided. This algorithm uses the basic idea of steganography. In one aspect of the invention a method is provided. The method comprises (a) taking a text message as input, (b) breaking up the input data into a series of bits, and (c) passing it to an encryption mechanism to merge into a bit map image. Thus, less important information from a bit map image is removed and hidden data (series of bits) is injected in its place. It is thus possible to (a) retrieve the entire bit map file, (b) remove its header information, (c) retrieve each byte one by one and (d) put the input data in each byte.	01-01-2009
20090016604	Invisible Junction Features for Patch Recognition - The present invention uses invisible junctions which are a set of local features unique to every page of the electronic document to match the captured image to a part of an electronic document. The present invention includes: an image capture device, a feature extraction and recognition system and database. When an electronic document is printed, the feature extraction and recognition system captures an image of the document page. The features in the captured image are then extracted, indexed and stored in the database. Given a query image, usually a small patch of some document page captured by a low resolution image capture device, the features in the query image are extracted and compared against those stored in the database to identify the query image. The present invention also includes methods for feature extraction, feature indexing, feature retrieval and geometric estimation.	01-15-2009
20090016605	System and method for creating an editable template from a document image - Embodiments of the present invention recite a system and method for creating an editable template from a document image. In one embodiment of the present invention, the spatial characteristics and the color characteristics of at least one region of a document are identified. A set of characteristics of a graphic representation within the region are then determined without the necessity of recognizing a character comprising the graphic representation. An editable template is then created comprising a second region having the same spatial characteristics and the same color characteristics of the at least one region of the document and comprising a second graphic representation which is defined by the set of characteristics of the first graphic representation.	01-15-2009
20090016606	METHOD, SYSTEM, DIGITAL CAMERA AND ASIC FOR GEOMETRIC IMAGE TRANSFORMATION BASED ON TEXT LINE SEARCHING - The present invention provides a method, system and/or a digital camera providing a geometrical transformation of deformed images of documents comprising text, by text line tracking, resulting in an image comprising parallel text lines. The transformed image is provided as an input to an OCR program either running in a computer system or in a processing element comprised in said digital camera.	01-15-2009
20090041352	IMAGE FORMATION DEVICE, IMAGE FORMATION METHOD, AND COMPUTER-READABLE RECORDING MEDIUM RECORDING IMAGE FORMATION PROGRAM - An image formation device includes an obtaining unit obtaining an input image and a first reduction scale, a first extraction unit extracting a first region including a character from the input image, a first setting unit setting a second reduction scale greater than the first reduction scale when a size of the reduced character is smaller than a first size, and a formation unit forming a second processed image by reducing the first region at the second reduction scale when the size of the character is smaller than the first size.	02-12-2009
20090067719	System and method for automatic segmentation of ASR transcripts - Text segmentation based on topic boundary detection has been an industry problem in automating information dissemination to targeted users. A system for automatic segmentation of ASR output text involves boundary identification based on “topic” changes. The proposed approach is based on building a weighted graph to determine dependency in input sentences based on bi-directional analysis of the input sentences. Furthermore, the input sentences are segmented based on the notion of segment cohesiveness and the segmented sentences are merged based on preamble and postamble analyses.	03-12-2009
20090080774	Hybrid Graph Model For Unsupervised Object Segmentation - This disclosure describes an integrated framework for class-unsupervised object segmentation. The class-unsupervised object segmentation occurs by integrating top-down constraints and bottom-up constraints on object shapes using an algorithm in an integrated manner. The algorithm describes a relationship among object parts and superpixels. This process forms object shapes with object parts and oversegments pixel images into the superpixels, with the algorithm in conjunction with the constraints. This disclosure describes computing a mask map from a hybrid graph, segmenting the image into a foreground object and a background, and displaying the foreground object from the background.	03-26-2009
20090092318	ONE-SCREEN RECONCILIATION OF BUSINESS DOCUMENT IMAGE DATA, OPTICAL CHARACTER RECOGNITION EXTRACTED DATA, AND ENTERPRISE RESOURCE PLANNING DATA - Systems and methods of reconciling data from an imaged document. In one embodiment, a business document is scanned to create a business document image. A set of extracted data is extracted from the business document image via optical character recognition (OCR). The set of OCR extracted data is then compared with data in business information management or enterprise resource planning (ERP) system. A set of ERP data is retrieved from the ERP system that relates to the set of OCR extracted data. The retrieved ERP data is than assigned to the set of OCR extracted data to create a set of assigned data. The business document image is then displayed in a business document image pane, the set of OCR extracted data is displayed in the OCR data pane, and the retrieved ERP data is displayed in the ERP data pane. The set of assigned data is validated, and the ERP system is updated with the set of validated, assigned data. In other embodiments, data is extracted from text files without using OCR.	04-09-2009
20090097750	IMAGE PROCESSING APPARATUS - An information embedding apparatus (	04-16-2009
20090110279	SYSTEM AND METHOD FOR EXTRACTING AND ORGANIZING DATA FROM ELECTRONIC IMAGES - A method of extracting and organizing data from electronic images includes processing a set of data fields representative of data to be extracted, mapping at least a subset of the set of data fields to at least one subclient, and attaching a rule from a set of rules to at least one of the mapped data fields. Each rule in the set of rules represents a transformation from a first data format to a preferred data format. The method also includes extracting data from at least one electronic image for the at least one subclient into the plurality of mapped data fields using the attached rule and storing the extracted data.	04-30-2009
20090110280	IMAGE RECOGNITION APPARATUS, IMAGE RECOGNITION PROGRAM, AND IMAGE RECOGNITION METHOD - An image recognition method is conducted by recognizing logical elements based on a logical structure model set to correspond to the logical structure of an image of individual character strings, collecting information processed with the logical structure model of images of a logical structure, acquiring a recognition result when recognizing an image of a logical structure by processing information collected with a post-update logical structure model,	04-30-2009
20090123070	Segmentation-based image processing system - A digital image can be processed by an image processing method that calculates a gradient map for the digital image, calculates a density function for the gradient map, calculates a modified gradient map using the gradient map, the density function and the selected scale level, and segments the modified gradient map. Prior to segmenting the modified gradient map, a sub-image of the digital image can be segmented at the selected scale level to determine if the selected scale level will give the desired segmentation.	05-14-2009
20090123071	DOCUMENT PROCESSING APPARATUS, DOCUMENT PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT - In a document processing apparatus, a first character information extracting unit extracts, for a first area that is an area determined to be a character extractable area in divided areas of a document information, first character information from the area; a second character information extracting unit extracts, for a second area that is an area not determined to be the character extractable area in the divided areas, a character code by performing a character recognition processing on a document image generated from the document information as second character information; and a storing unit stores therein the first character information, the second character information, and at least one of the document information and the document image in association with each other.	05-14-2009
20090136133	Personalized fetal ultrasound image design - (Process) the consumer submits their fetal ultrasound image along with an order form (includes requirements). The image is scanned and saved as a JPEG file or similar. If required the image is enhanced (cropped, brightness, etc.), saved and moved to Microsoft Publisher or similar software. The caption design and text, the phrase design and text, are created and saved in Microsoft Publisher/similar. The caption and phrase are placed over the fetal image, positioned and saved as a Publisher file. This file is the Personalized Fetal Ultrasound Image Design. Standard format is available to consumers. (Product) the Personalized Fetal Ultrasound Image Design is unique because it captures a personalized image of the fetus; a personalized caption; a personalized phrase. The personalized image can be copied, printed or transferred onto memorabilia and keepsakes such as t-shirts, pictures, mugs, blankets, magnets, bookmarks, etc. Each is unique, original, one of a kind.	05-28-2009
20090148042	TEXT REPRESENTATION METHOD AND APPARATUS - A text-like data representation technique and a text-like data representation apparatus are disclosed that my acquire image data from a scanned image; segment text regions from the image data; further extract each connected component in the text regions; form clusters based on the connected components; group each connected component in the text regions into one of the clusters with similar or identical characters; generate a high-resolution representative for each cluster; generate a vector representation for each high-resolution representative; and code the text as text data by associating each connected component with its vectorized high-resolution representative, and location in the document.	06-11-2009
20090148043	METHOD FOR EXTRACTING TEXT FROM A COMPOUND DIGITAL IMAGE - Text is extracted from a grayscale or color compound digital image. Kernels of text in the compound digital image are found using a stroke operator. The kernels of text are segmented into text blocks based on image space, color space, and intensity space. Each text block is segmented into text and background pixels using active contour analysis. The segmented text blocks are refined by altering parameters in the active contour analysis. Text is extracted from the refined segmented text blocks, and a binary image is created including text extracted from the refined segmented text blocks.	06-11-2009
20090245641	DOCUMENT PROCESSING APPARATUS AND PROGRAM - A document processing apparatus includes a region extracting unit that extracts a plurality of regions in a document image, a recognition unit that recognizes a character string, a conversion unit that converts the recognized character string, a setting unit that sets first boundary lines that surrounds the document image and at least one second boundary line in a space between adjacent regions of the plurality of regions, an enlargement/reduction unit that moves in parallel at least one line of the first and second boundary lines under a restraint condition that at least one line does not intersect any of the plurality of regions, and enlarges or reduces at least one of the regions in accordance with the parallel movement so long as each region does not get out of a cell; and an insertion unit that inserts the converted character string into each of the regions.	10-01-2009
20090252415	Method for retrieving text blocks in documents - To classify text blocks in printed material which is part of bulk postal items structure-related characteristics of one of the text blocks of a postal item are extracted, wherein the characteristics are characterized by graphical properties of the overall text block. The extracted structure-related characteristics are assigned to a characteristic data record of the postal item, and a characteristic data record of a reference text block is compared to the characteristic data record of the postal item.	10-08-2009
20090263019	OCR of books by word recognition - Disclosed embodiments of the invention provide automated global optimization methods and systems of OCR, tailored to each document being digitized. A document-specific database is created from an OCR scan of a document of interest, which contains an exhaustive listing of words in the document. Images of each word, taken from all the fonts encountered, are entered into the database and mapped to a corresponding textual representation. After entry of a first instance of an image of a word written in a particular font, each new occurrence of the word in that font can be quickly recognized by image processing techniques. The disclosed methods and systems may be used in conjunction with adaptive character recognition training and word recognition training of the OCR engines.	10-22-2009
20090279781	IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM - An image processing apparatus selects one extraction method among a plurality of extraction methods and then extracts feature information of objected image data, from the objected image data using the selected extraction method. The extracted feature information is registered, and the objected image data is output together with identification information indicating the extraction method that was used in the extraction.	11-12-2009
20090285482	DETECTING TEXT USING STROKE WIDTH BASED TEXT DETECTION - Detecting text using stroke width based text detection. As a part of the text detection, a representation of an image is generated that includes pixels that are associated with the stroke widths of components of the image. Connected components of the image are identified by filtering out portions of the pixels using metrics related to stroke width. Text is detected in the image based on the identified connected components.	11-19-2009
20090297027	ELECTRONIC DOCUMENT PRODUCING DEVICE, ELECTRONIC DOCUMENT PRODUCING METHOD AND STORAGE MEDIUM - An electronic document producing device has a correcting unit for correcting distortion of a first image to obtain a correction image, and a character recognition unit for executing character recognition processing on a plurality of character images contained in the correction image to obtain text data. The device also has a unit for finding a base line of each character row in the first image, and a unit for finding a relative position from the base line in regard to each character image in the first image. The device also includes a producing unit for producing an electronic document including the text data and the first image, wherein a position of the text data is described based on the relative position from the base line.	12-03-2009
20090316991	Method of Gray-Level Optical Segmentation and Isolation using Incremental Connected Components - A novel and useful method of using Incremental Connected Components to segment and isolate individual characters in a gray-scale or color image. For each pixel intensity of pixels in the image, a plurality of pixel groups are created comprising contiguous pixels of intensity equal to or less than the current pixel intensity. The pixel groups are then input to a character classifier which returns an identified character and a confidence value. Non-overlapping pixel groups (i.e. segmentation) of identified characters having the highest confidence values are then selected.	12-24-2009
20090324079	Methods and Systems for Region-Based Up-Scaling - Aspects of the present invention are related to systems and methods for region-based up-scaling, and in particular, for up-scaling still images and video frames that contain graphical elements.	12-31-2009
20090324080	IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD AND STORAGE MEDIUM - Character code data and vector drawing data are both listed and provided in a re-editable manner. Electronic data is generated in which information obtained by vectorizing character areas in an image and information obtained by recognizing characters in the image are stored in respective storage locations. As for the electronic data generated in this manner, because character code data and vector drawing data generated from the input image are both presented by a display and edit program, a user can immediately utilize the both data.	12-31-2009
20100002935	SYSTEM AND METHOD FOR DISPLAYING DIGITAL EDITIONS OF PERIODICALS AND PUBLICATIONS - Systems and methods are provided for displaying scanned information containing at least one of text and picture information of a document, wherein the document may include a plurality of scanned images. In one embodiment, the method may involve creating and utilizing at least one association of a given scanned image relative to one or more of the other scanned images of the document. The at least one association may define a relative location of the given scanned image with respect to one or more of the other scanned images.	01-07-2010
20100008579	SYSTEM AND METHOD FOR IDENTIFYING TEXT-BASED SPAM IN RASTERIZED IMAGES - A system, method and computer program product for identifying spam in an image, including (a) identifying a plurality of contours in the image, the contours corresponding to probable symbols; (b) ignoring contours that are too small or too large; (c) identifying text lines in the image, based on the remaining contours; (d) parsing the text lines into words; (e) ignoring words that are too short or too long from the identified text lines; (f) ignoring text lines that are too short; (g) verifying that the image contains text by comparing a number of pixels of a symbol color within remaining contours to a total number of pixels of the symbol color in the image, and that there is at least one text line after filtration; and (h) if the image contains text, rendering a spam/no spam verdict based on a contour representation of the text that which appears after step (f).	01-14-2010
20100008580	IMAGE PROCESSING APPARATUS AND IMAGE PROCESSING METHOD - An image scanner using an area sensor having a tilt reads a plurality of low-resolution image data having phase shifts from each other, and the low-resolution image data are converted into those on an orthogonal coordinate system by affine transformation. The number of data to be used is decided based on one of these low-resolution image data. The low-resolution image data as many as the designated number of data are saved, and high-resolution image data is generated by executing super-resolution processing.	01-14-2010
20100046838	SYSTEM AND METHOD FOR OBTAINING TEXT - A system includes a first supporting element, a first supporting element rotation module, a controller and a first optical head. The controller is configured to control a rotation of the first supporting element by the first supporting element rotation module. The first supporting element is coupled to the first optical head. The first supporting element rotation module is configured to rotate the first supporting element until text that is imprinted on a first side of a semiconductor wafer is located within a field of view of the first optical head. Semiconductors wafers of different size have text located at different locations. A method for obtaining a text imprinted on a first side of a semiconductor wafer, the method includes: determining a location of the text based on a size of the semiconductor wafer; rotating a first supporting element that is coupled to a first optical head until the text is located within a field of view of the first optical head; obtaining an image of the text by the first optical head; and translating the image of the text into text.	02-25-2010
20100061633	Method and Apparatus for Calculating the Background Color of an Image - A method, device and computer readable storage media for enhancing an image for optical character recognition by detecting the edges of the image to create an edge detected image, binarizing the edge detected image to create a binary edge image for processing, dilating the binary edge image to create a dilated binary edge image, taking the XOR difference between the binary edge image and the dilated binary edge image to obtain a text boundary, superimposing the text boundary on the image and determining the pixels of the image that are covered by the text boundary, calculating the average grayscale value of the pixels of the image that are covered by the text boundary, and setting background pixels of the image to the calculated average grayscale value of the pixels of the image that are covered by the text boundary. The method optionally includes the steps of performing edge filling and hole filling on the binary edge image to create an updated binary edge image and filling holes in the dilated binary edge image to create an updated dilated binary edge image, whereby the XOR difference is taken between the updated binary edge image and the updated dilated binary edge image. The image may be binarized for optimal results after the background images have been set to the calculated average grayscale value of the pixels that are covered by the text boundary.	03-11-2010
20100061634	Method of Retrieving Information from a Digital Image	03-11-2010
20100074525	Manipulating an Image by Applying a De-Identification Process - A method for manipulating an image, the method includes: capturing image information representative of an image that includes images of textual characters; recognizing the textual characters by applying Optical Character Recognition; identifying the layout of the image; and applying at least one de-identification process on textual characters of interest to provide de-identification process results.	03-25-2010
20100074526	Methods and Systems for Locating Text in a Digital Image - Aspects of the present invention are related to systems and methods for locating text in a digital image.	03-25-2010
20100080461	Methods and Systems for Locating Text in a Digital Image - Aspects of the present invention are related to systems and methods for locating text in a digital image.	04-01-2010
20100092087	METHODS AND APPARATUS FOR PERFORMING IMAGE BINARIZATION - Methods and apparatus for binarizing images represented by sets of multivalent pixel values in a computationally efficient manner are described In a grayscale image to be binarized, one group of pixel values represents “foreground”, e.g., text to be converted to black, while another group represents a shaded “background” region to be converted, e.g., to white. The difference between foreground and background is often a function of the scale of the image components, e.g., text and/or other images. Filters in the form of morphological operators, computationally efficient quick-open and quick-close morphological operators are employed to binarize images, e.g., grayscale images. The methods and apparatus effectively handle both smooth and sharp image background structures in a computationally efficient manner.	04-15-2010
20100098336	IMAGE PROCESSING APPARATUS - An image processing apparatus including an input part configured to input document data of a document, an extracting part configured to automatically extract partial image data from the document data, a storage part configured to store the document data and configuration data of the document data, a registering part configured to associate the document data with the partial image data and register the document data and the associated partial image data in the storage part, a generating part configured to generate push-type data based on the configuration data, and a transmitting part configured to transmit the push-type data.	04-22-2010
20100104187	Personal navigation device and related method of adding tags to photos according to content of the photos and geographical information of where photos were taken - A method of automatically adding tags to photos based on content of the photos and geographical information about where the photo was taken includes taking a photo with a camera of a personal navigation device, generating a geographical tag for the photo with the personal navigation device and attaching the geographical tag to the photo to generate a geotagged photo, transferring the geotagged photo to an optical character recognition (OCR) server, performing OCR on the geotagged photo with the OCR server and generating image description tags from text recognized in the geotagged photo, attaching selected tags to the geotagged photo, the selected tags being selected from the generated image description tags, and uploading the geotagged photo along with the attached selected tags to a photo sharing server, photos on the server being searchable by geographical tags or selected tags associated with the photos.	04-29-2010
20100142820	3 + 1 LAYER MIXED RASTER CONTENT (MRC) IMAGES HAVING A BLACK TEXT LAYER - A method, system and data structure for providing a 3+1 layer MRC image, including a black text layer. The black text layer includes pixel data corresponding to black text in an image and may be assigned a predetermined value for the color of black. According to one or more embodiments, using thresholding processing along with various morphological operations, the black text layer may be generated.	06-10-2010
20100166309	System And Methods For Creation And Use Of A Mixed Media Environment - A Mixed Media Reality (MMR) system and associated techniques are disclosed. The MMR system provides mechanisms for forming a mixed media document that includes media of at least two types (e.g., printed paper as a first medium and digital content and/or web link as a second medium). In one particular embodiment, the MMR system includes an MMR user, a MMR computer, a user printer that produces a printed document, a networked media server, an office portal, a service provider server, an electronic display that is electrically connected to a set-top box, a document scanner, a network, a capture device, a cellular infrastructure, wireless fidelity (Wi-Fi) technology, Bluetooth® technology, infrared (IR) technology, wired technology, and a geo location mechanism. The MMR system of the present invention provides mechanisms for forming a mixed media document that includes media of at least two types, such as printed paper as a first medium and a digital photograph, digital movie, digital audio file, or web link as a second medium. Furthermore, the MMR system of the present invention facilitates business methods that take advantage of the combination of a portable electronic device, such as a cellular camera phone, and a paper document.	07-01-2010
20100195909	SYSTEM AND METHOD FOR EXTRACTING INFORMATION FROM TEXT USING TEXT ANNOTATION AND FACT EXTRACTION - A fact extraction tool set (“FEX”) finds and extracts targeted pieces of information from text using linguistic and pattern matching technologies, and in particular, text annotation and fact extraction. Text annotation tools break a text, such as a document, into its base tokens and annotate those tokens or patterns of tokens with orthographic, syntactic, semantic, pragmatic and other attributes. A user-defined “Annotation Configuration” controls which annotation tools are used in a given application. XML is used as the basis for representing the annotated text. A tag uncrossing tool resolves conflicting (crossed) annotation boundaries in an annotated text to produce well-formed XML from the results of the individual annotators. The fact extraction tool is a pattern matching language which is used to write scripts that find and match patterns of attributes that correspond to targeted pieces of information in the text, and extract that information.	08-05-2010
20100202690	METHOD FOR RECOGNIZING TEXT FROM IMAGE - Disclosed is a method of recognizing a text from an image. The method includes dividing the image into a predefined number of regions through a clustering technique; setting a certain area of the regions as a background region; identifying the outer peripheral pixel and inner peripheral pixel of each region except for the background region of the divided regions; setting a region identified as having one of its outer peripheral pixel and its inner peripheral pixel corresponding to a pixel of the background region, as a boundary region; and setting a region identified as having any of its outer peripheral pixel and its inner peripheral pixel not corresponding to a pixel of the background region, as a center text region, and excluding the boundary region from a binary-coding object of the text.	08-12-2010
20100239165	Model-Based Dewarping Method And Apparatus - An apparatus and method for processing a captured image and, more particularly, for processing a captured image comprising a document. In one embodiment, an apparatus comprising a camera to capture documents is described. In another embodiment, a method for processing a captured image that includes a document comprises the steps of distinguishing an imaged document from its background, adjusting the captured image to reduce distortions created from use of a camera and properly orienting the document is described.	09-23-2010
20100239166	CHARACTER RECOGNITION DEVICE, IMAGE-READING DEVICE, COMPUTER READABLE MEDIUM, AND CHARACTER RECOGNITION METHOD - A character recognition device includes: an acquiring unit that acquires image data describing pixel values representing colors of pixels constituting an image; a binarizing unit that binarizes the pixel values; an extracting unit that extracts boundaries of colors in the image; a delimiting unit that delimits plural image areas in the image; a specifying unit that specifies, with regard to first image areas arranged according to a predetermined rule, pixels binarized by the binarizing unit, as a subject for character recognition, and specifies, with regard to second image areas not arranged according to the predetermined rule, pixels of areas surrounded by boundaries extracted by the extracting unit, as a subject for character recognition; and a character recognition unit that recognizes characters represented by the pixels specified by the specifying unit as a subject for character recognition.	09-23-2010
20100246958	TABLE GRID DETECTION AND SEPARATION - A technique is described for table grid detection and separation during the analysis and recognition of documents containing table contents. The technique includes the steps of table detection, grid separation, and table cell extraction. The technique is characterized by the steps of detecting the grid lines of a table using, for example, inverse cell detection, separating noise and touching text from the grid lines, and extracting the cell contents for OCR recognition.	09-30-2010
20100246959	APPARATUS AND METHOD FOR GENERATING ADDITIONAL INFORMATION ABOUT MOVING PICTURE CONTENT - An apparatus and method for generating additional information about moving picture content, including: comparing image feature information about each image frame in moving picture content with image feature information about each image frame in web information, searching for an image frame in the moving picture content, the image frame matching the image frame in the web information, determining location information about the found image frame in the moving picture content, and generating additional information by use of the determined location information and the web information.	09-30-2010
20100246960	Image Based Spam Blocking - A fingerprint of an image identified within a received message is generated following analysis of the message. A spam detection engine identifies an image within a message and converts the image into a grey scale image. The spam detection engine analyzes the grey scale image and assigns a score. A fingerprint of the grey scale image is generated based on the score. The fingerprint may also be based on other factors such as the message sender's status (e.g. blacklisted or whitelisted) and other scores and reports generated by the spam detection engine. The fingerprint is then used to filter future incoming messages.	09-30-2010
20100254605	MOVING TEXT DETECTION IN VIDEO - Methods and apparatus for detecting moving text in video comprising receiving consecutive frames from a video stream, extracting a sequence of pixels from the consecutive frames, categorizing the pixels, thinning the pixels, correlating corresponding thinned pixels in the frames, identifying the peaks that are equal to or exceed a threshold, and performing further processing on the peaks to determine if the peaks contain moving text.	10-07-2010
20100254606	METHOD OF RECOGNIZING TEXT INFORMATION FROM A VECTOR/RASTER IMAGE - A method is claimed for processing a vector-raster image file which contains a text image. The method comprises the steps of: fragmenting the image to obtain regions containing non-separable, logically connected fragments of text of the maximum possible size; processing text, vector, and raster objects; discarding excessive information; analyzing each object with the help of all available information. The step of processing text objects includes the steps of: dividing into separate characters and character groups according to supposed locations of blank spaces or other non-indicated symbols, and analyzing and assembling character groups into words and verifying and correcting characters encoding based on recognition of assembled words as raster objects. The step of processing vector objects includes the step of identifying separators, background, and substrates of blocks. The step of processing raster objects includes the steps of: analyzing non-text objects on order to detect text images within them, and/or detecting vector objects other than separators.	10-07-2010
20100260420	VARIABLE GLYPH SYSTEM AND METHOD - An automatic word segmentation and glyph variation system and method are provided herein.	10-14-2010
20110007970	SYSTEM AND METHOD FOR SEGMENTING TEXT LINES IN DOCUMENTS - Methods and systems of the present embodiment provide segmenting of connected components of markings found in document images. Segmenting includes detecting aligned text. From this detected material an aligned text mask is generated and used in processing of the images. The processing includes breaking connected components in the document images into smaller pieces or fragments by detecting and segregating the connected components and fragments thereof likely to belong to aligned text.	01-13-2011
20110026827	VISUALIZATION PROGRAM, VISUALIZATION METHOD AND VISUALIZATION APPARATUS FOR VISUALIZING READING ORDER OF CONTENT - A visualization program, method and apparatus for determining reading order of content in a structured document. The method includes generating, for each of a plurality of elements, a directed segment; storing, in the reading order, the generated directed segments of the elements into a storage device; reading from the storage device; linking together the directed segments for the elements in accordance with the reading order; and displaying the linked directed segments overlaid on the structured document which is displayed on the screen. A computer implemented program and an apparatus for carrying out the above method are also provided.	02-03-2011
20110052062	SYSTEM AND METHOD FOR IDENTIFYING PICTURES IN DOCUMENTS - A system and method to identify pictures in documents. An image representing a page of a document is received. The image is analyzed to identify text objects in the page. A masked image is generated by masking out regions of the image including the text objects in the page. Groups of pixels in the masked image are identified, wherein a respective group of pixels corresponds to at least one picture in the page. When there is one or more groups of pixels, regions for pictures are identified based on the one or more groups of pixels. Metadata tags for the pictures are stored, wherein a respective metadata tag for a respective picture includes information about a respective bounding box for the respective picture.	03-03-2011
20110064310	IMAGE FORMING APPARATUS THAT AUTOMATICALLY CREATES AN INDEX AND A METHOD THEREOF - An image forming apparatus capable of automatically creating an index, and a method for the same. The image forming apparatus includes a scan unit to scan a document, a text/image separation unit to separate the scanned document into a text area and an image area and to separate texts in the text area into symbols, an index determination unit to extract one or more properties of the separated symbols and to compare the extracted symbol properties with one or more index thresholds to determine whether text including the symbols is an index object, and an index page creation unit to create an index page including the text determined as the index object and information about a page including the text that corresponds to the index object. Accordingly, since the index page is automatically created, main contents of each page of the document can be easily selected and/or presented. Also, a search for desired contents in the document is facilitated by a link between the index page and original contents of the pages in the document, thereby improving user convenience.	03-17-2011
20110069885	3+N LAYER MIXED RATER CONTENT (MRC) IMAGES AND PROCESSING THEREOF - A method for processing image data includes using advantages of both a three-layer MRC model and an N-layer MRC model to create a new 3+N layer MRC model and to generate a 3+N layer MRC image. The method includes providing input image data; segmenting the input image data to generate: (i) a background layer representing the background and the pictorial attributes of the image data, (ii) one or more binary foreground layers, (iii) a selector layer, and (iv) a contone foreground layer representing the foreground attributes of the image data on the background layer; and integrating the background layer, the selector layer, the contone foreground layer, and the one or more binary foreground layers into a data structure having machine-readable information for storage in a memory device. Each binary foreground layer includes one or more pixel clusters representing text pixels of a particular color in the input image data.	03-24-2011
20110123114	CHARACTER RECOGNITION DEVICE AND METHOD AND COMPUTER-READABLE MEDIUM CONTROLLING THE SAME - A character recognition device to recognize characters after preprocessing an input image corrects distortion. The character recognition device includes an image input unit to receive an image acquired by an image device, a character position estimator to calculate a probability value of a position of characters of the image to estimate the position of the characters, an image preprocessor to detect a plurality of edges including the characters from the image and to correct distortion of the edges, and a character recognizer to recognize the characters included in a rectangle formed by the plurality of edges.	05-26-2011
20110158532	APPARATUS FOR DETECTING TEXT RECOGNITION REGION AND METHOD OF RECOGNIZING TEXT - A text recognition region detecting apparatus and a text recognition method are provided. A text recognition region is detected by expanding a region based on a user-specified position that is input through a simple manipulation by a user. A text recognition is performed on the detected text recognition region, thereby relieving a user from having to precisely input the text region and ensuring the user's convenience.	06-30-2011
20110158533	ROBUST AND EFFICIENT IMAGE IDENTIFICATION - Apparatus for matching a query image against a catalog of images, comprises: a feature extraction unit operative for extracting principle features from said query image; a relationship unit operative for establishing relationships between a given principle feature and other features in the image, and adding said relationships as relationship information alongside said principle features; and a first comparison unit operative for comparing principle features and associated relationship information of said query image with principle features and associated relationship information of images of said catalog to find candidate matches.	06-30-2011
20110188753	IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD AND COMPUTER READABLE MEDIUM - An image processing device includes: an acquisition section that acquires subject image information to be formed on a medium; an extraction section that selectively extracts a part of the subject image information corresponding to a portion of an image not formed due to a plurality of holes of a medium if an image relating to the subject image information is formed on the medium perforated with the plurality of holes; and a generation section that generates new subject image information by generating a command for forming the extracted part of the subject image information.	08-04-2011
20110194770	DOCUMENT EDITING APPARATUS AND METHOD - A method for storing a document recognition result is proposed. The method includes selecting a picture area from a document image, storing an image of the selected picture area in an image file format, removing the selected picture area, filling the removed picture area with a surrounding background color, and performing character recognition of a text area.	08-11-2011
20110206281	METHOD FOR FAST UP-SCALING OF COLOR IMAGES AND METHOD FOR INTERPRETATION OF DIGITALLY ACQUIRED DOCUMENTS - Method for up-scaling a color image prior to performing subsequent processing on said color image, comprising the steps of converting the color image into multiple image layers distinguishable from each other and up-scaling at least one of said multiple image layers. The up-scaling is tuned towards the subsequent processing, for example luminance is upscaled at higher quality than chrominance. Further, a method for interpreting information present on digitally acquired documents, comprising the steps of: (i) determining a country; (ii) identifying a list of languages and character sets in use in said country; (iii) performing optical character recognition simultaneously using all languages and character sets of the list; (iv) performing field parsing to identify fields in the digitally acquired document on the basis of international as well as country-specific field recognition rules; (v) storing the recognized information according to the identified fields in a database.	08-25-2011
20110222771	PAGE LAYOUT DETERMINATION OF AN IMAGE UNDERGOING OPTICAL CHARACTER RECOGNITION - A method and system is provided for identifying a page layout of an image that includes textual regions. The textual regions are to undergo optical character recognition (OCR). The system includes an input component that receives an input image that includes words around which bounding boxes have been formed and a text identifying component that groups the words into a plurality of text regions. A reading line component groups words within each of the text regions into reading lines. A text region sorting component that sorts the text regions in accordance with their reading order.	09-15-2011
20110229035	IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM - Even when captions of a plurality of objects use an identical anchor expression, the present invention can associate an appropriately explanatory text in a body text as metadata with the objects.	09-22-2011
20110243444	SEGMENTATION OF TEXTUAL LINES IN AN IMAGE THAT INCLUDE WESTERN CHARACTERS AND HIEROGLYPHIC CHARACTERS - An image processing apparatus segments Western and hieroglyphic portions of textual lines. The apparatus includes an input component that receives an input image having at least one textual line. The apparatus also includes an inter-character break identifier component that identifies candidate inter-character breaks along a textual line and an inter-character break classifier component. The inter-character break classifier component classifies each of the candidate inter-character breaks as an actual break, a non-break or an indeterminate break based at least in part on the geometrical properties of each respective candidate inter-character break and the bounding boxes adjacent thereto. A character recognition component recognizes the candidate characters based at least in part on a feature set extracted from each respective candidate character that can be histogram features, Gabor features or any other feature set applicable to character recognition. A Western and hieroglyphic text classifier component finds and classifies textual line segments as Western text segments or hieroglyphic text segments and further passes the recognition results to an output component.	10-06-2011
20120008864	IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND COMPUTER READABLE MEDIUM - An apparatus comprises: unit configured to divide input document data into a body region, a caption region, and an object region; unit configured to acquire text information included in each of the body region and the caption region; unit configured to search the text information in the body region for an anchor term, to extract an anchor term from the text information in the caption region, and to generate a bi-directional link between a portion corresponding to the anchor term in the body region and a portion of the object region to which the caption region is appended; and unit configured to convert the input document data into digital document data in which the portion corresponding to the anchor term in the body region and the portion corresponding to the object region to which the caption region is appended are bi-directionally linked based on the link.	01-12-2012
20120033887	IMAGE PROCESSING APPARATUS, COMPUTER READABLE MEDIUM STORING PROGRAM, AND IMAGE PROCESSING METHOD - An image processing apparatus includes a receiving unit, a path calculation unit, and a separation unit. The receiving unit receives an image including at least a character image. The path calculation unit calculates a separation path in the image received by the receiving unit. The separation path is a line segment that separates a character image from the image. The separation unit separates the image received by the receiving unit into plural character images using a separation path calculated by the path calculation unit. The path calculation unit calculates a separation path within a predetermined range including a portion of a character image in the image so that a cumulative value of luminance values of pixels along the separation path satisfies a predetermined condition.	02-09-2012
20120039536	OPTICAL CHARACTER RECOGNITION WITH TWO-PASS ZONING - An image of a paginated document is zoned to identify text zones. First-pass character recognition is performed on the text zones to generate textual content corresponding to the paginated document. The image of the paginated document is re-zoned based on the textual content to identify one or more new text zones. Second-pass character recognition is performed on at least the new text zones to generate updated textual content corresponding to the paginated document.	02-16-2012
20120045129	Document image processing method and apparatus - A method for processing a document image includes: performing horizontal and vertical text line extraction on the document image; providing an overlapping matrix, a value of an element of the overlapping matrix indicating an overlapping relation between horizontal and vertical text lines; merging the overlapping matrix in the vertical and horizontal direction; determining one or more text overlapping regions in the document image, based on the values of the elements of the merged overlapping matrix; counting the total number of strokes or pixel points in the horizontal and vertical text lines, respectively, within one of the one or more text overlapping regions; and determining an orientation of the text overlapping region is horizontal if the total number of strokes or pixel points in the horizontal text lines is larger than that in the vertical text lines, otherwise, determining the orientation is vertical.	02-23-2012
20120076413	Methods and Systems for Automatic Extraction and Retrieval of Auxiliary Document Content - Aspects of the present invention are related to systems and methods for automatically extracting, from a document image, references to relevant external content and automatically retrieving the external content associated with the references.	03-29-2012
20120076414	External Image Based Summarization Techniques - Techniques involve visually summarizing documents (e.g., search results, a collection of documents, etc.) using images which are visually representative of the documents for which the images represent. The images representing the documents may be external images obtained from sources other than the documents. The external images may be obtained from the sources other than the documents by performing a separate image based search using key phrases from the documents rather than extracting the images directly from within the documents themselves. Alternatively, an algorithm may be used to determine an image type, which may be chosen from a selection of external images, thumbnail images, or internal imaged taken directly from the collection of documents, that is suited to represent each document in the collection of documents. A snippet of the documents may be displayed along with the images which visually represent each of the documents.	03-29-2012
20120114241	USING EXTRACTED IMAGE TEXT - Methods, systems, and apparatus including computer program products for using extracted image text are provided. In one implementation, a computer-implemented method is provided. The method includes receiving an input of one or more image search terms and identifying keywords from the received one or more image search terms. The method also includes searching a collection of keywords including keywords extracted from image text, retrieving an image associated with extracted image text corresponding to one or more of the image search terms, and presenting the image.	05-10-2012
20120114242	Method and System for Identifying Addressing Data Within a Television Presentation - Characters represented within a frame of a television presentation are identified. A pattern formed by a subset of the characters is identified if the pattern is indicative of an addressing datum. A provision is made for a selection of characters that form the pattern indicative of the addressing datum. In one embodiment, a web page is displayed upon a selection of characters that form a pattern indicative of a uniform resource locator for the web page.	05-10-2012
20120134588	RECTIFICATION OF CHARACTERS AND TEXT AS TRANSFORM INVARIANT LOW-RANK TEXTURES - A “Text Rectifier” provides various techniques for processing selected regions of an image containing text or characters by treating those images as matrices of low-rank textures and using a rank minimization technique that recovers and removes image deformations (e.g., affine and projective transforms as well as general classes of nonlinear transforms) while rectifying the text or characters in the image region. Once distortions have been removed and the text or characters rectified, the resulting text is made available for a variety of uses or further processing such as optical character recognition (OCR). In various embodiments, binarization and/or inversion techniques are applied to the selected image regions during the rank minimization process to both improve text rectification and to present the resulting images of text to an OCR engine in a form that enhances the accuracy of the OCR results.	05-31-2012
20120163718	Removing character from text in non-image form where location of character in image of text falls outside of valid content boundary - Data representing an image of text is received, as is data representing the text in non-image form. A valid content boundary within the image of the text is determined. For each character within the text in the non-image form, a location of the character within the image of the text is determined. Where the location of the character within the image of the text falls outside the valid content boundary, the character is removed from the data representing the text in the non-image form.	06-28-2012
20120177290	AUTOMATIC TABLE LOCATION IN DOCUMENTS - A method for locating tables in documents includes defining a plurality of tiles for a document, for each tile, determining a horizontal profile and a vertical profile, determining the location of lines by means of gradients of the horizontal profiles and the vertical profiles, selecting from the lines, the lines that are persistent, determining a rectangle in at least one corner of the document based on the persistent lines, and applying heuristics in order to accept or reject a determined rectangle as a table of the document. An apparatus for automatically locating a table in a document applies the method for locating tables in documents.	07-12-2012
20120189202	IMAGE PROCESSING APPARATUS, IMAGE PROCESSING SYSTEM AND IMAGE PROCESSING METHOD - A handwritten area is separated from image data of printed material in which handwriting has been inserted, and the separated handwritten area is identified as an enclosing line or a class symbol. An image area enclosed within the handwritten area identified as the enclosing line is extracted and acquired as an extracted image. The class symbol is correlated to an enclosing line drawn nearest the class symbol, and the extracted images are classified into groups according to the image areas within the enclosing line correlated to the type of class symbols. The grouped images are organized as listed data.	07-26-2012
20120195505	TECHNIQUES INCLUDING URL RECOGNITION AND APPLICATIONS - Methods are systems are provided that include obtaining a digital image from a digital photograph, such as may be taken by a digital camera or a camera phone. The digital image includes, for example, a URI or URL, which may be contained within a visible frame. A character recognition technique, such as an optical character recognition technique, may be used to recognize the URI or URL from the digital image. The URI or URL may be used to access a corresponding Web page. The character recognition technique may be applied on the digital camera or cell phone itself, or remotely.	08-02-2012
20120201457	FINDING REPEATED STRUCTURE FOR DATA EXTRACTION FROM DOCUMENT IMAGES - Methods and system employing the same for finding repeated structure for data extraction from document images are provided. A reference record and one or more reference fields thereof are identified from a document image. One or more candidate fields are generated for each of the reference fields. One or more best candidate records from the candidate fields are selected using a probabilistic model and an optimal record set is determined from the best candidate records.	08-09-2012
20120201458	SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR DETERMINING WHETHER TEXT WITHIN AN IMAGE INCLUDES UNWANTED DATA, UTILIZING A MATRIX - A system, method, and computer program product are provided for determining whether text within an image includes unwanted data, utilizing a matrix. In operation, a matrix corresponding to an image is generated. Additionally, text within the image is identified utilizing the matrix. Furthermore, it is determined whether the text includes unwanted data.	08-09-2012
20120207390	SYSTEMS AND METHODS FOR REPLACING NON-IMAGE TEXT - Systems and methods for replacing non-image text are provided. One method for replacing non-image text includes padding a first data representing an image of text to create an image segment. The method includes replacing a second data representing non-image text with the image segment.	08-16-2012
20120207391	INTERACTIVE PAPER SYSTEM - A printer, scanner device and methods for using same are described herein. A printer device may include a dedicated input that, when actuated, generates and sends a request to a computer for known data or a predetermined print job, e.g., schedule information from a personal information management (PIM) application. A scanner device may include another dedicated input that, when actuated, automatically scans a document fed to the device by the user and sends the scanned image to IM (or other) software on a computer, bypassing the need to manipulate the scanned image using scanner software. The device may be used with printed metapaper, which includes a barcode or other indicia identifying the metapaper and corresponds to a stored template image of the metapaper. When the metapaper is rescanned, the scan can be compared to the stored template information to identify changes and synchronize the changes with the IM software.	08-16-2012
20120213441	RECEIPTS SCANNER AND FINANCIAL ORGANIZER - The system contains a scanner, an apparatus for scanning receipts into a computer and a unique software program which automatically processes, organizes and saves expense information that can be viewed in various formats, namely, tabular statements, pie-charts, etc. The scanner, which accommodates paper of differing sizes, is used to input bills, receipts, bank statements, etc. The scanner is usually connected to a computer through a Universal Serial Bus or a parallel port for easy installation. The software program creates a text file of the scanned data by inclusion of sorting, categories, etc., and automatically saves the information in Quicken Interchange Format, allowing it to be imported into any financial management software for further processing. Each receipt is treated as an individual transaction. Multiple items in the receipt are used to create a “split” transaction with proper customizable categories added. Further, the software also allows for record keeping, budgeting and budget balancing.	08-23-2012
20120243787	SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT FOR DOCUMENT IMAGE ANALYSIS USING FEATURE EXTRACTION FUNCTIONS - Methods, systems and computer program products to improve the efficiency and computational speed of an image enhancement process. In an embodiment, information that is generated as interim results during feature extraction may be used in a segmentation and classification process and in a content adaptive enhancement process. In particular, a cleaner image that is generated during a noise removal phase of feature extraction may be used in a content adaptive enhancement process. This saves the content adaptive enhancement process from having to generate a cleaner image on its own. In addition, low-level segmentation information that is generated during a neighborhood analysis and cleanup phase of feature extraction may be used in a segmentation and classification process. This saves the segmentation and classification process from having to generate low-level segmentation information on its own.	09-27-2012
20120269435	Contact Text Detection in Scanned Images - A device and method for identifying text pixels that are erroneously classified as non-text pixels, includes accessing an image region containing a non-text component. For each non-text component within the image region component, only one determines if there are any long line structures within the bounding box defined by the non-text component. If the long line structures are greater than a predefined percentage of the span of the dimension of the bounding box parallel to the line structure, then the line structure is removed. Any remaining non-text pixels within the bounding box are reclassified as text-pixels.	10-25-2012
20120314953	INFORMATION PROCESSING APPARATUS, PROGRAM, AND INFORMATION PROCESSING METHOD - There is provided an information processing apparatus including a selecting unit that selects figures from a candidate figure group based on recognition values obtained through character recognition with respect to an input image, and a display control unit that performs control to display the figures selected by the selecting unit.	12-13-2012
20120321188	IDENTIFYING INFORMATION RELATED TO A PARTICULAR ENTITY FROM ELECTRONIC SOURCES, USING DIMENSIONAL REDUCTION AND QUANTUM CLUSTERING - Presented are systems and methods for identifying information about a particular entity including acquiring electronic documents having unstructured text, that are selected based on one or more search terms from a plurality of terms related to the particular entity. Tokenizing the acquired documents to form a data matrix and then calculating a plurality of eigenvectors, using the data matrix and the transpose of the data matrix. The variance is then acquired for determining the amount of intra-clustering between the documents and then the acquired documents are clustered using some of the eigenvectors and the variance.	12-20-2012
20130004076	SYSTEM AND METHOD FOR RECOGNIZING TEXT INFORMATION IN OBJECT - A method for recognizing a text block in an object is disclosed. The text block includes a set of characters. A plurality of images of the object are captured and received. The object in the received images is then identified by extracting a pattern in one of the object images and comparing the extracted pattern with predetermined patterns. Further, a boundary of the object in each of the object images is detected and verified based on predetermined size information of the identified object. Text blocks in the object images are identified based on predetermined location information of the identified object. Interim sets of characters in the identified text blocks are generated based on format information of the identified object. Based on the interim sets of characters, a set of characters in the text block in the object is determined.	01-03-2013
20130058575	TEXT DETECTION USING IMAGE REGIONS - A method includes receiving an indication of a set of image regions identified in image data. The method further includes, selecting image regions from the set of image regions for text extraction at least partially based on image region stability.	03-07-2013
20130071027	VISUALIZATION PROGRAM, VISUALIZATION METHOD AND VISUALIZATION APPARATUS FOR VISUALIZING READING ORDER OF CONTENT - A visualization program, method and apparatus for determining reading order of content in a structured document. The method includes generating, for each of a plurality of elements, a directed segment; storing, in the reading order, the generated directed segments of the elements into a storage device; reading from the storage device; linking together the directed segments for the elements in accordance with the reading order; and displaying the linked directed segments overlaid on the structured document which is displayed on the screen. A computer implemented program and an apparatus for carrying out the above method are also provided.	03-21-2013
20130077863	SYSTEM AND METHOD FOR CAPTURING RELEVANT INFORMATION FROM A PRINTED DOCUMENT - A city directory, having a listing of names and associated information of residents in a city (or similar location), is digitized. Zones of text having information not useful to users of the digitized directory are removed, and lines of information corresponding to residents are reconstructed, to make the digitized directory more easily accessed and reviewed.	03-28-2013
20130108158	TECHNIQUES INCLUDING URL RECOGNITION AND APPLICATIONS	05-02-2013
20130114899	CONTENT DESCRIPTOR - An apparatus, method, system and computer-readable medium are provided for generating one or more descriptors that may be associated with content. A teaser for the content may be identified based on contextual similarity between words and/or phrases in the segment and one or more other segments, such as a previous segment. An optical character recognition (OCR) technique may be applied to the content, such as banners or graphics associated with the content in order to generate or identify OCR'd text or characters. The text/characters may serve as a candidate descriptor(s). One or more strings of characters or words may be compared with (pre-assigned) tags associated with the content, and if it is determined that the one or more strings or words match the tags within a threshold, the one or more strings or words may serve as a candidate descriptor(s). One or more candidate descriptor identification techniques may be combined.	05-09-2013
20130121579	SOFTWARE FOR TEXT AND IMAGE EDIT RECOGNITION FOR EDITING OF IMAGES THAT CONTAIN TEXT - Software for editing text and images enables a user to select a portion of an image and read the text on the selected image portion via an OCR function. The software enables the user to apply a mask containing the originally read text, that allows the user to type or paste new text to replace the previously read text in the selected image portion. The software also enables a user to edit images by automatically recognizing the borders of fields and/or columns and the background color. As a result, the user can easily modify an image by applying a mask to an image, wherein the mask has new data such as different text, the identical background color or a different background color or different layout, etc. and which may be placed exactly on the recognized borders of the original image.	05-16-2013
20130163871	SYSTEM AND METHOD FOR SEGMENTING IMAGE DATA TO IDENTIFY A CHARACTER-OF-INTEREST - A system and method of processing an acquired image to identify characters-of-interest in the acquired image. The method includes obtaining image data of a surface of an object. The image data includes a plurality of image pixels having corresponding light intensity signals. The light intensity signals are based on whether the corresponding image pixel correlates to a morphological change in the surface of the object. The method also includes determining a line section of the image data. The line section includes one of the character lines and has character image portions. The method also includes analyzing the light intensity signals of the image pixels in the character image portions to determine a common height of the characters-of-interest. The method also includes removing extraneous areas of the character image portions based on the common height of the characters-of-interest to provide trimmed image portions.	06-27-2013
20130163872	Method, Server, Reading Terminal and System for Processing Electronic Document - Systems and methods for processing an electronic document are provided. The method comprises segmenting the electronic document based on content of the electronic document and structuring the segmented electronic document into a format for displaying on a reading terminal based on a request received from the reading terminal.	06-27-2013
20130163873	Detecting Separator Lines in a Web Page - A system and method of detecting separator lines in a web page may include determining coordinates of visible web elements on a web page, generating an edge image of the web page based on the coordinates of the web elements, filtering edges belonging to non-separator line elements within the edge image, detecting horizontal lines within the edge image, detecting vertical lines within the edge image, and filtering short lines within the edge image. A system for detecting separator lines in a web page may include a memory device, and a processor communicatively coupled to the memory, in which the processor determines coordinates of visible web elements on a web page, generates an edge image of the web page based on the coordinates of the web elements, filters edges belonging to non-separator line elements within the edge image, detects horizontal lines within the edge image, detects vertical lines within the edge image, and filters short lines within the edge image.	06-27-2013
20130230246	Expense Report System With Receipt Image Processing - A system and method for capturing image data is disclosed. A receipt image processing service selects from a repository a template that guides data capture of receipt data from a receipt image and presents the template to a user on an image capture device. A user previews the receipt image and the selected template. If the user decides that the template does not correctly indicate locations of data areas for data items in the receipt image, then the user either updates an existing template or creates a new template that correctly indicates the location of selected data areas in the receipt image. The selected template, the updated template, or the new template is then used to extract receipt data from the receipt image. The receipt data and receipt image data are then provided to the expense report system.	09-05-2013
20130236100	MONITORING USAGE OF A COMPUTER BY PERFORMING CHARACTER RECOGNITION ON SCREEN CAPTURE IMAGES - Compositions of matter comprising computer readable media storing a computer program comprising instructions that, when executed, cause a computer to perform operations related to the monitoring of usage of a computer. In various aspects, the operations may include the steps of associating an identified user with a computer, and capturing an image of a monitored region of a computer screen of the computer at a specified time. The operations may include the steps of extracting image text from the image, determining image text content of the image text, and capturing a subsequent image of the monitored region of the computer screen of the computer at a subsequent time-subsequent to the specified time, in various aspects. A time difference between the specified time and the subsequent time is dependent upon image text content of the image text, in various aspects. The identified user does not control the associating step, the capturing step, the extracting step, the determining step, and the capturing a subsequent image step, in various aspects.	09-12-2013
20130243324	METHOD AND SYSTEM FOR CHARACTER RECOGNITION - Character recognition is described. In one embodiment, it may use matched sequences rather than character shape to determine a computer-legible result.	09-19-2013
20130251261	Method And Apparatus For Image Data Compression - A method and corresponding apparatus are for compressing image data of an image. The method includes splitting the image data into regions, including a first region and a second region. The method further includes determining a first compression scheme to be used in encoding the image data of the first region and a different second compression scheme to be used in encoding the image data of the second region. The method further includes applying the first compression scheme to the image data of the first region and the second the compression scheme to the image data of the second region. For each region, the determining and the applying are iteratively performed to yield first resulting compressed region data for the first region and second resulting compressed region data for the second region.	09-26-2013
20130251262	TEXT IMAGE TRIMMING METHOD - A text image trimming method, according to the following steps: step 1, obtaining text image data; step 2, using straight line detection method to detect the straight lines of the text image, obtaining edges of a trimmed quadrangle; step 3, detecting text on the image data, obtaining the coordinates of the boundary points of a text region; and step 4, obtaining the final trimming result according to the results of steps 2 and 3. The method can automatically detect the edges of the text region and utilize the detected text region to verify and remove unrelated redundant information thereby, allowing the user to only see the portion containing the text region useful to the user when viewing image data.	09-26-2013
20130259377	CONVERSION OF A DOCUMENT OF CAPTURED IMAGES INTO A FORMAT FOR OPTIMIZED DISPLAY ON A MOBILE DEVICE - Systems may be provided for recording a document with a camera-based mobile radio device and for converting textual information in the document into a format for suitable presentation on the mobile device. A document may be recorded by the mobile device in an image. A layout structure may be recognized with a text block in the image. Character text in the text block may be recognized by OCR. An order of the text blocks may be determined by taking into account the layout structure. A suitable format for presenting the character texts on the mobile device's display may be selected. The format may be adapted to a width of the display so that during reading of the character texts on the display, substantially only vertical scrolling is necessary. A file may be generated and displayed in the format with the character texts in the determined order of the text blocks.	10-03-2013
20130272612	METHOD OF PROVIDING ONLINE INFORMATION USING IMAGE INFORMATION - The present invention provides a method of providing online information using image, including separating each of a target image, received from a user terminal, and an original image, received from an information provider apparatus, into a text region and a graphic region; selecting an important text region from the text region; extracting features from the text region, the graphic region, and the important text region, respectively; searching for the original image corresponding to the target image using the features of the text region, the graphic region, and the important text region; and searching for supplementary information related to the retrieved original image and provided the retrieved supplementary information.	10-17-2013
20130294693	NOISE REMOVAL FROM IMAGES CONTAINING TEXT - The noise in an image having text is removed by convolving a shaped kernel centered on a pixel for each pixel in the image. The shaped kernel has a shape configured to identify pixels that are not part of the text. For example, the shaped kernel may be shaped with zeros in a center of the kernel to identify pixels that are not part of the text. A value for the pixel is set to erase the pixel when the resulting convolution value for the pixel is less than a threshold. The process may be repeated multiple times for differently shaped kernels, including kernels of different sizes and different configurations, such as having values greater than one in at least one of a row, column, and diagonal.	11-07-2013
20130330003	ADAPTIVE THRESHOLDING FOR IMAGE RECOGNITION - Various approaches for providing textual information to an application, system, or service are disclosed. In particular, various embodiments enable a user to capture an image with a camera of a portable computing device. The computing device is capable of taking the image and processing it to recognize, identify, and/or isolate the text in order to forward the text to an application or function. The application or function can then utilize the text to perform an action in substantially real-time. The text may include an email, phone number, URL, an address, and the like and the application or function may be dialing the phone number, navigating to the URL, opening an address book to save contact information, displaying a map to show the address, and so on. Adaptive thresholding can be used to account for variations across an image, in order to improve the accuracy and efficiency of text recognition processes.	12-12-2013
20130330004	FINDING TEXT IN NATURAL SCENES - As set forth herein, systems and methods facilitate providing an efficient edge-detection and closed-contour based approach for finding text in natural scenes such as photographic images, digital, and/or electronic images, and the like. Edge information (e.g., edges of structures or objects in the images) is obtained via an edge detection technique. Edges from text characters form closed contours even in the presence of reasonable levels of noise. Closed contour linking and candidate text line formation are two additional features of the described approach. A candidate text line classifier is applied to further screen out false-positive text identifications. Candidate text regions for placement of text in the natural scene of the electronic image are highlighted and presented to a user.	12-12-2013
20140023272	IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD AND STORAGE MEDIUM - Character code data and vector drawing data are both listed and provided in a re-editable manner. Electronic data is generated in which information obtained by vectorizing character areas in an image and information obtained by recognizing characters in the image are stored in respective storage locations. As for the electronic data generated in this manner, because character code data and vector drawing data generated from the input image are both presented by a display and edit program, a user can immediately utilize the both data.	01-23-2014
20140037210	SYMBOL COMPRESSION USING CONDITIONAL ENTROPY ESTIMATION - The present disclosure includes a system and method for symbol compression using conditional entropy estimation. One method for symbol compression using conditional entropy estimation includes approximating a quantity of symbol encoding bits for a number of symbols using a conditional entropy estimation. Dictionary entries are generated from the number of symbols so as to minimize a total bit-stream quantity. The total bit-stream quantity includes at least the approximated quantity of symbol encoding bits and a quantity of dictionary entries encoding bits. The symbols are encoded using the dictionary entries as a reference.	02-06-2014
20140093170	DOCUMENT PROCESSING DEVICE, IMAGE PROCESSING APPARATUS, DOCUMENT PROCESSING METHOD AND COMPUTER PROGRAM PRODUCT - A document processing device includes: a character information extracting unit that extracts character information from document image data; a feature character string extracting unit that extracts, as document name candidate character strings, a given number of character strings indicative of features of the document image data from the character information extracted by the character information extracting unit; an output condition acquiring unit that, when the document image data is processed by one of multiple processing methods involving an output of a document name of the document image data, acquires an output condition required for the output of the document name of the document image data; and a document name generating unit that generates the document name complying with a character condition corresponding to the output condition from the document name candidate character strings.	04-03-2014
20140140621	IMAGE RECTIFICATION USING AN ORIENTATION VECTOR FIELD - This invention is a method for rectifying an input digital image including warped textual information. The method includes analyzing the input digital image to determine local orientations for a plurality of local image regions and determining an orientation vector field by interpolating between the determined local orientations for a lattice of positions. A set of streamlines are determined responsive to the orientation vector field. A global deformation function is formed by interpolating between the streamlines and is used to form a rectified image.	05-22-2014
20140161353	TABLE GRID DETECTION AND SEPARATION - A technique is described for table grid detection and separation during the analysis and recognition of documents containing table contents. The technique includes the steps of table detection, grid separation, and table cell extraction. The technique is characterized by the steps of detecting the grid lines of a table using, for example, inverse cell detection, separating noise and touching text from the grid lines, and extracting the cell contents for OCR recognition.	06-12-2014
20140212038	DETECTION OF NUMBERED CAPTIONS - A method of detection of numbered captions in a document includes receiving a document including a sequence of document pages and identifying illustrations on pages of the document. For each identified illustration, associated text is identified. An imitation page is generated for each of the identified illustrations, each imitation page comprising a single illustration and its associated text. For a sequence of the imitation pages, a sequence of terms is identified. Each term is derived from a text fragment of the associate text of a respective imitation page. The terms of a sequence complying with at least one predefined numbering scheme which defines a form and an incremental state of the terms in a sequence. The terms of the identified sequence of terms are construed as being at least a part of a numbered caption for a respective illustration in the document.	07-31-2014
20140219561	CHARACTER SEGMENTATION DEVICE AND CHARACTER SEGMENTATION METHOD - A character segmentation section, for segmenting characters of a character line may include a minimum pixel-value curve creating section configured to extract a smallest pixel value in pixels composing a pixel line arranged in a direction orthogonal to a character line direction in said multi-level image data and create a minimum pixel-value curve, a character partitioning position determining section configured to determine partitioning positions of said characters, based on said minimum pixel value curve, a binarization processing section configured to detect a minimum pixel value indicating said linear drawing from said minimum pixel-value curve, acquires a binarization threshold based on said minimum pixel value, and binarizes said multi-level image data using said binarization threshold, and a character segmentation implementing section configured to extract the image data of each character.	08-07-2014
20140241631	SYSTEMS AND METHODS FOR TAX DATA CAPTURE AND USE - A computer-implemented method of acquiring tax data for use in tax preparation application includes acquiring an image of at least one document containing tax data therein with an imaging device. A computer extracts one or more features from the acquired image of the at least one document and compares the extracted one or more features to a database containing a plurality of different tax forms. The database may include a textual database and/or geometric database. The computer identifies a tax form corresponding to the at least one document from the plurality of different tax forms based at least in part on a confidence level associated with the comparison of the extracted one or more features to the database. At least a portion of the tax data from the acquired image is transferred into corresponding fields of the tax preparation application.	08-28-2014
20140247988	SYSTEM AND METHOD FOR CAPTURING RELEVANT INFORMATION FROM A PRINTED DOCUMENT - A city directory, having a listing of names and associated information of residents in a city (or similar location), is digitized. Zones of text having information not useful to users of the digitized directory are removed, and lines of information corresponding to residents are reconstructed, to make the digitized directory more easily accessed and reviewed.	09-04-2014
20140247989	MONITORING THE EMOTIONAL STATE OF A COMPUTER USER BY ANALYZING SCREEN CAPTURE IMAGES - In various aspects, methods disclosed herein may include the step of associating an identified user with a computer, and the step of capturing an image of a monitored region of a computer screen of the computer at a specified time. The methods may include the step of extracting image text from the image, the step of determining an emotional state of the identified user using image text content of the image text, and the step of capturing a subsequent image of the monitored region of the computer screen of the computer at a subsequent time subsequent to the specified time, a time difference between the specified time and the subsequent time is dependent upon the emotional state of the user, in various aspects. The associating step, the capturing step, the extracting step, the determining step, and the capturing a subsequent image step are not controlled by the identified user, in various aspects. This Abstract is presented to meet requirements of 37 C.F.R. §1.72(b) only. This Abstract is not intended to identify key elements of the methods, systems, and compositions of matter disclosed herein or to delineate the scope thereof.	09-04-2014
20140307966	METHOD OF MANAGING IMAGE AND ELECTRONIC DEVICE THEREOF - A system processes an image in an electronic device, by determining whether a text character is included in an image and extracting the determined text character from the image. The extracted text character is stored in association with the image.	10-16-2014
20140307967	STRAIGHTENING OUT DISTORTED PERSPECTIVE ON IMAGES - Methods for correcting distortions in an image including text, or an image of a page that includes text, are disclosed. The methods include identifying reliable and substantially straight lines from elements in the image. Vanishing points are determined from the lines. Parameters associated with a rectangle are determined. A coordinate conversion is performed.	10-16-2014
20140314319	METHOD AND SYSTEM USING TWO PARALLEL OPTICAL CHARACTER RECOGNITION PROCESSSES - A method and a system for providing a text-based representation of a portion of a working area to a user are provided. The method includes acquiring an image of the entire working area and performing a fast OCR process on at least a region of interest of the image corresponding to the portion of the working area, thereby rapidly obtaining an initial machine-encoded representation of the portion of the working area and immediately presenting it to the user as the text-based representation. Parallelly to the fast OCR process, a high-precision OCR process is performed on at least the region of interest of the image, thereby obtaining a high-precision machine-encoded representation of the portion of the working area. Upon completing the high-precision OCR process, the high-precision machine-encoded representation of the portion of the working area is presented to the user as the text-based representation, in replacement of the initial machine-encoded representation.	10-23-2014
20140355883	METHOD AND SYSTEM FOR RECOGNIZING INFORMATION - Embodiments of the present application relate to a method for recognizing information, a system for recognizing information, and a computer program product for recognizing information. A method for recognizing information is provided. The method includes locating a card zone for each frame within a card image frame sequence comprising a plurality of frames, locating an information zone within each card zone, dividing each information zone into at least one character zone, de-blurring a character zone corresponding to a same region across all the frames in the card image frame sequence, and recognizing character string information based on the de-blurred character zone.	12-04-2014
20150023597	Feature Sensitive Captioning of Media Content - There are provided methods and systems for use in performing feature sensitive captioning of media content. In one implementation, such a method includes detecting an aesthetically determinative feature of a media content unit selected by a user, and determining a captioning aesthetic for a caption of the media content unit based at least in part on the aesthetically determinative feature. The captioning aesthetic may include a background aesthetic and a text aesthetic. The captioning aesthetic may be utilized by a feature sensitive captioning application to produce a feature sensitive caption for the media content unit.	01-22-2015
20150055866	OPTICAL CHARACTER RECOGNITION BY ITERATIVE RE-SEGMENTATION OF TEXT IMAGES USING HIGH-LEVEL CUES - Disclosed techniques include receiving an electronic image containing depictions of characters, segmenting at least some of the depictions of characters using a first segmentation technique to produce a first segmented portion, and performing a first character recognition on the first segmented portion to determine a first sequence of characters. The techniques also include determining, based on the performing the first character recognition, that the first sequence of characters does not match the depictions of characters. The techniques further include segmenting at least some of the depictions of characters using a second segmentation technique, based on the determining, to produce a second segmented portion, and performing a second character recognition on at least a portion of the second segmented portion to produce a second sequence of characters. The techniques also include outputting a third sequence of characters based on at least part of the second sequence of characters.	02-26-2015
20150063698	Assisted OCR - A method including determining a position of each glyph in an image of a text document, identifying word boundaries in the document thereby implying the existence of a first plurality of words, preparing a first array of word lengths based on the first plurality of words, preparing a second array of word lengths based on a second plurality of words of a text file including a certain text, comparing at least part of the first array to at least part of the second array to find a best alignment between the first and second array, deriving a layout of at least part of the certain text as arranged in the image of the text document at least based on the best alignment and the position of at least some of the glyphs in the image. Related apparatus and methods are also described.	03-05-2015
20150063699	LINE SEGMENTATION METHOD APPLICABLE TO DOCUMENT IMAGES CONTAINING HANDWRITING AND PRINTED TEXT CHARACTERS OR SKEWED TEXT LINES - A text line segmentation method for a document image containing printed text and handwriting, or document image containing skewed lines or printed text. Connected component (CC) are obtained for the document, and their bounding boxes and centroids are calculated. The CCs are categorized into three categories based on bounding box sizes: small objects, regular text objects, and large objects involving handwriting. The centroids of regular text objects are used in a cluster analysis to find the vertical centers of the N text lines. Then, each CC is classified into one of the N lines based on the vertical distance between its centroid and the vertical centers of text lines, and copied into to a corresponding object board. Extra spaces are removed from the object boards to obtain the line segments. The large object involving handwriting will be classified into one of the lines but absent from other lines.	03-05-2015
20150078664	DETECTING TEXT USING STROKE WIDTH BASED TEXT DETECTION - Detecting text using stroke width based text detection. As a part of the text detection, a representation of an image is generated that includes pixels that are associated with the stroke widths of components of the image. Connected components of the image are identified by filtering out portions of the pixels using metrics related to stroke width. Text is detected in the image based on the identified connected components.	03-19-2015
20150317531	ELECTRONIC DOCUMENT GENERATION SYSTEM, IMAGE FORMING APPARATUS AND PROGRAM - An electronic document generation system includes: an image forming apparatus configured to generate a scanned image of an original document; and an external terminal configured to receive image data of the scanned image from the image forming apparatus, and generate an electronic document based on the scanned image, wherein the image forming apparatus includes: a divided data generation unit; a determination unit; and a communication unit, of the plurality of divided image data, the communication unit transmits, to the external terminal, divided image data that is determined to be the processing target data at an earlier point in time, and transmits, to the external terminal, divided image data that is determined not to be the processing target data after the divided image data that is determined to be the processing target data are transmitted, and the external terminal includes: an obtaining unit; and a document generation unit.	11-05-2015
20150332120	DETECTING AND PROCESSING SMALL TEXT IN DIGITAL MEDIA - A method for recognizing small-font sized text including receiving digital media of a natural scene, the digital media having at least one frame that includes the small-font sized text; generating input maps having values that reflect local properties of corresponding regions in the at least one frame; and detecting regions of the at least one frame that contain the small-font sized text by integrating information from the input maps. The integrated information may include information located between border lines having active pixels therebetween and gaps having a high ratio of non-ink pixels located below a bottom border line and above a top border line in relation to a dominant direction of the text. The active pixels may be pixels having dense changes in character stroke directions.	11-19-2015
20150356740	SYSTEM FOR AUTOMATED TEXT AND HALFTONE SEGMENTATION - A method and system for segmenting text from non-text portions of a digital image using the size, solidity, and run length characteristics of connected components within the image data. For a connected component comprising a rectangular group of pixels enclosing a set of connected pixels having the same binary state, the size characteristic may be based on a ratio of height to width of the connected component and the total number of pixels within the connected component, the solidity characteristic may be based on a ratio of pixels within a convex hull of the set of connected pixel to a total number of pixels within the connected component, and the run length characteristic may be based on a number of transitions within the connected component.	12-10-2015
20150370889	IMAGE PROCESSING SYSTEM, IMAGE PROCESSING METHOD, AND IMAGE PROCESSING PROGRAM - An image processing system according to one embodiment includes a feature quantity calculation unit, a classification unit, a score calculation unit, and an output unit. The feature quantity calculation unit calculates a feature quantity for each of a plurality of candidate regions extracted as a candidate for a text region from a plurality of original sample images. The plurality of original sample images include one or more text images containing a text region and one or more non-text images not containing a text region. The classification unit classifies the plurality of candidate regions into a plurality of categories based on the feature quantity. The score calculation unit calculates, for each category, a score indicating a frequency of appearance of the candidate region to which an annotation indicating extraction from the text image is added. The output unit outputs the score of each category as category information.	12-24-2015
20150371085	METHOD AND SYSTEM FOR IDENTIFYING BOOKS ON A BOOKSHELF - A method and system for identifying books located on a bookshelf. Photographs of the bookshelf are captured and processed to identify individual books. Processing involves segmenting the photograph into individual book spines and extracting and analyzing features of the book spines. Analysis may include database matching and/or optical character recognition.	12-24-2015
20150371399	Character Detection Apparatus and Method - According to one embodiment, a character detection apparatus includes a feature extractor, a determiner and an integrator. The feature extractor extracts a feature value of an image including character strings. The determiner determines each priority of a plurality of different character detection schemes in accordance with character detection accuracy with respect to an image region having a feature corresponding to the feature value. The integrator integrates text line candidates of the character detection schemes, and selects, as a text line, one of the text line candidates detected by the character detection scheme with the highest priority if a superimposition degree indicating a ratio of a superimposed region among the text line candidates is no less than a first threshold value.	12-24-2015
20150379340	IMAGE PROCESSING DEVICE - A binarizing section generates binary image data representing a binary image from raster image data representing a raster image. A white-pixel ratio determining section determines, based on the binary image data, a white pixel ratio for each line of the binary image and a position of each of lines having a white pixel ratio equal to or greater than a predetermined first threshold among the lines of the binary image, and also determines a white pixel ratio for an entirety of the binary image. An image-type determining section determines that the raster image is a photographic image when the white pixel ratio of the entirety of the binary image is equal to or less than a predetermined second threshold, and that the raster image includes a text image when the lines having a white pixel ratio equal to or greater than the first threshold appear cyclically in the binary image.	12-31-2015
20150379341	ROBUST METHOD TO FIND LAYOUT SIMILARITY BETWEEN TWO DOCUMENTS - Techniques for comparing documents may be provided. For example, a comparison between layouts of the documents may be performed. The comparison may include segmenting the documents into blocks, where an arrangement of blocks of a document represents a layout of the document. Once segmented, similarity metrics, such as distances, between blocks of one document and blocks of the other document may be computed. The similarity metrics may be used to match the blocks between the documents. Further, the similarity metrics between the matched blocks may be added to determine an overall similarity metric between the documents. This overall similarity metric may indicate how similar the documents may be.	12-31-2015
20160012295	IMAGE PROCESSOR, METHOD AND PROGRAM	01-14-2016
20160026899	TEXT LINE DETECTION IN IMAGES - Techniques for detecting and recognizing text may be provided. For example, an image may be analyzed to detect and recognize text therein. The analysis may involve detecting text components in the image. For example, multiple color spaces and multiple-stage filtering may be applied to detect the text components. Further, the analysis may involve extracting text lines based on the text components. For example, global information about the text components can be analyzed to generate best-fitting text lines. The analysis may also involve pruning and splitting the text lines to generate bounding boxes around groups of text components. Text recognition may be applied to the bounding boxes to recognize text therein.	01-28-2016
20160034755	Coarse Document Classification - Systems and methods coarsely classify unknown documents in a group or not with reference document(s). Documents get scanned into digital images. Counts of contours are taken. The closer the counts of the contours of the unknown document reside to the reference document(s), the more likely the documents are all of a same type. Embodiments typify contour analysis, classification acceptance or not, application of algorithms, and imaging devices with scanners, to name a few.	02-04-2016
20160048729	INFORMATION PROCESSING APPARATUS, CONTROL METHOD, AND STORAGE MEDIUM STORING PROGRAM - A plurality of regions corresponding to respective attributes in an image are detected, and a target region serving as a thumbnail image out of the plurality of regions is determined. Thumbnail image data is generated from data corresponding to the determined target region.	02-18-2016
20160055376	METHOD AND SYSTEM FOR IDENTIFICATION AND EXTRACTION OF DATA FROM STRUCTURED DOCUMENTS - The various embodiments herein provide a method and system for identifying and extracting data from electronic documents. The method comprises of extracting text from scanned documents with location on page data using OCR technology, identifying one or more tables present in a page using patterns in text placement in rows and columns, identifying the table boundaries using a pattern recognition method, identifying table borders using the location on page data, identifying the rows and columns on the table based on the identified table borders, defining a table structure for data extraction and automatically extracting data from cells of the table formed by identified rows and columns.	02-25-2016
20160063322	METHOD AND SYSTEM OF EXTRACTING LABEL:VALUE DATA FROM A DOCUMENT - This disclosure provides an exemplary method and system for extracting structured label and value pairwise textual data from a textual document. According to an exemplary method, initially a layout analysis is performed resulting in one or more alternatives for grouping and ordering the textual elements of interest. Next, textual elements are tagged as including a label term, a value term or a label and value term. Finally, a sequence-based method is applied to the tagged elements to generate one or more sequence listings representative of the label and value pairwise data structure(s) and label:value pairwise data is extracted.	03-03-2016
20160078632	DOCUMENT IMAGE COMPRESSION METHOD AND ITS APPLICATION IN DOCUMENT AUTHENTICATION - A method for compressing a bi-level document image containing text is disclosed. The document image is segmented into symbol images each representing a letter, numeral, etc. in the document. The symbol images are classified into a plurality of classes, each class being associated with a template image and a class index. Classification is done by comparing each symbol to be classified with template of existing classes, using a number of image features including zoning profiles, side profiles, topology statistics, and low-order image moments. These image features are compared using a tolerance based method to determine whether the symbol matches the template. After classification, certain classes that have few symbols classified into them may be merged with other classes. In addition, the template images of the classes are down-sampled, where the final sizes of the template images are dependent on the likelihood of confusion of the template with other templates.	03-17-2016
20160086026	REMOVAL OF GRAPHICS FROM DOCUMENT IMAGES USING HEURISTIC TEXT ANALYSIS AND TEXT RECOVERY - A graphic removal process for document images involves two stages: First, removal of graphics in the document image based on heuristic text analyses; and second, text recovery to recover some text that is accidentally removed during the first stage. The first stage uses a relatively aggressive strategy to ensure that all graphics components are removed, which also temporarily leads to the removal of some text; the lost text will then be recovered using the text recovery technique. The heuristic text analyses utilize the geometric properties of text characters and consider the properties of text characters in relation to their neighbors. The text recovery technique starts from the text that remain after the first stage, and recovers any connected component that is at least partially located within a pre-defined neighboring area around any of the text components in the intermediate document image.	03-24-2016
20160104052	TEXT-BASED THUMBNAIL GENERATION - A method for displaying an image is disclosed. The method may be performed in an electronic device. Further, the method may detect at least one text region in the image and determine at least one text category associated with the at least one text region. Based on the at least one text region and the at least one text category, the method may generate at least one thumbnail from the image and display the at least one thumbnail.	04-14-2016
20160110385	IDENTIFYING PRODUCT METADATA FROM AN ITEM IMAGE - A metadata extraction machine accesses an image that depicts an item. The item depicted in the image may have an attribute that describes a characteristic of the item and an attribute descriptor that corresponds to the attribute of the item and specifies a value of the attribute. The metadata extraction machine performs an analysis of the image. The analysis may include identifying the attribute descriptor corresponding to the attribute based on image segmentation of the image. The metadata extraction machine transmits a communication to a device of a user based on the identifying of the attribute descriptor corresponding to the attribute of the item depicted in the image.	04-21-2016
20160140398	CONTEXTUAL INFORMATION OF VISUAL MEDIA - An analysis master control can be configured to derive contextual information of visual media that includes extracted information and extrapolated information. The analysis master control can receive the extracted information characterizing visual media from a recognizer. An information finder can be configured to query a plurality of information sources for information based on the extracted information. The analysis master control can also be configured to match information received from the information sources with the extracted information to form the extrapolated information that characterizes the visual media.	05-19-2016
20160140410	IMAGE ACQUISITION USING A LEVEL-INDICATION ICON - During an information-extraction technique, visual suitability indicators may be displayed to a user of the electronic device to assist the user in acquiring an image of a document that is suitable for subsequent extraction of textual information. For example, an imaging application executed by the electronic device may display, in a window associated with the imaging application, a visual suitability indicator of a tilt orientation of the electronic device relative to a plane of the document. When the tilt orientation falls within a predefined range, the electronic device may modify the visual suitability indicators to provide visual feedback to the user. Then, the electronic device may acquire the image of the document using an imaging device, which is integrated into the electronic device. Next, the electronic device may extract the textual information from the image of the document using optical character recognition.	05-19-2016
20160155202	SYSTEMS AND METHODS FOR TAX DATA CAPTURE AND USE	06-02-2016
20160171298	PERSONAL INFORMATION COLLECTION SYSTEM, PERSONAL INFORMATION COLLECTION METHOD AND PROGRAM	06-16-2016
20160180185	ELECTRONIC DEVICE AND IMAGE RECOGNITION METHOD	06-23-2016
20180025222	OPTICAL CHARACTER RECOGNITION (OCR) ACCURACY BY COMBINING RESULTS ACROSS VIDEO FRAMES	01-25-2018
20190147238	INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM	05-16-2019

Patent applications in class Distinguishing text from other regions

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Distinguishing text from other regions

Subclass of:

382 - Image analysis

382173000 - IMAGE SEGMENTATION

Patent class list (only not empty are listed)

Deeper subclasses: