Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Distinguishing text from other regions

Subclass of:

382 - Image analysis

382173000 - IMAGE SEGMENTATION

Patent class list (only not empty are listed)

Deeper subclasses:

Entries
DocumentTitleDate
20110194770DOCUMENT EDITING APPARATUS AND METHOD - A method for storing a document recognition result is proposed. The method includes selecting a picture area from a document image, storing an image of the selected picture area in an image file format, removing the selected picture area, filling the removed picture area with a surrounding background color, and performing character recognition of a text area.08-11-2011
20090123070Segmentation-based image processing system - A digital image can be processed by an image processing method that calculates a gradient map for the digital image, calculates a density function for the gradient map, calculates a modified gradient map using the gradient map, the density function and the selected scale level, and segments the modified gradient map. Prior to segmenting the modified gradient map, a sub-image of the digital image can be segmented at the selected scale level to determine if the selected scale level will give the desired segmentation.05-14-2009
20100074526Methods and Systems for Locating Text in a Digital Image - Aspects of the present invention are related to systems and methods for locating text in a digital image.03-25-2010
20100074525Manipulating an Image by Applying a De-Identification Process - A method for manipulating an image, the method includes: capturing image information representative of an image that includes images of textual characters; recognizing the textual characters by applying Optical Character Recognition; identifying the layout of the image; and applying at least one de-identification process on textual characters of interest to provide de-identification process results.03-25-2010
20130077863SYSTEM AND METHOD FOR CAPTURING RELEVANT INFORMATION FROM A PRINTED DOCUMENT - A city directory, having a listing of names and associated information of residents in a city (or similar location), is digitized. Zones of text having information not useful to users of the digitized directory are removed, and lines of information corresponding to residents are reconstructed, to make the digitized directory more easily accessed and reviewed.03-28-2013
20100046838SYSTEM AND METHOD FOR OBTAINING TEXT - A system includes a first supporting element, a first supporting element rotation module, a controller and a first optical head. The controller is configured to control a rotation of the first supporting element by the first supporting element rotation module. The first supporting element is coupled to the first optical head. The first supporting element rotation module is configured to rotate the first supporting element until text that is imprinted on a first side of a semiconductor wafer is located within a field of view of the first optical head. Semiconductors wafers of different size have text located at different locations. A method for obtaining a text imprinted on a first side of a semiconductor wafer, the method includes: determining a location of the text based on a size of the semiconductor wafer; rotating a first supporting element that is coupled to a first optical head until the text is located within a field of view of the first optical head; obtaining an image of the text by the first optical head; and translating the image of the text into text.02-25-2010
20100104187Personal navigation device and related method of adding tags to photos according to content of the photos and geographical information of where photos were taken - A method of automatically adding tags to photos based on content of the photos and geographical information about where the photo was taken includes taking a photo with a camera of a personal navigation device, generating a geographical tag for the photo with the personal navigation device and attaching the geographical tag to the photo to generate a geotagged photo, transferring the geotagged photo to an optical character recognition (OCR) server, performing OCR on the geotagged photo with the OCR server and generating image description tags from text recognized in the geotagged photo, attaching selected tags to the geotagged photo, the selected tags being selected from the generated image description tags, and uploading the geotagged photo along with the attached selected tags to a photo sharing server, photos on the server being searchable by geographical tags or selected tags associated with the photos.04-29-2010
20130071027VISUALIZATION PROGRAM, VISUALIZATION METHOD AND VISUALIZATION APPARATUS FOR VISUALIZING READING ORDER OF CONTENT - A visualization program, method and apparatus for determining reading order of content in a structured document. The method includes generating, for each of a plurality of elements, a directed segment; storing, in the reading order, the generated directed segments of the elements into a storage device; reading from the storage device; linking together the directed segments for the elements in accordance with the reading order; and displaying the linked directed segments overlaid on the structured document which is displayed on the screen. A computer implemented program and an apparatus for carrying out the above method are also provided.03-21-2013
20090092318ONE-SCREEN RECONCILIATION OF BUSINESS DOCUMENT IMAGE DATA, OPTICAL CHARACTER RECOGNITION EXTRACTED DATA, AND ENTERPRISE RESOURCE PLANNING DATA - Systems and methods of reconciling data from an imaged document. In one embodiment, a business document is scanned to create a business document image. A set of extracted data is extracted from the business document image via optical character recognition (OCR). The set of OCR extracted data is then compared with data in business information management or enterprise resource planning (ERP) system. A set of ERP data is retrieved from the ERP system that relates to the set of OCR extracted data. The retrieved ERP data is than assigned to the set of OCR extracted data to create a set of assigned data. The business document image is then displayed in a business document image pane, the set of OCR extracted data is displayed in the OCR data pane, and the retrieved ERP data is displayed in the ERP data pane. The set of assigned data is validated, and the ERP system is updated with the set of validated, assigned data. In other embodiments, data is extracted from text files without using OCR.04-09-2009
20130058575TEXT DETECTION USING IMAGE REGIONS - A method includes receiving an indication of a set of image regions identified in image data. The method further includes, selecting image regions from the set of image regions for text extraction at least partially based on image region stability.03-07-2013
20130163872Method, Server, Reading Terminal and System for Processing Electronic Document - Systems and methods for processing an electronic document are provided. The method comprises segmenting the electronic document based on content of the electronic document and structuring the segmented electronic document into a format for displaying on a reading terminal based on a request received from the reading terminal.06-27-2013
20130163871SYSTEM AND METHOD FOR SEGMENTING IMAGE DATA TO IDENTIFY A CHARACTER-OF-INTEREST - A system and method of processing an acquired image to identify characters-of-interest in the acquired image. The method includes obtaining image data of a surface of an object. The image data includes a plurality of image pixels having corresponding light intensity signals. The light intensity signals are based on whether the corresponding image pixel correlates to a morphological change in the surface of the object. The method also includes determining a line section of the image data. The line section includes one of the character lines and has character image portions. The method also includes analyzing the light intensity signals of the image pixels in the character image portions to determine a common height of the characters-of-interest. The method also includes removing extraneous areas of the character image portions based on the common height of the characters-of-interest to provide trimmed image portions.06-27-2013
20090016605System and method for creating an editable template from a document image - Embodiments of the present invention recite a system and method for creating an editable template from a document image. In one embodiment of the present invention, the spatial characteristics and the color characteristics of at least one region of a document are identified. A set of characteristics of a graphic representation within the region are then determined without the necessity of recognizing a character comprising the graphic representation. An editable template is then created comprising a second region having the same spatial characteristics and the same color characteristics of the at least one region of the document and comprising a second graphic representation which is defined by the set of characteristics of the first graphic representation.01-15-2009
20080317343Methods and Systems for Identifying Text Orientation in a Digital Image - Aspects of the present invention relate to systems and methods for determining text orientation in a digital image.12-25-2008
20120114242Method and System for Identifying Addressing Data Within a Television Presentation - Characters represented within a frame of a television presentation are identified. A pattern formed by a subset of the characters is identified if the pattern is indicative of an addressing datum. A provision is made for a selection of characters that form the pattern indicative of the addressing datum. In one embodiment, a web page is displayed upon a selection of characters that form a pattern indicative of a uniform resource locator for the web page.05-10-2012
20130163873Detecting Separator Lines in a Web Page - A system and method of detecting separator lines in a web page may include determining coordinates of visible web elements on a web page, generating an edge image of the web page based on the coordinates of the web elements, filtering edges belonging to non-separator line elements within the edge image, detecting horizontal lines within the edge image, detecting vertical lines within the edge image, and filtering short lines within the edge image. A system for detecting separator lines in a web page may include a memory device, and a processor communicatively coupled to the memory, in which the processor determines coordinates of visible web elements on a web page, generates an edge image of the web page based on the coordinates of the web elements, filters edges belonging to non-separator line elements within the edge image, detects horizontal lines within the edge image, detects vertical lines within the edge image, and filters short lines within the edge image.06-27-2013
20090003701METHOD AND APPARATUS FOR APPLYING STEGANOGRAPHY TO DIGITAL IMAGE FILES - A method and apparatus for applying steganography to digital image files are provided. This algorithm uses the basic idea of steganography. In one aspect of the invention a method is provided. The method comprises (a) taking a text message as input, (b) breaking up the input data into a series of bits, and (c) passing it to an encryption mechanism to merge into a bit map image. Thus, less important information from a bit map image is removed and hidden data (series of bits) is injected in its place. It is thus possible to (a) retrieve the entire bit map file, (b) remove its header information, (c) retrieve each byte one by one and (d) put the input data in each byte.01-01-2009
20090285482DETECTING TEXT USING STROKE WIDTH BASED TEXT DETECTION - Detecting text using stroke width based text detection. As a part of the text detection, a representation of an image is generated that includes pixels that are associated with the stroke widths of components of the image. Connected components of the image are identified by filtering out portions of the pixels using metrics related to stroke width. Text is detected in the image based on the identified connected components.11-19-2009
20080310718Information Extraction in a Natural Language Understanding System - A method of extracting information from text within a natural language understanding system can include processing a text input through at least one statistical model for each of a plurality of features to be extracted from the text input. For each feature, at least one value can be determined, at least in part, using the statistical model associated with the feature. One value for each feature can be combined to create a complex information target. The complex information target can be output.12-18-2008
20080310719IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM - An image processing apparatus includes an analyzing unit configured to analyze an incomplete portion of input image data; and an obtaining unit configured to identify a storage location of original data corresponding to the input image data from the input image data, and to obtain the original data from the storage location. The original data obtained by the obtaining unit is corrected on the basis of a result of analysis by the analyzing unit to generate a complete image, and the complete image is output.12-18-2008
20090148042TEXT REPRESENTATION METHOD AND APPARATUS - A text-like data representation technique and a text-like data representation apparatus are disclosed that my acquire image data from a scanned image; segment text regions from the image data; further extract each connected component in the text regions; form clusters based on the connected components; group each connected component in the text regions into one of the clusters with similar or identical characters; generate a high-resolution representative for each cluster; generate a vector representation for each high-resolution representative; and code the text as text data by associating each connected component with its vectorized high-resolution representative, and location in the document.06-11-2009
20120033887IMAGE PROCESSING APPARATUS, COMPUTER READABLE MEDIUM STORING PROGRAM, AND IMAGE PROCESSING METHOD - An image processing apparatus includes a receiving unit, a path calculation unit, and a separation unit. The receiving unit receives an image including at least a character image. The path calculation unit calculates a separation path in the image received by the receiving unit. The separation path is a line segment that separates a character image from the image. The separation unit separates the image received by the receiving unit into plural character images using a separation path calculated by the path calculation unit. The path calculation unit calculates a separation path within a predetermined range including a portion of a character image in the image so that a cumulative value of luminance values of pixels along the separation path satisfies a predetermined condition.02-09-2012
20100080461Methods and Systems for Locating Text in a Digital Image - Aspects of the present invention are related to systems and methods for locating text in a digital image.04-01-2010
20090263019OCR of books by word recognition - Disclosed embodiments of the invention provide automated global optimization methods and systems of OCR, tailored to each document being digitized. A document-specific database is created from an OCR scan of a document of interest, which contains an exhaustive listing of words in the document. Images of each word, taken from all the fonts encountered, are entered into the database and mapped to a corresponding textual representation. After entry of a first instance of an image of a word written in a particular font, each new occurrence of the word in that font can be quickly recognized by image processing techniques. The disclosed methods and systems may be used in conjunction with adaptive character recognition training and word recognition training of the OCR engines.10-22-2009
20090279781IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM - An image processing apparatus selects one extraction method among a plurality of extraction methods and then extracts feature information of objected image data, from the objected image data using the selected extraction method. The extracted feature information is registered, and the objected image data is output together with identification information indicating the extraction method that was used in the extraction.11-12-2009
20110206281METHOD FOR FAST UP-SCALING OF COLOR IMAGES AND METHOD FOR INTERPRETATION OF DIGITALLY ACQUIRED DOCUMENTS - Method for up-scaling a color image prior to performing subsequent processing on said color image, comprising the steps of converting the color image into multiple image layers distinguishable from each other and up-scaling at least one of said multiple image layers. The up-scaling is tuned towards the subsequent processing, for example luminance is upscaled at higher quality than chrominance. Further, a method for interpreting information present on digitally acquired documents, comprising the steps of: (i) determining a country; (ii) identifying a list of languages and character sets in use in said country; (iii) performing optical character recognition simultaneously using all languages and character sets of the list; (iv) performing field parsing to identify fields in the digitally acquired document on the basis of international as well as country-specific field recognition rules; (v) storing the recognized information according to the identified fields in a database.08-25-2011
20080273796Image Text Replacement - Image text enhancement techniques are described. In an implementation, graphically represented text included in an original image is converted into process capable text. The process capable text may be used to generate a text image which may replace the original text to enhance the image. In further implementations the process capable text may be translated from a first language to a second language for inclusion in the enhanced image.11-06-2008
20110007970SYSTEM AND METHOD FOR SEGMENTING TEXT LINES IN DOCUMENTS - Methods and systems of the present embodiment provide segmenting of connected components of markings found in document images. Segmenting includes detecting aligned text. From this detected material an aligned text mask is generated and used in processing of the images. The processing includes breaking connected components in the document images into smaller pieces or fragments by detecting and segregating the connected components and fragments thereof likely to belong to aligned text.01-13-2011
20120195505TECHNIQUES INCLUDING URL RECOGNITION AND APPLICATIONS - Methods are systems are provided that include obtaining a digital image from a digital photograph, such as may be taken by a digital camera or a camera phone. The digital image includes, for example, a URI or URL, which may be contained within a visible frame. A character recognition technique, such as an optical character recognition technique, may be used to recognize the URI or URL from the digital image. The URI or URL may be used to access a corresponding Web page. The character recognition technique may be applied on the digital camera or cell phone itself, or remotely.08-02-2012
20090123071DOCUMENT PROCESSING APPARATUS, DOCUMENT PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT - In a document processing apparatus, a first character information extracting unit extracts, for a first area that is an area determined to be a character extractable area in divided areas of a document information, first character information from the area; a second character information extracting unit extracts, for a second area that is an area not determined to be the character extractable area in the divided areas, a character code by performing a character recognition processing on a document image generated from the document information as second character information; and a storing unit stores therein the first character information, the second character information, and at least one of the document information and the document image in association with each other.05-14-2009
201001428203 + 1 LAYER MIXED RASTER CONTENT (MRC) IMAGES HAVING A BLACK TEXT LAYER - A method, system and data structure for providing a 3+1 layer MRC image, including a black text layer. The black text layer includes pixel data corresponding to black text in an image and may be assigned a predetermined value for the color of black. According to one or more embodiments, using thresholding processing along with various morphological operations, the black text layer may be generated.06-10-2010
20080317344Printing apparatus and method with respect to medium - A user operates the operation panel portion 12-25-2008
20090136133Personalized fetal ultrasound image design - (Process) the consumer submits their fetal ultrasound image along with an order form (includes requirements). The image is scanned and saved as a JPEG file or similar. If required the image is enhanced (cropped, brightness, etc.), saved and moved to Microsoft Publisher or similar software. The caption design and text, the phrase design and text, are created and saved in Microsoft Publisher/similar. The caption and phrase are placed over the fetal image, positioned and saved as a Publisher file. This file is the Personalized Fetal Ultrasound Image Design. Standard format is available to consumers. (Product) the Personalized Fetal Ultrasound Image Design is unique because it captures a personalized image of the fetus; a personalized caption; a personalized phrase. The personalized image can be copied, printed or transferred onto memorabilia and keepsakes such as t-shirts, pictures, mugs, blankets, magnets, bookmarks, etc. Each is unique, original, one of a kind.05-28-2009
20090252415Method for retrieving text blocks in documents - To classify text blocks in printed material which is part of bulk postal items structure-related characteristics of one of the text blocks of a postal item are extracted, wherein the characteristics are characterized by graphical properties of the overall text block. The extracted structure-related characteristics are assigned to a characteristic data record of the postal item, and a characteristic data record of a reference text block is compared to the characteristic data record of the postal item.10-08-2009
20110229035IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM - Even when captions of a plurality of objects use an identical anchor expression, the present invention can associate an appropriately explanatory text in a body text as metadata with the objects.09-22-2011
20090080774Hybrid Graph Model For Unsupervised Object Segmentation - This disclosure describes an integrated framework for class-unsupervised object segmentation. The class-unsupervised object segmentation occurs by integrating top-down constraints and bottom-up constraints on object shapes using an algorithm in an integrated manner. The algorithm describes a relationship among object parts and superpixels. This process forms object shapes with object parts and oversegments pixel images into the superpixels, with the algorithm in conjunction with the constraints. This disclosure describes computing a mask map from a hybrid graph, segmenting the image into a foreground object and a background, and displaying the foreground object from the background.03-26-2009
20110222771PAGE LAYOUT DETERMINATION OF AN IMAGE UNDERGOING OPTICAL CHARACTER RECOGNITION - A method and system is provided for identifying a page layout of an image that includes textual regions. The textual regions are to undergo optical character recognition (OCR). The system includes an input component that receives an input image that includes words around which bounding boxes have been formed and a text identifying component that groups the words into a plurality of text regions. A reading line component groups words within each of the text regions into reading lines. A text region sorting component that sorts the text regions in accordance with their reading order.09-15-2011
20090245641DOCUMENT PROCESSING APPARATUS AND PROGRAM - A document processing apparatus includes a region extracting unit that extracts a plurality of regions in a document image, a recognition unit that recognizes a character string, a conversion unit that converts the recognized character string, a setting unit that sets first boundary lines that surrounds the document image and at least one second boundary line in a space between adjacent regions of the plurality of regions, an enlargement/reduction unit that moves in parallel at least one line of the first and second boundary lines under a restraint condition that at least one line does not intersect any of the plurality of regions, and enlarges or reduces at least one of the regions in accordance with the parallel movement so long as each region does not get out of a cell; and an insertion unit that inserts the converted character string into each of the regions.10-01-2009
20090316991Method of Gray-Level Optical Segmentation and Isolation using Incremental Connected Components - A novel and useful method of using Incremental Connected Components to segment and isolate individual characters in a gray-scale or color image. For each pixel intensity of pixels in the image, a plurality of pixel groups are created comprising contiguous pixels of intensity equal to or less than the current pixel intensity. The pixel groups are then input to a character classifier which returns an identified character and a confidence value. Non-overlapping pixel groups (i.e. segmentation) of identified characters having the highest confidence values are then selected.12-24-2009
20100002935SYSTEM AND METHOD FOR DISPLAYING DIGITAL EDITIONS OF PERIODICALS AND PUBLICATIONS - Systems and methods are provided for displaying scanned information containing at least one of text and picture information of a document, wherein the document may include a plurality of scanned images. In one embodiment, the method may involve creating and utilizing at least one association of a given scanned image relative to one or more of the other scanned images of the document. The at least one association may define a relative location of the given scanned image with respect to one or more of the other scanned images.01-07-2010
20090297027ELECTRONIC DOCUMENT PRODUCING DEVICE, ELECTRONIC DOCUMENT PRODUCING METHOD AND STORAGE MEDIUM - An electronic document producing device has a correcting unit for correcting distortion of a first image to obtain a correction image, and a character recognition unit for executing character recognition processing on a plurality of character images contained in the correction image to obtain text data. The device also has a unit for finding a base line of each character row in the first image, and a unit for finding a relative position from the base line in regard to each character image in the first image. The device also includes a producing unit for producing an electronic document including the text data and the first image, wherein a position of the text data is described based on the relative position from the base line.12-03-2009
20100254605MOVING TEXT DETECTION IN VIDEO - Methods and apparatus for detecting moving text in video comprising receiving consecutive frames from a video stream, extracting a sequence of pixels from the consecutive frames, categorizing the pixels, thinning the pixels, correlating corresponding thinned pixels in the frames, identifying the peaks that are equal to or exceed a threshold, and performing further processing on the peaks to determine if the peaks contain moving text.10-07-2010
20090110279SYSTEM AND METHOD FOR EXTRACTING AND ORGANIZING DATA FROM ELECTRONIC IMAGES - A method of extracting and organizing data from electronic images includes processing a set of data fields representative of data to be extracted, mapping at least a subset of the set of data fields to at least one subclient, and attaching a rule from a set of rules to at least one of the mapped data fields. Each rule in the set of rules represents a transformation from a first data format to a preferred data format. The method also includes extracting data from at least one electronic image for the at least one subclient into the plurality of mapped data fields using the attached rule and storing the extracted data.04-30-2009
20100239165Model-Based Dewarping Method And Apparatus - An apparatus and method for processing a captured image and, more particularly, for processing a captured image comprising a document. In one embodiment, an apparatus comprising a camera to capture documents is described. In another embodiment, a method for processing a captured image that includes a document comprises the steps of distinguishing an imaged document from its background, adjusting the captured image to reduce distortions created from use of a camera and properly orienting the document is described.09-23-2010
20100239166CHARACTER RECOGNITION DEVICE, IMAGE-READING DEVICE, COMPUTER READABLE MEDIUM, AND CHARACTER RECOGNITION METHOD - A character recognition device includes: an acquiring unit that acquires image data describing pixel values representing colors of pixels constituting an image; a binarizing unit that binarizes the pixel values; an extracting unit that extracts boundaries of colors in the image; a delimiting unit that delimits plural image areas in the image; a specifying unit that specifies, with regard to first image areas arranged according to a predetermined rule, pixels binarized by the binarizing unit, as a subject for character recognition, and specifies, with regard to second image areas not arranged according to the predetermined rule, pixels of areas surrounded by boundaries extracted by the extracting unit, as a subject for character recognition; and a character recognition unit that recognizes characters represented by the pixels specified by the specifying unit as a subject for character recognition.09-23-2010
20090041352IMAGE FORMATION DEVICE, IMAGE FORMATION METHOD, AND COMPUTER-READABLE RECORDING MEDIUM RECORDING IMAGE FORMATION PROGRAM - An image formation device includes an obtaining unit obtaining an input image and a first reduction scale, a first extraction unit extracting a first region including a character from the input image, a first setting unit setting a second reduction scale greater than the first reduction scale when a size of the reduced character is smaller than a first size, and a formation unit forming a second processed image by reducing the first region at the second reduction scale when the size of the character is smaller than the first size.02-12-2009
20110026827VISUALIZATION PROGRAM, VISUALIZATION METHOD AND VISUALIZATION APPARATUS FOR VISUALIZING READING ORDER OF CONTENT - A visualization program, method and apparatus for determining reading order of content in a structured document. The method includes generating, for each of a plurality of elements, a directed segment; storing, in the reading order, the generated directed segments of the elements into a storage device; reading from the storage device; linking together the directed segments for the elements in accordance with the reading order; and displaying the linked directed segments overlaid on the structured document which is displayed on the screen. A computer implemented program and an apparatus for carrying out the above method are also provided.02-03-2011
20090324080IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD AND STORAGE MEDIUM - Character code data and vector drawing data are both listed and provided in a re-editable manner. Electronic data is generated in which information obtained by vectorizing character areas in an image and information obtained by recognizing characters in the image are stored in respective storage locations. As for the electronic data generated in this manner, because character code data and vector drawing data generated from the input image are both presented by a display and edit program, a user can immediately utilize the both data.12-31-2009
20090324079Methods and Systems for Region-Based Up-Scaling - Aspects of the present invention are related to systems and methods for region-based up-scaling, and in particular, for up-scaling still images and video frames that contain graphical elements.12-31-2009
20090110280IMAGE RECOGNITION APPARATUS, IMAGE RECOGNITION PROGRAM, AND IMAGE RECOGNITION METHOD - An image recognition method is conducted by recognizing logical elements based on a logical structure model set to correspond to the logical structure of an image of individual character strings, collecting information processed with the logical structure model of images of a logical structure, acquiring a recognition result when recognizing an image of a logical structure by processing information collected with a post-update logical structure model,04-30-2009
20090148043METHOD FOR EXTRACTING TEXT FROM A COMPOUND DIGITAL IMAGE - Text is extracted from a grayscale or color compound digital image. Kernels of text in the compound digital image are found using a stroke operator. The kernels of text are segmented into text blocks based on image space, color space, and intensity space. Each text block is segmented into text and background pixels using active contour analysis. The segmented text blocks are refined by altering parameters in the active contour analysis. Text is extracted from the refined segmented text blocks, and a binary image is created including text extracted from the refined segmented text blocks.06-11-2009
20090067719System and method for automatic segmentation of ASR transcripts - Text segmentation based on topic boundary detection has been an industry problem in automating information dissemination to targeted users. A system for automatic segmentation of ASR output text involves boundary identification based on “topic” changes. The proposed approach is based on building a weighted graph to determine dependency in input sentences based on bi-directional analysis of the input sentences. Furthermore, the input sentences are segmented based on the notion of segment cohesiveness and the segmented sentences are merged based on preamble and postamble analyses.03-12-2009
20090003700Precise Identification of Text Pixels from Scanned Document Images - A system or method for identifying text in a document. A group of connected components is created. A plurality of characteristics of different types is calculated for each connected component. Statistics are computed which describe the group of characteristics. Outlier components are identified as connected components whose computed characteristics are outside a statistical range. The outlier components are removed from the group of connected components. Text pixels are identified by segmenting pixels in the group of connected components into a group of text pixels and a group of background pixels.01-01-2009
20110243444SEGMENTATION OF TEXTUAL LINES IN AN IMAGE THAT INCLUDE WESTERN CHARACTERS AND HIEROGLYPHIC CHARACTERS - An image processing apparatus segments Western and hieroglyphic portions of textual lines. The apparatus includes an input component that receives an input image having at least one textual line. The apparatus also includes an inter-character break identifier component that identifies candidate inter-character breaks along a textual line and an inter-character break classifier component. The inter-character break classifier component classifies each of the candidate inter-character breaks as an actual break, a non-break or an indeterminate break based at least in part on the geometrical properties of each respective candidate inter-character break and the bounding boxes adjacent thereto. A character recognition component recognizes the candidate characters based at least in part on a feature set extracted from each respective candidate character that can be histogram features, Gabor features or any other feature set applicable to character recognition. A Western and hieroglyphic text classifier component finds and classifies textual line segments as Western text segments or hieroglyphic text segments and further passes the recognition results to an output component.10-06-2011
20120243787SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT FOR DOCUMENT IMAGE ANALYSIS USING FEATURE EXTRACTION FUNCTIONS - Methods, systems and computer program products to improve the efficiency and computational speed of an image enhancement process. In an embodiment, information that is generated as interim results during feature extraction may be used in a segmentation and classification process and in a content adaptive enhancement process. In particular, a cleaner image that is generated during a noise removal phase of feature extraction may be used in a content adaptive enhancement process. This saves the content adaptive enhancement process from having to generate a cleaner image on its own. In addition, low-level segmentation information that is generated during a neighborhood analysis and cleanup phase of feature extraction may be used in a segmentation and classification process. This saves the segmentation and classification process from having to generate low-level segmentation information on its own.09-27-2012
20090097750IMAGE PROCESSING APPARATUS - An information embedding apparatus (04-16-2009
20100061633Method and Apparatus for Calculating the Background Color of an Image - A method, device and computer readable storage media for enhancing an image for optical character recognition by detecting the edges of the image to create an edge detected image, binarizing the edge detected image to create a binary edge image for processing, dilating the binary edge image to create a dilated binary edge image, taking the XOR difference between the binary edge image and the dilated binary edge image to obtain a text boundary, superimposing the text boundary on the image and determining the pixels of the image that are covered by the text boundary, calculating the average grayscale value of the pixels of the image that are covered by the text boundary, and setting background pixels of the image to the calculated average grayscale value of the pixels of the image that are covered by the text boundary. The method optionally includes the steps of performing edge filling and hole filling on the binary edge image to create an updated binary edge image and filling holes in the dilated binary edge image to create an updated dilated binary edge image, whereby the XOR difference is taken between the updated binary edge image and the updated dilated binary edge image. The image may be binarized for optimal results after the background images have been set to the calculated average grayscale value of the pixels that are covered by the text boundary.03-11-2010
20100061634Method of Retrieving Information from a Digital Image03-11-2010
20120201457FINDING REPEATED STRUCTURE FOR DATA EXTRACTION FROM DOCUMENT IMAGES - Methods and system employing the same for finding repeated structure for data extraction from document images are provided. A reference record and one or more reference fields thereof are identified from a document image. One or more candidate fields are generated for each of the reference fields. One or more best candidate records from the candidate fields are selected using a probabilistic model and an optimal record set is determined from the best candidate records.08-09-2012
20100098336IMAGE PROCESSING APPARATUS - An image processing apparatus including an input part configured to input document data of a document, an extracting part configured to automatically extract partial image data from the document data, a storage part configured to store the document data and configuration data of the document data, a registering part configured to associate the document data with the partial image data and register the document data and the associated partial image data in the storage part, a generating part configured to generate push-type data based on the configuration data, and a transmitting part configured to transmit the push-type data.04-22-2010
20120201458SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR DETERMINING WHETHER TEXT WITHIN AN IMAGE INCLUDES UNWANTED DATA, UTILIZING A MATRIX - A system, method, and computer program product are provided for determining whether text within an image includes unwanted data, utilizing a matrix. In operation, a matrix corresponding to an image is generated. Additionally, text within the image is identified utilizing the matrix. Furthermore, it is determined whether the text includes unwanted data.08-09-2012
20080253655TEXT AND GRAPHIC SEPARATION METHOD AND TEXT ENHANCEMENT METHOD - The present invention provides a text and graphic separation method and a text enhancement method. The text and graphic separation method is used for separating texts and graphics of an image and comprises coarse classification and advanced classification. The method of the present invention also adjusts the luminance of the text to enhance the text image according to the separation result.10-16-2008
20110052062SYSTEM AND METHOD FOR IDENTIFYING PICTURES IN DOCUMENTS - A system and method to identify pictures in documents. An image representing a page of a document is received. The image is analyzed to identify text objects in the page. A masked image is generated by masking out regions of the image including the text objects in the page. Groups of pixels in the masked image are identified, wherein a respective group of pixels corresponds to at least one picture in the page. When there is one or more groups of pixels, regions for pictures are identified based on the one or more groups of pixels. Metadata tags for the pictures are stored, wherein a respective metadata tag for a respective picture includes information about a respective bounding box for the respective picture.03-03-2011
20100260420VARIABLE GLYPH SYSTEM AND METHOD - An automatic word segmentation and glyph variation system and method are provided herein.10-14-2010
20100166309System And Methods For Creation And Use Of A Mixed Media Environment - A Mixed Media Reality (MMR) system and associated techniques are disclosed. The MMR system provides mechanisms for forming a mixed media document that includes media of at least two types (e.g., printed paper as a first medium and digital content and/or web link as a second medium). In one particular embodiment, the MMR system includes an MMR user, a MMR computer, a user printer that produces a printed document, a networked media server, an office portal, a service provider server, an electronic display that is electrically connected to a set-top box, a document scanner, a network, a capture device, a cellular infrastructure, wireless fidelity (Wi-Fi) technology, Bluetooth® technology, infrared (IR) technology, wired technology, and a geo location mechanism. The MMR system of the present invention provides mechanisms for forming a mixed media document that includes media of at least two types, such as printed paper as a first medium and a digital photograph, digital movie, digital audio file, or web link as a second medium. Furthermore, the MMR system of the present invention facilitates business methods that take advantage of the combination of a portable electronic device, such as a cellular camera phone, and a paper document.07-01-2010
20100195909SYSTEM AND METHOD FOR EXTRACTING INFORMATION FROM TEXT USING TEXT ANNOTATION AND FACT EXTRACTION - A fact extraction tool set (“FEX”) finds and extracts targeted pieces of information from text using linguistic and pattern matching technologies, and in particular, text annotation and fact extraction. Text annotation tools break a text, such as a document, into its base tokens and annotate those tokens or patterns of tokens with orthographic, syntactic, semantic, pragmatic and other attributes. A user-defined “Annotation Configuration” controls which annotation tools are used in a given application. XML is used as the basis for representing the annotated text. A tag uncrossing tool resolves conflicting (crossed) annotation boundaries in an annotated text to produce well-formed XML from the results of the individual annotators. The fact extraction tool is a pattern matching language which is used to write scripts that find and match patterns of attributes that correspond to targeted pieces of information in the text, and extract that information.08-05-2010
20110188753IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD AND COMPUTER READABLE MEDIUM - An image processing device includes: an acquisition section that acquires subject image information to be formed on a medium; an extraction section that selectively extracts a part of the subject image information corresponding to a portion of an image not formed due to a plurality of holes of a medium if an image relating to the subject image information is formed on the medium perforated with the plurality of holes; and a generation section that generates new subject image information by generating a command for forming the extracted part of the subject image information.08-04-2011
20080267502Variable skew correction system and method - A variable skew correction system comprises a de-skew application executable to transform scanned image data of a document exhibiting a variable skew condition to an output model representing a de-skewed image of the document by transferring pixel data for each of a plurality of raster lines of the scanned image data to the output model, wherein a variable spacing between at least two adjacent raster lines of the scanned image data is modified in the output model to correct non-linear distortion of the scanned image data.10-30-2008
20120039536OPTICAL CHARACTER RECOGNITION WITH TWO-PASS ZONING - An image of a paginated document is zoned to identify text zones. First-pass character recognition is performed on the text zones to generate textual content corresponding to the paginated document. The image of the paginated document is re-zoned based on the textual content to identify one or more new text zones. Second-pass character recognition is performed on at least the new text zones to generate updated textual content corresponding to the paginated document.02-16-2012
20110064310IMAGE FORMING APPARATUS THAT AUTOMATICALLY CREATES AN INDEX AND A METHOD THEREOF - An image forming apparatus capable of automatically creating an index, and a method for the same. The image forming apparatus includes a scan unit to scan a document, a text/image separation unit to separate the scanned document into a text area and an image area and to separate texts in the text area into symbols, an index determination unit to extract one or more properties of the separated symbols and to compare the extracted symbol properties with one or more index thresholds to determine whether text including the symbols is an index object, and an index page creation unit to create an index page including the text determined as the index object and information about a page including the text that corresponds to the index object. Accordingly, since the index page is automatically created, main contents of each page of the document can be easily selected and/or presented. Also, a search for desired contents in the document is facilitated by a link between the index page and original contents of the pages in the document, thereby improving user convenience.03-17-2011
20120207391INTERACTIVE PAPER SYSTEM - A printer, scanner device and methods for using same are described herein. A printer device may include a dedicated input that, when actuated, generates and sends a request to a computer for known data or a predetermined print job, e.g., schedule information from a personal information management (PIM) application. A scanner device may include another dedicated input that, when actuated, automatically scans a document fed to the device by the user and sends the scanned image to IM (or other) software on a computer, bypassing the need to manipulate the scanned image using scanner software. The device may be used with printed metapaper, which includes a barcode or other indicia identifying the metapaper and corresponds to a stored template image of the metapaper. When the metapaper is rescanned, the scan can be compared to the stored template information to identify changes and synchronize the changes with the IM software.08-16-2012
20120207390SYSTEMS AND METHODS FOR REPLACING NON-IMAGE TEXT - Systems and methods for replacing non-image text are provided. One method for replacing non-image text includes padding a first data representing an image of text to create an image segment. The method includes replacing a second data representing non-image text with the image segment.08-16-2012
20120008864IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND COMPUTER READABLE MEDIUM - An apparatus comprises: unit configured to divide input document data into a body region, a caption region, and an object region; unit configured to acquire text information included in each of the body region and the caption region; unit configured to search the text information in the body region for an anchor term, to extract an anchor term from the text information in the caption region, and to generate a bi-directional link between a portion corresponding to the anchor term in the body region and a portion of the object region to which the caption region is appended; and unit configured to convert the input document data into digital document data in which the portion corresponding to the anchor term in the body region and the portion corresponding to the object region to which the caption region is appended are bi-directionally linked based on the link.01-12-2012
20120045129Document image processing method and apparatus - A method for processing a document image includes: performing horizontal and vertical text line extraction on the document image; providing an overlapping matrix, a value of an element of the overlapping matrix indicating an overlapping relation between horizontal and vertical text lines; merging the overlapping matrix in the vertical and horizontal direction; determining one or more text overlapping regions in the document image, based on the values of the elements of the merged overlapping matrix; counting the total number of strokes or pixel points in the horizontal and vertical text lines, respectively, within one of the one or more text overlapping regions; and determining an orientation of the text overlapping region is horizontal if the total number of strokes or pixel points in the horizontal text lines is larger than that in the vertical text lines, otherwise, determining the orientation is vertical.02-23-2012
20120114241USING EXTRACTED IMAGE TEXT - Methods, systems, and apparatus including computer program products for using extracted image text are provided. In one implementation, a computer-implemented method is provided. The method includes receiving an input of one or more image search terms and identifying keywords from the received one or more image search terms. The method also includes searching a collection of keywords including keywords extracted from image text, retrieving an image associated with extracted image text corresponding to one or more of the image search terms, and presenting the image.05-10-2012
201100698853+N LAYER MIXED RATER CONTENT (MRC) IMAGES AND PROCESSING THEREOF - A method for processing image data includes using advantages of both a three-layer MRC model and an N-layer MRC model to create a new 3+N layer MRC model and to generate a 3+N layer MRC image. The method includes providing input image data; segmenting the input image data to generate: (i) a background layer representing the background and the pictorial attributes of the image data, (ii) one or more binary foreground layers, (iii) a selector layer, and (iv) a contone foreground layer representing the foreground attributes of the image data on the background layer; and integrating the background layer, the selector layer, the contone foreground layer, and the one or more binary foreground layers into a data structure having machine-readable information for storage in a memory device. Each binary foreground layer includes one or more pixel clusters representing text pixels of a particular color in the input image data.03-24-2011
20100092087METHODS AND APPARATUS FOR PERFORMING IMAGE BINARIZATION - Methods and apparatus for binarizing images represented by sets of multivalent pixel values in a computationally efficient manner are described In a grayscale image to be binarized, one group of pixel values represents “foreground”, e.g., text to be converted to black, while another group represents a shaded “background” region to be converted, e.g., to white. The difference between foreground and background is often a function of the scale of the image components, e.g., text and/or other images. Filters in the form of morphological operators, computationally efficient quick-open and quick-close morphological operators are employed to binarize images, e.g., grayscale images. The methods and apparatus effectively handle both smooth and sharp image background structures in a computationally efficient manner.04-15-2010
20100254606METHOD OF RECOGNIZING TEXT INFORMATION FROM A VECTOR/RASTER IMAGE - A method is claimed for processing a vector-raster image file which contains a text image. The method comprises the steps of: fragmenting the image to obtain regions containing non-separable, logically connected fragments of text of the maximum possible size; processing text, vector, and raster objects; discarding excessive information; analyzing each object with the help of all available information. The step of processing text objects includes the steps of: dividing into separate characters and character groups according to supposed locations of blank spaces or other non-indicated symbols, and analyzing and assembling character groups into words and verifying and correcting characters encoding based on recognition of assembled words as raster objects. The step of processing vector objects includes the step of identifying separators, background, and substrates of blocks. The step of processing raster objects includes the steps of: analyzing non-text objects on order to detect text images within them, and/or detecting vector objects other than separators.10-07-2010
20120314953INFORMATION PROCESSING APPARATUS, PROGRAM, AND INFORMATION PROCESSING METHOD - There is provided an information processing apparatus including a selecting unit that selects figures from a candidate figure group based on recognition values obtained through character recognition with respect to an input image, and a display control unit that performs control to display the figures selected by the selecting unit.12-13-2012
20100246959APPARATUS AND METHOD FOR GENERATING ADDITIONAL INFORMATION ABOUT MOVING PICTURE CONTENT - An apparatus and method for generating additional information about moving picture content, including: comparing image feature information about each image frame in moving picture content with image feature information about each image frame in web information, searching for an image frame in the moving picture content, the image frame matching the image frame in the web information, determining location information about the found image frame in the moving picture content, and generating additional information by use of the determined location information and the web information.09-30-2010
20100246960Image Based Spam Blocking - A fingerprint of an image identified within a received message is generated following analysis of the message. A spam detection engine identifies an image within a message and converts the image into a grey scale image. The spam detection engine analyzes the grey scale image and assigns a score. A fingerprint of the grey scale image is generated based on the score. The fingerprint may also be based on other factors such as the message sender's status (e.g. blacklisted or whitelisted) and other scores and reports generated by the spam detection engine. The fingerprint is then used to filter future incoming messages.09-30-2010
20100246958TABLE GRID DETECTION AND SEPARATION - A technique is described for table grid detection and separation during the analysis and recognition of documents containing table contents. The technique includes the steps of table detection, grid separation, and table cell extraction. The technique is characterized by the steps of detecting the grid lines of a table using, for example, inverse cell detection, separating noise and touching text from the grid lines, and extracting the cell contents for OCR recognition.09-30-2010
20120213441RECEIPTS SCANNER AND FINANCIAL ORGANIZER - The system contains a scanner, an apparatus for scanning receipts into a computer and a unique software program which automatically processes, organizes and saves expense information that can be viewed in various formats, namely, tabular statements, pie-charts, etc. The scanner, which accommodates paper of differing sizes, is used to input bills, receipts, bank statements, etc. The scanner is usually connected to a computer through a Universal Serial Bus or a parallel port for easy installation. The software program creates a text file of the scanned data by inclusion of sorting, categories, etc., and automatically saves the information in Quicken Interchange Format, allowing it to be imported into any financial management software for further processing. Each receipt is treated as an individual transaction. Multiple items in the receipt are used to create a “split” transaction with proper customizable categories added. Further, the software also allows for record keeping, budgeting and budget balancing.08-23-2012
20120163718Removing character from text in non-image form where location of character in image of text falls outside of valid content boundary - Data representing an image of text is received, as is data representing the text in non-image form. A valid content boundary within the image of the text is determined. For each character within the text in the non-image form, a location of the character within the image of the text is determined. Where the location of the character within the image of the text falls outside the valid content boundary, the character is removed from the data representing the text in the non-image form.06-28-2012
20120134588RECTIFICATION OF CHARACTERS AND TEXT AS TRANSFORM INVARIANT LOW-RANK TEXTURES - A “Text Rectifier” provides various techniques for processing selected regions of an image containing text or characters by treating those images as matrices of low-rank textures and using a rank minimization technique that recovers and removes image deformations (e.g., affine and projective transforms as well as general classes of nonlinear transforms) while rectifying the text or characters in the image region. Once distortions have been removed and the text or characters rectified, the resulting text is made available for a variety of uses or further processing such as optical character recognition (OCR). In various embodiments, binarization and/or inversion techniques are applied to the selected image regions during the rank minimization process to both improve text rectification and to present the resulting images of text to an OCR engine in a form that enhances the accuracy of the OCR results.05-31-2012
20120076414External Image Based Summarization Techniques - Techniques involve visually summarizing documents (e.g., search results, a collection of documents, etc.) using images which are visually representative of the documents for which the images represent. The images representing the documents may be external images obtained from sources other than the documents. The external images may be obtained from the sources other than the documents by performing a separate image based search using key phrases from the documents rather than extracting the images directly from within the documents themselves. Alternatively, an algorithm may be used to determine an image type, which may be chosen from a selection of external images, thumbnail images, or internal imaged taken directly from the collection of documents, that is suited to represent each document in the collection of documents. A snippet of the documents may be displayed along with the images which visually represent each of the documents.03-29-2012
20120076413Methods and Systems for Automatic Extraction and Retrieval of Auxiliary Document Content - Aspects of the present invention are related to systems and methods for automatically extracting, from a document image, references to relevant external content and automatically retrieving the external content associated with the references.03-29-2012
20120177290AUTOMATIC TABLE LOCATION IN DOCUMENTS - A method for locating tables in documents includes defining a plurality of tiles for a document, for each tile, determining a horizontal profile and a vertical profile, determining the location of lines by means of gradients of the horizontal profiles and the vertical profiles, selecting from the lines, the lines that are persistent, determining a rectangle in at least one corner of the document based on the persistent lines, and applying heuristics in order to accept or reject a determined rectangle as a table of the document. An apparatus for automatically locating a table in a document applies the method for locating tables in documents.07-12-2012
20100008579SYSTEM AND METHOD FOR IDENTIFYING TEXT-BASED SPAM IN RASTERIZED IMAGES - A system, method and computer program product for identifying spam in an image, including (a) identifying a plurality of contours in the image, the contours corresponding to probable symbols; (b) ignoring contours that are too small or too large; (c) identifying text lines in the image, based on the remaining contours; (d) parsing the text lines into words; (e) ignoring words that are too short or too long from the identified text lines; (f) ignoring text lines that are too short; (g) verifying that the image contains text by comparing a number of pixels of a symbol color within remaining contours to a total number of pixels of the symbol color in the image, and that there is at least one text line after filtration; and (h) if the image contains text, rendering a spam/no spam verdict based on a contour representation of the text that which appears after step (f).01-14-2010
20100008580IMAGE PROCESSING APPARATUS AND IMAGE PROCESSING METHOD - An image scanner using an area sensor having a tilt reads a plurality of low-resolution image data having phase shifts from each other, and the low-resolution image data are converted into those on an orthogonal coordinate system by affine transformation. The number of data to be used is decided based on one of these low-resolution image data. The low-resolution image data as many as the designated number of data are saved, and high-resolution image data is generated by executing super-resolution processing.01-14-2010
20120189202IMAGE PROCESSING APPARATUS, IMAGE PROCESSING SYSTEM AND IMAGE PROCESSING METHOD - A handwritten area is separated from image data of printed material in which handwriting has been inserted, and the separated handwritten area is identified as an enclosing line or a class symbol. An image area enclosed within the handwritten area identified as the enclosing line is extracted and acquired as an extracted image. The class symbol is correlated to an enclosing line drawn nearest the class symbol, and the extracted images are classified into groups according to the image areas within the enclosing line correlated to the type of class symbols. The grouped images are organized as listed data.07-26-2012
20090016606METHOD, SYSTEM, DIGITAL CAMERA AND ASIC FOR GEOMETRIC IMAGE TRANSFORMATION BASED ON TEXT LINE SEARCHING - The present invention provides a method, system and/or a digital camera providing a geometrical transformation of deformed images of documents comprising text, by text line tracking, resulting in an image comprising parallel text lines. The transformed image is provided as an input to an OCR program either running in a computer system or in a processing element comprised in said digital camera.01-15-2009
20090016604Invisible Junction Features for Patch Recognition - The present invention uses invisible junctions which are a set of local features unique to every page of the electronic document to match the captured image to a part of an electronic document. The present invention includes: an image capture device, a feature extraction and recognition system and database. When an electronic document is printed, the feature extraction and recognition system captures an image of the document page. The features in the captured image are then extracted, indexed and stored in the database. Given a query image, usually a small patch of some document page captured by a low resolution image capture device, the features in the query image are extracted and compared against those stored in the database to identify the query image. The present invention also includes methods for feature extraction, feature indexing, feature retrieval and geometric estimation.01-15-2009
20110123114CHARACTER RECOGNITION DEVICE AND METHOD AND COMPUTER-READABLE MEDIUM CONTROLLING THE SAME - A character recognition device to recognize characters after preprocessing an input image corrects distortion. The character recognition device includes an image input unit to receive an image acquired by an image device, a character position estimator to calculate a probability value of a position of characters of the image to estimate the position of the characters, an image preprocessor to detect a plurality of edges including the characters from the image and to correct distortion of the edges, and a character recognizer to recognize the characters included in a rectangle formed by the plurality of edges.05-26-2011
20120269435Contact Text Detection in Scanned Images - A device and method for identifying text pixels that are erroneously classified as non-text pixels, includes accessing an image region containing a non-text component. For each non-text component within the image region component, only one determines if there are any long line structures within the bounding box defined by the non-text component. If the long line structures are greater than a predefined percentage of the span of the dimension of the bounding box parallel to the line structure, then the line structure is removed. Any remaining non-text pixels within the bounding box are reclassified as text-pixels.10-25-2012
20080279453OCR enabled hand-held device - A method of processing image data consistent with certain embodiments involves defining a segment of a visual field using a laser pointer; capturing an image of the segment of the visual field; and processing the captured segment to produce associated text associated with the selected segment. This abstract is not to be considered limiting, since other embodiments may deviate from the features described in this abstract.11-13-2008
20120321188IDENTIFYING INFORMATION RELATED TO A PARTICULAR ENTITY FROM ELECTRONIC SOURCES, USING DIMENSIONAL REDUCTION AND QUANTUM CLUSTERING - Presented are systems and methods for identifying information about a particular entity including acquiring electronic documents having unstructured text, that are selected based on one or more search terms from a plurality of terms related to the particular entity. Tokenizing the acquired documents to form a data matrix and then calculating a plurality of eigenvectors, using the data matrix and the transpose of the data matrix. The variance is then acquired for determining the amount of intra-clustering between the documents and then the acquired documents are clustered using some of the eigenvectors and the variance.12-20-2012
20100202690METHOD FOR RECOGNIZING TEXT FROM IMAGE - Disclosed is a method of recognizing a text from an image. The method includes dividing the image into a predefined number of regions through a clustering technique; setting a certain area of the regions as a background region; identifying the outer peripheral pixel and inner peripheral pixel of each region except for the background region of the divided regions; setting a region identified as having one of its outer peripheral pixel and its inner peripheral pixel corresponding to a pixel of the background region, as a boundary region; and setting a region identified as having any of its outer peripheral pixel and its inner peripheral pixel not corresponding to a pixel of the background region, as a center text region, and excluding the boundary region from a binary-coding object of the text.08-12-2010
20110158533ROBUST AND EFFICIENT IMAGE IDENTIFICATION - Apparatus for matching a query image against a catalog of images, comprises: a feature extraction unit operative for extracting principle features from said query image; a relationship unit operative for establishing relationships between a given principle feature and other features in the image, and adding said relationships as relationship information alongside said principle features; and a first comparison unit operative for comparing principle features and associated relationship information of said query image with principle features and associated relationship information of images of said catalog to find candidate matches.06-30-2011
20110158532APPARATUS FOR DETECTING TEXT RECOGNITION REGION AND METHOD OF RECOGNIZING TEXT - A text recognition region detecting apparatus and a text recognition method are provided. A text recognition region is detected by expanding a region based on a user-specified position that is input through a simple manipulation by a user. A text recognition is performed on the detected text recognition region, thereby relieving a user from having to precisely input the text region and ensuring the user's convenience.06-30-2011
20130114899CONTENT DESCRIPTOR - An apparatus, method, system and computer-readable medium are provided for generating one or more descriptors that may be associated with content. A teaser for the content may be identified based on contextual similarity between words and/or phrases in the segment and one or more other segments, such as a previous segment. An optical character recognition (OCR) technique may be applied to the content, such as banners or graphics associated with the content in order to generate or identify OCR'd text or characters. The text/characters may serve as a candidate descriptor(s). One or more strings of characters or words may be compared with (pre-assigned) tags associated with the content, and if it is determined that the one or more strings or words match the tags within a threshold, the one or more strings or words may serve as a candidate descriptor(s). One or more candidate descriptor identification techniques may be combined.05-09-2013
20080205758Distortion Correction of a Captured Image - Disclosed are embodiments of systems and methods for eliminating or reducing the distortion in a scanned image. Embodiments of the present invention allow for the automatic pruning, de-skewing, and unwarping of an image using document layout information. In embodiments, dominant baselines may be selected by examining the letter regions on boundary baselines rather than examining the entire document layout. The dominant baselines may then be used to reduce distortion in the image. It shall be noted that present invention is robust enough to handle many types of content, including different languages, as well as documents with different layouts. The present invention may also be applied to images obtained from bound documents and flat documents.08-28-2008
20080199076Image processing method, image processing apparatus, image forming apparatus, program, and storage medium - An image processing apparatus includes: a document type automatic classification section which determines whether input image data is image data for a text document or not; a newspaper document classification section which determines whether the input image data is image data for a newspaper document or not; a segmentation process section which identifies a page-background region in the input image data; and a color correction section for, if the input image data is classified as the text document and but not the newspaper document and if a page-background removal process is to be performed to the input image data, performing a first page-background removal process to the image data, but if the input image data is classified as the text document and the newspaper document, not performing the first page-background removal process to the image data. This makes it possible to prevent deterioration of visual sharpness of the text in the document image printed on the newspaper.08-21-2008
20130121579SOFTWARE FOR TEXT AND IMAGE EDIT RECOGNITION FOR EDITING OF IMAGES THAT CONTAIN TEXT - Software for editing text and images enables a user to select a portion of an image and read the text on the selected image portion via an OCR function. The software enables the user to apply a mask containing the originally read text, that allows the user to type or paste new text to replace the previously read text in the selected image portion. The software also enables a user to edit images by automatically recognizing the borders of fields and/or columns and the background color. As a result, the user can easily modify an image by applying a mask to an image, wherein the mask has new data such as different text, the identical background color or a different background color or different layout, etc. and which may be placed exactly on the recognized borders of the original image.05-16-2013

Patent applications in class Distinguishing text from other regions