Patent application number | Description | Published |
20080240618 | Image-document retrieving apparatus, method of retrieving image document, program, and recording medium - Feature vectors used in discrimination of images include information on feature blocks of images in an image-document retrieving apparatus of the present invention. Text areas of a page image document are combined to form rectangular images. On the basis of information on the rectangular images that are extracted, a geometric structure of the page is analyzed, the page image document is divided into plural blocks, and then a plurality of feature blocks describing features of the page document image are selected from the plural blocks. The feature vectors are constituted of information on the feature blocks thus selected. This makes it possible to provide an image-document retrieving apparatus and a method of retrieving image documents, by which retrieval of image documents containing mainly text and a graphic is improvable in accuracy. | 10-02-2008 |
20080244378 | Information processing device, information processing system, information processing method, program, and storage medium - An information processing device includes: a feature extracting section for extracting, as format information, a format feature of a process-target document from image data of the process-target document, on which filling-in spaces of plural items are printed; a document recognizing section for comparing the format information of the process-target document with registered format information stored in a storage device, and specifying a registered document that corresponds to the process-target document, the registered format information regarding format features of registered documents; a data acquiring section for converting characters in the image data of the process-target document into text data; and a distributing section for grouping the image data and text data of the characters into plural groups according to a separation rule that is set for the registered document, the characters being written in the fill-in spaces of the items of the process-target document, and for transmitting the different groups to different external devices. With this, information such as personal information to be protected can be processed, preventing an operator dealing with the information from obtaining the whole information. | 10-02-2008 |
20090028435 | CHARACTER IMAGE EXTRACTING APPARATUS AND CHARACTER IMAGE EXTRACTING METHOD - In an extracting step, the extracting portion obtains a linked component composed of a plurality of mutually linking pixels from a character string region composed of a plurality of characters, and extracts section elements from the character string region, the section elements each being surrounded by a circumscribing figure circumscribing to the linked component. In the first altering step, the first altering portion combines section elements at least having a mutually overlapping part among the extracted section elements so as to prepare a new section element. In the first selecting step, the first selecting portion determines a reference size in advance and selects section elements having a size greater than the reference size, from among the section elements altered in the first altering step. | 01-29-2009 |
20090028445 | CHARACTER IMAGE FEATURE DICTIONARY PREPARATION APPARATUS, DOCUMENT IMAGE PROCESSING APPARATUS HAVING THE SAME, CHARACTER IMAGE FEATURE DICTIONARY PREPARATION PROGRAM, RECORDING MEDIUM ON WHICH CHARACTER IMAGE FEATURE DICTIONARY PREPARATION PROGRAM IS RECORDED, DOCUMENT IMAGE PROCESSING PROGRAM, AND RECORDING MEDIUM ON WHICH DOCUMENT IMAGE PROCESSING PROGRAM IS RECORDED - An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided character by character, and image features of each character image are extracted. On the basis of the image features, N (N>1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters from a character image feature dictionary which stores the image features of character image in units of character, and the first index matrix of M×N cells is prepared. A candidate character string composed of a plurality of candidate characters constituting the first column of the first index matrix, is subjected to a lexical analysis according to a predetermined language model, whereby a second index matrix adjusted into a character string which makes sense is prepared to he utilized for searching. | 01-29-2009 |
20090028446 | DOCUMENT IMAGE PROCESSING APPARATUS, DOCUMENT IMAGE PROCESSING METHOD, DOCUMENT IMAGE PROCESSING PROGRAM, AND RECORDING MEDIUM ON WHICH DOCUMENT IMAGE PROCESSING PROGRAM IS RECORDED - An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided into separate characters. Image features of each character image are extracted. Based on the image features, N (N>1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters, from a character image feature dictionary which stores the image features of character image in units of character, and a first index matrix of M×N cells is prepared. A candidate character string composed of a plurality of candidate characters constituting a first column of the first index matrix, is subjected to a lexical analysis according to a language model, and whereby a second index matrix having a character string which makes sense is prepared. In the language model, statistics are taken and then, the lexical analysis is performed. | 01-29-2009 |
20090030882 | DOCUMENT IMAGE PROCESSING APPARATUS AND DOCUMENT IMAGE PROCESSING METHOD - There is provided a document image processing apparatus which can reduce troubles to find a desired heading from a document image. A heading region extracting portion searches an index information DB and extracts a heading region containing a search keyword. An order setting portion automatically sets in line with a predetermined rule an order of the heading regions extracted by the heading region extracting portion. On a displaying portion is displayed a document image on which the heading regions extracted by the heading region extracting portion are highlighted in accordance with the order set by the order setting portion. A display order of search results may be set by determining importance of the extracted heading regions based on the number of the search keyword and features of character images in the heading regions. | 01-29-2009 |
20090245640 | IMAGE DETERMINATION APPARATUS, IMAGE SEARCH APPARATUS AND A RECORDING MEDIUM ON WHICH AN IMAGE SEARCH PROGRAM IS RECORDED - A preprocessing section binarizes input image data and calculates a total black pixel ratio. A feature extracting section detects connected components included in the binary image data and detects circumscribing bounding boxes of the connected components. Predetermined connected components are removed from all of the connected components based on the sizes of the detected circumscribing bounding boxes and bounding box black pixel ratios. By using the connected components that remain after removing the unnecessary connected components, a histogram is generated by specifying the sizes of the circumscribing bounding boxes as classes and numbers of the connected components as the frequencies of occurrence. A determining section determines whether the input image data is document image data or non-document image data based on information related to the generated histogram and the total black pixel ratio. | 10-01-2009 |
20090263025 | IMAGE DETERMINATION APPARATUS, IMAGE SEARCH APPARATUS AND COMPUTER READABLE RECORDING MEDIUM STORING AN IMAGE SEARCH PROGRAM - A preprocessing section binarizes input image data and calculates a total black pixel ratio. A feature extracting section detects connected components contained in the binarized image data and detects circumscribing bounding boxes that circumscribe these connected components, respectively. Based on sizes of the circumscribing bounding boxes detected and numbers of black pixels contained therein, predetermined connected components are removed. A determining section generates an edge map by using the residual connected components, and performs two-dimensional fast Fourier transform thereon to generate spectral data. The determining section performs two-dimensional fast Fourier transform on template images to generate spectral data. The determining section determines, based on these pieces of spectral data, whether or not a circular shape is contained in the input image data. | 10-22-2009 |
20120014601 | HANDWRITING RECOGNITION METHOD AND DEVICE - A handwriting recognition method and a handwriting recognition device are provided to recognize a character sequence continuously inputted by a user for convenience. The present method comprises steps of calculating various features of the inputted character sequence which include single character recognition accuracy features and space geometry features of different stroke combinations in the inputted character sequence, calculating segmentation reliabilities of respective stroke combinations in different segmented patterns by using a probabilistic model in which coefficients of the probabilistic model are estimated by a parameter estimation method through sample trainings, recognizing characters in different writing patterns by using a multiple-template matching method when performing single character recognition of the stroke combinations, searching for the best segmentation path and conducting post-processing to optimize the recognition results. The present method and device have advantages of simple structure, low hardware requirement, fast recognition speed and high recognition accuracy and can be implemented in an embedded system. | 01-19-2012 |