OCR & Logical Decomposition
Many OCR users want basic search from an OCR engine. This means that they want to find the needle in the haystack. They need to search a database and find all files that contain a certain expression. This type of OCR does not depend on a logical decomposition of the document image. It is sufficient to get back all the text associated with each page of a document image and feed the OCR text to a full-text search database engine. The database will then index on the full-text and allow general text-based database queries, e.g., proximity search.
There are times, however, when a logical decomposition of the document is required. This happens when part of the document is to be used in composing another document. In this case, the document needs to be logically understood, including word readability order, tables, and graphs, so that an excerpt can be utilized as part of another document. Certain OCR processes need this logical decomposition, and looking at OCR word accuracy is not sufficient in evaluating OCR systems for these applications.
Click here to read next topic: Electronic File Conversion & OCR
Return to Table of Content





