Document Capture & OCR
Document Capture is the first step in the OCR process. This process is alternatively known as scanning. Common capture devices include scanners, digital copiers, MFPs, fax machines, and cell phones. Technically, the capture process is usually a conversion of photonic flux to electronic flux. This conversion takes place using a charge-coupled device (CCD).
The method in which a document is captured affects the subsequent usefulness of the document. Consider a faxed document. Although usually human readable, these documents are often not very machine readable. This is usually directly related to the fax capture process. Because fax machines typically communicate over phone lines, fax scanning resolutions are set to low resolutions to keep the file size transmitted as small as possible. So, for example, normal fax mode is 203x98 dpi, which means that the vertical sampling rate is less than 100 dpi. This poor scan rate might result in a smaller size CCITT file that needs to be encoded and transmitted. This fax-scanned file might also transfer faster and still be human readable on the receiving fax end. However, since this file was captured under less than ideal scanning conditions, at very low resolution, there is a high probability that machine text readability, aka OCR, recognition rates are not very high.
So there is generally this tradeoff between capture resolution and recognition rates. The higher the scanning resolution, up to say 300 dpi, the higher the OCR text recognition rates.
A similar relationship exists between color depth and OCR-based recognition rates. Namely, the greater the bits per pixel, the better the OCR recognition. Consequently, the same document scanned at 150 dpi (dots per inch) in both bitonal (black and white) and greyscale will have better recognition rates for the file captured to greyscale. If the file size is reduced by excessive JPEG quantization before OCR, this will also negatively impact on the OCR recognition rates.
There is usually some degree of skew, or page slant, during the capture process. This is true for manually fed and auto-feed devices. Many capture devices have some image processing capability that includes deskew, despeckle, and thresholding.
Click here to read next topic: Thresholding within OCR
Return to Table of Content





