Document Capture & OCR

In All, Archived, OCR, OCR PDF, OCR Software, OCR with Application to the Digital Mailroom by ChrisLeave a Comment

Document Capture, or scanning documents is the first step in the OCR process. Common capture devices include scanners, digital copiers, MFPs, and fax machines. Technically, the capture process is usually a conversion of photonic flux to electronic flux.

The method in which a document is captured affects the subsequent usefulness of the document. Consider a faxed document. Although usually human readable, these documents are often not very machine readable. This is usually directly related to the fax capture process. Because fax machines typically communicate over phone lines, fax scanning resolutions are set to low resolutions to keep the file size transmitted as small as possible. So, for example, normal fax mode is 203×98 dpi, which means that the vertical sampling rate is less than 100 dpi. This poor scan rate might result in a smaller size CCITT file that needs to be encoded and transmitted. This fax-scanned file might also transfer faster and still be human readable on the receiving fax end. However, since this file was captured under less than ideal scanning conditions, at very low resolution, there is a high probability that machine text readability, aka OCR, recognition rates are not very high.

Leave a Comment