Introduction to OCR
Optical Character Recognition, also known as OCR, is the process of converting scanned paper documents into searchable, electronic files. In many office applications, such as invoice processing, there are clear time and cost efficiencies in converting paper documents into electronic documents. For example, entering document information into a database, also known as field coding, is often a very expensive proposition. Using electronically converted files, instead of the original paper, the process of field coding can be sent offshore or even automated.
The field of OCR even predates the advent of computers, with original OCR-related patents dating back as far as 1929. Of course, much of the significant advances in OCR are directly related to the computer age, and usually, the more advanced the OCR system is, the more computer resources (e.g., faster CPU) are required to run the OCR system. OCR, and its related discipline ICR (Intelligent Character Recognition), are changing the way industry handles its documents. ICR is defined as the computer translation of manually entered text characters into machine-readable characters.
In many applications, including legal, accounting, banking, digital libraries, insurance, remote backups, and records management, OCR is automating the way that businesses process files. Accurate OCR directly lends itself to data extraction which reduces the costs associated with form processing.
In this OCR primer (OCR tutorial), we review the basic concepts of Optical Character Recognition. We also look at the technical aspects that need to be solved to obtain accurate OCR results. Some of the complex recognition problems that can arise are considered, along with some suggestions on obtaining improved, more accurate OCR results.
Click here to read next topic: Document Capture & OCR
Return to Table of Content
OCR





