Rating an OCR System
Rating an OCR system is a little tricky. Very often, one OCR system might excel on one type of document and another OCR system might excel on a 2nd document type. The best OCR test, unless one is testing for Consumer Reports, is to test on the company's own documents. Some documents are first generation, clean scans. Other corporate documents are 3rd generation re-scans. While one can analyze a system generally, on a broad spectrum of documents, the most relevant testing is done on company-specific documents.
Features that minimally need to be rated are generally OCR accuracy and processing speed. Of course, in any such test there also needs to be a familiarity with the OCR controls. Many OCR systems allow the user to control speed vs. accuracy. The fast setting can often be 3x-9x faster than the slower, most accurate setting. So any such test, needs to be "apples to apples".
OCR accuracy can be tested programmatically, by running the OCR output through a dictionary lookup. It can also be tested manually, by having a human go and match the OCR output against the original documents. The programmatic system tends to be a lot faster, but some accuracy might be lost (e.g., valid OCR responses to items like invoice number or part number, or non-English phrases).
Click here to read next topic: Tweaking the System to Optimize Performance
Return to Table of Content





