OCR Software

In All, Archived, OCR, OCR Accuracy, OCR Software by ChrisLeave a Comment

Question: Is there any way to automatically compare different OCR products to determine which is more reliable?

Answer: Most OCR products have the ability to return some confidence factor for each OCRed word. Often, these confidence measures are themselves less than reliable. When comparing two database products, an automated method for comparing their relative reliability is as follows:
i. Take a dataset of varied documents including bitonal, color, low dpi, textured background, reverse video text, with skew, etc.
ii. Run the different OCR engines being tested on each document in the dataset.
iii. Using a language dictionary, check for each document and each engine how many words were correctly recognized.
iv. Determine the cumulative recognition rates of each OCR engine on the dataset.

Leave a Comment