Color OCR

In All, Archived, Color OCR, OCR by ChrisLeave a Comment

Question: Can I OCR scanned color documents reliably, particularly newspaper and magazine scans?

Answer: Although OCR rates are getting better with each new OCR product release, there is a considerable disparity in recognition rates between bitonal (black & white) documents and color scanned files. It is quite common, even using the latest OmniPage or Abbyy release, that entire blocks of light text, text on textured regions, and reverse video text goes unrecognized. Many OCR methods still rely on basic thresholding of color files, and are mostly calibrated to achieve good performance results on bitonal image scans.

At the heart of good color OCR, is an OCR engine or preprocessing step that very accurately finds and lifts (or segments) all the text regions. The right image preprocessing of color files before OCR is a step in the right direction (e.g., CVISION turbo OCR).

Another important issue in achieving high recognition rates on color scanned documents is not over-compressing these files prior to OCR. Often, companies will use a low quality JPEG setting when capturing color documents directly off the scanner. By the time these files get to the OCR engine, sometime later, they are already considerably degraded and the OCR recognition rates are appreciably lower. It is better to OCR first, using the highest quality scanned files available, and then compress the files.

Leave a Comment