Optical Character Recognition - Font and Size Issues
Optical Character Recognition or OCR is a technology designed for automatic recognition of characters and letters by a scanner. The font and size used in the source document plays a very important role in OCR. The most common problem with OCR is misinterpretation of characters or the confusion of one character for another. A classic example is the perception of `O' and `0'. The human mind can see the difference, but the software may fail to do so if it is not sophisticated enough. Thus one should always remember to use best font and size for OCR in the files being scanned.
Best Font and Size for OCR
Using the best font and size for OCR is crucial to obtaining desirable results. Earlier, OCR devices used to read input from printers that acted as data processing devices. The OCR was able to decipher only a particular font provided by the printer. Later, owing to various technological advancements, there was need for standardized and independent set of alphanumeric characters called OCR Fonts. Two fonts, namely OCR A and OCR B were designed and considered to be the best fonts for OCR scanning. Font OCR A is supported in almost all scanners available today, mainly for its size and reliability. The font is placed in an independent line so that when it is scanned, it remains away from the visual text intended for human understanding. OCR B font is an upgraded version of OCR A. The content for scanning and human understanding remains the same in both fonts. OCR A and OCR B with their sizes are hence considered the best fonts for OCR scanning.
Optimal OCR Results
Best OCR results depend on various factors, the most important being font and size used for OCR. Other noted factors are color, contrast, brightness, and density of content. Best font size for OCR is 10 to 12 points. Use of smaller font size has led to bad OCR. Font sizes greater than 72 are considered images, and thus should be avoided. Usually dark, bold letters on a light background and vice versa yield good results. The textual content should be ideally placed, with good spacing between words and sentences. If the source file contains Asian languages, it is recommended to scan with 300 dpi for accurate results.