Tweaking the System to Optimize OCR Performance
Controlling the Document Formation
Within the domain of OCR testing and evaluation is controlling the document formation process. There are times, for example, when the OCR recognition rates are poor due to document imaging conditions that can be changed. For example, sometimes the background invoice color is not white, but maybe a textured blue. The foreground is black. In the black and white space, which is how the invoice might be captured, the texture elements are partially lifted which interfere with the recognition rates. A solution here may involve modification of the document capture process to allow for color capture and thresholding in the color space so that the texture and text are properly delineated.
Another example might be that there is a form (in black) and foreground text (also in black), and there is interference between the form and foreground text. This might include text stuck to grid lines of the form. If the form is predictable, or from a set of forms known a priori, then this form can be solved for by the system and can be “removed” prior to the OCR process.