One of the natural abilities that develops as you see millions of sample images and their associated recognition results, is you begin to notice patterns and instantly indentify if a document will read well for both full-page document conversion and for field level. It has more or less become a natural ability of mine, but I can identify its components.
First is initial image quality. Without yourself identifying any objects on the page, look objectively at the document as a collection of questionable objects and see if you think the image quality is good. This is determined by coherence of each object. Are object borders tight and determinable? Are there objects interfering with other objects? Is the background of the image significantly different than all objects?
Second am identification of objects. Find text, graphics, lines, paragraphs, etc. Are their borders far enough apart? Is their type clear? This is most important for text. Is their printing consistent? For example does text go from one background color to another, this would make it inconsistent. Or another example does the straightness of lines change throughout the document? And can one object be confused for another?
And third, now that you know the objects, how easy is it to determine their value. Is the value obvious? Do you have to look at it for a while to figure it out?
Essentially the three above steps are exactly what the conversion ( OCR, ICR, OMR ) product does in order to read a document. With field level recognition it’s a bit more elaborate, but the core is the same. By identifying early on what the anticipated accuracy is of a document, you can then adjust your scan, or input settings accordingly even before looking at any technology. Doing this will give the best chance for success.
Chris Riley – AboutFind much more about document technologies at www.cvisiontech.com.