Form Design and Scan Settings - Best Practices
There are many factors that come into play when integrating data capture technology. Because of the interpretive nature of this technology, there are also many nuances to contend with. Even so, there are some clear ways to make the integration of data capture technology more accurate. Below are some of the primary influences on data capture accuracy that all organizations should consider.
The way in which forms are created can dramatically impact the data capture accuracy when being processed and scanned. Organizations that have control over the creation of their forms are in the best control of this factor of accuracy. The best practices for printing forms are based on fixed or semi-structured types. The most control can be gained on fixed forms, and thus the greatest impact, but semi-structured typographic forms also have potential for improvement.
Fixed Form Design
Does the vendor allow organizations to try the software for a period of time in a production environment? Often, the complexity of data capture software can be such that a trial of the software without guidance or initial setup is more harmful than good. If this is the case with a particular vendor, they should offer to do a setup for the organization and provide a trial that is operator mode only, or clearly explain the skill level required and what one may expect to encounter when performing setup without training.
Make sure your form has corner stones in each corner of the page. The corner stones should be at 90 degree angles to each of their neighbors. The ideal type is black 5 mm squares.
2. Form Title
A clear title in 24 point or higher print that does not use a stylized font.
3. Completion Guide
It is optional but sometimes useful to print a guide on how to best fill in the fields of the type you use at the top of the form.
4. Mono-Spaced Fields by Data Type
For the fields to be completed, it is best to use field types that are character-bycharacter separated. Each character block should be 4 mm x 5 mm and should be separated by 2 mm or more. The best types of fields to use in order are letters separated by dotted frames, letters separated by drop-out color frames, and letters separated by complete square frames.
5. Segmented Fields by Data Type
For certain fields, it will be important to segment the field in portions to enhance ICR accuracy. The best example is date: instead of having one field for the complete date, split it into 3 separate parts, the first being a month field, the next a day field, and the last a year field. The same is done with numbers, codes, and phone numbers.
6. Separate Fields
Separate each field by 3mm or more.
7. Consistent Fields
Make sure the form uses consistent field types.
8. Form Breaks
It is okay to break the form up into sections and separate those sections with solid lines. This often helps with template matching.
9. Placement Field Names
This is for the text that indicates what a field is, such as “first name” or “last name”. It is best to put these left justified to the left of the field at a distance of 5mm or more. DO NOT put the field descriptor in dropout in the field itself.
Barcode form identifiers are useful in form identification. Use a unique ID per form page and place the barcode at the bottom of the page at least ten mm from any field.
Semi-Structured Form Design
Provide sufficient space in each field for data to be entered.
2. Limit Use of Lines
Text can often be printed on lines and this is problematic, no matter which technology or imaging tool is used.
3. Field Names
Print field labels to the left of input text. It’s best not to allow input text to be below field labels, as the field label then often interferes with OCR.
4. Effective Dropout
When using dropout, make sure the form has some black-only elements. If all referencing elements on the form drop out, the data capture software has no reference points to find even the first field. It’s best to have field names as black text that would show up in a scan
Proper scan settings are absolutely critical to obtaining the highest level of accuracy in data capture. While there are many scan settings that are based on document type, there are a few ways to ensure that all documents are scanned properly.
The optimal resolution at which to scan documents for data capture is 300 DPI. This setting is optimal for accuracy and speed of scan. Companies working with documents with small font or hand-print may consider scanning at a higher resolution, but this is rare.
2. Color Scanning
To ensure that the data capture software has the greatest possible amount of information to work with, organizations should scan in color. Often, organizations will pick a lower bit depth, considering only file size. Scanning in color will help obtain the highest accuracy and is a format that can be compressed and re-purposed.
3. Image Pre-Processing
Occasionally after a document scan, image preprocessing provides additional benefit to the accuracy of data capture. The types of image processing should only be chosen by organizations when necessary and proven to help accuracy. To do this, an image should be tested both with and without image processing. The types of image processing that are most beneficial to data capture are thresholding, despeckling, rotation, deskew, background removal, and correction of linear distortion.