Data Type, Dictionary, Database Lookup = First Verification

Oct 28

After viewing the power of data capture technology, I’ve yet to see an organization un-impressed, until the conversation explores quality assurance steps. Though the technology is extremely powerful, there will always be some level of quality checking to get a 100% accurate results. Think of it this way;if you were to spill coffee on a perfectly printed document, scan it soon after ( rollers making a nice smudge ) you likely would be unable to read the text yourself, so how can the software? In this scenario, QA would be required for the smudged fields. This seems obvious but illustrates the fact. I have good news however, if you provide the right tools you can use a computer to do the first pass of verification.

It’s just like a human verifying a document but much faster and less expensive. Organizations that deploy these methods can eliminate a large percentage of verification, but the caveat is they must first know their documents. After data capture has happened, if you combine first data types with a dictionary or database lookup, you have created an electronic verifier.

A data type tells the software what structure a field should be in. A data type can be used to confirm a field result OR can modify uncertain results based on the knowledge contained within. For example take a date field. After data capture, the field is recognized as 1O/13/8I. We see there are two errors an “O” instead of a “0” and a “I” instead of a “1”. If you were to deploy a date data type that says simply you will always have numbers 1-12 followed by a “/” followed by numbers 1-31 followed by a “/” followed by two numbers. Then the date would automatically be converted to 10/13/81 which is correct. Some data types are universal such as date and time, others are specific to a document type and the organization if they know ALL of them stands to benefit greatly.

Dictionaries and database lookup functions are essentially the same with a slight variation. The purpose of these two approaches is to validate what was extracted via data capture against pre-existing acceptable results. The simplest example to consider is existing customer names. If you are processing a form distributed to existing customers that contains first name and last name because you already know they exist, you should be able to look in a database for the customer and confirm the results. If no match is found then likely there is a problem with the form. Dictionaries can provide the same value but are more static and often used for fields such as product type, or rate type that have one set of possibilities that rarely change. The point is that organizations should look at the database or dictionary assets they already have to augment the data capture process and make it more accurate.

There will always be quality assurance steps with any technology that involves interpretation of data. Organizations wanting to deny these steps either do not understand the technology, do not understand their own processes, or were mislead by a vendor. Quality assurance is the place where much effort should be spent to streamline, and one of the ways to do that is by leveraging data types, dictionaries, and databases that already exist.

Chris Riley – About

Find much more about document technologies at

Where do the images go?

Feb 16

Document imaging and scanning are facilitated in large parts by various software applications. Often some of the greatest appeal, for those not too familiar with document imaging, is the functionality contained within the software that is bundled with a document scanner. Many of the vendors, while they are selling document scanners, put all the focus on their applications that are married to the scanner and how they handle the images.

Recently at MacWorld 2010, this was proven to be true from the various scanner vendors who had more to say about their personal content management applications than their actual scanners. What surprised me is how little end-users were concerned about where and how the images are stored.

Knowing how your personal content management application stores images is critical for your future retention and use of those images. To give you an example, if you are now scanning to an application that converts images to a proprietary format and saves them in an SQL Express database you don’t have access to, migrating from this application will be as difficult as re-scanning each and every piece of paper. What if you no longer have the originals?

Many of the sexy software applications out there make it very difficult to get to your data files directly, for use in other applications or for purpose of migration. I would expect this to be a common question asked by vendors but it was not. Only once did I see a vendor explain how you can still get to the files that are contained in their application. Indeed you could, following some non-obvious steps. And once you found all the image files they were bizarrely named, not the name assigned within the software. It is good to know they are there and accessible, but what a tremendous amount of work to get there.

You own the information so make sure you know where the images go, how they are stored, and how you can get to them if at all. If a particular solution is locked down or requires some hacking, it’s not a personal content management system for you.

Chris Riley – About

Find much more about document technologies at