Preparing Sample Sets
The sample documents an organization uses to evaluate technology is the most important tool for gauging potential value, measuring exceptions, and ultimately picking a solution. The biggest mistake is forgoing the process of picking samples. Without a well-prepared static sample set, there is no consistency, which in the end diminishes the value of the sample set and testing period. Because this white paper deals with such a range of document types, the exact calculation of sample set quantities and variance is not specific. The following page contains a general guide to make organizations aware of the elements they should consider.
The sample set is the collection of already imaged documents on which the prospective software packages are configured. Sample sets should consist of a fixed number of each document type the organization plans to automate. For example, if it includes AP processing of invoices, purchase orders, and checks, then there are three types. An organization should have no less than ten sample production documents per type. The documents should be exactly as the data capture software will receive them at its final integration. The number of samples will be scaled based on production volume, but should not exceed fifty per type as quality analysis then becomes unbearable. Each type should contain as much or as little variation as is experienced in the production environment. If, for example, an organization processes a thousand commercial invoices a month and has a thousand separate vendors in its system, each sample invoice should be from a different vendor. But if an organization processes the same volume from only three separate vendors, then there should be several samples of each and more for the greatest two contributors in volume. Because organizations are sharing private data they should take the proper measures to protect themselves. If an organization must sanitize documents before providing them to any vendor, it should not black-out (redact) information it expects the data capture system to collect. The best option is to substitute real information with fake, as redaction could impact the technology evaluation process. The above sample set is ideal for demos, and estimating value; for a proof of concept, the sample set needs to be revamped.
Production Sample Set
Production sample sets are the samples that are run through the prospective software packages after setup has been done on the above sample set. The production set should be two times the volume of sample sets and have exactly the same variational makeup. The reason the software is tested on an independent sample set is to best approximate the production environment and to isolate any effects of setup on static documents.
Truth data is the 100% accurate, manually entered data for a given set of documents. While truth data should ideally be prepared for both the sample set and the production sample set, many organizations will evaluate accuracy at the point of proof of concept, so truth data for the production sample set may be sufficient. The purpose of the truth data is to compare the prospective products’ recognition results to already known, 100% accurate, manually entered data.
Evaluation Method and Criteria
Organizations need to agree internally on the method that will be used for testing products and how they will be measured before any actual testing is done.Recommended methods for most organizations’ needs include:
- Vendor Discovery
- View a "canned" demo of each prospective product
- Modify the prospective vendor list based on the demos
- Have vendor perform setup on the sample set
- See demo of each prospective product on the sample set
- Modify the prospective vendor list
- Begin price negotiation
- Obtain a trial from remaining vendors with tailored configuration for the sample set
- Run the production sample set through setup of the final prospective vendors’ products
At the first interaction with the vendor, organizations should remove any potential deal killers such as the pricing model or support concerns. The organization needs to focus on the benefit of the technology and understand from the vendor the amount of preparation required and the skill level required based on work that has been done on the above sample set.
View a “canned” demo of each prospective product
Canned demos are pre-configured demonstrations of the software on vendors’ picked sample documents. These demos do not require the vendor to perform any work other than presentation of the demo.
At each step, organizations should evaluate the speed of creation, speed of processing, and accuracy. Using this method in conjunction with the above facts, the organization should end up with a vendor list and associated performance score for each vendor based on the organization’s needs and expectations.