Path to simple yet robust document routing

Dec 30
2015

When it comes to the input path that documents follow, for many it’s as simple as scan, convert, save, but others require more complex work-flows. The good news is there are tools out there to perform even the most advanced work-flows you could imagine. The bad news, they are expensive. I’m here to tell you about a way of combining your scanner and data capture, OCR, and document conversion software to make more complex work-flows without the premium.

By using settings that come with most document scanners and the ability of most data capture, OCR, and document conversion products to utilize hot-folders ( watch folders ) you can create robust multi-step work-flows out of the box. What you need is a scanner that supports multiple destinations usually 9 or more. This is indicated by an LED on your document scanner which at the point of a batch scan allows you to pick a destination number. Second you will need all the software required to perform the conversions needed for final result. In our example we will want to be able to OCR, data capture, compress and archive.

Basically the task is to create a funnel for your documents and the end result is saved where you want final destination to be. If your scanner supports what is called duel-stream then you can be working with two funnels simultaneously making your work-flow all the more robust. The first part of the funnel is identifying the document type. Each of the 9 destinations on your scanner should be configured for one document type ( you may want it to be one destination per business process instead ). The configuration would include the scan settings, 300 DPI of course, and what folder the document will go in. This is just the staging folder for the next step. Lets assume that we setup destination 1 for invoices and our scanner supports duel-stream. We want the invoices when it’s all said and done to have one copy to saved in a search-able directory, where the file is both compressed and in PDF/A format. Then we want another copy of the same invoice to be data captured and put in a working directory for someone to review. Lets put it all together.

Destination one on the scanner is configured for invoices. The first copy of any invoice will be saved to a hot-folder that the PDF conversion utility is watching, the second copy will be scanned into a hot-folder that the data capture product is watching. Because these are hot folders, both copies are picked up instantly and processed by each application. Our requirement for the second copy was only to be data captured and exported to a working directory, so we have now completed it’s task. For the first copy we have more conversions to do. The PDF conversion utility saves the OCRed search-able PDF to a hot-folder for the compression utility, the compression utility compresses the PDF and saves it to a hot-folder for the archive utility, and FINALLY the archive utility saves the result in our final destination for all invoices. Below is a basic diagram of the work-flow we created for invoices ( destination 1 )

Scan >PDF Creation >Compression >Archive >Final Result
> Data Capture >Final Result

Although it may have been slightly difficult to read, hopefully it’s clear that above is just one work-flow getting the most out of the tools offered by both the document scanner and conversion software packages. Now you can proceed to program each other destination with different document types and their associated work-flows. Programmers and tech savvy individuals will be able to easily envision ways to add scripts to make the process even more robust with email notifications etc. This approach is not a replacement for advanced work-flows but a middle ground between no work-flow and very pricey work-flows.

Chris Riley – About

Find much more about document technologies at www.cvisiontech.com.

Workflow, super-charge with OCR

May 26
2015

Document workflow can be as easy as saving a file to a single location to as complex as decision tree document routing rules. Throw some paper into the mix and the problem intensifies slightly. Getting your paper documents to fit your already accepted digital document workflow can be challenging. Some organizations choose to keep the paper and digital workflows separate. Others unite them but create separate rules for each. For most however, it would be ideal to have a single workflow engine or product supporting both the digital, image, and paper documents.

To do so with the greatest value, you need not only document conversion using Optical Character Recognition ( OCR ), but some other advanced imaging and recognition tools. In the digital document world, you don’t have only the data contained in the document, you have various other meta data items such as file name, file location ( taxonomy ), tags, etc. In order to marry paper with digital the same has to be duplicated on the paper document and has to occur at time of document processing. This could be a manual process or automated, and depending on your paper volume doing it in manual may be OK. To compete with the efficiency of digital documents however, automatic is the way to go.

Using OCR, image-based and contextual-based classification, paper or image documents that enter the workflow can obtain the same value as digital documents. The OCR is responsible for getting all the content from the document. The purpose of this content is for search, indexing, auto-filing, as well as generation of keywords ( tags ) associated with a taxonomy. In order to determine where the document fits into a taxonomy, you must first classify it.

For classification to be most effective, it happens on two levels. Image-based classification, which is what the document looks like, classifies documents based on their physical structure which is a good indicator of its type and very fast. Contextual classification, which is what words are contained in the document, is one level deeper in classification and looks for the keywords that would make a document one type over another.  For some environments, image-based classification can do the job entirely.  Once classification is known, a classification engine can place the document in the correct spot in an existing taxonomy. Once an ID or classification is determined, it is no challenge to apply tags, file-naming, and file location to a document.

Workflow can stand alone, but injected with the power of OCR and document classification, it becomes a power house that does not know the difference between paper and digital.

Chris Riley – About

Find much more about document technologies at www.cvisiontech.com.