Hand-print or Handwriting, makes a big difference

Jan 14
2014

When it comes to forms processing and data capture, working with documents that have hand-print vs. handwriting is a huge difference in accuracy and validity. Sometimes the difference between these two is not so clear. So how do you tell if your form is hand-print, or handwriting, or better yet both!

ICR ( Intelligent Character Recognition ) is the algorithm used in the place of OCR for characters generated by a human hand. The algorithm is more dynamic as a persons hand-print changes slightly by the minute. It’s possible to be very accurate when processing hand-print forms when the form is designed correctly. When doing this type of forms processing you will always have quality assurance steps, but you can get close to the accuracy of any OCR process. Very often forms that were not created with data capture or automatic extraction in mind will contain handwriting. The reason for this is that hand-print is usually guided by the form itself. Forms without hand-print cannot expect to be processed at a high accuracy. So what makes hand-print, hand-print?

Mono-spaced text: What this means that each character as it’s filled out is the same distance apart as all the other characters. In handwriting very often you will have characters that connect, in the extreme form this is cursive. When characters touch or are not spread out equally you get improper segmentation and get characters clumped together as one or split in half during recognition. Mono-spaced text is usually achieved using boxes on the form guiding the user to fill within the boxes.

Uniform Height and Width: Similar to mono-spaced text the text as it is filled in should have a more or less uniform height or width. This forces the completer to not introduce as many variable elements as they would in straight handwriting and increases accuracy. This is also accomplished using boxes on the form keeping users within boundaries.

Stable Base-Line: This aspect of hand-print is the lessor thought about but very important. Text must always be on the same horizontal base-line. What happens typically in handwriting is a user varies up and down on an invisible baseline. You may have noticed sometimes when you write that the end of any line is lower then the beginning. Baselines are important for OCR and ICR to get proper character segmentation and recognition of a few key characters such as “q” and “p” the “tail” characters.

Sans-serif: The last element is keeping characters sans-serif. The reason for this is the extra tails to characters can cause confusion between certain characters like “o” vs. “q” and “c” vs. “e”. The way to achieve this is less obvious, it is by putting a guide on the top of the form that shows a good character and a bad character.

ICR is a technology for Hand-print recognition and can be very accurate when having the proper guides. Today handwriting and cursive automation is not complete and usually only successful when augmented with other technologies such as data base look-up and CAR and LAR. Sometimes the difference between the two is unclear, but the above 4 elements provide a clear definition of hand-print. The best hand-print that can be found is by the highly training creators of engineering drawings whose print is so perfect it resembles very closely typographic text.

Chris Riley – About

Find much more about document technologies at www.cvisiontech.com.

Barcodes, time savers, and wasters

Oct 27
2009

Barcodes are a great technology. You can fit a lot of information in a barcode, they can be read at any angle, and they are also very accurate. You have to degrade 30% of a barcode before it’s unreadable. In data capture, barcodes are commonly used for batch cover sheets, document separation, or printed on the document themselves. This has been proven to be a time saver both in quality and because they can be read very quickly using both software based and hardware based solutions. What organizations often don’t think about is the additional time and cost that barcodes add to the capture process.

Organizations usually don’t connect document creation and prep time with data capture time. The total time and cost associated with the capture of documents is not just from the point of scan to export. It is all the additional steps leading up to the scan to get the document in the state it needs to be fore scanning. If an organization uses barcode pages to separate documents, it’s the time it takes for an operator to generate the pages and put them manually between documents. If organizations use barcode pages as batch separation, it’s the time it takes to create the unique barcode for each batch and place it on top of the batch prior to scan. These are just the two most common examples but there are many more.  This is a common misconception because it’s not the same person doing the barcode creation and separation as the person scanning, or the barcodes are created in advanced and the time it took is forgotten.

Because organizations are not counting this into the total capture process they are missing out in the real data capture time and cost. It’s no surprise then when they are maintaining high paper cost and not reaching the ROI they expected. Barcodes are a great tool, but should be used when their benefit is greater then their time cost. Benefits can be accuracy, and process molding. Very seldom are barcodes alone responsible for substantial cost savings. Very often organizations don’t realize that they could in fact do away with barcodes by using advanced data capture. Accuracy may surfer slightly but the time savings is substantially more.

Chris Riley – About

Find much more about document technologies at www.cvisiontech.com.

Black belt in data capture processes an EOB

Sep 29
2009

Explanation of Benefit’s (EOB) next to student transcripts are without a doubt the most difficult documents to automate. The value to automate these documents however is tremendously high as they are very expensive to data enter. 3 years ag,o the fad to automating these documents was to use semi-structure data capture to locate information no matter the variation. Companies buying into this fad quickly found themselves in an expensive and deep data capture implementation. This is where I get to tout the power of simplicity and beat down the over complicators.

Just as a Sensei would practice meditation before a bout to calm the nerves so should an implementer of data capture when facing the bloody battle with EOB documents. Simplicity is key when processing EOBs. Organizations should:

1.) Consider processing first those EOBs that are clear. Clarity is a vague term and includes document structure and scanning quality. But because of the variation across EOB types, its best for an organization to focus on automating the best quality, the ones they know will provide the highest accuracy and then move onto the rest when they have succeeded.

2.) Consider classification as a primary step. If you can very accurately classify EOBs by type then you don’t need to use semi-structured technology on the EOBs. You simply need to isolate each type and use a combination of coordinate and semi-structured based field location. Because you are working with a single type, you will be way more accurate in locating the fields and reading them.

3.) Ignore document structure. Very often EOBs don’t follow their own document structure especially when it comes to tables. Often EOBs have tables within tables, or data in tables that does not align to table headings. Additionally EOBs have patients that span pages, and totals for items on previous pages. EOBs should be thought about as a collection of lines that start with a header ( easy to collect the data ) and a footer ( also easy to collect data ). Your job then is to classify lines, and extract data per-line.

4.) Extract the data then convert it. In EOB processing, there are many items contained within the EOB that have to be converted to another format prior to reconciliation. When trying to extract data, if you focus on the conversions they often muddy up the extraction process. First very accurately get the data from the paper then convert it to the desired format.

For those who are currently processing EOBs and receiving the great value that automation can provide, you truly are black-belts of data capture and have mastered the nuances of document automation. For those of you wanting to process EOBs, it’s very possible, just keep it simple.

Chris Riley – About

Find much more about document technologies at www.cvisiontech.com.

Not that you want to pay that invoice any faster

Sep 21
2009

But you can, and you can with a lower cost, and perhaps take advantage of net discounts. With Data Capture and OCR technology you can automate the entry and routing of commercial invoices. The reality for organizations that receive many invoices a day is that the accounting department is paying high salaries and taking time a way from other activities to data enter paper invoices. Using recognition technology to replace this process has been a tremendous benefit to many organizations. There are a few keys to success.

Start out simple: don’t try to tackle the entire paper world with your solution, start out simple. First identify the process and where the opportunities for saving are. Usually the biggest opportunity is going to be in the entry of data into some accounting system. To automate this you will need data capture and scanning capabilities. Starting out simple does not mean to overlook all the possibilities but to find the technology that will fit all your wildest dreams of automation but start out slow with it. More specifically with invoices, first start by scanning, then by getting vendor, invoice number, and total due using recognition technology, etc.

Wait for an ROI before you make a major change: These technologies if implemented correctly can provide a great return on investment. Sometimes organizations make the mistake of not waiting until they get an ROI before making another major change. The change likely will have positive results, but requires another round of additional effort and could be problematic. This does not allow you to see when the value of the technology starts kicking in and could have you repeating effort. Wait until you succeed at a basic implementation before you seek even more cost savings. Saving money is addicting, but let each phase actualize itself.

Never forget your business process is boss: Organizations have processes that are set in stone. Staff understands how to execute them, technology is setup to facilitate them, and other processes are feeding or fed by them. Sometimes new technology is so exciting that it forces you to change what you are doing right when you acquire it. Often organizations don’t realize the upstream and downstream impact of dramatically changing business processes. A technology should give you the option to keep doing what you are doing only faster, or to change things if you choose. At first try to keep it as consistent with the already in place AP business processes, then look for process improvement later.

No maybe you don’t want to pay that invoice faster, but you do want to reduce the cost of working with it. With Data Capture and OCR you can save a ton as long as you prepare yourself and do your homework.

Chris Riley – About

Find much more about document technologies at www.cvisiontech.com.