Black belt in data capture processes an EOB

Sep 14

Explanation of Benefit’s (EOB) next to student transcripts are without a doubt the most difficult documents to automate. The value to automate these documents however is tremendously high as they are very expensive to data enter. 3 years ag,o the fad to automating these documents was to use semi-structure data capture to locate information no matter the variation. Companies buying into this fad quickly found themselves in an expensive and deep data capture implementation. This is where I get to tout the power of simplicity and beat down the over complicators.

Just as a Sensei would practice meditation before a bout to calm the nerves so should an implementer of data capture when facing the bloody battle with EOB documents. Simplicity is key when processing EOBs. Organizations should:

1.) Consider processing first those EOBs that are clear. Clarity is a vague term and includes document structure and scanning quality. But because of the variation across EOB types, its best for an organization to focus on automating the best quality, the ones they know will provide the highest accuracy and then move onto the rest when they have succeeded.

2.) Consider classification as a primary step. If you can very accurately classify EOBs by type then you don’t need to use semi-structured technology on the EOBs. You simply need to isolate each type and use a combination of coordinate and semi-structured based field location. Because you are working with a single type, you will be way more accurate in locating the fields and reading them.

3.) Ignore document structure. Very often EOBs don’t follow their own document structure especially when it comes to tables. Often EOBs have tables within tables, or data in tables that does not align to table headings. Additionally EOBs have patients that span pages, and totals for items on previous pages. EOBs should be thought about as a collection of lines that start with a header ( easy to collect the data ) and a footer ( also easy to collect data ). Your job then is to classify lines, and extract data per-line.

4.) Extract the data then convert it. In EOB processing, there are many items contained within the EOB that have to be converted to another format prior to reconciliation. When trying to extract data, if you focus on the conversions they often muddy up the extraction process. First very accurately get the data from the paper then convert it to the desired format.

For those who are currently processing EOBs and receiving the great value that automation can provide, you truly are black-belts of data capture and have mastered the nuances of document automation. For those of you wanting to process EOBs, it’s very possible, just keep it simple.

Chris Riley – About

Find much more about document technologies at

Turning off the latest technology

Mar 03

Our culture is built on the fact that the newer and more means better. In the advanced technologies that exist, this for the most part is true, but people are always surprised when I tell them that disabling some of the newer technology will actually produce a better result. I am going to give you three examples of where technology demands time travel to older approaches for higher accuracy.

In data capture and OCR, there is a component of the technology called document analysis. Document analysis prior to any collection of data tells the structure of a page including columns, rows, tables, pictures, paragraphs, lines, etc. It’s the biggest contributor to modern day OCR accuracy. Document analysis is really designed for documents that are more traditional such as an article, a book page, or a letter. Document analysis ( although there have been special ones ) does not excel at form type documents. One of the most difficult documents in the world is an Explanation of Benefits EOB. This document has its own structure per variant typically. Surprisingly, the best way to process such a document is to turn off document analysis and default to a basic full-page read of the text. The reason for this is that document analysis provides an overwhelming bias for tables that no EOB will match.

It is the same case when reading text from photographs. When reading text from license-plates and product-plates ( serial number plates welded or stuck to many products ) during assembly it is best done with engines that do not have document analysis. In this case, the document analysis is trying too hard to find information. Because of the nature of these images, what ends up happening is characters in the photo are split into multiple lines and characters. Without document analysis, the engine sees the whole image as one text block and just reads it, thus creating better results. Looking at the license-plate readers that snap pictures of your license plate at toll booths, they are all using older antiquated OCR technology. By turning off document analysis they can use the newer engines.

Finally, there is mobility. This one makes a lot of people uncomfortable. Our society wants to believe their cell phone can do anything. Just today I was wondering why my cell phone did not brush my teeth for me. You can have your cell phone do OCR sure, but it requires older smaller and limited OCR engines to do so. I prefer to send an image to a server and use more advance OCR, but many demand OCR on the phone though in practice it’s usually slower. The reason for this is OCR requires specific processing power, and specific types of processing. Chips in phones today, and likely for a very long time to come will not compete with the power of a computer nor will they, and most importantly, include the proper math operators it takes for efficient and math heavy modern OCR. Cell phones cannot adopt proper chips because we demand long lasting batteries, small size, and low cost. Intense math is simply not important for 99.9% of mobile applications.

There you have it. Modern OCR taken down a few notches to solve current day problems. The best engines that exist today allow you to turn on and off all the various functionality you need thus making it possible to purchase the latest OCR technology and limiting it however you need. Most organizations don’t understand why anyone would want to turn off the new but today I’ve proven new is not always better!

Chris Riley – About

Find much more about document technologies at