OCR with Application to the Digital Mailroom

In All, Archived, OCR, OCR with Application to the Digital Mailroom by ChrisLeave a Comment

Question: We want to start handling all corporate mail electronically. That is, scan in each mail piece and then OCR and index each item as it comes in. From then on, deal with the mail item as you would any other electronic file in our database. Will this work? Have we missed anything fundamental? Can we rely on the OCR engine to accurately capture the text of each mail item?

Answer: There is a trend today towards a digital mailroom in many companies. This trend is similar to the paperless office in that it is not always fully realizable, but is understood to be a desirable end game, namely, to reduce the paper flow as much as possible. In this paradigm, companies will typically scan paper mail as soon as it gets to the office. Many commercial products, such as OPEX, support the operation and workflow of the digital mailroom, in both automatically opening the mail and scanning it.

One of the issues with using a digital mailroom environment to handle all corporate mail is whether to scan to bitonal (black and white) or color. With invoices and other mail-based correspondence there is often considerable color information, as opposed to the typical bitonal business document. This leaves the option of scanning to color or black and white. Color scanning can tend to make electronic file sizes very large unless color compression is applied (e.g., CVISION PdfCompressor http://www.cvisiontech.com/pdf_compressor_31.html ). On the other hand, with color scanning the relevant color information is retained. One can alternatively scan corporate mail to black and white and hope that nothing important, e.g., handwritten note from client, is threshold out in the process.

Most digitized mail is printed matter and first order (i.e., directly printed, not a scan of some original). As such, OCR recognition rates on both bitonal and color mail scanning are pretty high. Things to avoid include low resolution mail scanning (i.e., 150 dpi) and strong JPEG-based (i.e. DCT) compression prior to the OCR process. Reverse video text and background textured regions still present serious challenges to current OCR systems. It is well-worth setting up the production workflow of a digital mailroom environment and trying the production system in test mode for a while before putting the system online.

Leave a Comment