Processing of forms and documents digitally has come of age, and OCR software boast of a variety of features besides the high accuracy of 99%. Although each OCR software has generic applicability, some software have features designed specifically for processing certain documents. One document type requiring special software features for efficient processing is invoice forms. Most of the companies, businesses, and organizations deal with multiple financial transactions, and keeping a record of each invoice can get burdensome, especially when the cost for processing invoices becomes substantial. There are various software that can efficiently process invoices and store them in a database or equivalent format. These software use the ubiquitous technology of OCR or Optical Character Recognition.
OCR as a Technology
OCR is a technology that can recognize machine printed characters from a paper document, and digitize it in a searchable text format. This enhances the utility of a paper-less office while providing easy handling of bulk invoices. OCR can recognize most machine printed font types, sizes, and color with an accuracy of up to 99%. This is achieved using pre-processing of the invoice image obtained via scanning, applying advanced digital character recognition algorithms for character recognition, and using post processing techniques for improving the accuracy. Pre-processing involves rotation of the image to correct the document angle, de-skewing the text for correct alignment, and de-speckling the image for higher readability. Post-processing usually consists of a spell-check of the extracted text.
The Use of OCR for Invoice Processing
Invoice fields usually need to be matched with the database fields. This requires knowing where each component of the field is in the invoices of multiple companies. For example, the database has to be updated with the correct invoice using the name of the company that sent the invoice. For this, the software needs to know the format of the invoice and where name of the company is printed. This is achieved using two techniques. Zoning technique is used to extract text from individual `zones' within the document and match them with the database. Once a format has been established, it is stored within the software template for efficiency in the future. This is the second technique. Invoices can hence be processed at lower costs.