CVISION home
 
 
 
Litigation Support Web Repositories Scanning Bureaus Wireless Telecom
 

 
   CVista Suite Overview
   CVista PdfCompressor
   CVista Viewer
   CVista API Toolkit
   CBatch
   OCR
 
  Professional Services Overview
  LeapReader Overview
  Submit Inquiry
 
   Case Studies
   Litigation Support
   Web Repositories
   Scanning Bureaus
   Wireless Telecom
 
   Resellers
   Service Bureaus
 
   Case Studies
   Clients
   Testimonials
   Information/Support Blog
   Submit a File to our Staff

 

Document Capture & OCR

Document Capture is the first step in the OCR process. This process is alternatively known as scanning. Common capture devices include scanners, digital copiers, MFPs, fax machines, and cell phones. Technically, the capture process is usually a conversion of photonic flux to electronic flux. This conversion takes place using a charge-coupled device (CCD).

The method in which a document is captured affects the subsequent usefulness of the document. Consider a faxed document. Although usually human readable, these documents are often not very machine readable. This is usually directly related to the fax capture process. Because fax machines typically communicate over phone lines, fax scanning resolutions are set to low resolutions to keep the file size transmitted as small as possible. So, for example, normal fax mode is 203x98 dpi, which means that the vertical sampling rate is less than 100 dpi. This poor scan rate might result in a smaller size CCITT file that needs to be encoded and transmitted. This fax-scanned file might also transfer faster and still be human readable on the receiving fax end. However, since this file was captured under less than ideal scanning conditions, at very low resolution, there is a high probability that machine text readability, aka OCR, recognition rates are not very high.

So there is generally this tradeoff between capture resolution and recognition rates. The higher the scanning resolution, up to say 300 dpi, the higher the OCR text recognition rates.

A similar relationship exists between color depth and OCR-based recognition rates. Namely, the greater the bits per pixel, the better the OCR recognition. Consequently, the same document scanned at 150 dpi (dots per inch) in both bitonal (black and white) and greyscale will have better recognition rates for the file captured to greyscale. If the file size is reduced by excessive JPEG quantization before OCR, this will also negatively impact on the OCR recognition rates.

There is usually some degree of skew, or page slant, during the capture process. This is true for manually fed and auto-feed devices. Many capture devices have some image processing capability that includes deskew, despeckle, and thresholding.

Click here to read next topic: Thresholding within OCR

Return to Table of Content

 
 
   
 


Copyright (c) 1998-2007 CVISION Technologies, Inc.
CVISION, CVista, CBatch, and the CVISION logo are registered
trademarks of CVISION Technologies, Inc.

 
Litigation Support Web Repositories Scanning Bureaus Wireless Telecom