CVISION home
 
 
 
Litigation Support Web Repositories Scanning Bureaus Wireless Telecom
 

 
   CVista Suite Overview
   CVista PdfCompressor
   CVista Viewer
   CVista API Toolkit
   CBatch
   OCR
 
  Professional Services Overview
  LeapReader Overview
  Submit Inquiry
 
   Case Studies
   Litigation Support
   Web Repositories
   Scanning Bureaus
   Wireless Telecom
 
   Resellers
   Service Bureaus
 
   Case Studies
   Clients
   Testimonials
   Information/Support Blog
   Submit a File to our Staff

 

OCR to create Searchable PDF's

In the new "paperless" office, there is no more paper. Any file, contract, meeting notes, etc. you need are found instantly with Google-style searching. To make all this happen, paper needs to be made searchable. Pure OCR is usually not the answer for a variety of reasons. Among them pure paper to electronic conversion is not a reliable process. For most corporate documents, before searchability is a factor there must be a guarantee of accuracy in the conversion process. Any process that can modify the look and feel of the document is not acceptable.

Paper to electronic format conversion is not reliable for purposes of record management. The OCR engine attempts to understand the document logically, and then reconstruct it electronically. Any error on the part of the OCR engine in interpreting the document will result in perceptual errors in the electronic reconstruction. Instead, what has become heavily used in the document management industry is that of searchable image documents. Within searchable image documents, the most popular format is that of PDF image + hidden text. In this format, what you "see" is just the imaged document. What you can search on, however, is the fully OCRed version of the file. Since the OCR layer is hidden, mistakes and substitution errors at the OCR level are not visible on the document.

Searchable PDF is an ideal format for several reasons. One of the main reasons for searchable PDF is database portability. When porting between databases, it is very good practice for documents to be self-contained. This essentially means that OCRed text and any document-related metadata should be available within the document. With PDF this is easily realizable. With other formats, such as TIFF, it is virtually impossible. Of course, other information including headers, footers, Bates Stamping, and security can easily be added to a PDF file.

Click here to read next topic: OCR & Logical Decomposition

Return to Table of Content

 
 
   
 


Copyright (c) 1998-2007 CVISION Technologies, Inc.
CVISION, CVista, CBatch, and the CVISION logo are registered
trademarks of CVISION Technologies, Inc.

 
Litigation Support Web Repositories Scanning Bureaus Wireless Telecom