PDF Document Conversion

In All, Archived, Convert PDF, PDF Conversion, PDF Document Conversion by ChrisLeave a Comment

Question: Is it safe to convert all our database documents to one format for long-term archiving? Preferrably, we’d like to convert to PDF.

Answer: The safest way to convert documents is manually, with a human in the loop verifying each file conversion. With any file conversion, there is always a risk of losing data during the conversion process. There are typically 2 ways to do file conversion, and each has its advantages. We’ll review each method.

1. Archiving to Captured Documents: In the good old days (i.e., in the last 50 years or so) companies would either retain the original documents or image these documents onto microfiche / microfilm. This imaging was considered a reliable method for long-term document preservation since the image would “look” like the original document. These days, in converting physical documents into electronic form, there is an analogous way to “image” these documents. Essentially, a program can act like (simulate) an application-based printer driver and, instead of printing out each document page, turn each document page into an image. The resulting document, in image format, should look exactly like the original.

Converting documents into image formats, which include TIFF, JPEG, and image PDF, have the advantage of reliability – looking exactly like the original document. The drawbacks, however, include the fact that an image PDF file which was converted from an original Excel spreadsheet could be much larger in file size than the original. Of course, compression can help in this regard to bring the image document file size back to its original pre-image electronic size. See, for example, http://www.cvisiontech.com/pdf_compressor_31.html.

If one is converting corporate documents into image format for records management or archival purposes then PDF has the advantages of compression, web-optimization, and hidden-text searchability, all of which are not natively supported within TIFF format. There is an ROI (return on investment) when using PDF in the fact that the 5x-10x compression results in reduced bandwidth and storage requirements.

A disadvantage to imaging documents for long-term storage is that certain functionality the document might have had gets lost during conversion. For example, if a document was of “form” type so that certain fields could be filled out, these form actions will get lost. Also, hyperlinks will not be preserved if image conversion is used.

2. Archiving to Electronic Files: Most files these days start out in electronic form. So another way to convert documents for long term storage is to convert directly into electronic format without actually imaging the document. This direct conversion from one electronic format to another has both advantages and disadvantages.

The advantages of direct electronic conversion is that file size remains small, documents remain searchable, and functionality of the document (e.g., hyperlinks) is preserved. The disadvantages include the fact that the converted document may look different than the original. The disadvantage is a serious problem since the most important aspect in converting a document is that it appear exactly like the original. In an electronic file conversion, important aspects of the original document may get “lost in translation”.

Wherever possible, a manual or automatic (programmatic) validation of the conversion process is highly recommended (e.g., CVISION ICert).

Leave a Comment