Converting Documents to a Standard Electronic Format
Most companies and organizations have a compelling interest in converting their incoming documents to a single electronic format. There are several reasons for this. First, it is much easier to reliably query, view, and print documents if they are all kept in the same file format. This file format needs to be as functional as possible, while preserving the integrity of the original document. With respect to optimizing and automating document workflows, it is much easier to handle a workflow process if there is one standard format that all the documents are converted into. If all files are converted to one standard, canonical format then viewing, classifying, routing, extracting data, analyzing, and commenting on these documents would all use the same software.
There are challenges in archiving “digitally born” or native electronic documents, just as there are challenges in archiving paper and captured paper documents. With respect to records management and archiving of electronic files, a naïve approach would simply retain all such documents in their native electronic format and import them into the company’s record management system. This method, although simple to implement, is risky and not consistent with best practices in document archiving and records management (as suggested by NARA and ARMA).
Electronic Document Archives Preserved in their Original Format
There are serious problems that result when keeping corporate documents in their native electronic format. These problems include:
- Viewing corporate documents, if there is no policy of converting to a standardized file format, requires supporting different viewer tools.
- A given document produced in one office, such as an MS Word file generated in a company’s Mexico City office, may not display correctly in the same company’s New York office. This can easily occur as the Windows system fonts available in one office location are not necessarily the same fonts available in a 2nd location.
- There is no guarantee that the look and feel of an electronic document is consistent when generated on one machine and subsequently displayed on another.
- With changes in margins and pagination, a page reference may no longer be correct.
- For legal document retention, it is hard to show that a given original electronic file has not been modified. Using the PDF format, this could be established with a digital signature.
These are some of the reasons that archiving a document in its native format is usually inadequate, both from a legal and regulatory standpoint.
Electronic Documents Preserved in Image Format
There are also problems in archiving to image formats, such as TIFF or JPEG. A method that has been used extensively in the past two decades for document processing and archiving is conversion to TIFF format. This electronic imaging (aka capture) of corporate and government documents was actively used in lieu of the original files, which were put into “deep storage” or discarded, and generally replaced the need to microfiche documents.
Some of the advantages of converting documents to TIFF format are that it retains the original page layout, cannot be easily modified, and can be widely viewed and printed. For this reason, TIFF is still heavily used as a document archiving format within many organizations despite some serious limitations.