Question: We are considering standards for an image document archive. Are there compelling reasons to consider PDF/A instead of TIFF?
Answer: There are really two parts to your question. Should we consider PDF as a document archive format ? If so, should we take advantage of PDF/A, the new version of PDF that is specifically designed for long-term archiving?
Most companies are adapting, some rapidly and some less so, to the age of digital media. Whereas historically corporate archiving methods were either paper or microfilm / microfiche, today much of the archiving is done using electronic files. The traditional methods of paper and microfilm, although somewhat out of date in the computer age, have the advantage of guaranteed reproducibility. Initially, companies started to move in the direction of the “paperless office” by converting some of their paper to electronic TIFF format files. While TIFFs were not readily searchable, except by field coding, they were electronic media that could be stored on computer and accessed on remote datasites. TIFF as a format has the advantage that it is not changing and, as a result, reproducibility is essentially guaranteed. TIFF is easily accepted within an imaging document workflow, but is not natively searchable and had no support for metadata, hyperlinks, annotations, or security.
In the last several years, there has been a shift in the document imaging community towards adopting PDF as a standard. The advantages include : i. efficient full-text search, ii. much better compression than TIFF and JPEG (bitonal up to 10x, color up to 100x), iii. metadata support (author, keywords, etc.), iv. web-optimization, v. security, and vi. portability across platforms and databases.
The problem with increased migration towards PDF as the electronic document archive format of choice is that PDF is an evolving standard which is very complex and can include mpeg videos, hyperlinks, and javascript. It becomes very difficult at some point to ensure what industry needs most – guaranteed reproducibility of the document. Efficient document indexing and transmission are important features of a digital archive, but most important is the certainty that the document can be reproduced on demand, as required, over the long term.
Thus, as PDF evolved there seemed to be a need for a version of PDF where reproducibility of the document is assured. ISO 19005-1 defines “a file format based on PDF, known as PDF/A, which provides a mechanism for representing electronic documents in a manner that preserves their visual appearance over time, independent of the tools and systems used for creating, storing or rending the files.” These specifications define a profile for electronic documents that ensure the documents can be reproduced in years to come.
An important aspect to this reproducibility is the requirement that PDF/A documents be 100 % self-contained. All the information necessary for displaying documents as the original files, identically every time, is embedded in the file. This includes, but is not limited to, all content (text, raster images and vector graphics), fonts, and color information. A PDF/A document cannot rely on information from external sources (e.g. non-embedded fonts and hyperlinks).
So if a company has decided to use PDF as its records management and/or archival format, a limitation of PDF in its native form is that it cannot guarantee long-term reproducibility. Certain restrictions have been incorporated into the PDF Standard to derive PDF/A, where long-term reproducibility can be guaranteed. PDF/A is based on an existing version of the PDF Reference, namely Adobe PDF Reference 1.4, implemented in Adobe Acrobat and Reader 5. Certain functions allowed in PDF 1.4 have been specifically excluded from PDF/A, e.g., sound, movie actions.
If a company has decided to convert to PDF, there are certainly some compelling reasons to consider the new PDF/A format.