PDF/A Compression

In Archived, Batch PDF OCR, CVISION PdfCompressor, JBIG2 Compression, PDF Compression, PDF/A by Kelvin0 Comments

PDF-A is a restriction on the classical PDF file format. Certain rules are enforced, such as no javascript or executable, no referencing of non-embedded font types, and no encryption. Other PDF/A guidelines are not enforced but rather fall under best practices. Image downsampling is addressed in an appendix of the PDF/A specification. Although the specification does not prohibit image downsampling, it recommends that, as a best practice, images should not be downsampled when converted to PDF/A.

These guidelines do not address the broader aspect of document creation versus document archiving. If the documents are created as a process that is prior and unrelated to archiving, from multiple sources of varying resolution and quality, then any down-sampling or re-quantization, as in JPEG, is not advisable since the documents have already been created.  In this case, from a strictly archiving perspective, we do not want to introduce any subsequent degradation.

However, when captured documents or documents that include images are first created this process almost universally involves either downsampling or re-quantization, or both. This applies in hospitals with MRI and CAT scans, and to government offices that scan using an MFP to JPEG, TIFF or JPEG-based PDF. If conversion into PDF/A is part of a larger document workflow that also includes document creation then it would NOT violate best PDF/A practices to use a JPEG or lossy image compression method since the document archiver in this case is also the document creator. The document creator, or the document creation workflow, always has complete latitude to determine what constitutes informational loss within this application or organization.

This method, of combining document creation and archiving into a single step, is exactly equivalent to first capturing the documents with a lossy image format as  a first process, followed by archiving the same documents into Pdf/A format as a 2nd process, where the PDF/A process directly embeds the JPEG image stream from the capture process. This 2nd method is consistent with PDF/A best practices since archiving is completely lossless, but is exactly equivalent (with respect to output PDF) to the first method that created a “lossy” PDF/A document as a single process, i.e., combining document creation and archiving.

Leave a Comment