Document Archiving: A Quick Guide to PDF/A Subtypes
PDF/A is a document format approved by the ISO (International Organization for Standardization) specialized for long-term archival use. While other document formats might decay with time and lose unembedded links, or might even be lost completely due to an incompatible font, PDF/A safeguards against lost data by preserving structured information and fonts, maintaining meta-descriptions, and offering indexability benefits. Since its initial release in 2005, PDF/A has evolved into numerous subtypes, each suited for specific accessibility, archiving, compliance, and indexability factors. With 8 format types circulating, do you know which PDF/A type and level is best suited for your business and your document workflows? In this guide, you’ll find a breakdown of the categories, functions, and use cases of each PDF/A type. Read further to learn how best to navigate among PDF/A formats, and to ascertain what’s needed for your business.
What Do the Letters ‘a,’ ‘b,’ and “u’ Mean in PDF/A?
You’ll find that there are letter characters—‘a,’ ‘b’, and ‘u’—attached by a dash to each PDF/A subtype. These characters designate the levels of each PDF/A and characterize the level of conformance and specific functions fulfilled by the format.
PDF/A Level b: For Basic Compliance
Level b PDF/A subtypes suit the minimal compliance level of the original PDF/A document standard, meant for long-term preservation. Where Level a is primarily concerned with meeting accessibility standards, PDF/A Level b is primarily concerned with maintaining the visual integrity and a consistently identical display of a document. As per the ISO, Level b does not necessarily meet accessibility standards: “Level b conforming files might not have sufficiently rich internal information to allow for the preservation of the document’s logical structure and content text stream in natural reading order, which is provided by Level a conformance.” The trade-off in using PDF/A Level b is forgoing the accessibility conformance assured by Level a. Use PDF/A Level b if you prioritize maintaining the visual integrity and a consistently identical display of a document over meeting accessibility standards.
PDF/A Level u: For Standardized Document Indexing with Searchable PDFs
Level u PDF/A is a relatively recent subtype, introduced with PDF/A-2. What PDF/A Level u offers is a prioritization of indexability. Where Level a assures accessibility for audiences with visual disabilities, and where Level b maintains the legibility of structured documents, Level u ensures text searchability. The ISO defines Level u conformance, which “represents Level b conformance with the additional requirement that all text in the document have Unicode equivalents, ensuring that all text can be indexed and displayed.” We can think of Level u as a direct upgrade of Level b, with the added assurance of text indexability. Use Level u PDF/A formats if your IT environment prioritizes text searchability, and the copying of text for text scanned through PDF Compressor’s v8 OCR.
PDF/A Level a: For Accessible Conformance
Level a PDF/A subtypes are adapted to ensure accessibility, to meet specific conformance standards. Features include generation of textual content and structured information that is legible and able to be processed by screen reader applications. Furthermore, Level a meets specific requirements, as delegated by the ISO: ”language specification, hierarchical document structure, tagged text spans (for text extraction and viewing by multiple devices, including hand-helds), and descriptive text for images and symbols, and character mappings to Unicode (for character searchability).” For industries and IT environments where complying with accessibility standards is required, Level a is the format to pursue.
What Do the Numbers Mean in PDF/A-1, PDF/A-2, and PDF/A-3?
In addition to the letters demarcating each level of PDF/A, you’ll find numbers preceding these letters. Each number can be considered as a chronological marker for each new lineage type of PDF/A, each equipped with its own set of new benefits.
Published originally in September 2005, PDF/A-1 is the original round of PDF/A formats approved by the ISO. There are two levels of PDF/A-1: Level a for accessible conformance, and Level b for basic compliance.
An evolution of PDF/A-1, PDF/A-2 was published in June 2011. New features include JPEG 2000 image compression, support for transparency effects and layers, embedding of OpenType fonts, provisions for digital signatures in accordance with the PDF Advanced Electronic Signature/PadES standard, and the option of embedding PDF/A files to facilitate archiving of sets of documents within a single file. This format of PDF/A should be used in lieu of previous versions of PDF/A for its provisions for image display and embedded file types, if image visualization is prioritized in your document.
The most recent format, PDF/A-3 was published in October 2012. PDF/A-3 allows for the embedding of arbitrary file formats (XML, CSV, CAD, word-processing documents, spreadsheet documents, etc.) into PDF/A conforming documents. The PDF Association has noted PDF/A-3’s use as a vehicle for multiple file types: “Effectively, the PDF becomes a zip archive that may also include an integrated cover-document.” Furthermore, PDF/A-3 has uses specifically in collating content in various formats for communications and reference purposes, such as preserving legacy email content.
General Recommended Default Output: PDF/A-2u
For general use cases, PDF/A-2u is recommended, for its provisions in image display and embedded file types, as well as the assurance of text searchability and indexability.
Recommended Output for Embedded File Formats: PDF/A-3
For the financial data entry industry, where PDF files have embedded spreadsheets or CSV file formats, PDF/A-3 functions well because of its provisions for embedding of these file formats. Likewise, for engineering or design industries, where PDF files require 2-D or 3-D images embedded within documents, PDF/A-3 functions well because of its provisions for embedding of the CAD format. The provisions for embedding multiple file types within PDF/A-3 render it ideal for these purposes.
Recommended Output for General Accessibility Conformance: PDF/A Level a
For those in the government sector, or in environments where making documents accessible and legible for document readers is a priority, all number types of PDF/A Level a should be options in your consideration set. PDF/A Level a’s conformance to stringent standards becomes especially relevant for adherence to amendment Section 508 of the Rehabilitation Act, which requires that documents account for the needs of visually impaired readers who need assistive technology to read. It is strongly recommended that Level a formats be activated with OCR in order to generate documents with textual content and structured information legible by screen reader applications. Furthermore, to meet accessibility standards, both born-digital and unstructured scanned files should be converted to PDF/A Level a with auto-tagging—a provision that will ensure your document structures will be naturally read by assistive screen reading software. With regard to the specific number type of PDF/A, consider the file construction and any additional embedded files within your document, and choose among 1, 2, and 3 based on those considerations.