Hybrid Records: Optimizing Hybrid Document Capture

In PDF Conversion, PDF Document Conversion, PDF Search by AdminLeave a Comment

What is a Hybrid Record?

A hybrid document or hybrid record is a file whose data is stored in multiple file formats within the same document. An example of a hybrid document is a contract in an electronically generated PDF format that contains a scanned image of the signature page, or an image-PDF with a CSV file embedded within the document. Hybrid records function essentially as container files; they carry data in myriad formats through one document, and act as vehicles for information. A useful way of framing hybrid documents is defining them as a specific form of a metafile, a file format capable of storing multiple types of data.

Hybrid PDFs and ISO Specifications

A commonly-used metafile format is PDF/A, which meets industry standards in universal archiving and document accessibility. PDF/A has nuanced file embedding compatibilities that should be addressed if you are considering its usage as a hybrid document. While PDF/A-2 brought specifications that allowed for embedding of only valid PDF/A files, the addition of PDF/A-3 permits embedding of any file type. Supported file types include CAD, CSV, XML, and image files. This ISO specification allows specifically the form of PDF/A-3 to serve as a hybrid document.

Industries Adopting Hybrid Records

The hybrid record as a file format is commonly deployed across many industries. For instance, XML files are often attached within PDF files for the sake of electronic invoicing and billing; CSV files are embedded within financial documents; and CAD drawings are placed within PDFs for the manufacturing and engineering industries. A salient specific usage of hybrid documents can be found in the health industry: many healthcare institutions are pivoting to rely on “hybrid health records,” or HHRs. As collations of electronic health records (EHR) and paper charts from scanned document images, the HHR is easing the transformation to paperless, digitally-dominant workflows in the medical industry. Hybrid health records help to alleviate the costs of physical paper storage, to conveniently assemble image-based data within one document, and to aid in the transition to paperless document management.

Optimizing Document Capture Processes for Hybrid Records

The main boon of creating, using, and distributing hybrid documents is expediency. As a compound file format, multiple streams of data can be contained within one document file. However, merging file types does come with a trade-off: it can be difficult to process image and electronic components together without support from the right capture software, which can be designed to only handle image documents such as scanned paper. Being unable to process the electronic components with the image portions of a hybrid record slows down document processes, adds manual exception handling expenses to your bottom line, and disrupts workflows. An ideal document capture software should rely upon built-in support for an array of file types to contain hybrid document processes within a singular workflow. PDF Compressor, for instance, comes equipped with automatic processing support for many input file types, like XFA PDF forms and hybrid PDFs. Expanded input file type compatibility reduces time spent manually processing document types that otherwise cannot be ingested by many capture solutions, and enables faster business workflows and less missed revenue from document-based transactions.

Data extraction and analytics, archiving, and forms processing of hybrid documents are only as accurate as the data inputted into them. A conversion software will often rasterize all images regardless of its source. In the process, these solutions will flatten text to image and re-OCR already indexable text from digital-born documents—wasting time unnecessarily, decreasing performance, and exposing the hybrid record to inaccuracy from re-recognition errors. PDF Compressor is designed to uniquely discern pre-existing text layers from born-digital documents, and automatically bypass the OCR phase for these portions of text. Unattended auto detection of electronically-produced text relieves your company of manual effort, special coding, or expensive professional services for sorting and laboriously separating born-digital files from image documents.

Especially useful for hybrid documents with born-digital text content, PDF Compressor mitigates any risks of inaccurate OCR and allows these documents to be properly utilized as data-rich assets. PDF Compressor offers intuitive document capture for hybrid documents, and helps organizations like yours achieve efficient and precise document processes.

How can your organization achieve greater efficiency in your hybrid document processes? Start your free trial of our PDF Compressor today!


Leave a Comment