Resource Library

The basic concept of file compression is to take a large file and make it smaller. There are many time and money saving reasons to compress files. In particular, the advantages of PDF file compression include saving storage space, saving web hosting fees, saving time by transmitting and receiving files faster, and incorporating efficient OCR for fast, easy search and retrieval of files.

What is Document Compression?

The key challenge in both document and image compression is to compress files to their minimum size without sacrificing image quality. If an image is compressed with absolutely no change to the original image bitmap then it is called lossless compression.

To achieve an order of magnitude compression 10x or even 100x smaller than the original, we need to consider other forms of compression. Most of the file size for compressed digital images and video is used to code noise and digitization artifacts, not the important symbolic information. Consequently, it is essential to identify which bytes correspond to "noise" and which correspond to "signal". When this signal/noise classification is done accurately then the resulting compressed file should appear identical to the original file. This compression method is referred to as perceptually lossless.

When relying on a compression method, such as perceptually lossless, where databits can change, it is really important to validate that the method does not degrade the original data in any way. In fact, a good compression method should enhance, not degrade, the input data. Examples of these compression forms include both MP3, for digital audio, and JBIG2, for digital scanned files. Both of these methods, if implemented correctly, can enhance the original digital data.

Of course, improper implementation of any perceptually lossless compression standard can degrade the signal, so it's important to understand the distinctions between different implementations of the same compression standard that might be commercially available. One reasonable test of whether the compression system is enhancing or degrading involves comparing the fidelity of both the original data and compressed data with a recognition system, such as OCR (optical character recognition), to validate that recognition rates for the compressed data are as high as those for the original input data.

Lossy compression is any compression system that degrades the input data such that either humans notice a perceptual difference or machine recognition systems exhibit a statistically significant difference. Such lossy compression methods are generally not recommended for corporate document storage and retrieval, or for retention of digital image records.

Use Compression to Save Time and Money

With more companies hosting and sharing documents online and working in distributed database environments, recent advances in compression technology have become very relevant. It's hard to ignore the value equation of saving 90% of a company's available storage space. Web hosting fees, as well as costs involved in archiving and storing data, are usually significantly lowered by compressing files. Having the capability to reduce the transmission time of a file by a factor of 10x makes sharing documents more feasible and efficient.

PDF - The Best Format to Compress, Web Optimize, and Search your Files

There are many reasons why conversion to compressed, Web-optimized, searchable PDF format will yield a greater return on investment than alternative file formats. PDF is more universally viewable than TIFF, JPEG or any other image format, it can be compressed automatically, and it can be made seamlessly text-searchable for immediate file retrieval.

Of course, many companies already have their bitonal files in TIFF format and their color images in JPEG. One solution is to utilize a software package that performs three functions, conversion, compression and OCR. These programs allow users to input TIFF, JPEG and many other image formats and easily convert them to compressed, web-optimized, searchable PDF files. The end result will be the smallest, most universally viewable, Web-ready files anywhere.