Zip, Compress, Tar, Rar ….. confuse me?

May 05

Hard drives continually get cheaper and cheaper, but the rate that people collect information is still filling up hard drive real-estate faster than we can get more storage. One trick to saving space is various compression technologies. Most people when they think about compression now think of either a Zip file or that little check-box on Windows settings to enable compression on a physical drive. What is often overlooked is the ability to compress a single file utilizing a file compression tool for a specific format.

Choosing what is the right way to compress files and save space is based on several things, how often will you access that file again, what ratio of compression are you getting, and what are the long term impacts of the compression. When you use the technologies Zip, Tar, and RAR you are usually combining multiple files together, and don’t have plans to access them soon. These compression tools take multiple files and combine them into a single zipped file. This means that access to any one individual file in that zipped file will take additional time and effort to open. With this approach you can combine many various formats. Some formats will have a compression ratio of 0% and others a compression ratio of 60%. Rarely but occasional when a zip is not successful you can result in file corruption. I always suggest checking that you can un-zip a zip after it’s created. People who need to access their files regularly, or need to be able to search on their content at any given time can still benefit with compression tools that are specific to a format and can be done one file at a time in batch.

The most common file format that people use for search and retrieval and is generated by Data Capture and OCR is PDF. PDFs get good compression usually in a Zip, Tar, or Rar tool but there are specific things that can be done just for a PDF to compress it even further. PDFs often have a text layer that is search-able, and an image layer for viewing. The bulk of the file size is always the image layer, so a specific image compression can be applied to just this layer, and a separate text compression to the text layer. The result is a PDF that opens just like any other file, but is taking up much less space. The benefit of this is that you can access your PDF at any time, it’s still indexed with your search utility, and you are saving space!

Compression is almost always a good choice when considering saving space. Compression technologies have come a long way in the last 4 years. It’s good to know what your purpose is in compression and the frequency you want access to your files. Don’t be afraid to scout out compression tools for specific file formats and give them a try.

