CVista PdfCompressor™ Primer: Frequently-Asked Questions (FAQ) about PdfCompressor Usage
CVista PdfCompressor™ is designed to minimize the size of your PDF files. PdfCompressor facilitates fast Internet transmission, efficient web-hosting and reduction in storage space requirements. It also enhances the use of scanned documents as email attachments, digital copying of documents to the Web, and remote scanning. As document content can vary greatly, often the product default settings may not be optimized for your documents or application. To make PdfCompressor as effective and easy-to-use as possible, we are providing some basic easy-to-follow recommendations.
1. What documents can be compressed using PdfCompressor?
PdfCompressor is most effective on scanned input documents that are in TIFF, JPEG, or PDF format. Compression ratios will depend on the image resolution and compression methods, if any, that are already being used on the input data. PdfCompressor can also be used effectively on PDF files that contain embedded image streams such as corporate earnings reports, company brochures, product user manuals, etc.
2. What product settings are recommended for compression of black and white TIFF files?The default settings for PdfCompressor, both in Quick Run and Full Options modes, utilize the JBIG2 compression filter. JBIG2 is a recently-approved ITU black and white (aka bitonal) compression standard that allows for much more efficient coding of black and white scanned documents. More information on JBIG2 is available at http://www.jbig2.org and http://www.jbig.org. JBIG2 compression can be up to 10x smaller than TIFF (G4). We recommend the product default settings of JBIG2 in perceptually lossless mode. The perceptually lossless mode, which differentiates the PdfCompressor product from variations that are either lossless or lossy, obtains very compact JBIG2 image streams with no loss in image quality and no OCR degradation.
3. What product settings are recommended for compression of color files?The setting usually required for effective color compression, up to 100x smaller than JPEG, is auto-segmentation, which is located on the Compression Options panel. This setting should already be ON by default, but this should be verified at runtime. The auto-segmentation feature separates each scanned color file into text, picture, graphics, halftones, and other document regions. These image regions are then each encoded using the appropriate compression filter. The auto-segmentation filter is set to quality 5 by default, but this image quality setting can be adjusted up or down to fit your documents. To adjust the PdfCompressor default settings, make sure to set the Job Type (upper left box of GUI panel) to Batch Compression mode. If there is any image quality loss with auto-segmentation ON and Image Quality 5, then a higher Image Quality setting of 8 or higher should be used. If auto-segmentation using a higher image quality settings of 8 or above still results in some image degradation, then the auto-segmentation mode should be set to OFF.
4. What product settings are recommended for compression of generated documents, such as company brochures that include embedded images?If the generated PDFs contain embedded bitonal image streams then the default system settings should be satisfactory. If the generated PDF documents contain embedded color images then some adjustments may be required as these documents can be difficult for PdfCompressor to segment. The user can first try auto-segmentation both in the medium quality range (e.g., Image Quality 5) and then in the higher quality range (e.g., Image Quality 8). If in using the auto-segmentation mode there is some element of visual image degradation, then PdfCompressor should be run with auto-segmentation feature set to OFF.
5. What PdfCompressor settings are recommended for scanned bitonal files that may include pictures or halftone regions, e.g., newspapers, photos, etc., captured using an image scanner or a digital copier ?For most effective compression results on scanned bitonal documents that include picture regions, the Use halftone algorithm option should be selected. This option is located in the Compression Options panel. This option can also useful if the image background is not white but some shade of grey or if the document contains some shaded regions (e.g., tax form).
6. Is PdfCompressor useful on files that are completely generated, i.e., contain no image streams?Even though PdfCompressor is most useful on files that contain image streams, it can sometimes be very useful in compressing files that do not contain image streams. In addition, PDF files are not always as they appear. For example, it may appear that a generated PDF document such as a company invoice does not contain any images when it actually does contain embedded image streams, e.g., company logo. When in doubt, try running the file through PdfCompressor (in default mode) to see if there is any file size reduction. If the file appears to be generated but the file size seems very high, in excess of 100 KB per page, we recommend first trying PdfCompressor using the default settings (e.g., stream compression mode). If this does not reduce the file size, try PdfCompressing the document in raster mode which is shown on the "Image Processing >> PDF Input Processing" page as the Rasterize each page before compressing option. Leave the other rasterization options in their default settings.
7. What should I do if scanned documents in TIFF format, which appear to be black and white, are still very large after being processed with PdfCompressor?Sometimes scanned files are saved as color or greyscale files even though the actual document being scanned is black and white. We have seen this occur quite frequently using various capture devices including digital copiers and digital senders. While PdfCompressor in auto-segmentation mode can sometimes correct for this misleading image type, at least partially, it can often result in significantly larger file size. If the input document appears to be bitonal but the file size in TIFF format seems excessively large, e.g., exceeds 100 KB per page, there is a reasonable chance that these files are mapped as color or greyscale. To correct for this, go to the Image Processing page. In the Greyscale Images column, select the Remap images to bitonal option. Similarly, in the Color Images column, select the Remap images to bitonal option.
8. What other image processing options in PdfCompressor can reduce file size while improving image quality?The despeckling option, on the Image Processing page, can sometimes reduce the size of scanned black and white documents while improving image quality. This option, however, can occasionally cause image artifacts so it should be used with caution. The deskew option, also available on the Image Processing page as the option to Automatically rotate scanned pages to proper viewing angle, should improve the overall image quality of scanned bitonal files and slightly reduce file size (assuming the files have not already been skew-corrected).
9. When should optical character recognition (aka OCR) be turned ON and how does it affect overall file size?OCR is the process of converting an image into corresponding text. It is a necessary step if the input files i. are scanned, ii. have not yet been OCRed, and iii. the intended application requires text searchability. With respect to file size, there is very little increase in file size (about 5%) as a result of running PdfCompressor with OCR set to ON.
10. Is the web optimization feature always set ON?Web optimization in PdfCompressor is ON by default. It can be turned ON or OFF from the Document Features page. This feature should normally be set ON since it allows PDF files to view more efficiently over the web. It is a relatively fast operation (unlike OCR) so it does not slow down PdfCompressor's rate of processing.
11. Why is web optimization an important document feature?Viewing documents on the Web is different than viewing them locally, e.g., on a LAN. Files viewed over the Web must generally be downloaded in order to display. Without web optimization, which typical image formats (e.g., TIFF) do not support, a file would have to download completely just to view the first page. This can make viewing a large multipage scanned file very inefficient. PDFs that are web optimized can be viewed on a per page basis, without waiting for the rest of the file to transfer. So a 300-page PDF file could have its first page displayed as soon as the page had been downloaded. Web optimized PDFs also support opening to a random page in a multipage file. This is useful for efficient display of search query results.
12. I want to run PdfCompressor in the background without setting up a batch run each time files need to be compressed?PdfCompressor Professional supports watched folder mode. This allows any number of folders to be watched, and includes a scheduler to determine exact schedules for monitoring each watched folder (e.g., evening, weekends, etc.).
PDF/A Frequently Asked Questions
1. Why archive with PDF/A?
PDF/A is the most responsible method to ensure your electronic documents can be viewed tomorrow, or decades later. PDF/A combines the reliability and security of microfiche with the convenience of electronic documents. Without conversion to PDF/A, companies risk losing valuable information within their electronic documents, or worse have documents that will not open at all.
2. Why not archive with microfiche?
Microfiche has been a reliable option for archiving for quite some time. However, microfiche does not provide the same convenience, accessibility, and ease of use as PDF/A. PDF/A documents can be opened instantly on a computer and be emailed all around the world. Recent advances including PDF/A makes archiving with microfiche an outdated method of document preservation.
3. Is there a special viewer for PDF/A?
Similar to PDF files, PDF/A can be viewed with Adobe’s free reader. The PDF/A format is actually a valid PDF file, with some built-in -limitations.
4. Who should be archiving with PDF/A?
Any responsible record manager, or company concerned with the integrity of their documents should rely on PDF/A for their archival needs. Microfiche and paper are reliable, time-tested methods for archiving documents but both lack the convenience and accessibility of PDF/A. PDF/A provides complete security that your documents will view and open correctly in the future.
5. What is the difference between PDF and PDF/A?
The PDF/A format is actually a restricted version of PDF. In particular, the PDF/A part I specs as currently adopted are a restricted version of the PDF 1.4 specifications. These specifications will guarantee that your documents will view and print correctly now, and forever.
6. Why can I rely on PDF/A for long-term archiving?
PDF/A is recognized by leading record management organizations for electronic document archiving. Both ARMA (Association of Records Managers and Administrators), and NARA (National Archives and Records Administration) have recommend archiving documents using the PDF/A format, http://www.archives.gov/records-mgmt/pdf/strategic-directions-status-sept2004.pdf.
7. What is the difference between PDF/A, A-1a, A-1b?
PDF/A-1a ensures the preservation of a document’s logical structure and content text stream in natural reading order. The text extraction is especially important when the document must be displayed on a mobile device (for example a PDA) or other devices in accordance with Section 508 of the US Rehabilitation Act. In such cases the text must be reorganized on the limited screen size (re-flow). This feature is also known as “Tagged PDFs”.
PDF/A-1b ensures that the text (and additional content) can be correctly displayed (e.g. on a computer monitor), but does not guarantee that extracted text will be legible or comprehensible. It therefore does not guarantee compliance with Section 508.
The difference between PDF/A-1a and -1b has no impact for scanned documents, provided the files have not been enhanced by means of OCR for searching.