|
|
|||||||||||||
|
|
|
|
|
CVista
PdfCompressor 3.1 Primer: Frequently-Asked Questions (FAQ) about PdfCompressor Usage
1. What documents can be compressed using PdfCompressor 3.1? PdfCompressor is most effective on scanned input documents that are in TIFF, JPEG, or PDF format. Compression ratios will depend on the image resolution and compression methods, if any, that are already being used on the input data. PdfCompressor can also be used effectively on PDF files that contain embedded image streams such as corporate earnings reports, company brochures, product user manuals, etc.
The default settings for PdfCompressor 3.1, both in Quick Run and Full Options modes, utilize the JBIG2 compression filter. JBIG2 is a recently-approved ITU black and white (aka bitonal) compression standard that allows for much more efficient coding of black and white scanned documents. More information on JBIG2 is available at http://www.jbig2.org and http://www.jbig.org. JBIG2 compression can be up to 10x smaller than TIFF (G4). We recommend the product default settings of JBIG2 in perceptually lossless mode. The perceptually lossless mode, which differentiates the PdfCompressor product from variations that are either lossless or lossy, obtains very compact JBIG2 image streams with no loss in image quality and no OCR degradation.
The setting usually required for effective color compression, up to 100x smaller than JPEG, is auto-segmentation, which in v3.1 is located on the Compression Options panel. This setting should already be ON by default, but this should be verified at runtime. The auto-segmentation feature separates each scanned color file into text, picture, graphics, halftones, and other document regions. These image regions are then each encoded using the appropriate compression filter. The auto-segmentation filter is set to quality 5 by default, but this image quality setting can be adjusted up or down to fit your documents. To adjust the PdfCompressor v3.1 default settings, make sure to set the Job Type (upper left box of GUI panel) to Batch Compression mode. If there is any image quality loss with auto-segmentation ON and Image Quality 5, then a higher Image Quality setting of 8 or higher should be used. If auto-segmentation using a higher image quality settings of 8 or above still results in some image degradation, then the auto-segmentation mode should be set to OFF.
If the generated PDFs contain embedded bitonal image streams then the default system settings should be satisfactory. If the generated PDF documents contain embedded color images then some adjustments may be required as these documents can be difficult for PdfCompressor to segment. The user can first try auto-segmentation both in the medium quality range (e.g., Image Quality 5) and then in the higher quality range (e.g., Image Quality 8). If in using the auto-segmentation mode there is some element of visual image degradation, then PdfCompressor should be run with auto-segmentation feature set to OFF.
For most effective compression results on scanned bitonal documents that include picture regions, the Use halftone algorithm option should be selected. This option is located in v3.1 on the Compression Options panel. This option can also useful if the image background is not white but some shade of grey or if the document contains some shaded regions (e.g., tax form).
Even though PdfCompressor is most useful on files that contain image streams, it can sometimes be very useful in compressing files that do not contain image streams. In addition, PDF files are not always as they appear. For example, it may appear that a generated PDF document such as a company invoice does not contain any images when it actually does contain embedded image streams, e.g., company logo. When in doubt, try running the file through PdfCompressor (in default mode) to see if there is any file size reduction. If the file appears to be generated but the file size seems very high, in excess of 100 KB per page, we recommend first trying PdfCompressor using the default settings (e.g., stream compression mode). If this does not reduce the file size, try PdfCompressing the document in raster mode which is shown on the "Image Processing >> PDF Input Processing" page as the Rasterize each page before compressing option. Leave the other rasterization options in their default settings.
Sometimes scanned files are saved as color or greyscale files even though the actual document being scanned is black and white. We have seen this occur quite frequently using various capture devices including digital copiers and digital senders. While PdfCompressor in auto-segmentation mode can sometimes correct for this misleading image type, at least partially, it can often result in significantly larger file size. If the input document appears to be bitonal but the file size in TIFF format seems excessively large, e.g., exceeds 100 KB per page, there is a reasonable chance that these files are mapped as color or greyscale. To correct for this, go to the Image Processing page. In the Greyscale Images column, select the Remap images to bitonal option. Similarly, in the Color Images column, select the Remap images to bitonal option.
The despeckling option, on the Image Processing page, can sometimes reduce the size of scanned black and white documents while improving image quality. This option, however, can occasionally cause image artifacts so it should be used with caution. The deskew option, also available on the Image Processing page as the option to Automatically rotate scanned pages to proper viewing angle, should improve the overall image quality of scanned bitonal files and slightly reduce file size (assuming the files have not already been skew-corrected).
OCR is the process of converting an image into corresponding text. It is a necessary step if the input files i. are scanned, ii. have not yet been OCRed, and iii. the intended application requires text searchability. With respect to file size, there is very little increase in file size (about 5%) as a result of running PdfCompressor with OCR set to ON.
Web optimization in PdfCompressor 3.1 is ON by default. It can be turned ON or OFF from the Document Features page. This feature should normally be set ON since it allows PDF files to view more efficiently over the web. It is a relatively fast operation (unlike OCR) so it does not slow down PdfCompressor's rate of processing.
Viewing documents on the Web is different than viewing them locally, e.g., on a LAN. Files viewed over the Web must generally be downloaded in order to display. Without web optimization, which typical image formats (e.g., TIFF) do not support, a file would have to download completely just to view the first page. This can make viewing a large multipage scanned file very inefficient. PDFs that are web optimized can be viewed on a per page basis, without waiting for the rest of the file to transfer. So a 300-page PDF file could have its first page displayed as soon as the page had been downloaded. Web optimized PDFs also support opening to a random page in a multipage file. This is useful for efficient display of search query results.
PdfCompressor Professional 3.1 supports watched folder mode. This allows any number of folders to be watched, and includes a scheduler to determine exact schedules for monitoring each watched folder (e.g., evening, weekends, etc.).
Although it is our
intent to make PdfCompressor easy to use, sometimes understanding the
input data and selecting the appropriate runtime settings is not simple.
If you are not able to obtain the results you expected using PdfCompressor,
send some sample input documents along with a brief explanation of the
problem to support@cvisiontech.com.
Please do not send email file attachments in excess of 10 MB per email.
If your files are larger than 10 MB, we can set you up with a password-protected
ftp account to upload your documents. Someone from CVISION support should
generally get back to you in 24-48 hours.
|
|
|
||
|
|