May 21st, 2008 by Chris
Question: Does bitonal compression of scanned documents to JBIG2 format makes sense on its own or should such conversion be done as part of a general conversion to PDF format?
Answer: JBIG2 is a new ITU-approved, international standard for compression of scanned black and white files. The effectiveness of JBIG2 compression versus the previous ITU TIFF G4 standard is very much dependent on the JBIG2 compression software used. The quality of the scanned document is also a funcion of the JBIG2 software used since the decompression specs for JBIG2 are open but the individual JBIG2 compression algorithms used are proprietary.
Using the right JBIG2 compression software can results in compression rates where the JBIG2 files are 5x-10x smalller than TIFF G4 and G4 PDF, with No Loss of image quality.
Although JBIG2 is an ITU approved format, it is still very new to the industry. The assumption that a typical client or system user has a pre-installed JBIG2 viewer is probably false. The advantage of using JBIG2-compressed PDF as the document format is several fold. First, PDF fully supports JBIG2 so that the compression advantages of JBIG2 can be fully utilized within the PDF specs. Second, PDF Reader 5.0 and up can handle JBIG2-compressed files, so that your user base most likely has a JBIG2 PDF Reader pre-installed on their computer. Third, adding OCR searchability to your JBIG2-compressed file is very easy within the PDF specs using a hidden text layer. And finally, for multipage files that need to be web-hosted and viewed remotely, JBIG2 files that are made to fit the PDF specs (i.e., given a PDF wrapper) can take full advantage of the web-optimization feature supported by PDF and Adobe Reader, which means that large multipage files will open and display quickly on the Web.
So, in short, there are serious advantages in converting scanned documents into JBIG2 format. But having decided to convert a database to JBIG2, there are additional features available and more file control when the files are converted to JBIG2-compressed PDF format.
Category: All, JBIG2 Compression, JBIG2 and PDF |
No Comments »
August 6th, 2007 by Chris
Question: What are the advantages of JBIG2-encoded PDF documents?
Answer: JBIG2 represents a revolutionary breakthrough in captured document technology. Using JBIG2 encoding, a scanned image can be compressed up to 10x smaller than with TIFF G4. This facilitates creating, for the very first time, scanned image documents whose file size is the same as OCR-converted text files. It allows scanned manuals, books, checks images and other document types to be viewed and manipulated efficiently over the Internet, and affords digital copiers/printers efficient network transmission of digitally copied documents.
The power behind JBIG2 technology is its ability to support both lossless and perceptually lossless black and white image compression. Only CVISION Technologies’ CVista smart compression software has the ability to encode perceptually lossless images with absolutely no perceptual degradation. CVISION’s JBIG2 encoding supports compact multipage TIFF and PDF file conversion.
The advantage of JBIG2 encoding of your company’s documents is that this ITU-approved standard, unlike TIFF G4 and TIFF-based PDF, is font-based and allows for efficient encoding of a fully-searchable text layer. So not only are JBIG2 documents much more compact than TIFF and TIFF-based PDF, they’re also much more functional - able to support image-based full-text search with virtually no file size increase. The JBIG2 standard is the result of several years of collaboration among image compression experts from IBM, Xerox, AT&T, Siemens, and others.
To download a free trial version of the software, go to “PDF JBIG2″
Category: All, JBIG2 and PDF |
No Comments »
May 27th, 2007 by Chris
Question: How can I shrink my TIFF files? Is it possible to convert the compressed TIFF files into a word document?
Answer: You can compress the size of TIFFs, however, they will be converted to compressed PDFs. The PdfCompressor reduces PDF file size of scanned color PDFs up to 1/100th of the size, and minimizes scanned black & white PDFs up to 1/10th of the size. The PDF file reduction is completed while maintaining file quality. Once the compressed PDFs are complete, we can not convert the files into a word document. However, we can convert them into a format called rich text. Rich Text Format allows you the same capabilities of the word documents, such as editing, deleting, and adding.
For a free 30-day trial download, click below.
http://www.cvisiontech.com/download_main.html
Category: All, File Compression, JBIG2 and PDF, OCR, PDF OCR, PDF Optimize, Shrink PDF, Tiff, Tiff Compression |
No Comments »
February 26th, 2007 by Chris
Question: Are JBIG2 file compression and document OCR completely separate problems? Should these processes be done together or separately? Does JBIG2 conversion prior to OCR lower the recognition rates?
Answer: JBIG2 and OCR are related problems. A basic element of JBIG2 compression is bottom-up font learning. This font learning is used for compression but can easily be used to cross-check font mappings returned by the OCR engine. So an effective JBIG2 compression algorithm can be used to improve on OCR recognition rates, see http://www.cvisiontech.com/pdf_compressor_31.html.
For example, if we use global models in JBIG2 which is an effective compression tool, it can also be used to propagate correct OCR mappings throughout the document.
In general, these processes should not be constructed in a linked way, and each process, JBIG2 & OCR, needs to be able to run without the other. One important reason for process separability is speed: JBIG2 tends to runs at 3-5 pages/sec. while OCR can take 5 secs a page. Another reason is that to achieve good OCR rates the right language dictionary needs to be used. JBIG2 compression is language independent and should not rely on any language dependencies.
So the JBIG2 and OCR problems are certainly very-much interrelated. Having said that, there are many reasons (including speed) to solve them separately and then combine results. Certainly, there should be an integration phase where a higher level module is aware of both the JBIG2 and OCR results and is able to combine these results to acheive improved OCR (and maybe also JBIG2) results.
There are problems inherent in propagating OCR results across a document. If there are any OCR errors these can also propagate across the document with negative consequences. Obviously, it is important in any such fusion of OCR and JBIG2 results to make sure this kind of error propagation does not occur.
Reliable JBIG2 compression, done with precision, should not result in any degradation of the document. As such, JBIG2 conversion prior to OCR should not lower recognition rates. There are, however, JBIG2 compression implementations that are clearly lossy and degrading in nature. If one of these degrading JBIG2 methods is run prior to OCR, a drop in recognition rates can usually be expected.
Category: All, Document Compression, JBIG2 Compression, JBIG2 and PDF, OCR, OCR Download, OCR Software, Optical Character Recognition |
No Comments »