OCR & JBIG2

In All, Archived, JBIG2 Compression, OCR by ChrisLeave a Comment

There is a clear correlation between OCR and the new ITU bitonal JBIG2 standard. In particular, an important aspect of JBIG2 is font learning. Whereas in the previous CCITT4 TIFF image specifications there was no notion of fonts, or font learning, it is a very important part of the JBIG2 compression specs and is one of the main reasons that JBIG2 compression rates are as high as 10:1 with respect to TIFF G4 compression.

Of course, font learning is important for OCR performance as well. When a font is “learned” it imposes constraints on all the connected components that map to that font character. One of the aspects of JBIG2 is font models, another aspect is global models, and a third is composite model. Each of these is not only useful for compression purposes, but also for effective OCR rates. Models, assuming a perfect font matcher, impose intra-page node constraints, but do not impose any constraints between nodes on different pages. Global models impose inter-page constraints on nodes linked to the same global font model. Composites impose n-gram constraints between groups of n consecutive nodes.

Most OCR engines deal with recognition a page at a time. Thus, there is no constraint satisfaction across different pages of the same document. JBIG2 compression can allow a system to see multiple inter-page constraints, all at the same time. Through the use of model-based propagation, the OCR process can be sped up considerably in this way.

If you are interested in learning more about PdfCompressor with OCR and testing our free 30-Day, click
http://www.cvisiontech.com/pdf_compressor_31.html

Leave a Comment