The Dangers of PM&S: Proceed with Caution
Among the worst mistakes a JBIG2 encoder can make is a font substitution error, commonly known as a mismatch.
Like all powerful tools, it is essential that PM&S be used correctly. Among the worst mistakes a JBIG2 encoder can make is a font substitution error, commonly known as a mismatch. If an encoder mistakenly includes a character in the wrong font, it will replace that character with the mistaken font in the compressed file. This creates a typo that will be seen in the compressed document. This misspelled word will confuse those who read the document and will cause an OCR engine that processes the compressed file to generate the wrong textual information. The only way to recover the lost information would be to recover it from the original document.
The ability to use PM&S presents many JBIG2 vendors with a dilemma. In order to stay competitive and get the best compression rates, they need to map as many characters as possible to the same font. A single mismatch, though, can potentially make the document worthless. Since the JBIG2 specs have nothing to say on which characters can be safely matched together and which can’t, each JBIG2 vendor must develop their own proprietary algorithms. These algorithms involve sophisticated computer vision techniques. It is therefore not uncommon to find mismatches produced by many JBIG2 implementations, especially from the more recent entrants into the field.
These mismatches can severely degrade image quality. Here is a sample from a typical image file. The top half of the figure below shows what the original above looked like after lossy compression by a typical JBIG2 vendor. By way of contrast, the same document when compressed by a second vendor (CVISION PdfCompressor), seen on the bottom half of the figure below, is accurate.