Rich Media OCR

Oct 15

I often speak of unique uses of OCR, and here is yet another. OCRing video files! But why? Part of the management of rich media assets is indexing these files. Technologies such as speech recognition and optical character recognition give a greater index and search value to rich media.

By using OCR technology to find and extract text from video frames, the data can be stored as meta-data. In the simplest scenario, this is a text file that accompanies the video file. More complex environments will even tell you the minuet and second the text occurs. Because this is not a traditional use of the technology, some special consideration must take place.

First is converting and separating frames to individual images files. For the OCR to be effective it needs to work on a series of images. Although a video is only a sequence of images that repeat at a high rate of speed, it’s still somewhat of a challenge to convert video files such as MPEG to a series of images. Not only that, dealing with motion blurs that might occur in some frames will also be a problem.

The second challenge is dealing with frames that are repeats. Essentially, because there are so many similar images that are only slightly different from each other, the text on a series of frames might not change. Better OCR results will account for this and not repeat text as the frames would.

And finally dealing with the variations of fonts, and often small sizes. This requires an OCR engine with specific settings for specialized OCR, and one that is very accurate on complex low quality documents.

I expect that in the future, this technique in conjunction with speech recognition will be used in eDiscovery, content management, and robust search of rich media files.

Chris Riley – About

Find much more about document technologies at

Whatever happened to OCR-A and OCR-B

Sep 28

In the early days of OCR soon after Kurzweil invented it, the desired approach to increase accuracy was to institute a printing standard. That standard included two fonts OCR-A and OCR-B fonts that the first OCR engines were specially trained for. Today use of these fonts sometimes actually reduces OCR accuracy with modern engines. It’s a fact that if you just run a modern engine on a document with OCR-A text that it will initially be less accurate unless you tell the software that it is OCR-A at which point it will be extremely accurate.

Some of the education around OCR processing still discusses these fonts as a living standard. In the area of OCR of numbers only the fonts are beneficial as it demonstrates a significant difference between numbers that look like characters “1”, “0”, etc. This font, if you extract the numbers only portion of OCR-A is called “Index”. But for the most part the fonts provide no additional benefit in everyday OCR processing. So what happened?

Three major things happened that prevented this standard from taking off:

  1. The adoption of OCR technology was very low at the time and used in special cases so there was not a large enough user base to embrace it.

  2. It’s really hard to tell users how to create their documents, especially because the people doing the OCR often are not the creators of the original document and do not have the power to determine printing font. All documents printed in these fonts are very boring and document a generator like style.

  3. The OCR engines in-spite of the standard improved to work very well on the vast majority of all fonts minus cursive and stylized special fonts. Because of this, it quickly became clear that any typographic text could be converted.

As a little bit of OCR history these fonts are interesting to explore the rapid growth in the technologies accuracy. There are a few specialized engines out there that utilize only the OCR-A and OCR-B fonts especially when dealing with very fast camera OCR of part numbers on product assembly lines, but for the most part the standard is not required and not widely used.

Chris Riley – About

Find much more about document technologies at