OCR Speed vs. Accuracy

In All, Archived, Color OCR, OCR, OCR Accuracy, OCR Download, OCR Software, OCR Verification and Confidence, Optical Character Recognition, PDF OCR by Chris0 Comments

There is a general tradeoff between OCR speed and accuracy. The accuracy is generally a function of the OCR engine speed. It is usually possible to obtain greater OCR accuracy by running the engine longer, or by deploying more OCR engines. The problem, however, becomes one of diminishing returns.

Let’s say the OCR engine has to run twice as long to diminish the remaining error rate by half. Say, for example, there’s an error rate of 2% when the OCR engine is run in normal mode, which might have a processing speed of 3 seconds per page. There might also be an accurate mode, with an error rate of 1% and a processing speed of 6 seconds per page. Let’s also assume there is a super-accurate mode with an error rate of 0.5 % and a processing speed of 12 seconds per page. This might correspond in some systems to the number of OCR engines that are run.

In any event, even assuming that there are as many as 10 OCR engines, or super-accurate modes, for most companies there is a point of diminishing returns, that is, a point at which it is not worth slowing down the OCR processing rate any more, even if the tradeoff is greater accuracy.

So what often needs to be established, sometimes empirically through trial and error, is what degree of accuracy is required for a given Company? How accurate is accurate enough? What processing rate is acceptable within a given workflow?

Sometimes greater OCR accuracy can be achieved without increasing processing time. This often involves either some form learning or optimizing for a given application domain.

Leave a Comment