Thresholding within OCR

In All, Archived, OCR, OCR Accuracy, OCR Download by ChrisLeave a Comment

Thresholding is the easiest way of grouping an image into regions, aka image segmentation. In the case of thresholding, there are only two types of pixels: foreground and background. In general, the foreground pixels correspond to the text and the background pixels correspond to everything else, e.g., background texture, embedded images. Individual pixels in a grayscale image are typically marked as “object” pixels if their value is greater than some threshold value and as “background” pixels otherwise. Typically, an object pixel is given a value of “1” while a background pixel is given a value of “0.” This method employs a static threshold, namely, one value is used to threshold the entire page.

The key parameter in thresholding is obviously the choice of the threshold. Several different methods for choosing a “static” threshold exist. The simplest method would be to choose the mean or median value of the image, the rationale being that if the object pixels are brighter than the background, they should also be brighter than the average value. In a noiseless image with uniform background and object values, the mean or median will work quite well as the threshold. In many situations, however, this will not be the case.

Leave a Comment