CVISION home
 
 
 
Litigation Support Web Repositories Scanning Bureaus Wireless Telecom
 

 
   CVista Suite Overview
   CVista PdfCompressor
   CVista Viewer
   CVista API Toolkit
   CBatch
   OCR
 
  Professional Services Overview
  LeapReader Overview
  Submit Inquiry
 
   Case Studies
   Litigation Support
   Web Repositories
   Scanning Bureaus
   Wireless Telecom
 
   Resellers
   Service Bureaus
 
   Case Studies
   Clients
   Testimonials
   Information/Support Blog
   Submit a File to our Staff

 

Thresholding within OCR

Thresholding is the simplest method of grouping an image into regions, aka image segmentation. In the case of thresholding, there are only two types of pixels: foreground and background. In general, the foreground pixels correspond to the text and the background pixels correspond to everything else, e.g., background texture, embedded images. Individual pixels in a grayscale image are typically marked as “object” pixels if their value is greater than some threshold value and as “background” pixels otherwise. Typically, an object pixel is given a value of “1” while a background pixel is given a value of “0.” This method employs a static threshold, namely, one value is used to threshold the entire page.

The key parameter in thresholding is obviously the choice of the threshold. Several different methods for choosing a "static" threshold exist. The simplest method would be to choose the mean or median value of the image, the rationale being that if the object pixels are brighter than the background, they should also be brighter than the average value. In a noiseless image with uniform background and object values, the mean or median will work quite well as the threshold. In many situations, however, this will not be the case.

A more sophisticated approach might be to create a histogram of the image pixel intensities and use the valley point as the threshold. The histogram approach assumes that there is some average value for the background and object pixels, but that the actual pixel values have some variation around these average values. However, computationally this is not as simple as we’d like, and many image histograms do not have clearly defined valley points. Ideally we’re looking for a method for choosing the threshold which is simple, does not require too much prior knowledge of the image, and works well for noisy images.

Clearly, if the image page contains both video, i.e., dark text on light background, and reverse video, i.e., light text on dark background, then a single static threshold for the page will not suffice. A more complex thresholding algorithm may first try to segment the image into different backgrounds, not assuming a uniform image background. Then, for each background region, a static threshold value is selected. Methods such as this one, that are static for some local region but not for the entire image, are sometimes referred to as semi-static.

Of course, even the above method has its limitations. So for a book page where the background intensity varies smoothly this method may not be appropriate. Undersampled text, or documents that are cell phone scanned, may need special treatment including upsampling prior to thresholding. Gradient methods, akin to edge detection used in computer vision, may sometimes be appropriate for hard to threshold images.

Click here to read next topic: How Texture patterns relate to OCR

Return to Table of Content

 
 
   
 


Copyright (c) 1998-2007 CVISION Technologies, Inc.
CVISION, CVista, CBatch, and the CVISION logo are registered
trademarks of CVISION Technologies, Inc.

 
Litigation Support Web Repositories Scanning Bureaus Wireless Telecom