How Texture patterns relate to OCR
Many methods in OCR and image processing make assumptions about the image background. Often, a constant background is assumed. A texture can be defined as a tessellated, approximately repeating pattern in an image. This texture might be real, e.g., wood paneling, or synthetic, e.g., the screening pattern caused by a color printer to represent a constant color background. Sometimes, it is beneficial to descreen the image before thresholding or further image processing. Understanding textured regions can be very complex, but is sometimes necessary for proper separation of foreground and background.
Solving for texture patterns is very helpful in segmentation and MRC based coding. Effective compression of scanned documents, and reliable OCR output, require accurate background foreground discrimination. When lifting the foreground in segmentation or MRC coding, the original background pattern, or some facsimile thereof, must be reconstituted. There are several ways to do this, which include building up a Markov model of the original texture and using this statistical model to regenerate the background regions that need to be covered. Alternatively, one can find a tessellation element and replace the lifted text region with a "pure" background region.
Click here to read next topic: Small Fonts & OCR
Return to Table of Content





