OCR for Windows
Windows OCR is Optical Character Recognition software that converts non-editable textual content, to editable and machine understandable format and runs on Windows operating systems. This file can later be opened and used on any of the desktop publishing software as the user wishes. The file after OCR is made searchable too. Windows OCR can convert files, documents with standard fonts and sizes to electronic formats with high precision.
Working of Windows OCR
There are basically two methods in working of windows OCR. The first is Matrix Matching, and the second is Feature Extraction. However, matrix matching is preferred over feature extraction, and is employed in all windows OCR software programs. Matrix matching has a set of matrices, otherwise called templates. The OCR software analyses one character at a time from the file, with that present in the matrix. When one particular character or image correlates with the corresponding set of prescribed matrices of dots at level of approximation, the computer assigns the appropriate ASCII character value to it. Feature Extraction is quite different. There is no exact matching to prescribed templates. It is based on Intelligent Character Recognition (ICR) and works on `computer intelligence'. The computer looks out for features like shapes of lines, closed areas, intersection etc, analyses them and then the process of conversion is carried out.
Steps for Better Windows OCR
Generally the accuracy of windows OCR is high, almost 100%. So the main task lies in procuring high quality scanned documents. There are many things one could do, to ensure you get best possible outcomes from your scanner. One of them is to use a good quality, smudge-free paper in the scanner. The scanner glass should also be kept clean and the document neatly fixed on it horizontally. Harmonizing the color/contrast/brightness so that the background is light/white and free of "artifacts" (such as a pattern in the paper) and the text is in dark bold letters will also help. Scanning at 300 dpi is a preferred option any day. These are some steps that result in better scanning of the document, thus making it easy for the windows OCR software to work on the files.