What is Scanned PDF Image
Often we optically scan PDF to create a digitized exact copy of the original paper document. How this benefits one is that after the digitized copy is created, you can easily copy and share, and also save it in multiple locations for safe storage. When you scan a file and create such a digitized copy, you actually create an image. Such an image will be sharp and clear if you begin with a clean, un-creased and clear original source. If the resultant image document obtained after a PDF scan contains words, these words will be unrecognizable for the computer as words - to the computer they will just appear as pixels and the computer will also displays them as pixels.
Why OCR a Scanned PDF
The scanned PDF image of a PDF document that contained text is not a very usable object. This is because, unlike a text document, a scanned image is not searchable. A word-processed document contains characters that a computer can recognize as character and, therefore, searching such a document is possible. A scanned PDF image is just digitized pixels that cannot be searched. One way to make such documents useful and searchable is to OCR (Optical Character Reader) them. The OCR process is a very sophisticated process in which the computer analyzes every group of pixels to ascertain the character it represents. Clean pixels make this task easier and more accurate. So, a clean, un-smudged and sharp scanned document will provide a better OCR PDF file than a scanned PDF that has a dark, smudged background and unclear bleeding characters. After this is done, the file is ready to be searched. It is important to check the OCR PDF file for accuracy and correctness of the content in the searchable format and fix any problems that might be found.
How to IdenTIFy an OCR PDF File
To look at, there is no difference between a PDF file and an OCR PDF file. They both look like images. So, the best way to tell them apart is to perform a keyword search on them. The OCR PDF file will let you search by keywords, while a non-OCR PDF file, which is just an image, will not allow such a search.