Batch PDF OCR

In All, Archived, Batch PDF OCR by ChrisLeave a Comment

Question: Can I batch convert my files to PDF with OCR (make them text searchable)?

Answer: Sure, you can definitely batch convert your PDFs with OCR. But it’s helpful to understand what is meant by batch and the limitations of various products with respect to batch processing. When someone wants to batch process something, it’s obvious that they do not want to click on each individual file. However, what may be less obvious is that many products that claim to support batch functionality still require a lot of manual interaction.

For example, certain products claim batch functionality because the user can individually select many files at one time to process. This kind of “batch” function ignores the fact that each file must still be individually selected. There are other batch products that let the user select directories, but do not allow for automatically traversing the subdirectories. Many products that claim to be batch will not actually run to completion on say 100,000 files, as the system utilizes increasing resources as the run progresses and eventually crashes or stops running altogether.

How much control does the batch control program actually give you? Can you skip previously processed files? Can you run a batch sequence from the command line, or via SDK? Is there support for watched folder? These are all different methods of setting up a batch run and should be supported in a batch PDF product.

Leave a Comment