CVISION Technologies

Document Imaging, Information, and Tech Support

Archive for the 'All' Category

Software For Batch OCR

June 17th, 2010 by Jerry

Question: Hi, I work for a huge finance corporation and deal with accounting. I have a bunch of files from the last decade and it is taking too long going through dates to find the right one. Do you know of any software for batch OCR I could use?

Answer: Yes, I believe that in your case, it is necessary to utilize software for batch OCR. In your profession, having the software would save a lot of time. I recommend using the CVISION OCR program. It can process up to 20 pages per second and is above 99% accurate. You can download it if you are still unsure about purchasing it.

www.cvisiontech.com/download_main.html

Category: All, Free OCR, OCR, OCR Accuracy, OCR Download, OCR Software, batch file conversion | No Comments »

Decompress PDF files

September 24th, 2008 by Chris

Question: I am trying to use the decompressor on existing PDF files.  I receive a message indicating: “Cannot open file …” on any PDF file not created by PdfCompressor.  Does the decompressor only work on PdfCompressor created files, or is it looking for other characteristics in the PDF File? I am using PdfCompressor Version 4.0.135 – Evaluation Copy.

Answer: PdfCompressor only decompresses files that have been previously compressed by PdfCompressor. However, we do have a hidden flag that you can use “-convertall” that you can add to the command line in the decompressor that will allow you to decompress all files, even the ones that you didn’t create using PdfCompressor.

Category: All | No Comments »

OCR accuracy and DPI

September 23rd, 2008 by Chris

Question: I have been testing PdfCompressor with OCR, I OCR’d a RAW TIFF files and the results were good on the super accurate (not on the balanced). My colleague claims that you obtain better results when you OCR scanned documents with lower resolutions, I never heard of this. What can you tell me?

Answer: The lower DPI statement is not correct. The ideal DPI is 300 for good OCR accuracy. Also if the input file is a PDF try running the file in raster mode to see if you get better results.

If you would like to retest PdfCompressor, I have attached the link for the free trial below:

www.cvisiontech.com/download_main.html

Category: All, OCR, OCR Accuracy, OCR Download, OCR PDF, OCR Software, OCR Verification and Confidence, OCR with Application to the Digital Mailroom, Optical Character Recognition | No Comments »

PDF Compression, OCR Command Line

September 22nd, 2008 by Chris

Question: My company has been testing PdfCompressor. If we have it on a server and we invoke it through a command line, will it be able to operate on multiple files simultaneously?
 
Will it give error messages if there are problems with the PDF file (via command line interface)?
 
Answer: As long as you are running PdfCompressor Pro you will be able to use the command line which is supported in all the versions that we have released. The command line supports all the features of PdfCompressor just like the GUI would as well.

If you would like to extend the trial of the software, I have included the link below to download the product:

www.cvisiontech.com/download_main.html

Category: API, Adobe PDF Conversion, All, CVISION PdfCompressor, License | No Comments »

Batch PDF Compression in Multi-threading mode

September 11th, 2008 by Chris

Question: Does the API support batch PDF compression? I have a folder of PDFs, I would like to process them and produce a single PDF for each one processed? I know I can loop through the folder and process one at a time but I’m not taking advantage of multi-threading. Can this be done and still be able to take advantage of the compressors multi-threading capability?

Answer: The multithreading is controlled at the command line. Then the appropriate flag is passed multithreading will kick in. The flag for multi-threading is “-mt”. Please add that flag to the command line and that should fix the problem.

Category: All, CVISION PdfCompressor, Compress File, Convert PDF, Create PDF, OCR, PDF Compression, compress TIFF | No Comments »

PdfCompressor Professional vs. Desktop

September 10th, 2008 by Chris

Question: What are the main differences between PdfCompressor Professional, and PdfCompressor Desktop?

Answer: The following features are only available with PdfCompressor Professional

• Professional Edition supports Batch Compression
• The Professional Edition includes “watched folder” capabilities
• The Professional Version enables users to compress files that are greater than 100 pages in length
• The Professional Edition processes directories of files with a single click.
• The Professional Edition includes a Command Line Interface
• The Professional Edition supports use through an API
• Technical Support is available with the Professional Edition

If you are interested in testing either version, click here:

http://www.cvisiontech.com/index.php?option=com_docman&task=cat_view&gid=45&&Itemid=206

Category: All | No Comments »

Battery License & Watch Folder

August 26th, 2008 by Chris

Question: We had to install a new hard drive in our machine that hosts PdfCompressor. When we reinstalled the software I don’t think it gave us the correct number of pages. Our license indicates we should be able to compress 75,000 pages and I’m sure we’re not even close to that, but the battery is down to approximately 15%. Do I need a new key?

Also, I’m having trouble saving settings in the watched folder. I changed the settings (unchecked auto-segmentation and changed both color and grayscale compression to JPEG 2000 — High quality), saved them as the default and then restarted the service. Even though it looked like the settings saved the image looked worse. I used the same settings in the batch compression to check the quality and it came out fine with the same settings in the batch compression. Can you help me?

Answer: To verify the battery power, please run the “CVista Monitor” from the Tools drop down menu. The CVista Monitor will tell you how many pages you received for the month and how many you used up for the month as well.

To apply the setting to existing watch folder you must change the settings to the preferred settings and Hit the “Rub Job” button to apply the changes to your folders.

Category: All, License | No Comments »

Convert PDF

June 6th, 2008 by Chris

Question: From an IT perspective, what are the pros and cons of converting all our documents of record into PDF format?

Answer: The process of maintaining files over time, otherwise know as archiving, is complex. There are many factors that argue towards having a uniform format for long term document storage, and yet also some factors that would mitigate against it. Nevertheless, converting to PDF is likely a worthwhile initiative.

Among the reasons to convert all documents of record to one format is that its easier on the IT group to maintain. There is only one format to maintain with respect to both the viewer and the operating system, in the long term. It is easy to convert both electronic and image formats into PDF since the PDF specs have direct support for both of these format types. It is easy to support meta-data, web-optimization, header/footers, and security (view/print).

So these are among the advantages towards standardization and conversion of all database files to a PDF format. Reasons not to standardize/convert include the risk in any conversion of modifying the original document. There is some risk inherent in any document conversion of modifying the original source document. Although the risk is very small per document, taken over millions of documents the risk is non-negligible. Of course, it is also very hard to modify the source documents once it is no longer in its native format, but this is probably a plus with respect to long term archiving of documents of record that should no longer be changing.

PDF/A is a recently introduced version of Adobe PDF that is specifically designed for long-term archiving. Javascript, non-embedded fonts, and encryption are all disallowed within the PDF/A specifications. Widespread adoption of PDF/A within industry appears likely.

If you have any further questions concerning converting to PDF, please email support@cvisiontech.com

If you would like to download a free trial of our software, click the link below:

http://www.cvisiontech.com/download_main.html

Category: All, Convert PDF, PDF Conversion | No Comments »

OCR & JBIG2

June 5th, 2008 by Chris

There is a clear correlation between OCR and the new ITU bitonal JBIG2 standard. In particular, an important aspect of JBIG2 is font learning. Whereas in the previous CCITT4 TIFF image specifications there was no notion of fonts, or font learning, it is a very important part of the JBIG2 compression specs and is one of the main reasons that JBIG2 compression rates are as high as 10:1 with respect to TIFF G4 compression.

Of course, font learning is important for OCR performance as well. When a font is “learned” it imposes constraints on all the connected components that map to that font character. One of the aspects of JBIG2 is font models, another aspect is global models, and a third is composite model. Each of these is not only useful for compression purposes, but also for effective OCR rates. Models, assuming a perfect font matcher, impose intra-page node constraints, but do not impose any constraints between nodes on different pages. Global models impose inter-page constraints on nodes linked to the same global font model. Composites impose n-gram constraints between groups of n consecutive nodes.

Most OCR engines deal with recognition a page at a time. Thus, there is no constraint satisfaction across different pages of the same document. JBIG2 compression can allow a system to see multiple inter-page constraints, all at the same time. Through the use of model-based propagation, the OCR process can be sped up considerably in this way.

If you are interested in learning more about PdfCompressor with OCR and testing our free 30-Day, click
http://www.cvisiontech.com/pdf_compressor_31.html

Category: All, JBIG2 Compression, OCR | No Comments »

OCR Verification and Confidence

June 4th, 2008 by Chris

Question: Can I OCR my files and guarantee that each document has been OCRed correctly with a given confidence, e.g., 99.5% ?

Answer: Yes, sort of. OCR verificarion is really a semi-automated process. What can be expected from the OCR system is to i. correctly determine the reliability we have in each OCR ASCII assignment, and ii. flagging for human intervention all words in the document below the pre-assigned confidence level.

Getting an accurate OCR confidence measure is non-trivial. Most OCR packages return a confidence assignment to each word, but that measure is often unreliable. So it is important to run your files on a system with a somewhat reliable confidence measure. These measures often consider attributes that include “is this word returned by the OCR engine in the language dictionary?” and “does this word have a reasonable intra-document frequency?”. There are many other indicators that can be useful in obtaining an accurate confidence measure for each word.

At some point, any such OCR verification system needs to be semi-automated, with a human in the loop. Say that a document requires a recognition rate of 99.5%, then this recognition rate is with respect to human recognition, not machine recognition. For example, if there was a paragraph in the document that was completely unreadable to any human, e.g., a third generation scan, using some very small fonts in the text, then the words in this paragraph should not be counted us unrecognized since this text is beyond readability, and in an information theoretic sense this information is already lost, no fault of the OCR system. On the other hand, if a paragraph is small font and very difficult to read but still clearly human readable, but the OCR engine does not pick it up then these words must be counted as unrecognized.

To guarantee a certain minmum OCR accuracy, then, all document pages below the minimum OCR recognition threshold must be shown to a human to determine if these words are human readable. If so, then the human can manually correct any words with incorrect text assignments. The task of any OCR verification system is to semi-automate the recognition process such that, with minimal human intervention, a certain minimal OCR confidence level can be established for a collection of documents.

Category: All, OCR, OCR Accuracy, OCR Verification and Confidence | No Comments »