CVISION Technologies

Document Imaging, Information, and Tech Support

Archive for February, 2007

JBIG2 Compression and Document OCR

February 26th, 2007 by Chris

Question: Are JBIG2 file compression and document OCR completely separate problems? Should these processes be done together or separately? Does JBIG2 conversion prior to OCR lower the recognition rates?

Answer: JBIG2 and OCR are related problems. A basic element of JBIG2 compression is bottom-up font learning. This font learning is used for compression but can easily be used to cross-check font mappings returned by the OCR engine. So an effective JBIG2 compression algorithm can be used to improve on OCR recognition rates, see http://www.cvisiontech.com/pdf_compressor_31.html.

For example, if we use global models in JBIG2 which is an effective compression tool, it can also be used to propagate correct OCR mappings throughout the document.

In general, these processes should not be constructed in a linked way, and each process, JBIG2 & OCR, needs to be able to run without the other. One important reason for process separability is speed: JBIG2 tends to runs at 3-5 pages/sec. while OCR can take 5 secs a page. Another reason is that to achieve good OCR rates the right language dictionary needs to be used. JBIG2 compression is language independent and should not rely on any language dependencies.

So the JBIG2 and OCR problems are certainly very-much interrelated. Having said that, there are many reasons (including speed) to solve them separately and then combine results. Certainly, there should be an integration phase where a higher level module is aware of both the JBIG2 and OCR results and is able to combine these results to acheive improved OCR (and maybe also JBIG2) results.

There are problems inherent in propagating OCR results across a document. If there are any OCR errors these can also propagate across the document with negative consequences. Obviously, it is important in any such fusion of OCR and JBIG2 results to make sure this kind of error propagation does not occur.

Reliable JBIG2 compression, done with precision, should not result in any degradation of the document. As such, JBIG2 conversion prior to OCR should not lower recognition rates. There are, however, JBIG2 compression implementations that are clearly lossy and degrading in nature. If one of these degrading JBIG2 methods is run prior to OCR, a drop in recognition rates can usually be expected.

Category: All, Document Compression, JBIG2 Compression, JBIG2 and PDF, OCR, OCR Download, OCR Software, Optical Character Recognition | No Comments »

PDF Document Conversion

February 22nd, 2007 by Chris

Question: Is it safe to convert all our database documents to one format for long-term archiving? Preferrably, we’d like to convert to PDF.

Answer: The safest way to convert documents is manually, with a human in the loop verifying each file conversion. With any file conversion, there is always a risk of losing data during the conversion process. There are typically 2 ways to do file conversion, and each has its advantages. We’ll review each method.

1. Archiving to Captured Documents: In the good old days (i.e., in the last 50 years or so) companies would either retain the original documents or image these documents onto microfiche / microfilm. This imaging was considered a reliable method for long-term document preservation since the image would “look” like the original document. These days, in converting physical documents into electronic form, there is an analogous way to “image” these documents. Essentially, a program can act like (simulate) an application-based printer driver and, instead of printing out each document page, turn each document page into an image. The resulting document, in image format, should look exactly like the original.

Converting documents into image formats, which include TIFF, JPEG, and image PDF, have the advantage of reliability - looking exactly like the original document. The drawbacks, however, include the fact that an image PDF file which was converted from an original Excel spreadsheet could be much larger in file size than the original. Of course, compression can help in this regard to bring the image document file size back to its original pre-image electronic size. See, for example, http://www.cvisiontech.com/pdf_compressor_31.html.

If one is converting corporate documents into image format for records management or archival purposes then PDF has the advantages of compression, web-optimization, and hidden-text searchability, all of which are not natively supported within TIFF format. There is an ROI (return on investment) when using PDF in the fact that the 5x-10x compression results in reduced bandwidth and storage requirements.

A disadvantage to imaging documents for long-term storage is that certain functionality the document might have had gets lost during conversion. For example, if a document was of “form” type so that certain fields could be filled out, these form actions will get lost. Also, hyperlinks will not be preserved if image conversion is used.

2. Archiving to Electronic Files: Most files these days start out in electronic form. So another way to convert documents for long term storage is to convert directly into electronic format without actually imaging the document. This direct conversion from one electronic format to another has both advantages and disadvantages.

The advantages of direct electronic conversion is that file size remains small, documents remain searchable, and functionality of the document (e.g., hyperlinks) is preserved. The disadvantages include the fact that the converted document may look different than the original. The disadvantage is a serious problem since the most important aspect in converting a document is that it appear exactly like the original. In an electronic file conversion, important aspects of the original document may get “lost in translation”.

Wherever possible, a manual or automatic (programmatic) validation of the conversion process is highly recommended (e.g., CVISION ICert).

Category: All, Convert PDF, PDF Conversion, PDF Document Conversion | No Comments »

Reduce PDF

February 20th, 2007 by Chris

Question: My business deals with large PDF files often. These PDF files decrease the performance of our computers, and our document workflow. Can reducing these PDF files help?

Answer: Our PDF conversion software converts PDF, TIFF, and JPEG files into compressed, Web-optimized and text-searchable PDF files. In converting with CVISION’s PdfCompressor, file sizes decrease by an order of magnitude. The PDF conversion and compression including greatly reduced storage requirements, which dramatically improves the performance of your computer.

CVISION’s software provides leading edge PDF compression, empowering companies to save time and money. CVISION is the first company to enable color image compression at a 100:1 ratio with no loss in quality. The outstanding PDF writing capabilities in PdfCompressor guarantee that all files produced are compatible with Adobe Reader 5.0 and higher while retaining important document metadata, such as bookmarks and hyperlinks.

For a free 30-day trial, click the link below.

http://www.cvisiontech.com/download_main.html

Category: All, CVISION PdfCompressor, Compress File, Document Compression, File Compression, JBIG2 Compression, PDF Compression, Tiff Compression | No Comments »

Using MFPs, MFDs, Digital Copiers in your Document Workflow

February 13th, 2007 by Chris

In the current office workflow, as companies move toward the paperless office, effective use of MFP, MFD, and digital copier devices is essential. For documents starting out in paper form, there needs to be a seamless way to introduce them into the Company’s electronic workflow. Any MFP, MFD, or digital copier device is capable of converting a paper document to a scanned, electronic one. What is missing, though, are seamless methods to allow for text searching the file, compression to support emailing and web-optimization to support web-hosting, as well as dropping files into a client folder or attaching and indexing into a database.

In general, these devices allow for users to copy a paper document to paper, email, or a folder. Folders are generally the easiest way to seamlessly intercept and enhance the document workflow. For example, assume a given MFP device is capable or generating PDF files from paper documents and dropping them into an output directory, based on client code. However, these captured PDF files are typically large, not OCRed, or web-optimized, or compressed. To use these files effectively, some processing off the MFP device is required.

This is where effective PDF document management software can make a real difference. Operating on a bank of hidden client directories, seamless to the user, this PDF software (e.g., CVISION PdfCompressor, run in watched folder mode) can OCR, compress, and web-optimize the MFP or digital copier output file before the client is even back at his desk. This file is still in his / her same MFP folder, but now much more amenable to email, web-hosting, or attaching to the Company database.

Every company has an ever increasing number of MFPs, MFDs, and/or digital copiers to manage. It is up to the Company IT guys to make sure these devices are utilized as effectively as possible.

Category: All, Digital Copiers in your Document Workflow, MFDs, MFPs, MFPs MFDs Digital Copiers in your Document Workflow | No Comments »

PDF Compression

February 12th, 2007 by Chris

Question:  I need info on PDF Compression

Answer: See link attached:

http://www.cvisiontech.com/document-automation/compression/batch-pdf-compression.html

Category: Document Compression | No Comments »

OCR Languages for PdfCompressor Evaluation

February 8th, 2007 by Chris

Question: Could you please let us know whether the 30 days evaluation pack has other language’s OCR capture facility? We could not able to access other languages except the ENGLISH in the OCR menu.

Answer: We have an OCR Language Pack which provides better OCR recognition for 60+ languages. You can download this by visiting the following page on our website:

http://www.cvisiontech.com/download_main.html

Click on the last link labeled “OCR Language Pack for CVista PdfCompressor 3.1.”

Once this is installed, you can choose your OCR language on the OCR Options panel, which is available when you start PdfCompressor in Full Options mode.

Category: All, OCR, OCR Download, OCR Languages, OCR Software, Optical Character Recognition | No Comments »

PDF OCR Software

February 7th, 2007 by Chris

Question: I am looking for PDF OCR software. Do you have software that will take my PDF files, OCR them, and convert them to searchable PDFs?

Answer:  Yes, PdfCompressor with OCR processes image files, including PDFs, and converts them to OCR’d PDF files. You can convert your PDF files into OCR (searchable) PDF files. If you are interested in a free download, I have pasted the information below.

www.cvisiontech.com/download_main.html

Category: Batch PDF OCR, PDF OCR, PDF OCR Tutorial, PDF Optimize | No Comments »

Download Free OCR

February 7th, 2007 by Chris

Question: Please provide the link, so I may download your free OCR trial.

Answer: We offer a free 30-day trial download of PdfCompressor with OCR. I have attached the link below.

http://www.cvisiontech.com/download_main.html

Category: All, Batch PDF OCR, Color OCR, OCR, OCR Download | No Comments »

Create Adobe PDF

February 6th, 2007 by Chris

Question: What is the advantage of compressing Adobe files to PDFs from other formats, such as TIFF or JPEG?

Answer:
With Pdf Compressor, you can compress PDF files as much as 100x smaller than equivalent compressed TIFF or JPEG images. A compressed document creates significant savings in digital storage costs, bandwidth costs, and transmission times when using a PDF compressor as effective as CVISION’s.

To try our software for free, go to http://www.cvisiontech.com/download_main.html

Category: Adobe PDF Conversion, All | No Comments »

TIFF Compression

February 5th, 2007 by Chris

Question: Is it possible to compress TIFF files?

Answer: It is certainly possible to compress TIFF files. The question is - into what format? TIFFs for black and white are normally stored in G4 format. TIFF files for bitonal scans that are not already in G4 format can certainly benefit from conversion to G4 format. Some TIFFs are originally saved using TIFF raw, run-length encoding, or G3 format. These variations of TIFF are usually not optimal for bitonal files and would benefit from conversion to TIFF G4.

Assuming a bitonal scan is already in TIFF G4 format, there is considerable advantage with respect to further reducing file size by converting to JBIG2-encoded PDF. There is often an additional file size reduction using JBIG2-based PDF that shrinks the file size up to 10x smaller than TIFF G4. Of course, this requires conversion of the database from TIFF to PDF format. In general, there are many advantages to using PDF format as opposed to TIFF. These PDF advantages include features of security, web-optimization, meta-data, and hidden text OCR. The new PDF/A variation on PDF makes it highly desirable for long-term document archiving.

There are variations on TIFF that are challenging to read (or parse). There are also variations on spelling TIFF (aka TIF). Particularly challenging is correctly reading OJPEG TIFFs. There are NO viewers we have found that are capable of reading all variations of TIFF. Thus, reading and/or displaying a TIFF file is never a sure thing. This makes PDF a more reliable, safer, long term bet than TIFF for document databases.

Category: All, Compress File, Document Compression, PDF Compression, Tiff, Tiff Compression | No Comments »