September 23rd, 2008 by Chris
Question: I have been testing PdfCompressor with OCR, I OCR’d a RAW TIFF files and the results were good on the super accurate (not on the balanced). My colleague claims that you obtain better results when you OCR scanned documents with lower resolutions, I never heard of this. What can you tell me?
Answer: The lower DPI statement is not correct. The ideal DPI is 300 for good OCR accuracy. Also if the input file is a PDF try running the file in raster mode to see if you get better results.
If you would like to retest PdfCompressor, I have attached the link for the free trial below:
www.cvisiontech.com/download_main.html
Category: All, OCR, OCR Accuracy, OCR Download, OCR PDF, OCR Software, OCR Verification and Confidence, OCR with Application to the Digital Mailroom, Optical Character Recognition |
No Comments »
June 2nd, 2008 by Chris
Document Capture, or scanning documents is the first step in the OCR process. Common capture devices include scanners, digital copiers, MFPs, and fax machines. Technically, the capture process is usually a conversion of photonic flux to electronic flux.
The method in which a document is captured affects the subsequent usefulness of the document. Consider a faxed document. Although usually human readable, these documents are often not very machine readable. This is usually directly related to the fax capture process. Because fax machines typically communicate over phone lines, fax scanning resolutions are set to low resolutions to keep the file size transmitted as small as possible. So, for example, normal fax mode is 203×98 dpi, which means that the vertical sampling rate is less than 100 dpi. This poor scan rate might result in a smaller size CCITT file that needs to be encoded and transmitted. This fax-scanned file might also transfer faster and still be human readable on the receiving fax end. However, since this file was captured under less than ideal scanning conditions, at very low resolution, there is a high probability that machine text readability, aka OCR, recognition rates are not very high.
Category: All, OCR, OCR PDF, OCR Software, OCR with Application to the Digital Mailroom |
No Comments »
March 11th, 2008 by Chris
Question: Do you offer a free trial for your Optical Character Recognition Software? If so, can you forward me the link?
Answer: Yes, we do offer a free trial for our optical character recognition software. It is a 30 day trial. Our OCR is available in 60+ languages. The link to download is below:
http://www.cvisiontech.com/pdfpro40_download.html
Category: All, Batch PDF OCR, Color OCR, OCR, OCR Accuracy, OCR Download, OCR PDF, OCR Software, OCR Verification and Confidence, OCR with Application to the Digital Mailroom, Optical Character Recognition |
No Comments »
September 6th, 2007 by Chris
Question: Our office is taking steps towards the paperless office. We are utilizing the office scanner for most of our company documents. A colleague of mine referred me to CVISION for scanner software. What sort of solutions do you provide?
Answer: CVISION Technologies provides solutions to optimize your scanner. We create more manageable output for scanners and MFPs. PdfCompressor, our most common application, is designed for document imaging solutions within the corporate setting; we are used by countless Fortune 500 companies, and are applicable to industries across all vertical markets.
PdfCompressor works to compress the size of scanned documents. As you may all ready realize, the output files of your scanner are rather large in storage size. These oversized files are difficult to open, email, and manage. In addition to compression, PdfCompressor also equips files with OCR, OCR is short for optical character recognition. OCR converts image documents into text searchable files. Searchable files created by OCR, are far more manageable and users are more efficient. PdfCompressor also converts the scanner output into PDF documents. PDF are readily viewable with Adobe’s free reader.
To download PdfCompressor for a free trial, click the link below:
http://www.cvisiontech.com/download_main.html
Category: All, MFDs, MFPs, MFPs MFDs Digital Copiers in your Document Workflow, OCR with Application to the Digital Mailroom |
No Comments »
July 16th, 2007 by Chris
Questions: Does the PdfCompressor have the ability to make files text searchable, even if the files are JPEG or TIFF? Also, what are the advantages of text searchable documents?
Answer: PdfCompressor compressed files & inputs OCR to make files text-searchable. If you have TIFF files or JPEG file, we can convert TIFFs and JPEGs into compressed, searchable PDFs. The OCR engine with PdfCompressor is made with corporate business needs in mind. The OCR engine is designed for large volume, business needs.
Through robust functionality, PdfCompressor provides configurations for speed, volume, and automation. CVISION automates the OCR process with Watch Folder capabilities; through Watch Folders, users can leave the process unattended as documents are processed. In Watch Folder mode, files are OCR’d by simply being dropped into a folder. To accommodate large volume scanning, the Batch OCR feature within PdfCompressor enables scanned documents to be processed fast; PdfCompressor OCR processing rates are about 1 page per second..
To try our OCR free software, click the link below:
http://www.cvisiontech.com/pdfpro31_download.html
Category: All, Color OCR, OCR, OCR Accuracy, OCR Download, OCR Software, OCR Verification and Confidence, OCR with Application to the Digital Mailroom, Optical Character Recognition |
No Comments »
April 9th, 2007 by Chris
Question: Every office these days has an MFP (Multi Function Printer) device, or two. Maybe more. The relevant IT question is : How can the office get the most use from this MFP device? Once converting office paper into electronic documents, text searchability seems like an important function. The problem is twofold: i. Many office MFP devices offer no, or very limited, OCR (optical character recognition) capability, and ii. running OCR directly from the MFP device, even if possible, will slow down the machine processing rate tremendously. Of course, slowing down the MFP increases the waiting time for anyone using the device. How then do you OCR from an MFP device without slowing down machine throughput?
Answer: The best solution to this problem is based on a separation of processes. Do not run the OCR directly from the MFP device, even if it has OCR support. The performance of an OCR system embedded on a typical MFP devices tends to be mediocre, at best. In addition, trying to run the OCR process in real-time, in sync with your MFP, will take up much of your MFP resources and hurt your processing speed.
Any heavy-duty CPU process, such as OCR, should be taken off the MFP device and performed elsewhere. A perfect OCR solution for MFPs consists of assigning to each user (that needs OCR) a passcode that, when in “scan to folder” mode, actually scans to a watched folder. That is, the MFP scans the file and drops it in a watched folder and proceeds to the next document. Meanwhile, the watched folder for this user is being “watched” by another process on a separate machine.
This other process, such as our PdfCompressor, can perform all post-scan processes to this document such as OCR, web-optimization, compression, security, and meta-data, and then deposit the document in the user’s actual ouput scanning folder. This solution keeps the MFP available and running at full capacity, while providing extremely functional PDF documents to the end-user.
Category: All, Batch PDF OCR, MFDs, MFP Devices, MFPs, MFPs MFDs Digital Copiers in your Document Workflow, OCR, OCR Software, OCR with Application to the Digital Mailroom, Optical Character Recognition |
No Comments »
March 23rd, 2007 by Chris
Question: I work in a hospital. We are planning to scan very old files into our computer. What we want to do is to get specific data from certain parts of the files, so that we can put this in our database. Is this possible?
Answer: Yes, this is possible. There is a function in the PdfCompressor called zone OCR. Once you set that setting for where you want the OCR to occur, the data is then put into a Rich Text File. Then, you can put the RTF file into your database. However, to optimize zone OCR results with very old files, you can follow these steps:
1. Verify that the existing resolution (dpi) is correct. OCR engines are calibrated based on the dpi that is typically given in the image header file. If this value is incorrect, then the OCR results will degrade.
2. Assuming the dpi has now been set correctly, up sample to a reasonably high dpi. Typically, 300 dpi is a good number. The up sampling method does matter - use bicubic spline interpolation.
3. OCR engines usually perform better on bitonal documents that are thresholded correctly than on the original color files. Of course, if the threshold is poorly chosen, the OCR engine is better off with the original color or grayscale image file. So if possible, threshold each upsampled image file manually so that the text is most readable.
You can try out the PdfCompressor’s zone OCR below:
http://www.cvisiontech.com/pdf_compressor_31.html
Category: All, OCR, OCR Accuracy, OCR Download, OCR Languages, OCR PDF, OCR Software, OCR Verification and Confidence, OCR with Application to the Digital Mailroom, Optical Character Recognition |
No Comments »
January 12th, 2007 by Chris
Question: We want to start handling all corporate mail electronically. That is, scan in each mail piece and then OCR and index each item as it comes in. From then on, deal with the mail item as you would any other electronic file in our database. Will this work? Have we missed anything fundamental? Can we rely on the OCR engine to accurately capture the text of each mail item?
Answer: There is a trend today towards a digital mailroom in many companies. This trend is similar to the paperless office in that it is not always fully realizable, but is understood to be a desirable end game, namely, to reduce the paper flow as much as possible. In this paradigm, companies will typically scan paper mail as soon as it gets to the office. Many commercial products, such as OPEX, support the operation and workflow of the digital mailroom, in both automatically opening the mail and scanning it.
One of the issues with using a digital mailroom environment to handle all corporate mail is whether to scan to bitonal (black and white) or color. With invoices and other mail-based correspondence there is often considerable color information, as opposed to the typical bitonal business document. This leaves the option of scanning to color or black and white. Color scanning can tend to make electronic file sizes very large unless color compression is applied (e.g., CVISION PdfCompressor http://www.cvisiontech.com/pdf_compressor_31.html ). On the other hand, with color scanning the relevant color information is retained. One can alternatively scan corporate mail to black and white and hope that nothing important, e.g., handwritten note from client, is threshold out in the process.
Most digitized mail is printed matter and first order (i.e., directly printed, not a scan of some original). As such, OCR recognition rates on both bitonal and color mail scanning are pretty high. Things to avoid include low resolution mail scanning (i.e., 150 dpi) and strong JPEG-based (i.e. DCT) compression prior to the OCR process. Reverse video text and background textured regions still present serious challenges to current OCR systems. It is well-worth setting up the production workflow of a digital mailroom environment and trying the production system in test mode for a while before putting the system online.
Category: All, OCR, OCR with Application to the Digital Mailroom |
No Comments »