CVISION Technologies

Document Imaging, Information, and Tech Support

Archive for the 'PDF/A' Category

Microfiche to PDF

July 20th, 2009 by Marsha

Question:
I have a lot of documents stored in archives on microfiche, but as everything gets digitized, I really can’t work in that format anymore. Is there an easy way to convert everything to PDF? What’s the best way to store everything electronically?

Answer:
This is a really common question that comes up a lot lately, as many organizations make the switch from hard documents to electronic copies. Digital archives have several advantages: they are easier to access and they save storage space. To convert microfiche to PDF, you need to use a scanner to get it in digital format. However, I would recommend that you also take an additional step. If you are thinking about long-term storage, you might want to convert the PDF file to PDF/A, which is a special type of PDF used for archives. PDF/A protects you from information loss associated with changes in software, and ensures that you will always be able to open the original documents, even if new software versions no longer support the original fonts and formatting. CVISION’s DocArchiver provides an easy solution for converting from regular PDF to PDF/A.

Download a free trial version of DocArchiver below:

http://www.cvisiontech.com/download_main.html

Category: Convert PDF, PDF Conversion, PDF/A | No Comments »

PDF Archive Format

July 20th, 2009 by Marsha

Question:
Is there a special PDF archive format that is better than regular PDF format? How is it different?

Answer:
There is a type of PDF called PDF/A, which is used specifically for archiving purposes. You might wonder what makes a file format optimal for long-term storage. Since your goal is to preserve information over a long period of time, the most important thing is to ensure that your files are not lost as a result of changes in software that prevent the recognition of the original fonts and formatting. PDF/A ensures that your files are preserved forever by keeping all of the necessary formatting within the document itself, so you don’t have to rely on future software being compatible with the original. All you need to be able to open your files is a PDF/A reader. To convert your PDF documents into PDF/A, you can use DocArchiver. DocArchiver exists specifically for the purpose of creating an accessible, easy-to-use archive with all of your important PDF files.

Check out the free trial version at the link below:

http://www.cvisiontech.com/download_main.html

Category: Convert PDF, PDF/A | No Comments »

Archive PDF Files

July 9th, 2009 by Michael

Question:
Is it possible to archive PDF files? Our company is looking for a way to digitize our archives. Is archiving PDF files a secure way to store information in the long run?

Answer:
You’re on the right track with your idea to archive PDF files. Actually, there is a subset of the PDF format, called PDF/A, that is specifically designed for archival purposes. You can safely archive PDF files by converting them to PDF/A, or you can scan your paper documents and store them as PDF/A files. The main difference between regular PDF files and PDF/As is that PDF/A files contain all information needed to open them, including font information, making them truly platform-independent. All that will be needed in the future to open PDF/A files is a PDF/A reader, regardless of how standard system fonts and other such data has changed. CVISION’s DocArchiver converts files to PDF/A, and ensures that they remain compliant with the PDF/A standard.

You can download a free trial version of DocArchiver by clicking the link below.

http://www.cvisiontech.com/download_main.html

Category: PDF/A | No Comments »

PDF File Archive

July 9th, 2009 by Marsha

 

Question:

I have a lot of PDF files that I want to archive for long-term use. What is the best way to do this?

 

Answer:

The best way to create a PDF file archive is to convert your documents into PDF/A format. PDF/A is a restricted version of PDF that ensures your documents will be viewable in the future, even if new software no longer supports the original fonts and formatting. PDF/A can always be opened as long as you have a PDF/A reader, making it the most secure, reliable archiving method. This storage method lets you avoid the risk of losing valuable information, and gives you convenient access to all of your documents. Additionally, it is more accessible and easier to use than microfiche. To create a PDF/A archive, you must first convert your documents into PDF/A format. This process is very simple with the help of conversion software such as CVISION’s DocArchiver, a server-based system that enables the verification and correction of documents into PDF/A format.

 

To try DocArchiver, download the free trial version:

http://www.cvisiontech.com/download_main.html

Category: Adobe PDF Conversion, Convert PDF, PDF/A | No Comments »

What is PDF/A?

June 3rd, 2009 by Chris

Question:
I have a lot of digital documents that absolutely must be able to be opened in the future. I was at a business lunch the other day and someone mentioned PDF/A files for archiving, but I had no idea what he was talking about. What is PDF/A?

Answer:
PDF/A is a file format used for archiving digital documents. Unlike regular PDF (portable document format) files, PDF/A files have all the information needed to open them embedded in the file itself, meaning that the only thing needed to open a PDF/A file in the future is a PDF/A reader. This ensures that your documents will be able to be opened in the future. CVISION Technologies has software called PdfCompressor, which will, among other things, convert your digital documents to secure PDF/A files for archiving. I have included a link to our downloads page where you can download a free trial and experiment with PDF/A files. The best way to learn what PDF/A is is to try it, and if you like the software you can purchase a full version.

http://www.cvisiontech.com/download_main.html

Category: PDF Conversion, PDF Document Conversion, PDF/A | No Comments »

PDF OCR for NARA compliance

June 1st, 2008 by Chris

Retaining documents is getting easier is a sense than it’s ever been. Many documents are already in electronic form and paper documents can easily be processed through a scanner or MFP and converted to electronic form. For serious-minded IT managers, however, this is only where the problem begins.

There are issues of retention that must be resolved. Will these documents open and be readable 5, 10, 30 years from now? While microfiche is fast becoming an outdated technology, it is also tried and true. Meaning, little can and will go wrong by archiving a Company’s database to microfilm. The same cannot easily be said for electronic archiving.

Will the operating system in 5 years from now, say Vista 2012, be able to read the PDFs being archived right now? Are these archived records readable across all machines in the Company, even overseas? To the electronic files satisfy NARA archival requirements? Are we converting to PDF/A?

So although document archiving and records management issues are in some sense simplifying and streamlining, as the corporate fileroom becomes a thing of the past, many aspects of archiving and RM are becoming more complex. What constitutes a legal record? Is a facsimile acceptable?

NARA has put out a set of guidelines that can be helpful to IT departments trying to sort all this out

http://www.archives.gov/records-mgmt/initiatives/pdf-records.html

http://www.archives.gov/records-mgmt/initiatives/scanned-textual.html

OCR is generally an important component with respect to scanned documents. Once converting paper to electronic documents to be archived, it is often important to have full-text search capability with respect to these files. It is important to understand, particularly if archiving to PDF, what the various formats are for scanned documents and which are acceptable from a NARA archival perspective.

In particular, many OCR products by default convert PDF OCR ‘ed documents to electronic form such as text or Word. Any such conversion to a non-image format is not acceptable for archiving (at least from a NARA perspective) since guesses are made in the OCR process and there is a certain error rate as scanned characters are misinterpreted by the OCR system.

Also not acceptable for record retention purposes is PDF normal, which is a hybrid of both image and electronic PDF blended together. PDF normal also has problems with OCR mismatches and font subsitutions. In addition, the combination of electronic and image characters can detract from document readability.

What is acceptable for compliance with NARA, PDF/A, and other archival specifications? The basic requirements with respect to compliance is that captured paper documents remain in some image format, e.g., PDF image. They must look perceptually identical to the documents at the time of capture. To achieve this, a purely image PDF format needs to be selected (e.g., JPEG or JBIG2-based) . This rules out conversion to electronic or normal PDF formats.

It is also OK, from a NARA perspective, to add a hidden text layer to an image PDF document. The hidden text layer does not change the appearance of the PDF in any way, but it does allow for full-text search and indexing of the source document.

Category: All, OCR, PDF OCR, PDF/A | No Comments »

Convert PDF to PDF/A

September 11th, 2007 by Chris

Question: Can your software convert PDF documents to PDF/A for archiving purposes? We would like to create an online archive of PDF/A documents.

Answer: In PdfCompressor 4.0, which will be released in October of 2007, we will support PDF/A for archiving.

If you are interesting in testing our current version, click the link below.

http://www.cvisiontech.com/download_main.html

Category: All, PDF Conversion, PDF/A | No Comments »

Converting to PDF/A opposed to TIFF

June 5th, 2007 by Chris

Question: We are considering standards for an image document archive. Are there compelling reasons to consider PDF/A instead of TIFF?

Answer: There are really two parts to your question. Should we consider PDF as a document archive format ? If so, should we take advantage of PDF/A, the new version of PDF that is specifically designed for long-term archiving?

Most companies are adapting, some rapidly and some less so, to the age of digital media. Whereas historically corporate archiving methods were either paper or microfilm / microfiche, today much of the archiving is done using electronic files. The traditional methods of paper and microfilm, although somewhat out of date in the computer age, have the advantage of guaranteed reproducibility. Initially, companies started to move in the direction of the “paperless office” by converting some of their paper to electronic TIFF format files. While TIFFs were not readily searchable, except by field coding, they were electronic media that could be stored on computer and accessed on remote datasites. TIFF as a format has the advantage that it is not changing and, as a result, reproducibility is essentially guaranteed. TIFF is easily accepted within an imaging document workflow, but is not natively searchable and had no support for metadata, hyperlinks, annotations, or security.

In the last several years, there has been a shift in the document imaging community towards adopting PDF as a standard. The advantages include : i. efficient full-text search, ii. much better compression than TIFF and JPEG (bitonal up to 10x, color up to 100x), iii. metadata support (author, keywords, etc.), iv. web-optimization, v. security, and vi. portability across platforms and databases.

The problem with increased migration towards PDF as the electronic document archive format of choice is that PDF is an evolving standard which is very complex and can include mpeg videos, hyperlinks, and javascript. It becomes very difficult at some point to ensure what industry needs most - guaranteed reproducibility of the document. Efficient document indexing and transmission are important features of a digital archive, but most important is the certainty that the document can be reproduced on demand, as required, over the long term.

Thus, as PDF evolved there seemed to be a need for a version of PDF where reproducibility of the document is assured. ISO 19005-1 defines “a file format based on PDF, known as PDF/A, which provides a mechanism for representing electronic documents in a manner that preserves their visual appearance over time, independent of the tools and systems used for creating, storing or rending the files.” These specifications define a profile for electronic documents that ensure the documents can be reproduced in years to come.

An important aspect to this reproducibility is the requirement that PDF/A documents be 100 % self-contained. All the information necessary for displaying documents as the original files, identically every time, is embedded in the file. This includes, but is not limited to, all content (text, raster images and vector graphics), fonts, and color information. A PDF/A document cannot rely on information from external sources (e.g. non-embedded fonts and hyperlinks).

So if a company has decided to use PDF as its records management and/or archival format, a limitation of PDF in its native form is that it cannot guarantee long-term reproducibility. Certain restrictions have been incorporated into the PDF Standard to derive PDF/A, where long-term reproducibility can be guaranteed. PDF/A is based on an existing version of the PDF Reference, namely Adobe PDF Reference 1.4, implemented in Adobe Acrobat and Reader 5. Certain functions allowed in PDF 1.4 have been specifically excluded from PDF/A, e.g., sound, movie actions.

If a company has decided to convert to PDF, there are certainly some compelling reasons to consider the new PDF/A format.

Category: All, Convert PDF, PDF/A, Tiff | No Comments »

PDF/A

June 2nd, 2007 by Chris

Question: Is there a problem with view-protecting my PDFs, while at the same time converting them to be PDF/A - archivable PDF format?

Answer: There is a problem with security and PDF/A that needs to be addressed. In particular, there is a lot of buzz about PDF/A but it does not necessarily meet all the requirements of industry: financial, legal, banking, etc… The PDF specs, in general, were made to be versatile. So protecting your files either through view protection or print protection is clearly a useful feature in many areas, though not compliant with PDF/A.

The fact is that the concerns involved in the PDF/A spec design mostly reflected issues of libraries, more than industry. While the goals involved in PDF/A are generally worthwhile, they can also be shortsighted. Almost all compression of image types involves some change to the underlying images. Hopefully, these modifications, such as resolution reduction of the original images for tele-radiology, do not affect the use of the image document within its application. But these decisions are best made within an industry, not by the PDF/A committee.

For long term archiving of text-based documents, where there are no security issues, such as library archives, the PDF/A specs seem well-advised. No no-embedded fonts, no javascript - reasonable restrictions on the PDF specs. However, for active PDF files used in various applications such as investment banking, the lack of security within PDF/A and the inability to modify a document in any way, e.g., deskew, is apt to be a problem.

So while PDF/A is generally useful and most of the recommendations in it are well-advised, it is not ideal for every industry and application. Rather, it must be considered, with its limitations, on a per application basis.

Category: All, PDF/A | No Comments »

PDF Conversion: PDF/A vs. Document Security

January 19th, 2007 by Chris

Question: How can I convert company documents to PDF/A when I’m also concerned about file security and encryption?

Answer: There is an inherent conflict between a document being open & accessible and also being secure. The focus of the PDF/A specs is accessibility, not security. Which works great at the library level, but not necessarily for an investment bank.

Sensitive company documents can always be kept unencrypted, in an open PDF format, with security enforced at the company database level. In other words, only users with the proper database security in the company could view, print, or edit a given document.

Of course, enforcing security for PDF files at the database level has its drawbacks. Sending a file across the Internet makes it vulnerable to being “sniffed” or read by a 3rd party. What if it’s necessary at certain times to web-host the document and make it viewable to people outside the company? What if you need to email the document reliably to a 3rd party?

One of the advantages to using PDF for conversion & archiving in the first place is the format’s view, print, and edit protection features. But these security features all require encryption and must be disabled for a document to satisfy the PDF/A requirements. So it seems that satisfying the PDF/A specs requires disabling some of PDF’s finest features, at least with respect to security. For many companies, this is not always a winning proposition and should be considered carefully before implementation.

Category: All, Convert PDF, PDF Conversion, PDF/A | No Comments »