<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Doc Doc : The Document Doctor</title>
	<atom:link href="http://www.cvisiontech.com/docdoc/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.cvisiontech.com/docdoc</link>
	<description>The Document Doctor is in to talk about Document Technologies</description>
	<lastBuildDate>Mon, 29 Apr 2013 15:03:37 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Workflow, super-charge with OCR</title>
		<link>http://www.cvisiontech.com/docdoc/?p=447</link>
		<comments>http://www.cvisiontech.com/docdoc/?p=447#comments</comments>
		<pubDate>Sun, 21 Apr 2013 16:18:35 +0000</pubDate>
		<dc:creator>Chris Riley</dc:creator>
				<category><![CDATA[workflow]]></category>
		<category><![CDATA[classification]]></category>
		<category><![CDATA[contextual]]></category>
		<category><![CDATA[data capture]]></category>
		<category><![CDATA[image classification]]></category>
		<category><![CDATA[OCR]]></category>
		<category><![CDATA[routing]]></category>
		<category><![CDATA[taxonomy]]></category>
		<category><![CDATA[work-flow]]></category>

		<guid isPermaLink="false">http://www.cvisiontech.com/docdoc/?p=447</guid>
		<description><![CDATA[Workflow on digital documents, been there done that, now it's time to advance your workflow to paper and image documents.  Supercharge workflow with optical character recognition and classification.]]></description>
			<content:encoded><![CDATA[<p>Document workflow can be as easy as saving a file to a single location to as complex as decision tree document routing rules.  Throw some paper into the mix and the problem intensifies slightly. Getting your paper documents to fit your already accepted digital document workflow can be challenging.  Some organizations choose to keep the paper and digital workflows separate.  Others unite them but create separate rules for each.  For most however, it would be ideal to have a single workflow engine or product supporting both the digital, image, and paper documents.</p>
<p>To do so with the greatest value, you need not only document conversion using Optical Character Recognition ( OCR ), but some other advanced imaging and recognition tools.  In the digital document world, you don&#8217;t have only the data contained in the document, you have various other meta data items such as file name, file location ( taxonomy ), tags, etc.  In order to marry paper with digital the same has to be duplicated on the paper document and has to occur at time of document processing.  This could be a manual process or automated, and depending on your paper volume doing it in manual may be OK.  To compete with the efficiency of digital documents however, automatic is the way to go.</p>
<p>Using OCR, image-based and contextual-based classification, paper or image documents that enter the workflow can obtain the same value as digital documents.  The OCR is responsible for getting all the content from the document.  The purpose of this content is for search, indexing, auto-filing, as well as generation of keywords ( tags ) associated with a taxonomy.  In order to determine where the document fits into a taxonomy, you must first classify it.</p>
<p>For classification to be most effective, it happens on two levels. Image-based classification, which is what the document looks like, classifies documents based on their physical structure which is a good indicator of its type and very fast.  Contextual classification, which is what words are contained in the document, is one level deeper in classification and looks for the keywords that would make a document one type over another.  For some environments, image-based classification can do the job entirely.  Once classification is known, a classification engine can place the document in the correct spot in an existing taxonomy.  Once an ID or classification is determined, it is no challenge to apply tags, file-naming, and file location to a document.</p>
<p>Workflow can stand alone, but injected with the power of OCR and document classification, it becomes a power house that does not know the difference between paper and digital.</p>
<p>Chris Riley &#8211; <a href="http://www.cvisiontech.com/docdoc/?page_id=2">About</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cvisiontech.com/docdoc/?feed=rss2&amp;p=447</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>What is OCR to SharePoint?</title>
		<link>http://www.cvisiontech.com/docdoc/?p=475</link>
		<comments>http://www.cvisiontech.com/docdoc/?p=475#comments</comments>
		<pubDate>Tue, 12 Mar 2013 03:45:59 +0000</pubDate>
		<dc:creator>Chris Riley</dc:creator>
				<category><![CDATA[OCR]]></category>
		<category><![CDATA[2010]]></category>
		<category><![CDATA[data capture]]></category>
		<category><![CDATA[features]]></category>
		<category><![CDATA[sharepoint]]></category>

		<guid isPermaLink="false">http://www.cvisiontech.com/docdoc/?p=475</guid>
		<description><![CDATA[Are you ready for SharePoint 2010?  What happens when you need to convert images and get them into SharePoint?  OCR and Imaging are not currently on Microsoft's radar, so find out your options.]]></description>
			<content:encoded><![CDATA[<p>To SharePoint product managers, everything is content.  An image, a file, an email.  It&#8217;s all content, and content that needs to be stored.  It&#8217;s hard even outside of the enterprise content management community  to find a technology conversation that does not mention SharePoint.  SharePoint is huge, and with the looming SharePoint 2010 release it promises to be even greater.</p>
<p>The latest updates to SharePoint are looking more and more like a fully functional content management system, with one exception, they focus on the general use only.  Microsoft&#8217;s methodology on the product is to create the functionality that everyone requires and leave the reset to the users or partners.  One such feature that is not the general is imaging and OCR.  While this would not be the first time that I saw Microsoft indicate they would not delve into an arena, only a few years later finding them as a threat in it.  I believe that Microsoft at this point will not invest much development in the imaging, data capture, and OCR functionality to SharePoint.</p>
<p>While there are several imaging packages out there that do export directly to SharePoint, they all share the primary problem of supporting newer versions.  When SharePoint 2010 comes out, it means substantial development for these vendors to improve their products.  With the current economic situation I expect many to refuse.</p>
<p>The good news is the number of hot-folder driven conversion applications that exist is high, and how hot-folders integrate nicely with all applications makes them an ideal solution.  Utilizing an existing imaging or OCR platform as a server based watch folder process allows companies to integrate OCR and data capture functionality into SharePoint in a day simply by having the documents converted prior to an upload.</p>
<p>SharePoint is not going away, and the need to get searchable images into the system is clear.  Until the time Microsoft decides to invest money here, it&#8217;s time to find a stable, and scalable way to OCR documents prior to SharePoint import.</p>
<p>Chris Riley &#8211; About</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cvisiontech.com/docdoc/?feed=rss2&amp;p=475</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Know your accuracy before you even test</title>
		<link>http://www.cvisiontech.com/docdoc/?p=484</link>
		<comments>http://www.cvisiontech.com/docdoc/?p=484#comments</comments>
		<pubDate>Mon, 25 Feb 2013 15:59:04 +0000</pubDate>
		<dc:creator>Chris Riley</dc:creator>
				<category><![CDATA[OCR Accuracy]]></category>
		<category><![CDATA[Scan Settings]]></category>
		<category><![CDATA[Scanning]]></category>
		<category><![CDATA[Accuracy]]></category>
		<category><![CDATA[image quality]]></category>
		<category><![CDATA[object detection]]></category>
		<category><![CDATA[scan]]></category>
		<category><![CDATA[success]]></category>

		<guid isPermaLink="false">http://www.cvisiontech.com/docdoc/?p=484</guid>
		<description><![CDATA[Can you look at a scanned image, and know it's OCR accuracy even before testing it?  It's not as difficult as you would think.]]></description>
			<content:encoded><![CDATA[<p>One of the natural abilities that develops as you see millions of sample images and their associated recognition results, is you begin to notice patterns and instantly indentify if a document will read well for both full-page document conversion and for field level.  It has more or less become a natural ability of mine, but I can identify its components.</p>
<p>First is initial image quality.  Without yourself identifying any objects on the page, look objectively at the document as a collection of questionable objects and see if you think the image quality is good.  This is determined by coherence of each object.  Are object borders tight and determinable? Are there objects interfering with other objects? Is the background of the image significantly different than all objects?</p>
<p>Second am identification of objects.  Find text, graphics, lines, paragraphs, etc.  Are their borders far enough apart?  Is their type clear?  This is most important for text.  Is their printing consistent?  For example does text go from one background color to another, this would make it inconsistent. Or another example does the straightness of lines change throughout the document?  And can one object be confused for another?</p>
<p>And third, now that you know the objects, how easy is it to determine their value.  Is the value obvious?  Do you have to look at it for a while to figure it out?</p>
<p>Essentially the three above steps are exactly what the conversion ( OCR, ICR, OMR ) product does in order to read a document.  With field level recognition it’s a bit more elaborate, but the core is the same.  By identifying early on what the anticipated accuracy is of a document, you can then adjust your scan, or input settings accordingly even before looking at any technology.  Doing this will give the best chance for success.</p>
<p>Chris Riley &#8211; <a href="http://www.cvisiontech.com/docdoc/?page_id=2">About</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cvisiontech.com/docdoc/?feed=rss2&amp;p=484</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Machine Based vs. Software Based OCR</title>
		<link>http://www.cvisiontech.com/docdoc/?p=471</link>
		<comments>http://www.cvisiontech.com/docdoc/?p=471#comments</comments>
		<pubDate>Sun, 04 Mar 2012 17:16:30 +0000</pubDate>
		<dc:creator>Chris Riley</dc:creator>
				<category><![CDATA[OCR]]></category>
		<category><![CDATA[Accuracy]]></category>
		<category><![CDATA[in-line OCR]]></category>
		<category><![CDATA[OCV]]></category>
		<category><![CDATA[pc]]></category>
		<category><![CDATA[software based]]></category>
		<category><![CDATA[speed]]></category>

		<guid isPermaLink="false">http://www.cvisiontech.com/docdoc/?p=471</guid>
		<description><![CDATA[Ever heard of in-line or machine based OCR?  Read about how the technology is used and how it compares to PC based OCR]]></description>
			<content:encoded><![CDATA[<p>Well they are all software, perhaps the better comparison is in-line OCR vs. PC based OCR.  In any case, there is an important differences between the two.  What I&#8217;m talking about are two very different modes of OCR ( Optical Character Recognition ) .  One is done at a very fast rate on documents or images as they are scanned, the other is done after the scanning and achieved by PCs or Servers.  When someone is referring to OCR,  they are most likely discussing the software that is installed on a PC to convert image to text.  Let&#8217;s look at the differences between the two.</p>
<p>In-line OCR is used primarily for mail-room processing on high speed high volume scanners, or on manufacturing assembly lines.  Both scenarios need data from the input asset quickly.  The benefit&#8217;s of in-line OCR is it&#8217;s the fastest OCR around.  Usually the OCR is apart of firmware, and optimized for speed.  If you imagine an assembly line of bottles, the bottles pass the camera at millisecond time.  To wait for OCR would be a huge bottleneck in the quality control and inventory process.  The downside to in-line OCR is accuracy.  Usually in the case of the assembly line the engine has been so tuned that it is extremely accurate for a single image type.  Where accuracy is proven to be less, is when it comes to document scanning, the digital mail-room.  In the digital mail-room the in-line OCR, in order to be as fast as it is, must be an engine that is reduced in complexity, namely removing document analysis and reading of complex fonts.  Because of this, when documents are scanned the accuracy cannot compare to that of PC based OCR.</p>
<p>PC based OCR has the benefit of scalability.  It can work on the widest range of document types.  Also because it&#8217;s using the PC, it has the latest and greatest technologies that work on degraded documents and complex documents.  The downside of PC based OCR is that it&#8217;s not as fast as in-line. 99.9% it is fast enough.  Many times PC based OCR is used at document scanner rated speed of 60 pages a minute.  This is plenty fast for those who&#8217;s primary concern is quality.  It is not fast enough for machine to machine hand-off&#8217;s, but this is not its primary use.</p>
<p>You may never encounter in-line OCR, but knowing about the technology helps understand the world of recognition and applications of such technology.</p>
<p>Chris Riley &#8211; <a href="http://www.cvisiontech.com/docdoc/?page_id=2">About</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cvisiontech.com/docdoc/?feed=rss2&amp;p=471</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Compression &#8211; Not for saving for optimizing</title>
		<link>http://www.cvisiontech.com/docdoc/?p=488</link>
		<comments>http://www.cvisiontech.com/docdoc/?p=488#comments</comments>
		<pubDate>Fri, 20 May 2011 14:32:02 +0000</pubDate>
		<dc:creator>Chris Riley</dc:creator>
				<category><![CDATA[compression]]></category>

		<guid isPermaLink="false">http://www.cvisiontech.com/docdoc/?p=488</guid>
		<description><![CDATA[Compression is not just for saving space, its also for increasing your efficiency of working with digital files.]]></description>
			<content:encoded><![CDATA[<p>The first thing people think of when investigating compression technologies is, &#8220;How can I save space?&#8221;.  For the advanced users, and some companies, compression is not necessarily for saving space, but optimizing it.  If you calculate the amount of time spent waiting for emails to download, opening large files, and searching, you will start to realize that compression plays a big role in workers efficiency.</p>
<p>The type of compression that I&#8217;m discussing here is file specific compression.  These are compression technologies that operate on single file types, and have special algorithms to reduce the size of those file types.  The two most common examples are JPEG image files and PDF files.  Using type specific compression has the benefit of being able to manipulate the files as you would normally.  The opposite of type specific is compression technologies such as Zip or Tar.  Here you have to uncompress the files before utilizing them.</p>
<p>Because the file types are left intact with type specific compression, it means that you can email the files after compression, search engines can index them, and they can be opened in your typical viewer.  The reality is that hard drive space is cheap and adding more is relatively easy.  So for some, compression is more about efficiency.  With proper compression, emails are sent and received faster, search engines crawl faster and indexes are smaller, and opening large files takes less time.</p>
<p>This is not to diminish the use of compression to save space in an ever increasing data collection world.  The purpose of this article is to highlight the other and substantial benefits of type specific file compression.  The trick now becomes finding the right compression tools that create high quality compressed files and compatible with typical file browsers.</p>
<p>Chris Riley &#8211; <a href="http://www.cvisiontech.com/docdoc/?page_id=2">About</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cvisiontech.com/docdoc/?feed=rss2&amp;p=488</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Set it and forget it OCR</title>
		<link>http://www.cvisiontech.com/docdoc/?p=54</link>
		<comments>http://www.cvisiontech.com/docdoc/?p=54#comments</comments>
		<pubDate>Tue, 08 Mar 2011 21:05:26 +0000</pubDate>
		<dc:creator>Chris Riley</dc:creator>
				<category><![CDATA[Full-Page]]></category>
		<category><![CDATA[Index]]></category>
		<category><![CDATA[OCR]]></category>
		<category><![CDATA[Scanning]]></category>
		<category><![CDATA[automated]]></category>
		<category><![CDATA[full-page ocr]]></category>

		<guid isPermaLink="false">http://www.cvisiontech.com/docdoc/?p=54</guid>
		<description><![CDATA[Set it and forget OCR is where you scan to a folder and it's automatically converted.  I might be an extremist but there is serious value in treating your office documents with this technique.]]></description>
			<content:encoded><![CDATA[<p>My office is a paper monster. Paper comes in and never leaves intact.  The scary part is how fast this happens.  Paper in hand, review its contents and asses its value, scan it, shred it.  Usually within minuets of its existence.  The value of set it and forget it OCR is tremendous, but you have to be comfortable.</p>
<p>Set it and forget it OCR is where you take your OCR product and configure it to automatically process any images that appear in a certain folder.  For my office, I scan to an “input” folder and all the resulting compressed and OCR&#8217;ed PDF files end up in the “File Cabinet” folder.  My strategy will not work for the timid because basically I&#8217;m relying solely on the power of OCR text and search to retrieve documents when I need them.  Most would rather configure their ADF scanner to have a setting or folder for each particular class of documents.  Most document scanners anymore have as few as 9 and as many as 99 destinations you can program.  You can set each destination as its own input folder with its own OCR settings with its own output folder.</p>
<p>I know I can do this because I know what settings it takes to get the quality of OCR I would need to at least have one or more usable keyword on the document for search.    And after-all, I&#8217;m an expert in OCR so to not use it everyday would be crazy in its own right.  I&#8217;ve yet to be proven wrong, my “File Cabinet” abyss has always given me the information I need at the time I asked for it and sometimes even new information I did not realize I had.</p>
<p>Now for you records management folks shaking your head, I understand your complaint.  It should not be about my approach but should be about what I do with the final paper product.  For those items that are for legal or business reasons that are deemed as a record by your taxonomy, they should be filed as such, perhaps scanned again as a record, and for heavens sake if you are not supposed to, don&#8217;t destroy it!</p>
<p>The purpose of my madness is to touch paper as little as possible, and get information only when I need it.  I am an extremist, but I assure you there is serious value, and a little fun in the set it and forget it OCR technique.</p>
<p>Chris Riley – <a href="http://www.cvisiontech.com/docdoc/?page_id=2">About</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cvisiontech.com/docdoc/?feed=rss2&amp;p=54</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Squeeze those files</title>
		<link>http://www.cvisiontech.com/docdoc/?p=237</link>
		<comments>http://www.cvisiontech.com/docdoc/?p=237#comments</comments>
		<pubDate>Fri, 04 Mar 2011 16:01:45 +0000</pubDate>
		<dc:creator>Chris Riley</dc:creator>
				<category><![CDATA[compression]]></category>
		<category><![CDATA[curruption]]></category>
		<category><![CDATA[real-time]]></category>

		<guid isPermaLink="false">http://www.cvisiontech.com/docdoc/?p=237</guid>
		<description><![CDATA[Don't wait to compress you files, start compressing today in real-time.  It's not just a suggestion, it's a life saver.]]></description>
			<content:encoded><![CDATA[<p>Compression is a great tool for saving hard drive space.  You may not currently be thinking about file compression, but you should.  It&#8217;s very likely that on your machines data is being created at an increasing rate, and your hard-drive space is decreasing at the same fast pace.  Organizations and individuals often only consider file compression when there is far to little space left on their hard-drives or the warning messages about too little space start appearing.  This is a big risk.</p>
<p>As we create files on our computer, access them, move them, modify them, we are fragmenting the drive.  Overly fragmented drives slow down machines and increase risk for damage and corruption.  The more files you have, the more this multiplies.  Real-time file compression helps with this because as soon as a file is generated, it&#8217;s compressed.  There is less space being used, and the need to compress in the future is gone.  Back-log compression ( compressing in bulk of all your files ) requires a lot of activity on the hard drive and increases the fragmentation.  The other risk of bulk conversion is the fact that you only have one chance to get it right.</p>
<p>Bad compression is not just an irritation, it&#8217;s a risk.  Usually when you compress a file, you are removing the original.  The whole purpose is to save space, not use up more by keeping both copies.  But because of the need to make sure you are compressing the file correctly, keeping both files waste a lot of space.  When doing day-forward compression or real-time compression it&#8217;s easy to check as the files come across to make sure at initial setup everything is good, but if you do bulk compression and make a mistake you could have ruined a large library of files.</p>
<p>I firmly believe in file compression, but I know first hand the risk of doing it incorrectly.  I now compress files as they are created and no longer have to think about data piling up faster then I can find ways to save space.</p>
<p>Chris Riley – <a href="http://www.cvisiontech.com/docdoc/?page_id=2">About</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cvisiontech.com/docdoc/?feed=rss2&amp;p=237</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OCR makes old systems new</title>
		<link>http://www.cvisiontech.com/docdoc/?p=229</link>
		<comments>http://www.cvisiontech.com/docdoc/?p=229#comments</comments>
		<pubDate>Thu, 10 Feb 2011 18:30:10 +0000</pubDate>
		<dc:creator>Chris Riley</dc:creator>
				<category><![CDATA[OCR]]></category>
		<category><![CDATA[legacy systems]]></category>
		<category><![CDATA[screen scrapping]]></category>

		<guid isPermaLink="false">http://www.cvisiontech.com/docdoc/?p=229</guid>
		<description><![CDATA[Migrating data from proprietary and legacy systems is one of the scariest activities of any IT department, useing OCR is the secrete when their is no other option.]]></description>
			<content:encoded><![CDATA[<p>One of the biggest challenges in the IT space, is migration from legacy systems, often mainframe&#8217;s, to modern day operating systems and applications.  Legacy systems still exist today in the form of classic green screen UNIX systems.  Their life has been extended due to the critical nature of the data they contain.  Modern day standards have been put into place hoping to avoid this problem in the future.  However, those applications that seem most critical to conform to standards such as hospital medical records systems, airline systems, and government systems still do not conform to any. The vendors who make these systems have every intention of making it very hard to migrate from.  But there is a way, and it works very well.  OCR.</p>
<p>You may have seen in a previous post where I eluded to the possibilities of using OCR to scrape screen-shots.  This is one of the best real examples of why the technology is so useful.  When you don&#8217;t have XML and ODBC or any of the other great standards that allow the exchange of data from one system to another, you always have what you can see, and if you can see it you can OCR it.  If you can view the data on the screen, you can move it to a new system.</p>
<p>Using OCR to either problematically or manual read portions of a screen where the legacy system window is displaying data, copy it to memory, and paste it into the new system is one of the most ingenious ways to ensure the neutrality of your data.  Vendor lock down attempts, or old technology should not prevent you from getting to what you own, the information.</p>
<p>Whether it&#8217;s a manual process or a programmatic one, the ability to OCR screen-shots and to migrate data is the hidden secret to crack any proprietary software safe.</p>
<p>Chris Riley – <a href="http://www.cvisiontech.com/docdoc/?page_id=2">About</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cvisiontech.com/docdoc/?feed=rss2&amp;p=229</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Space Age Optical Character Recognition</title>
		<link>http://www.cvisiontech.com/docdoc/?p=232</link>
		<comments>http://www.cvisiontech.com/docdoc/?p=232#comments</comments>
		<pubDate>Thu, 19 Aug 2010 16:10:27 +0000</pubDate>
		<dc:creator>Chris Riley</dc:creator>
				<category><![CDATA[Full-Page]]></category>
		<category><![CDATA[OCR]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[data security]]></category>
		<category><![CDATA[future]]></category>
		<category><![CDATA[robots]]></category>

		<guid isPermaLink="false">http://www.cvisiontech.com/docdoc/?p=232</guid>
		<description><![CDATA[Lets talk about the cool future uses of OCR technology when paper goes away.]]></description>
			<content:encoded><![CDATA[<p>There are a lot of technologists out there who believe that optical character recognition has its days numbered and is an aged technology.  The belief is that soon paper will go away.  This post is for those who believe OCR technology is going away.</p>
<p>The reality is that paper consumption has not really decreased. In some areas paper has been replaced with electronic data interchange EDI, but in other areas it has actually increased.  Studies have also shown that because documents are being scanned more often, there is also an increase in printing when the documents need to be shared or re-purposed.  But I&#8217;m not here to argue that paper is not going away and that document conversion technologies are required to convert them.  I&#8217;m here to point out a few futuristic uses of the technology that technologists like to already talk about and involve OCR.</p>
<p><strong>Data Security</strong></p>
<p>The first futuristic use of the technology that I would like to discuss is the use of OCR in data security.  Text strings sent over the Internet are far easier to sniff and unlock than a compressed JPEG image.  What if you were to convert the text into a JPEG during transmission and the person on the receiving end would OCR it to get the data. By doing so the data has been masked in a more efficient and secretive way.  For added security, proprietary image formats could be devised.</p>
<p><strong>File Compression</strong></p>
<p>Storing ASCII text takes up far less space than an image or video file.  As apart of the future of compression technologies, expect that OCR will be uesd to extract the text from an image and saved as an ASCII file.  Viewers will convert the text back to an image during viewing.  This then removes the image portion of the text and significantly reduces file size.</p>
<p><strong>Robots</strong></p>
<p>How else to you expect future robots to read text?  OCR of course.  The eyes of the robot are essentially a camera that takes pictures of images rapidly.  When the robot is faced with the comprehension of text, the image will be converted using OCR and fed through an engine to gain meaning from the text and act on it.</p>
<p>So there you have it, three really cool and cutting edge ways OCR is and will be used in the future.  Paper is not going away, but even if it were,  just look at the other cool uses of OCR technology.</p>
<p>Chris Riley – <a href="http://www.cvisiontech.com/docdoc/?page_id=2">About</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cvisiontech.com/docdoc/?feed=rss2&amp;p=232</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Down and dirty paperless office</title>
		<link>http://www.cvisiontech.com/docdoc/?p=81</link>
		<comments>http://www.cvisiontech.com/docdoc/?p=81#comments</comments>
		<pubDate>Sun, 11 Jul 2010 18:25:15 +0000</pubDate>
		<dc:creator>Chris Riley</dc:creator>
				<category><![CDATA[Full-Page]]></category>
		<category><![CDATA[OCR]]></category>
		<category><![CDATA[Scanning]]></category>
		<category><![CDATA[compression]]></category>
		<category><![CDATA[paperless office]]></category>
		<category><![CDATA[pdf]]></category>
		<category><![CDATA[server ocr]]></category>

		<guid isPermaLink="false">http://www.cvisiontech.com/docdoc/?p=81</guid>
		<description><![CDATA[Find out how I practice what I preach and have a paperless office!]]></description>
			<content:encoded><![CDATA[<p><!-- 		@page { margin: 0.79in } 		P { margin-bottom: 0.08in } --></p>
<p style="margin-left: 0.11in; margin-bottom: 0in;">In my office, paper comes in, is reviewed for value, gets scanned, and shredded or filed.  I have setup a system that allows me to very efficiently scan documents to my “digital file cabinet”.  Here is a quick guide on how I do it!</p>
<p style="margin-left: 0.11in; margin-bottom: 0in;">What you will need:</p>
<ol>
<li>
<p style="margin-bottom: 0in;">An unused computer attached to 	your network</p>
</li>
<li>
<p style="margin-bottom: 0in;">Google Desktop Search with network 	browsing enabled</p>
</li>
<li>
<p style="margin-bottom: 0in;">A document scanner</p>
</li>
<li>
<p style="margin-bottom: 0in;">A server based automatic OCR 	product</p>
</li>
<li>
<p style="margin-bottom: 0in;">A file compression product ( 	optional but recommended )</p>
</li>
</ol>
<p style="margin-left: 0.11in; margin-bottom: 0in;">Now to put it all together.  How I have my system setup is an inexpensive desktop computer with Windows XP installed.  Once all the applications are installed you don&#8217;t even need a monitor attached to this computer.  The computer is visible on the network and has one folder shared the “File Cabinet” folder in my case.  This computer is my stand alone digital file cabinet.  Attached to it is a document scanner with a 30 page feeder.  I have the scanner configured to scan to an “input” directory on the machine.</p>
<p style="margin-left: 0.11in; margin-bottom: 0in;">
<p style="margin-left: 0.11in; margin-bottom: 0in;">The automatic OCR processing product is configured to pick up images as soon as they arrive in the input folder “hot folder”, OCR them using specific index level OCR settings, and create a PDF with a hidden search-able layer.  The resulting PDF is put into another hot folder that the PDF compression tool is watching.  As soon as a PDF arrives in this folder it is instantly compressed and the compressed PDF is moved to the “File Cabinet Folder”.</p>
<p style="margin-left: 0.11in; margin-bottom: 0in;">Because Google desktop search is enabled to index all files in the “File Cabinet” folder the PDFs very quickly become a part of the index.  Configure your Google desktop search to enable network searches so that any machine on the network can open a browser, go to a URL located on the digital file cabinet machine and be located with a search.</p>
<p style="margin-left: 0.11in; margin-bottom: 0in;">Once it&#8217;s setup it&#8217;s simply a matter of putting paper in the scanner and pressing the scan button, and you&#8217;re done. It&#8217;s that easy, and extremely useful!</p>
<p style="margin-left: 0.11in; margin-bottom: 0in;">Chris Riley – <a href="http://www.cvisiontech.com/docdoc/?page_id=2">About</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cvisiontech.com/docdoc/?feed=rss2&amp;p=81</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
