CVision Tech CVision Tech
English French German Italian Japanese Korean Norwegian Polish Portuguese Spanish Swedish Thai Turkish
  • Download
  • Contact
  • Live Chat
  • Store
Store CVision Tech Contact Info
**
  • Home
  • Products
    • PdfCompressor
    • Maestro Recognition Server
    • PdfCompressor Developer’s SDK
    • OCR Engine
    • PDF Optimization Suite for Captiva
    • PDFOptimizer for OpenText Captiva
    • ImageOptimization for Documentum
    • PdfCompressor for Kofax
  • Solutions
    • File Compression
    • OCR
    • PDF Conversion
    • PDF Linearization
    • PDF/A Compliance for Archiving
      • DocArchiver
  • Industries
    • Banking and Financial Services
    • Tax and Accounting
    • Legal
      • Legal Document Management
      • Specific Needs for Legal Market
      • Specific Needs for Legal Market
    • Government
    • Education
    • Healthcare
    • Insurance
    • Wireless Telecom
    • Scanning Bureaus
    • Web Repositories
    • ASPs
    • News & Media
  • Resources
    • Resource Library
    • PdfCompressor Overview
    • Document Imaging Blog
    • The Visionary Newsletter
    • Compression
    • White Papers
      • PDF/A Document Archiving Primer
        • Challenges and Complexity of Document Archiving
        • Converting Documents to a Standard Electronic Format
        • PDF Evolves into the Electronic Document Standard
        • PDF as a Records Management Document Solution
        • PDF/A: Document Solution for Archiving and RM
      • Advanced Document Compression Primer
        • Reduced Storage Costs
        • Improved Collaboration Capabilities
        • Fully Searchable PDF Files
        • PdfCompressor’s Adjustable Settings
        • PdfCompressor - Complementing Document Management Workflow
      • OCR Software Primer
        • Thresholding within OCR
        • Texture Patterns and Small Fonts OCR
        • OCR, Neural Networks and other Machine learning Techniques
        • OCR, Crytorithms, Cryptograms and Substitution Ciphers
        • CAPTCHA: Human and Machine Readability & OCR
        • OCR & Novel Fonts, Multidirectional and Undersampled Text
        • Relationship between OCR & JBIG2
        • OCR, MRC & JPEG2000
        • Reverse Video & OCR
        • OCR & How they relate to MFPs (MultiFunctional Peripheral devices)
        • Dictionary Lookup and OCR
        • Rating an OCR System
        • Tweaking the System to Optimize OCR Performance
        • Searchable PDF using OCR
        • Electronic File Conversion & OCR
        • Bar Codes, OCR & ICR
        • OCR & Form Recognition
        • Data Extraction with OCR
        • Business Process Automation and How it Relates to OCR
        • OCR-based ROI
        • Towards the Paperless Office
      • JBIG2 Compression Primer
        • The Business Case for JBIG2 Compression
        • JBIG2 Compression Success Stories
        • JBIG2: A short history
        • Digital file formats: The short definition of JBIG2
        • JBIG2 and TIFF compared
        • JBIG2 and JBIG Comparison
        • Essential compression issues
        • Smart Compression Codecs: JBIG2, JPEG2000, and MPEG4
        • JBIG2: The Compression Connection
        • The JBIG2 Standard
        • Lossless, Lossy, and Perceptually Lossless Compression
        • JBIG2 Technical Advantages for Business Solutions
        • JBIG2 Technical Advantages: File Size
        • Efficient Encoding
        • OCR Support within PDF Format
        • PDF Web Optimization
        • Scanner Distortions Resolved
        • JBIG2-Compressed PDF Documents
        • Pattern Matching & Substitution
        • The Dangers of PM&S: Proceed with Caution
        • Verification
        • Halftoning in JBIG2
        • Utilizing a JBIG2 Encoder with No Information Loss
        • Overview: Benefits of PDF Compression and PDF Conversion
        • JBIG2 Compression Summary
    • Product Video Tutorials
      • PdfCompressor Demo Video
      • Maestro Demo Video
  • News & Events
    • Recent News and Events
      • CVISION Releases PdfCompressor 6.6
      • CVISION Releases PdfCompressor 6.5
      • CVISION Technologies will exhibit at Prophet 21 WWUG Conference in New Orleans
    • Industry News
  • Support
    • Support Login
    • System Requirements
    • Documentation
    • FAQs
      • Automatic Licensing Documentation
    • OCR Languages Supported
    • Submit a Ticket
  • About Us
    • Company Information
    • Partners
    • Success Stories
      • File Compression and Dept. of Homeland Security
      • Legal Industry Enjoys Freedom from Paper
      • University benefits from Improved Document Capture
      • Media Organization enjoys benefits of OCR, compression, conversion
      • Law Firm benefits from Auto-Routing & Filing of Image Documents
      • Improved Efficiency for the Legal Industry
      • New York City based law firm accelerates document efficiency with OCR
      • Leading hospital optimizes documents with compression and OCR
      • Global financial company utilizes digital mailroom
      • Energy Consulting and Construction Company Improves Document Accessibility
      • Manufacturing Company Reduces Accounts Payable Costs with Advanced Solution
      • Frontier Farm Credit Optimizes Accessibility with Distributed Capture Solution
      • Technology Company Reduces Storage Costs
      • CVISION Provides American Radio History a PDF Optimization Solution
      • Top 5 Global Financial Firm Processes 1.25 Billion Pages Yearly with PdfCompressor
      • Global Law Firm Resolves Bottleneck of Scanning and OCR with CVISION
      • Leading Distribution Company Realizes ROI Within 6 Months
      • Non-Profit Leverages Compression for Document Workflow
      • Large Government Agency Uses Compression to Accelerate File Transmission and Retrieval
      • Global Credit Card Company Accelerates Merchant Statement Processing Speed
      • Global Power Industry Leader Increases Document Handling Efficiency by More Than 50% with PdfCompressor
      • Argus der Presse Case Study
      • Healthcare Provider Improves Patient Care with Maestro OCR Software for EHR
      • Government Agency Improves OCR Efficiency with PdfCompressor
    • Client Testimonials
    • Customer Feedback
    • Careers
    • Contact
  • Home
  • Resources
  • White Papers
  • OCR Software Primer
  • CAPTCHA: Human and Machine Readability & OCR
 

CAPTCHA: Human and Machine Readability & OCR

There is a gap between human and machine readability. What does this mean exactly? Well, consider the websites that rely on “CAPTCHA” to distinguish between humans and bots. These websites are relying on the fact that there exist images where the text is human readable, but not machine readable.

What is CAPTCHA? A CAPTCHA is a type of challenge-response test used in computing to determine whether the user is human or not. “CAPTCHA” is an acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart”, trademarked by Carnegie Mellon University. A CAPTCHA involves one computer asking a user to complete a test. While the computer is able to generate and grade the test, it is not able to solve the test on its own.

Because computers are unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human. The term CAPTCHA was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper (all of Carnegie Mellon University), and John Langford (of IBM). A common type of CAPTCHA asks the user to type in the letters of a distorted image.

For computers, if distinct characters are not separate in the image after thresholding, there is often a sharp decrease in recognition rates.

When sampling is below the Nyquist sampling rate, machine recognition fails entirely, while human recognition remains intact until perhaps
25 dpi

Human vs. Machine Character Recognition

Typical document scanning takes place in the 200-300 dpi range. In that range, basic topological and geometric properties are preserved, even after thresholding (i.e., converting scan document to black and white). At low resolution scan rates, however, machine OCR systems run into trouble. Some of the reason for this disparity is that humans are adept at reconstructing the shapes of characters even if multiple characters share a pixel. For computers, if distinct characters are not separate in the image after thresholding, there is often a sharp decrease in recognition rates. Usually, an image is adequately sampled if each letter is at least two pixels in thickness; the same applies to white space. When sampling is below the Nyquist sampling rate, such that this constraint is clearly not satisfied, machine recognition fails entirely, while human recognition remains intact until perhaps 25 dpi.

Wherein lies the difference between human and machine readability? For example, there is an explosion in cell phone use worldwide, with the expected number of units to exceed one billion by the end of 2008. Many of these users will have the ability to capture images, including documents. For OCR to work effectively at these cell scan rates, which for documents is well below 50 dpi, there need to be fundamental improvements in OCR technology.

« To Section 6: OCR, Crytorithms, Cryptograms and Substitution Ciphers
To Understanding OCR Technology
To Section 8: OCR & Novel Fonts, Multidirectional and Undersampled Text »

  • Privacy
  • Cookies
  • Sitemap
  • Reference
  • Library
  • Contact Us
CVISION Technologies Facebook Page CVISION Technologies LinkedIn Company Page CVISION Technologies Twitter Page Subscribe to The Visionary Newsletter CVISION Technologies Blog CVISION Technologies YouTube Channel
 
Copyright © 1998-2018 CVISION Technologies, Inc.
CVISION, CVista, CBatch, and the CVISION logo are registered trademarks of CVISION Technologies, Inc.