CVISION home
 
 
 
Litigation Support Web Repositories Scanning Bureaus Wireless Telecom
 

 
   CVista Suite Overview
   CVista PdfCompressor
   CVista Viewer
   CVista API Toolkit
   CBatch
   OCR
 
  Professional Services Overview
  LeapReader Overview
  Submit Inquiry
 
   Case Studies
   Litigation Support
   Web Repositories
   Scanning Bureaus
   Wireless Telecom
 
   Resellers
   Service Bureaus
 
   Case Studies
   Clients
   Testimonials
   Information/Support Blog
   Submit a File to our Staff

 

OCR, Crytorithms, Cryptograms and Substitution Ciphers

Cryptorithms

Crptograms and OCR

Cryptorithms

Cryptorithms are puzzles where the digits in an arithmetic computation are replaced with letters. The puzzle is presented with the letters, and the object is to find out what the corresponding digits are. A famous example is

    S E N D
+ M O R E
_________
M O N E Y

Another, somewhat simpler example is given by

   PYX
+ PYX
_______
    YYP

Here, the trick is, like in a crossword puzzle, to start where the puzzle is "easiest" to break. In this example, we note that in the 2nd column we have that Y + Y + {0,1} = {0,1}Y. But Y cannot be 0 as YYP is the sum of 2 numbers whose leading digits would be non-zero. But then X + X must involve a 1-carry since otherwise Y + Y + 0 cannot equal Y, for non-zero Y. This forces Y to be odd, since odd + odd + 1 is odd, but even + even + 1 is not even. But then Y must equal 9 since only 9 satisfies the constraint that 9 + 9 + 1 = {0,1}9. Since Y = 9, we also have the constraint from the leftmost column that P + P + 1 (1-carry from column 2) = 9. So that P = 4. We know from our analysis thus far that X > 4 since X + X results in a 1-carry to column 2. But then X + X = 14, so that X = 7. This is how a simple cryptorithm is solved.

Of course, they can get more complicated. Try the first cryptorithm problem, given above. Analysis there would again start from the easiest letter to break, leading us to conclude M = 1.

Cryptograms and OCR

In cryptography, a substitution cipher is a method of encryption by which units of plaintext are substituted with ciphertext according to a regular system; the "units" may be single letters (the most common), pairs of letters, triplets of letters, mixtures of the above, and so forth. The receiver deciphers the text by performing an inverse substitution. A cryptogram is defined as a short piece of text encrypted with a simple substitution cipher in which each letter is replaced by a different letter. To solve the puzzle, one must recover the original lettering.

Here is a simple example :

CAEEAEEAOOA

Answer:

Mississippi

We can, quite naturally, view certain OCR problems in a similar vein. Of course, in analyzing scanned documents we cannot always assume that each connected component in the image corresponds to a symbol in the alphabet. We have to deal with oversegmented and undersegmented images. In the oversegmented case, more than one model is required to comprise certain letters in the alphabet. This can happen if the document is not thresholded correctly, or if composite topological structures, such as "i" and "j", are not combined into single models. In the undersegmented case, one component comprises more that one symbol in the alphabet. This happens often with certain letters such as "fi" and "th". For best OCR results, these undersegmented cases need to be broken.

Click here to read next topic: Human and Machine Readability & OCR

Return to Table of Content

 
 
   
 


Copyright (c) 1998-2007 CVISION Technologies, Inc.
CVISION, CVista, CBatch, and the CVISION logo are registered
trademarks of CVISION Technologies, Inc.

 
Litigation Support Web Repositories Scanning Bureaus Wireless Telecom