Creating an interactive cryptogram solver (Part III)


Execute Interactive Solver

In Part I and Part II of this series we created the framework for our interactive solver and finally had a working AristocratSolver class. In this part, we will enhance our existing framework by adding some commonly used functions, add frequency counting of characters and character sequences to the AristocratSolver class, add the ability to display the current plaintext and ciphertext keys to the AristocratSolver class and then finally create a PatristocratSolver class that reuses all our work in the AristocratSolver class.

Extending the Cipher Class

In the previous parts of this series, the Cipher class had two methods, encrypt and decrypt, that basically just returned the Cipher.text property. A problem that we will run into is that the Cipher.text property could contain characters that are not compatible with the cipher class that we are using. We’ll want to strip out any invalid characters before we had the text off to the encryption and decryption routines of the actual cipher class. First, lets import the shared class at the top of cipher.py so we can use it:

Contents of cipher.py (Place at the top of the file on line 3)

import shared

Now lets modify the Cipher.__init__ method as follows:

Contents of cipher.py (Replace the existing Cipher.__init__ method)

  def __init__(self):
    self.text = ""
    self.decrypt_filter = lambda char: True
    self.encrypt_filter = lambda char: True

Here we’ve added self.decrypt_filter and self.encrypt_filter variables. These two variables will actually hold the functions that will get called for each character in Cipher.text and will return True or False depending on if the character is valid or not. We are using a really cool feature of python here called “lambda”. Lambda creates a single expression function. They are in the format:

lambda arguments: expression

We’ll see a better use of lambdas in the Aristocrat class in a moment. For now, the Cipher.decrypt_filter and Cipher.encrypt_filter just pass in each character and always return True. So the text will go unchanged because all characters are considered valid. Now lets add a method that will handle all the filtering of the text.

Contents of cipher.py (Insert after the Cipher.__init__ method)

  def get_text(self, text="", filter_func = None):
    if text == "":
      text = self.text
    if filter_func == None:
      return text
    else:
      filtered_text = ""
      for char in text:
        if filter_func(char):
          filtered_text += char
      return filtered_text

The Cipher.get_text method will use the Cipher.text property unless the text argument is passed. It then walks through the text and calls the filter_func for each character in the text. If filter_func returns True, the character is added to the return value. This is the perfect place to strip off line feeds or any other undesired character. The filter_func can be any function that has a character argument and returns True or False. Lets modify our Cipher.encrypt and Cipher.decrypt methods to take advantage of this new Cipher.get_text method.

Contents of cipher.py (Replace the existing Cipher methods)

  def decrypt(self, text=""):
    return self.get_text(text, self.decrypt_filter)

  def encrypt(self, text=""):
    return self.get_text(text, self.encrypt_filter)

The Cipher.encrypt and Cipher.decrypt methods are much simpler now because they just have to call the Cipher.get_text method with the appropriate filter. Alright, the Cipher class is now ready for other classes to utilize its new functionality! Now lets modify the CipherSolver to contain a few commonly used functions.

The CipherSolver Class

Contents of cipher.py (Replace the existing CipherSolver.__init__ method)

  def __init__(self):
    self.cipher = Cipher()
    self.prompt = ">"
    self.maxlinelen = 70
    self.shortcuts = {"d":"display"}

Here we’ve changed the CipherSolver.__init__ method by adding “self.maxlinelen = 70”. We want to add this here because we will be using the shared.breaklines function and will need to know what the maximum line length will be. This means that we can remove this line from the Aristocrat class if we want to. Now lets add a new method below the CipherSolver.display method.

Contents of cipher.py (Replace the existing CipherSolver.__init__ method)

  def print_counts(self,counts):
    retvalue = ""
    if len(counts) == 0:
      retvalue = "None"
    else:
      for item,count in counts:
        retvalue += "%s:%s, " % (item,count)
    print shared.breaklines(retvalue.strip(", "), self.maxlinelen)

A lot of our cipher classes are going to be calculating and counting things like frequency counts, digraph and trigraph counts, etc. So the CipherSolver.print_counts method neatly display those counts on the screen. All our cipher classes will be able to use this method and uniformly display their counts as needed

Shared functions

Before we jump into the Aristocrat class, lets add a few shared functions to our “shared.py” file. Add the imports to the top of the file and then add the two functions. I like to keep things in alphabetical order but that is totally up to you.

Contents of shared.py (Place at the top of the file)

import string
from operator import itemgetter

Contents of shared.py (ADD at the top of the file)

#Breaks the text into blocks of length blocklen
def breakblocks(text,blocklen):
  output = ""
  currlen = 0
  for index in range(len(text)):
    output += text[index]
    currlen += 1
    if currlen == blocklen:
      output += " "
      currlen = 0
  return output

The breakblocks will break a long line into chunks based on the blocklen parameter. This will be used in the PatristocratSolver class when we want to display things in 5 character blocks. Lets use the python interpreter and see how to use it:

python
>>> import shared
>>> line = 'GSRHRHZGVHGLUGSVVNVITVMXBYILZWXZHGHBHGVN'
>>> shared.breakblocks(line,5)
'GSRHR HZGVH GLUGS VVNVI TVMXB YILZW XZHGH BHGVN '
>>>
#Calculates how many times sequences of characters appears in the text
def calc_graphs(items, length, sorted = True, keepsingles = False, valid_chars = string.ascii_letters):
  if type(items) != list:
    items = [items]
  counts = {}
  for text in items:
    temp = ""
    for index in range(len(text)):
      if text[index] in valid_chars:
        temp = temp + text[index]
    text = temp
    for index in range(len(text)-(length-1)):
      graph = text[index:index+length]
      if not graph in counts:
        counts[graph] = 0
      counts[graph] += 1
  counts = counts.items()
  if not keepsingles:
    counts = [(k,v) for k,v in counts if v > 1]
  if sorted:
    counts.sort(key=itemgetter(1), reverse=True)
  return counts

The calc_graphs function takes a string or list of strings and calculates all the repeated sequences of characters that are of the length specified. If we want to throw away any that only appear once, we can pass in False for the keepsingles parameter. The valid_chars parameter is a string containing all the valid characters that will be counted. Lets look at an example of its usage:

python
>>> import shared
>>> line = 'GSRH RH Z GVHG LU GSV VNVITVMXB YILZWXZHG HBHGVN.'
>>> shared.calc_graphs(line,1)
[('G', 6), ('H', 6), ('V', 6), ('Z', 3), ('B', 2), ('I', 2), ('L', 2), ('N', 2), ('S', 2), ('R', 2), ('X', 2)]
>>>

If we want to see the digraphs we can pass in 2 as the length or 3 for trigraphs. By default the result is sorted descending from largest to smallest count.

The Aristocrat Class

Alright, now lets put all this new functionality into the Aristocrat class. Lets override the Cipher.encrypt_filter and Cipher.decrypt_filter properties in the Aristocrat.__init__ method.

Contents of aristocrat.py (Replace the existing Aristocrat.__init__ method)

  def __init__(self):
    Cipher.__init__(self)
    self.ctkey = string.ascii_uppercase
    self.ptkey = "-" * 26
    self.decrypt_filter = lambda char: char in (string.ascii_letters + string.punctuation + " ")
    self.encrypt_filter = self.decrypt_filter

Here we are setting the decrypt_filter to a new lambda function. This function checks to see if the character passed is a letter, punctuation or a space. We want the encrypt_filter to be the same so we are just assigning it to the value of the decrypt_filter. Now, whenever we need to get the text in the Aristocrat class we will strip out any line feeds, digits or other invalid characters automatically.

The AristocratSolver Class

Now lets add our new frequency counting functionality to the AristocratSolver class.

Contents of aristocrat.py (Replace the existing AristocratSolver.__init__ method)

  def __init__(self):
    CipherSolver.__init__(self)
    self.cipher = Aristocrat()
    self.shortcuts['f'] = 'frequency_list'
    self.shortcuts['k'] = 'keys'
    self.shortcuts['s'] = "set"

Here we’ll add two new shortcuts to the Cipher.shortcuts list in the AristocratSolver.__init__ method for the frequency_list method and the keys method. So whenever we want to get frequency counts, we just have to type “f”. If you want digraphs just type “f 2” into the solver. If we want to display the keys we can just type “k”. Notice we’ve removed the “self.maxlinelen = 70” line from the AristocratSolver.__init__ method. Since the CipherSolver.__init__ method already declares this, we don’t need to do this. If we want to change the maxlinelen globally, we can just change it in CipherSolver.

The last thing we need to do is add the actually frequency_list method to the AristocratSolver class below the display method.

Contents of aristocrat.py (Insert after the AristocratSolver.display method)

  def frequency_list(self, length = 1, text = ""):
    text = Cipher.encrypt(self.cipher, text)
    self.print_counts(shared.calc_graphs(text.split(" "), int(length)))

This method just gets the Cipher.text property and uses the shared.calc_graphs and print_counts functions to calculate and display the frequency counts. Notice that the text that we are passing into the shared.calc_graphs function is actually being split into chunks at the spaces. This makes sure that we only see repeated patterns in the actual words themselves and not just mashed together by ignoring the spaces

Lets see this new method in action! Lets fire up our “sandbox.py” in python:

python sandbox.py
> d

GSRH RH Z GVHG LU GSV VNVITVMXB YILZWXZHG HBHGVN.
---- -- - ---- -- --- --------- --------- ------.

> f 1

G:6, H:6, V:6, Z:3, B:2, I:2, L:2, N:2, S:2, R:2, X:2

> f 2

HG:3, GV:2, GS:2, VN:2, RH:2

>

Now lets add the displaying of plaintext and ciphertext keys to the AristocratSolver class.

Contents of aristocrat.py (Insert after the AristocratSolver.frequency_list method)

  def keys(self):
    print "ct:", self.cipher.ctkey, "npt:", self.cipher.ptkey, "n"
    values = zip(self.cipher.ptkey, self.cipher.ctkey)
    values.sort()
    ptval=""
    ctval=""
    for pt,ct in values:
      ptval += pt
      ctval += ct
    print "pt:",ptval, "nct:", ctval

The key function displays the ciphertext and plaintext keys, first in alphabetical ciphertext order and then alphabetical plaintext order. Lets see what it looks like when executed:

python sandbox.py
> s abcdefghijklmnopqrstuvwxyz zyxwvutsrqponmlkjihgfedcba

GSRH RH Z GVHG LU GSV VNVITVMXB YILZWXZHG HBHGVN.
this is a test of the emergency broadcast system.

> k

ct: ABCDEFGHIJKLMNOPQRSTUVWXYZ
pt: zyxwvutsrqponmlkjihgfedcba

pt: abcdefghijklmnopqrstuvwxyz
ct: ZYXWVUTSRQPONMLKJIHGFEDCBA

>

There we go! We’ve extended our AristocratSolver class so that it can display frequency counts and keys. We’ve modified our Cipher and CipherSolver classes to make them more extendable for other cipher classes. The last thing we need to do is whip up the Patristocrat and PatristocratSolver classes.

The Patristocrat Class

Since Patristocrats are basically just Aristocrats without the word spacings, we can re-use a ton of the Aristocrat class functionality and only override the different parts. Lets create a new file called “patristocrat.py” and put in the following code:

Contents of patristocrat.py

import string
import shared
from cipher import Cipher
from aristocrat import Aristocrat, AristocratSolver

class Patristocrat(Aristocrat):
  def __init__(self):
    Aristocrat.__init__(self)
    self.decrypt_filter = lambda char: char in string.ascii_letters
    self.encrypt_filter = self.decrypt_filter

The Patristocrat class is quite small because the Aristocrat class already does everything. Notice we are inheriting from the Aristocrat class and calling the Aristocrat.__init__ method. The Aristocrat class allowed letters, punctuation, and spaces in both its encrypted and decrypted text. For Patristocrats, we really only want letters. So we change the decrypt_filter and encrypt_filter to only return True if the character is a letter (A-Z, a-z).

The PatristocratSolver class

The PatristocratSolver class has a little more code than its cipher.

Contents of patristocrat.py (Place after the Patristocrat class)

class PatristocratSolver(AristocratSolver):
  def __init__(self):
    AristocratSolver.__init__(self)
    self.cipher = Patristocrat()

  def display(self):
    data = ""
    ct = shared.breakblocks(Cipher.encrypt(self.cipher), 5)
    ct = shared.breaklines(ct, self.maxlinelen).split("n")
    pt = shared.breakblocks(self.cipher.decrypt(), 5)
    pt = shared.breaklines(pt, self.maxlinelen).split("n")

    for index in range(len(ct)):
      data += ct[index] + "n"
      data += pt[index] + "nn"
    print data.strip("n")

  def frequency_list(self, length = 1, text = ""):
    text = Cipher.encrypt(self.cipher, text)
    self.print_counts(shared.calc_graphs(text, int(length)))

Lets go through each of the three methods in the class

  def __init__(self):
    AristocratSolver.__init__(self)
    self.cipher = Patristocrat()

First we have the PatristocratSolver.__init__ method. This calls the AristocratSolver.__init__ method so it can inherit all its functionality. We then change the self.cipher to be a new instance of the Patristocrat class.

  def display(self):
    data = ""
    ct = shared.breakblocks(Cipher.encrypt(self.cipher), 5)
    ct = shared.breaklines(ct, self.maxlinelen).split("n")
    pt = shared.breakblocks(self.cipher.decrypt(), 5)
    pt = shared.breaklines(pt, self.maxlinelen).split("n")

    for index in range(len(ct)):
      data += ct[index] + "n"
      data += pt[index] + "nn"
    print data.strip("n")

The display method gets the text from the Cipher.encrypt method and then breaks it into five character blocks. We use the shared.breaklines method again to break the lines if they are too long. We also break the decrypted text into blocks and then display the ciphertext and plaintext together.

  def frequency_list(self, length = 1, text = ""):
    text = Cipher.encrypt(self.cipher, text)
    self.print_counts(shared.calc_graphs(text, int(length), False))

Finally, we have the frequency_list method. The only thing that is different from the AristocratSolver.frequency_list is that we are not splitting the text at the spaces or line breaks.

That is it for the Patristocrat and PatristocratSolver classes. We saved a lot of work by inheriting from the Aristocrat classes. Now lets modify “sandbox.py” and take her for a spin.

Contents of sandbox.py (Replace existing contents)

from patristocrat import PatristocratSolver
solver = PatristocratSolver()
solver.cipher.text = 'GSRHRHZGVH GLUGSVVNVITVMXBYIL ZWXZHGHBHGVN'
solver.solve()

So we just set everything to use the Patristocrat classes instead. I’ve thrown a little ciphertext in there with spaces so you can see that they will be stripped out by the filters. Lets see how this looks in python:

> d

GSRHR HZGVH GLUGS VVNVI TVMXB YILZW XZHGH BHGVN
----- ----- ----- ----- ----- ----- ----- -----

> f

G:6, H:6, V:6, Z:3, B:2, I:2, L:2, N:2, S:2, R:2, X:2

> s VNVITVMXB emergency

GSRHR HZGVH GLUGS VVNVI TVMXB YILZW XZHGH BHGVN
----- ---e- ----- eemer gency -r--- c---- y--em

> k

ct: ABCDEFGHIJKLMNOPQRSTUVWXYZ
pt: -y------r---nm-----g-e-c--

pt: -------------------cegmnry
ct: ACDEFGHJKLOPQRSUWYZXVTNMIB

>

It works! You can still use all the AristocratSolver methods like keys. Our framework has been completed and we’ve seen how to create two different cipher classes and interactive solvers. They can be extended to fit any solving style or cipher type. In Part IV of this series we will bring all the code together into a complete interactive solver that allows you to select which solver class you would like to use. We will also add documentation functionality to the classes so that the solver can tell you all about the functions and parameters without us having to constantly look at the source code or memorize everything.

Complete Source Code

You can download the complete source code to Part III at: Creating an interactive cryptogram solver (Part III – Source).

Related Posts:

Creating an interactive cryptogram solver (Introduction)
Creating an interactive cryptogram solver (Part I)
Creating an interactive cryptogram solver (Part II)
Creating an interactive cryptogram solver (Part IV)

One thought on “Creating an interactive cryptogram solver (Part III)”

  1. CODE PENGUIN –

    Nice job!

    Now, to turn this into an automatic solver, you need a way of scoring deciphered text – the method used by ANCHISE (Mike Cowan) is quite good. The auto solve process sets the CT key to ABC…Z, Initially set the PT key to the same, then randomly swap any two letters in one of the keys. If the swap results in an improved score, keep it. Otherwise, try a different swap..

    For the Aristocrat, it makes a lot of sense to use a 27 letter alphabet, with the word divisor (space) as the 27th letter. I have also been exploring the benefits of having a 28th letter for the apostrophe, but that is not as significant as the word divisor.

    Best wishes with your project.

    Happy Thanks giving!

    GGMA

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s