Julius Caesar encrypted some of his correspondence. He would shift each letter by 3: A became D, B became E, C became F, and so on. Letters at the end of the alphabet would wrap around to the beginning: W became Z, but then X became A, Y became B, and Z became C.
To decode an encrypted message, you have to shift the other direction.
Here is a short message (penned by J. K. Rowling):
I SOLEMNLY SWEAR THAT I AM UP TO NO GOOD.
And here is the encrypted version produced by applying the Caesar cipher:
L VROHPQOB VZHDU WKDW L DP XS WR QR JRRG.
We provide a module containing functions that encrypt and decrypt using the Caesar cipher: caesar_cipher.py. This program uses ord to convert a letter to a number and chr to convert a number to a letter. You're going to need to do very similar things in A2, so please study that code.
The Caesar cipher is a terrible encryption algorithm because it is so easy to break. In A2, you're going to write one that is much harder to break.
The goal of this assignment is to gain practice with some important Python and computer science concepts:
You'll be implementing a program to encrypt and decrypt text using an encryption algorithm. Such an algorithm describes a process that turns plaintext into what looks like gibberish (we'll call this ciphertext), and decrypts the ciphertext back into plaintext. For example, your program will be able to take a secret message written in ciphertext, and figure out what it says. ... Like:
OXXIKQCPSZXWW
Hmmm. How can that ciphertext be decrypted, and what does it say? You'll have to do the assignment to find out!
The encryption algorithm carries out encryption and decryption using a deck of cards[1]. However, we're going to make two simplifications to a regular card deck. First, rather than use a standard deck of 52 cards, we're going to use 28 cards. Second, rather than use actual card ranks and suits, we're going to use the numbers from 1 to 28. We will call 27 and 28 the jokers. We're doing this because it's not necessary to know anything about playing cards to understand this algorithm, so we might as well use integers rather than card names.
The algorithm is an example of a stream cipher. What that means is that every time you complete a round of the algorithm, you get one keystream value. This stream of values is then used, in combination with plaintext or ciphertext, to encrypt or decrypt, respectively. You will complete one round of the algorithm for each letter in the text to be encrypted or decrypted.
Each round of the algorithm consists of (one or more repetitions of) five steps; once all steps in a round are complete, one keystream value is available and the next round of keystream generation can begin.
Here is the algorithm; an example follows.
Begin with a deck of cards, which is really just a permutation of the integers from 1 to 28. The steps for one round are as follows:
To generate another keystream value, we take the deck as it is after step 5 and run another round of the algorithm. We need to generate one keystream value for each character in the text to be encrypted or decrypted.
Let's go through an example of how the algorithm generates keystream values. Consult the descriptions of the steps given above as you follow along with this example. We'll illustrate the first round of the algorithm, but encourage you to do another round by-hand so you really understand what's happening.
Consider the following deck. The top card has value 1. The bottom card has value 26.
Top Bottom
1 4 7 10 13 16 19 22 25 28 3 6 9 12 15 18 21 24 27 2 5 8 11 14 17 20 23 26
Step 1: Swap 27 with the value following it. So, we swap 27 and 2:
1 4 7 10 13 16 19 22 25 28 3 6 9 12 15 18 21 24 2 27 5 8 11 14 17 20 23 26
^^^^
Step 2: Move 28 two places down the list. It ends up between 6 and 9:
1 4 7 10 13 16 19 22 25 3 6 28 9 12 15 18 21 24 2 27 5 8 11 14 17 20 23 26
^^^^^^
Step 3: Do the triple cut. Everything above the first joker (28 in this case) goes to the bottom of the deck, and everything below the second (27) goes to the top:
5 8 11 14 17 20 23 26 28 9 12 15 18 21 24 2 27 1 4 7 10 13 16 19 22 25 3 6
^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Step 4: The bottom card is 6. The first 6 cards of the deck are 5, 8, 11, 14, 17, and 20. They go just ahead of 6 at the bottom end of the deck:
23 26 28 9 12 15 18 21 24 2 27 1 4 7 10 13 16 19 22 25 3 5 8 11 14 17 20 6
^^^^^^^^^^^^^^^
Step 5: The top card is 23. Thus, our generated keystream value is the card at position 23. This card has value 11. Since this value 11 isn't 27 or 28, we are done with the round.
As a self-test, you should carry out the next round of the algorithm to find the second keystream value. Its value is 9; be sure you get the same value to convince yourself that you understand the five steps.
To encrypt a message, remove all non-letters from the message and convert any lowercase letters to uppercase. Next, convert the letters to numbers (A becomes 0, B becomes 1, ..., Y becomes 24, and Z becomes 25). Then, use the algorithm to generate the same number of values as there are letters in the message. Add the corresponding pairs of numbers, modulo 26. Finally, convert the resulting numbers back to letters.
Decryption is just the reverse of encryption. Start by converting the message to be decoded to numbers. Using the same card ordering as was used to encrypt the message, generate one keystream value for each character in the message. (Because the same starting deck of cards was used, the same keystream will be generated.) Subtract the keystream values from the message numbers, again modulo 26. Finally, convert the numbers to letters to recover the message.
Let's say we want to encrypt the message Lake Hylia. Removing non-letters and capitalizing the letters gives us LAKEHYLIA. Next, convert these letters to numbers:
L A K E H Y L I A
11 0 10 4 7 24 11 8 0
Since we have nine letters, nine keystream values are required. Rather than go through nine rounds of the algorithm here, let's just assume that the nine generated keystream values are as follows:
12 8 17 25 1 14 15 13 20
Now add the two groups together pairwise, modulo 26. This means that if the sum of a pair is greater than 25, just subtract 26 from it. (For example, 1 + 8 = 9, but 11 + 17 = 28 - 26 = 2.) Then, convert these numbers to letters to arrive at the encrypted message:
11 0 10 4 7 24 11 8 0
+ 12 8 17 25 1 14 15 13 20
--------------------------
23 8 1 3 8 12 0 21 20
X I B D I M A V U
To decrypt this message, the recipient would start with the same deck with which the encryption was started, and generate the same nine-number keystream. They would then convert the encrypted message to numbers. Then, instead of adding corresponding numbers from the keystream and message, they would pairwise-subtract the keystream from the message, modulo 26. That is, if the subtraction gives you a negative number, add 26 to the result. (For example, 9 - 8 = 1, and 2 - 17 = -15, but since it is negative you add -15 + 26 = 11.) The decryption back to LAKEHYLIA looks like this:
23 8 1 3 8 12 0 21 20
- 12 8 17 25 1 14 15 13 20
--------------------------
11 0 10 4 7 24 11 8 0
L A K E H Y L I A
Of course we can never get all the way back to Lake Hylia because we don't know how to restore any lowercase letters or where to insert any spaces.
Please download the Assignment 2 Data Files and extract the zip archive. The following paragraphs explain the files you have been given.
The main program: cipher_program.py This file contains the definition of three constants, an empty function called main, and a call on function main. More on this later.
A nearly-empty file: cipher_functions.py You'll write almost all of the functions here. It's imported by cipher_program.py. It includes two constants: JOKER1 and JOKER2. These constants will make our code more readable. Use these constants instead of using 27 and 28 lest you incur the wrath of the people marking your program!
deck files: deck1.txt and deck2.txt Design note: We decided to store a deck in a text file, so we'll have a function that reads a deck file and builds a list of integers. A deck of cards will be represented in Python as a list of numbers. We provide a sample deck in deck1.txt. Notice that this file has multiple lines: a deck file can have any number of lines. Another sample file deck2.txt is a different file, but if you look closely you will notice that it represents the same deck. Your function for reading in a deck must work for both these deck files and any others that we might use for testing. In particular, the numbers might be in a different order and there could be one or more numbers on each line.
Some message files: message_file1.txt and secret?.txt
Messages to encrypt or decrypt will be stored in text files. Message text files contain multiple messages - one per line. Imagine that we're encrypting or decrypting a message file that contains multiple lines. The first line is encrypted (or decrypted) using the deck in the configuration as specified in a deck file. Subsequent lines of that same file are encrypted (or decrypted) starting with the configuration of the deck following the previous encryption (or decryption). That is, the deck is not reset between lines of a message file.
The starter code archive contains one text file named message_file1.txt that contains two plaintext messages. The first is It does not do to dwell on dreams and forget to live. and the second is Albus Dumbledore. (From J. K. Rowling's Harry Potter series, of course.) There are also seven text files named secret?.txt (secret1.txt, secret2.txt, etc.) containing ciphertext that you can decrypt. Some of them contain multiple messages, one message per line. They were all encrypted using the deck in deck1.txt. The encrypted version of message_file1.txt is in secret1.txt.
File secret7.txt contains the ciphertext from the handout. When you successfully decrypt the message in secret7.txt using the deck in file deck1.txt, you should celebrate: do a barrel roll!
A type-checking file: typechecker.py
To help you test, we provide a type checker. Here is a revised version of the type checker (as per the Oct 15 announcement.) This program calls all of the functions that appear in the table below (except for function main and makes sure that your functions accept the proper number of parameters and return the proper type. Passing the type-checker says nothing of the correctness of your functions. It only says that your functions have the correct number of parameters and that the functions return the right type of value. In order for the type checking of the file reading functions to work correctly, you must have the files deck1.txt and secret1.txt in the same file as the typechecker.
For example, if your clean_message function has one parameter and always returns the empty string, it will pass the type checker. Passing the type checker means that our own tests will be able to call your functions properly when we go mark your assignment. That's all it means. Test carefully on your own as explained more toward the end of this handout!
We have performed a top-down design on the handout and have come up with quite a few functions. Write all but the last one in file cipher_functions.py. (The last one, main, will go in cipher_program.py.)
We present them in what we think is a sensible order, but you're welcome to jump around as you work. Notice that the functions near the top of the table don't require you to use lists or to read from a file. In fact, you already know all the Python instructions you need to complete these functions.
Please follow the function design recipe we have been using in class. We will mark your docstrings in addition to your code, so we expect the docstrings to contain a description, examples, and type contract. Wrath of the markers and all that.
We will be testing and marking each of these functions individually. So even if you can't complete the entire program and correctly encrypt and decrypt messages, you can earn marks for correctly implementing many of the functions.
It is strongly recommended that you test each function as you write it. (You might alternatively decide to just plow ahead and write all of the functions and hope everything works, but then it will be really difficult to determine whether your program is encrypting/decrypting correctly. If you get garbage when decrypting our ciphertext, then what? The bug(s)could be anywhere.)
Here is what we recommend for testing your A2 code:
When you're ready to test the algorithm overall (cipher_program.py), start with a one- or two-letter message and make sure it matches what you get by hand. Then work up to longer messages from there. Remember that when you encrypt a message file and then want to decrypt it, you need to start the decryption with the same deck that started the encryption.