I have several encrypted files. You are asked to find the original text files by trying to find the KEYs that were used to encrypt them. Try to be a hacker to finish the project.
You can find the encrypted files: t0 e, t1 e at that directory. These files were encrypted by different keys. But all keys have the same length: 4 ASCII characters. From some inside information, you already know:
These files (and also longtest e ) were encrypted using a simple but specially designed en- cryption program by us. The source code of decryption program can be obtained from that directory decrypt.c. You may revise it and include it as a function in your program for you to decrypt with a guessed key. The source code and executable code for encryption, called encrypt.c and encrypt, can be found at that directory. Please take a look at ”example” file in the directory and see how you can encrypt and decrypt it.
The length of each key used for encryption of these three files is exactly 4 ASCII characters. These 4 characters are in the set of {a,...,z, A, ..., Z} (English characters) and {0,...,9}. Because key length is short, you can try all the possible combinations in your program. The number of possible combinations is 62^4.
The original plain-text is written in English. It may have numbers, mathematical notations, small or capital letters or some characters such as SPACE, LF, ”,’, -, :, etc. as an ordinary English articles or report may have. It is hard for anyone to know which characters are appeared in the texts and which are not so you probably have to come out your own analysis scheme in your program to guess. You can find the table of ASCII code in the course web page for your reference. It is pretty much English, so it satisfies most of English property. Here I am giving you an example that the original plaintext may look like: ”example” in the directory. Please take a look and try to decrypt example e.
The original text and the encrypted one have the same length.
Now you may try all the different combinations of keys to find out the original files t0 and t1. It is very important that you should write the fastest programs. Try to minimize the number of disk I/O. Your programs should be runnable in the school UNIX environment.
You can include decrypt.c into your program as a function. But the original decrypt.c is not efficient for your program because it reads from hard-disk character by character while decrypting it. You can read the whole file into a buffer before calling decrypt function. Change decrypt.c so your program can run faster with the minimum number of disk I/O.
You may try the encrypt program (encrypt) yourself that can be found in the same directory as decrypt.c. See Readme in that directory for how to use it.
You guess a key, and use it to decrypt the message, and then analyze the decrypted message if the decrypted message is a possible English text. One suggestion (may not be the best) for your analysis part is to decide whether the decrypted characters follow the distribution of English characters occurring in an English article. Think by yourself if you can find some efficient way to make a determination. For example, the frequency of “spaces” can be an unseful information. Or even the dictionary file /usr/dict/words might be useful as well. Write down your technique in the report. The link of ASCII table and the English letter distribution can be found in the course web page.
A longer file: longtest e is also in the directory. It is encrypted using the same encrypt program, encrypt, with a key of 10 characters.
Hint: you need to look into the decrypt source program and see how you can avoid trying all combinations of keys. Can you make 6210 complexity become something like 10 × 62?