# Hands-on Project: Breaking Ciphers

September 21, 2018

In this project, you will write C++ programs to decipher secret messages! The secret messages were created by encrypting the original message with a cipher. A cipher takes the original message, and a secret key, and produces an encrypted message. Normally, the secret key is required for decryption, i.e. to get the original message back from the encrypted version. Hence the encrypted message is supposed to be safe, in the sense that only people with knowledge of the secret key can reconstruct the original message. However, your task will be to break the cipher. That is, figure out the message, without the secret key!

You will encounter the following three ciphers in this project — Caesar Cipher, Shift Cipher and Vigenere Cipher. Breaking the first two ciphers should be quite easy, but they will make sure you’re understanding the concept of ciphers, encryption and decryption. Breaking the Vigenere Cipher is what most of the project is about.

The project will give you a chance to practice almost all the C++ skills you have acquired over the course, including file input-output, custom sorting, strings and string methods, maps, functions and loops.

Note: You will need to have C++ installed on your computer to complete this project. If you haven’t installed C++ yet, see installation instructions here.

Let’s get started!

# Caesar Cipher

Unlike almost all other ciphers, the Caesar cipher does not have a secret key that’s needed to decode the message. Instead, the “secret” in the Caesar cipher is how the cipher works.

## Encryption

In the Caesar cipher, for encryption, each letter of a message is replaced by the letter three steps to the right of it in the alphabet. For example, A is replaced by D, B by E, etc. The shift “wraps” around the alphabet so that X is replaced by A, Y is replaced by B, and Z is replaced by C.

As an example, here’s the encryption of the message CAESARMESSAGE using the Caesar cipher:

plaintext: CAESARMESSAGE
C → F
A → D
E → H
S → V
A → D
R → U
M → P
E → H
S → V
S → V
A → D
G → J
E → H
ciphertext: FDHVDUPHVVDJH

## Decryption

Now, think about how you are going to decrypt a message given the encrypted message (also known as ciphertext).

If you feel confident, try writing the program for decryption on your own and not looking at the rest of the hints (or use as little as needed). You can test you code by trying to decrypt the above message: FDHVDUPHVVDJH and see if you get back CAESARMESSAGE.

## Hints

Hint 1:

Try to think how will you shift the letters left by 3 steps. Using their ASCII values can really help you.

Hint 2:

Use modulus operator to help you wrap around. For example if you shift A to left by 3 steps, it result should be X.

## Quiz

Once you’re done with coding, take the quiz (at the bottom of the write-up)!

# Shift Cipher

Shift ciphers are a more general form of the Caesar cipher. The Caesar cipher is traditionally meant to be a right shift by three for encoding and a left shift by three for decoding.

## Encryption

In a Shift cipher, any shift amount can be used. The shift is the secret key.

As an example, let’s encrypt the message SHIFTMESSAGE using a shift of 13.

plaintext: SHIFTMESSAGE
S → F
H → U
I → V
F → S
T → G
M → Z
E → R
S → F
S → F
A → N
G → T
E → R
ciphertext: FUVSGZRFFNTR

## Breaking the Cipher!

In the Shift cipher, any shift amount can be used. This means that given a ciphertext, there are 26 possible plaintexts if the secret key (shift) is not known!

One easy way to breaking the Shift cipher is, try every possible shift. Inspect all the 26 possible plaintexts, and see which one you can spot English words in!

Write the program for decryption. You can test you code by trying to decrypt the above message: FUVSGZRFFNTR and see if you are easily able to spot SHIFTMESSAGE as one of the 26 plaintexts.

## Quiz

Once you’re done with coding, take the quiz (at the bottom of the write-up)!

# Vigenère Cipher

Vigenère’s cipher was invented in the 16th century and was considered secure until well into the twentieth century despite attacks being developed in the 19th century by the British mathematician Charles Babbage and the German cryptographer Friedrich Kasiski.

## Encryption

The steps of Vigenère’s cipher are as follows:

1. Choose a keyword (the keyword is the secret key).
2. Convert both the plaintext and key to integers based upon the alphabetic position of the letters (e.g. A = 0, B = 1, …, Z = 25).
3. Repeat the key if necessary so that it is at least as long as the plaintext to be sent.
4. Add the values of the (repeated) key to the values of the plain text and take the modulus of them mod 26.*
5. Convert the numbers back to letters.

## Encryption Example

Let’s try an example using VECTOR as our key:

Plaintext:  THISISATEST → 19 07 08 18 08 18 00 19 04 18 19
Key:        VECTORVECTO → 21 04 02 19 14 17 21 04 02 19 14
————————————————————————————————
+             40 11 10 37 22 35 21 23 06 37 33
(mod 26)      14 11 10 11 22 09 21 23 06 11 07
Ciphertext:               O  L  K  L  W  J  V  X  G  L  H 

This encryption has rendered the original plaintext THISISATEST to the ciphertext OLKLWJVXGLH simply by shifting each letter by its corresponding letter in the key VECTOR.

If we have spaces or other characters in the text, you can assume that Vigenère’s cipher ignores them. Hence, the encryption of THIS IS A TEST. is done in the following way:

Plaintext:  THIS IS A TEST. → 19 07 08 18 _ 08 18 _ 00 _ 19 04 18 19 .
Key:        VECT OR V ECTO  → 21 04 02 19   14 17   21   04 02 19 14
————————————————————————————————
+                 40 11 10 37   22 35   21   23 06 37 33
(mod 26)          14 11 10 11   22 09   21   23 06 11 07
Ciphertext:                   O  L  K  L  _ W  J  _ V  _ X  G  L  H  .

That is, the ciphertext for THIS IS A TEST. is OLKL WJ V XGLH. Note that alphabets in the keyword are not skipped when a space ( ) or dot (.) is encountered in the text. Hence, letter O from the keyword VECTOR is used (not R) with letter I of the plaintext above.

## Breaking the Cipher

Now, you need to break the Vigenere cipher.

In particular, you can download the encrypted text from this file. Here are the first few lines from the file:

KPG ZTFUNIQ VZ TUZA DVOB ZJVC PALYVKVLD ZQYIPZUTGVVVSY RQI MLSARZRTNF DLUNRE
FQVRE GGHRJ RK ZCEK EAAWAHBCH IVSZYIAX EKAH ILHLYEP FRPVOHN. Z KFZC SAUAU
MCJH JWTVW OK IGJMNM TF EJ EKHEIAX, IPK TYH HSJYQCGZWP AABHS XMTQTUVZ VV BV
DREXVZG:
GYIV VNV SJVQBZ CBLTF OAMH XS KNZY JFVFLRWXQPW PDAMP BJPNXV MENCQN GF PKT IE
RSI JVRE VJ AQTEKLRIQ UMRQ KW DLLZHAI. RUMT BEM RLRJRS GMHXD VEDGUT JR RYAU
UNAFKGUT DLXGFVQF VE WPL LZIJ MQ FGRRCG CU IEVUMPNFIBE! - ZCSPY OJMEUFOA

You also know that the key used to encrypt the text is one out of the ~1000 keys listed in this file.

Given the above information, your task is to figure out the original text and the key used for encryption.

## Strategy for breaking Vigenère Cipher

The strategy for breaking Vigenère Cipher is quite simple. In the Shift cipher case, there were 26 possible keys. So, you could look at them manually and see which one looks like English.

In this case, there are 1000 possible keys. So, instead of manually identifying which one looks like English, we need to write a function which tells us if a given text looks like English.

One strategy for doing this is by comparing the number of times each alphabet appears in the text. If the text is in English, we would expect the letters that appear more commonly in English to appear more commonly in the text.

Here’s the frequency percentage of all the letters in English language:

E  12.02
T  9.10
A  8.12
O  7.68
I  7.31
N  6.95
S  6.28
R  6.02
H  5.92
D  4.32
L  3.98
U  2.88
C  2.71
M  2.61
F  2.30
Y  2.11
W  2.09
G  2.03
P  1.82
B  1.49
V  1.11
K  0.69
X  0.17
Q  0.11
J  0.10
Z  0.07

Based on the above information, we can define mismatch, which tells you how much the text letter frequency differs from English letter frequency.

Here’s one possible way to define mismatch:

mismatch = | frequency of A in text - expected frequency of A |
+ | frequency of B in text - expected frequency of B |
+ ...
+ | frequency of Z in text - expected frequency of Z |

where | $x$ | is the absolute value of $x$.

Although it is not guaranteed that the mismatch will be minimum when the text is decrypted with the correct key, this is indeed the case for the given encrypted text and the list of possible keys. In fact, the mismatch for the correct key was by far the lowest.

In general, when breaking ciphers, it is common to narrow down the possibilities using a technique such as the above, and then still inspect things manually.

Now you have all the information to breaking the Vigenère Cipher. Go ahead, write the code. This one is more involved that the previous two ciphers.

## Hints

Here’s are the helper functions used in the official solution:

string decrypt(string text, string key) — decrypts text as per given keyword

map <char, double> read_frequency_table() — reads letter frequency table from file

map <char, double> calculate_frequency(string text) — calculate letter frequency given text

double mismatch(string text) — use the above two frequency tables to calculate mismatch

string read_file(string filename) — for reading the encrypted text from a file. uses getline() to read each line, and then concatenates all the lines

## Quiz

Once you’re done with coding, take the quiz (at the bottom of the write-up)!

# Congratulations!

Congratulations on completing the final project of the C++ course! Hope you had fun breaking the ciphers, and uncovering the secret messages!

# Official solutions

Official solutions for all three ciphers are available at the end of the quiz.

# Bonus: Challenge Question

You broke the Vigenere Cipher given a list of ~1000 possible keys.

What if you were not given a set of keys? The only information you were given, is that the key is of a reasonable length (say less than 100 characters). Can you write a program that figures out the key using frequency analysis even if that is the only constraint provided to you?

Hint (don’t see this unless absolutely necessary. all the fun is in figuring out the solution):

First figure out the length of the key. Then figure out each alphabet of the key.

If you solve this problem (without hints or Googling), write to me at keshav@commonlounge.com. I would be happy to hear from you!