# The One-Time Pad

January 17, 2018

The one-time pad is famous as being the only completely unbreakable cipher. Assuming that the secret pad is randomly generated, not-reused (hence “one-time pad”), and not leaked, it is impossible to learn a single bit of the plaintext of a message from a ciphertext.

The one-time-pad is one of the best cryptography protocols when the work must be done by hand, without the aid of a computer. This made it important in the pre-computer era, and is still useful in situations where possession of a computer is illegal or incriminating or where trustworthy computers are not available.

# Encryption and decryption with the One-Time Pad

Encryption with the one-time pad is based on the exclusive-or (XOR) operation. The exclusive-or of two identical bits (two zeros or two ones) produces a zero and the exclusive-or of two different bits (a zero and a one) produces a one.

The secret in one-time pad encryption is a secret of the same length as the message to be encrypted. So, if you want to encrypt a one megabyte message and send it to someone, you need to securely share a one megabyte message with the intended recipient.

As an example, let’s encrypt the message ONETIME using a one-time pad. A common method of encoding letters in binary is the America Standard Code for Information Interchange (ASCII) protocol. The table below shows the binary encodings of each capital and lowercase letter and space.

ASCII binary encodings of each capital and lowercase letter and space.

In ASCII, ONETIME is encoded as

Message: ONETIME
O: 01001111
N: 01001110
E: 01001010
T: 01010100
I: 01001001
M: 01001101
E: 01001010
Encoded: 01001111 01001110 01001010 01010100 01001001 01001101 01001010

For the example, we’ll use the following as the one-time pad:

11010111 11100101 10001111 00110000 10100010 00001010 01000000

To encrypt the message using this pad, we exclusive-or each bit of the message with each bit of the one-time pad.

Plaintext:    01001111 01001110 01000101 01010100 01001001 01001101 01000101
OneTimePad: ^ 11010111 11100101 10001111 00110000 10100010 00001010 01000000
Ciphertext: = 10011000 10101011 11001010 01100100 11101011 01000111 00000101

Decryption process is same as encryption, i.e. we perform the exclusive-or of the ciphertext and the one-time pad, which gives us the original message. This works because X^X = 0 (i.e. the exclusive-or of something with itself is 0).

Ciphertext:   10011000 10101011 11001010 01100100 11101011 01000111 00000101
OneTimePad: ^ 11010111 11100101 10001111 00110000 10100010 00001010 01000000
Plaintext:  = 01001111 01001110 01000101 01010100 01001001 01001101 01000101

# Why a “One-Time” Pad? Issues with re-use

Since the one-time pad requires a secret key the same length as the message to be encrypted, it seems like it would be more logical to generate a large, random key and use it for multiple encryptions. However, it’s called a “one-time” pad for a reason. Use of the same pad for multiple encryptions breaks the security of the one-time pad protocol and can reveal the secret key and plaintexts.

Obviously, knowledge of a plaintext and its ciphertext reveals the secret key in a one-time pad. If an attacker can get someone to encrypt a message using a pad that is reused, the attacker can trivially learn the key and all other messages. For the remainder of this section, we will assume that the attacker does not know the messages sent but has learned that they are ASCII-encoded letters and the space character.

As an example, let’s try encrypting multiple messages with the same one-time pad. The messages that we’ll use are “The first message”, encoded in ASCII as

01010100 01101000 01100101 00100000 01100110 01101001
01110010 01110011 01110100 00100000 01101101 01100101
01110011 01110011 01100001 01100111 01100101

and “The second message”, encoded in ASCII as

01010100 01101000 01100101 00100000 01110011 01100101
01100011 01101111 01101110 01100100 00100000 01101101
01100101 01110011 01110011 01100001 01100111 01100101

For the example, we’ll use the one-time pad

11011001 11111010 00110111 10111101 01000101 10010000
10011011 10000100 11001101 00111001 01000111 00111111
01110010 01011001 11010110 10000000 00010010 10111001 

In this example, the encryption of the first message becomes

10001101 10010010 01010010 10011101 00100011 11111001
11101001 11110111 10111001 00011001 00101010 01011010
00000001 00101010 10110111 11100111 01110111

The encryption of the second message becomes

10001101 10010010 01010010 10011101 00110110 11110101
11111000 11101011 10100011 01011101 01100111 01010010
00010111 00101010 10100101 11100001 01110101 11011100

Now, let’s consider what happens to a one-time pad when two ciphertexts are exclusive-ored together. Here, the messages will be m1 and m2, the ciphertexts c1 and c2 and the one-time pad will be p. Encryption of the first message is c1=m1^p, where ^ represents the XOR operation. Encryption of the second message is c2=m2^p. Exclusive-oring the two ciphertexts is c1^c2=m1^p^m2^p. Since exclusive-or is commutative and exclusive-oring something with itself results in a zero, this simplifies to c1^c2=m1^m2. The relationship between the two ciphertexts is the same as the relationship between the two plaintexts, which can be useful in attacking messages with a known format.

## Flaw 1: Zeroes in the combined ciphertext

The exclusive-or of the two example ciphertexts is

00000000 00000000 00000000 00000000 00010101 00001100
00010001 00011100 00011010 01000100 01001101 00001000
00010110 00000000 00010010 00000110 00000010

The first four characters of the messages, “The ”, are the same, causing the first four bytes of the combined ciphertext to be zero. An all-zero byte of combined ciphertext indicates a shared letter between the plaintext and ciphertext, which leaks information about the messages.

## Flaw 2: Narrowing down options using ciphertext

Looking at the next byte of the combined ciphertext, ’00010101’, we can learn more about the possible values of the messages. From this, we learn that the exclusive-or of the fifth bytes of the messages is 21. Since we know that the message is all ASCII-encoded letters or spaces, we can determine that the letter cannot be H, I, J, K, N, U, or Z. The reason for this is that the exclusive-or of these values is outside the range of possible values in our alphabet (letters and spaces). For example, H in ASCII is 01001000 or 72. The exclusive-or of 72 and 21 is 93, which is ] in ASCII. As this is not in our current alphabet, neither message can have an H as its fifth letter.

Being able to determine this solely from ciphertext is less than ideal since we’ve narrowed the options for the letter from 26 to 19. By doing the same analysis on each letter and testing combinations, it may be possible to extract words of the message and determine pieces of the secret key.

## Flaw 3: Space reveals a lot

Finally, let’s look at what happens when a message is encoded in ASCII and contains a space. For example, let’s look at the tenth byte of the combined ciphertext, ’01000100’, which is the ASCII encoding of ‘D’. The tenth letter of the second message (the one without a space at this location) is ‘d’. This is not a coincidence. The ASCII value of the space character is 32, which is the XOR of a lowercase letter and its capital counterpart. Reusing a one-time pad when encrypting an ASCII-encoded message with spaces reveals a letter of the other message and a byte of the pad (since the pad byte is the XOR of a ciphertext byte and the corresponding byte of plaintext).

The main disadvantage of encryption with the one-time pad is that it requires a pad of the same length as the message to be encrypted. Since each pad can only be used once, this means that it is necessary to share a pad of the same length as the message to be shared. This pad must be shared in a completely secure method in order to protect the secrecy of the message. Doing so in real-time is illogical since the existence of a secure method to share the pad means that the message could just be sent using this method.

Therefore, the only logical use for a one-time pad is when a large quantity of key material has been shared in advance and then is used up in chunks as messages are sent.