# Cryptanalysis Background

**Cryptanalysis** is the study of cryptosystems with the objective of attacking them and decrypting codes and ciphers. The field includes rigorous mathematical investigation of encryption and decryption algorithms as well as side-channel attacks whereby flaws in implementation are exploited rather than a mathematical flaw in the algorithm itself.

Cryptanalysis generally falls into one of several categories which can be broadly considered to be **ciphertext only** (where only the encrypted output is known), **known plaintext** (where the plaintext corresponding to some given ciphertext is known), **chosen plaintext** (where the cryptanalyst may choose plaintext and receive the related ciphertext) and **chosen ciphertext attacks** (where the cryptanalyst may choose some ciphertext and receive the corresponding plaintext).

In other tutorials we have seen simple cryptanalysis such as **letter frequency analysis** in the field of affine ciphers. This considers the letter frequencies in natural language and in some given ciphertext and uses this information to reverse engineer the encryption key mapping letters to one another based on the frequency with which they occur.

# Linear & Differential Cryptanalysis

In this tutorial we will consider linear and differential cryptanalysis. These are both instances of *known plaintext attacks* where to be effective a certain amount of plaintext and its corresponding ciphertext must be known. The approaches were initially designed to aid in breaking the Data Encryption Standard (DES). In this case the fact that the algorithm was known (although the key in each case was not) enabled plaintext to be encrypted by the cryptanalyst to see the related ciphertext.

**Linear cryptanalysis** is an approach where we aim to find affine approximations to the action of a cipher. Letter frequency analysis is one of the simplest forms of linear cryptanalysis. **Differential cryptanalysis** is an approach to cryptanalysis whereby differences in inputs are mapped to differences in outputs and patterns in the mappings of plaintext edits to ciphertext variation are used to reverse engineer a key.

Linear and differential cryptanalysis are most often applied to *block ciphers* (encryption functions operating on messages that are split into blocks). They are *symmetric key algorithms*.

# Linear Cryptanalysis

The paradigm of linear cryptanalysis was originally designed in 1993 as a theoretical attack on DES. It is now used widely on block ciphers across the field of cryptanalysis and is an effective starting point for developing more complex attacks.

Linear cryptanalysis posits a linear relationship between the elements (characters or individual bits) of plaintext, the ciphertext, and the key. It therefore tries to find a linear approximation to the action of a cipher, i.e. if "ciphertext = *f*(plaintext, key)", then we are trying to find a linear approximation of *f*.

Any linear relation between the plaintext bits and ciphertext bits can be written as a chain of exclusive-or operations of the following form:

\begin{aligned}
&\ X_{i_1}\oplus X_{i_2}\oplus\cdots\oplus X_{i_u}\oplus Y_{j_1}\oplus Y_{j_2}\oplus\cdots\oplus Y_{j_v} \\
&= K_{k_1}\oplus K_{k_2}\oplus\cdots\oplus K_{k_w}
\end{aligned}

where ⊕ denotes the binary operation XOR (exclusive-OR), *X*_{i} denotes the *i*^{th} bit of the input *X* = [*X*_{1}, *X*_{2}, …], *Y*_{j} denotes the *j*^{th} bit of the output *Y* = [*Y*_{1}, *Y*_{2}, …] and *K*_{k} denotes the *k*^{th} bit of the key *K* = [*K*_{1}, *K*_{2}, …]. The sum therefore denotes the XOR ‘sum’ of *u* input and *v* output bits vs *w *private key bits.

Howard Heys (2002) summarizes the process of linear cryptanalysis well in his introductory paper “*A Tutorial on Linear and Differential Cryptanalysis*”. He says,

The approach in linear cryptanalysis is to determine expressions of the form above which have a high or low probability of occurrence. (No obvious linearity such as above should hold for all input and output values or the cipher would be trivially weak.) If a cipher displays a tendency for [the] equation [above] to hold with high probability or not hold with high probability, this is evidence of the cipher’s poor randomization abilities. Consider that if we randomly selected values for [...] bits and placed them into the equation above, the probability that the expression would hold would be exactly ½. It is the deviation or bias from the probability of ½ for an expression to hold that is exploited in linear cryptanalysis: the further away that a linear expression is from holding with a probability of ½, the better the cryptanalyst is able to apply linear cryptanalysis.

This quote tells us the fundamental paradigm of linear (and indeed differential) cryptanalysis. The cryptanalyst aims to exploit the fact that encryption is non-random, attaining information through the measurement of deviations from random behavior.

# Steps to perform Linear Cryptanalysis

In the most common use case, we assume that everything about the encryption algorithm is known apart from the private key. Performing linear cryptanalysis on a block cipher usually consists of three steps.

- Find linear approximations of the non-linear parts of the encryption algorithm (usually only the substitution boxes, known as S-boxes).
- Combine linear approximations of S-boxes with the rest of the (linear) operations done in the encryption algorithm, to obtain a linear approximation of the entire encryption algorithm. This linear approximation is a function which relates the plaintext bits, the ciphertext bits, and the bits of the private key.
- Use the linear approximation as a guide for which keys to try first. This leads to substantial computational savings over trying all possible values of the key. Multiple linear approximations may be used to further cut down the number of keys that need to be tried.

We will discuss each of these steps in further detail in the following sections.

## S-boxes (or substitution box)

The non-linearity in block ciphers is often introduced through S-Boxes (operations such as exclusive-or, bit-shifts are linear in nature). An S-Box is used to map incoming binary sequences to some output. It provides the non-linearity that builds strength and renders the affine approximation gained through linear cryptanalysis only an approximation and unable to be a true representation of the encryption.

See the example of the S-Box described below. It uses the 1st and 4th bits to determine the column and the middle two bits (bits 3 and 4) to determine the row (so the input 1110 yields 0101).

| 11 | 10 | 01 | 00

----+------+------+------+------

00 | 0000 | 0001 | 0010 | 0011

01 | 1000 | 1001 | 1111 | 1011

10 | 1100 | 1101 | 1110 | 1010

11 | 0100 | 0101 | 0010 | 0111

All the inputs and the outputs of the S-box are listed below in tabular form.

Input (I) | Output (O)

-----------+------------

0000 | 0011

0001 | 0010

0010 | 1011

0011 | 1111

0100 | 1010

0101 | 1110

0110 | 0111

0111 | 0010

1000 | 0001

1001 | 0000

1010 | 1001

1011 | 1000

1100 | 1101

1101 | 1100

1110 | 0101

1111 | 0100

## Linear Approximation of S-box

Suppose that we would like to linearly approximate the above S-box. Given this information we may consider the accuracy of various linear approximations. There are 256 linear approximations of the form

a_1I_1\oplus a_2I_2\oplus a_3I_3\oplus a_4I_4=
b_1O_1\oplus b_2O_2\oplus b_3O_3\oplus b_4O_4

where *I*_{i} and *O*_{i} denote the *i*^{th} bit of the input *I* and the output *O* respectively and *a*_{i} and *b*_{i} are all either 0 or 1. For example, the linear approximation *I*_{2} = *O*_{1} ⊕ *O*_{4} is given by *a* = 0100_{2} and *b* = 1001_{2}.

We then tabulate the number of times each is correct in a 16 x 16 table (with entries indexed by the values of *a* and *b, *so each row corresponds to the 4-bit value of *a*, and column corresponds to 4-bit value of *b*). The *informativeness* of each linear approximation is given by the number of times it is true minus 8 (to work on a scale from -8 to +8). Then we can attain the probability biases simply by dividing through by 8. For example, for *I*_{2} = *O*_{1} ⊕ *O*_{4}, this is correct 10 times leading to a bias of +0.250. Doing this for all possibilities of our proposed linear form can enable those with the largest biases to be chosen so that a full mapping from *I* to *O* can be approximated.

For situations different from S-boxes where listing all possible values of I and O is not feasible, we may calculate the bias by taking a large number of values of pairs (I, O), say 1 million.

This then ultimately yields the most accurate linear approximation of the S-box.

## Piling Up (concatenating linear approximations)

Once we have linear approximations for each part of the system, a linear approximation of the entire system can be optioned by combining all the linear approximations into one. To combine, we simply 'sum' (exclusive-or) all the equations in various combinations in-order to obtain a single equation, while eliminating intermediate variables. This is discussed in further detail below.

**Piling Up for 2 variables:**

Let us consider two random binary variables *X*_{1} and *X*_{2} and the linear relationship *X*_{1} ⊕ *X*_{2} = 0 (equivalent to *X*_{1} = *X*_{2}). Let the probability that *X*_{1} = 0 be denoted by *p* and the probability that *X*_{2} = 0 be denoted by *q*. Then if we assume that the two random variables are independent:

Pr(X_1=i,X_2=j)=\begin{cases}
p\cdot q\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{for}\ i =0, j=0\\
p\cdot(1-q)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{for}\ i =0, j=1\\
(1-p)\cdot q\ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{for}\ i=1, j=0\\
(1-p)\cdot(1-q)\ \ \ \ \text{for}\ i=1, j=1
\end{cases}

Furthermore it can be shown that:

\begin{aligned}
&\ Pr(X_1\oplus X_2 = 0) \\
&= Pr(X_1=X_2) \\
&= Pr(X_1=0,X_2=0) + Pr(X_1=1,X_2=1) \\
&= pq+(1-p)(1-q)
\end{aligned}

Since the aim is to gain information upon the encryption through our analysis it is helpful to instead state the probabilities *p* and *q* in terms relative to the case of zero information gain where *p *= *q* = ½ in every case. Let us instead state *p* = ½ + ε_{1} and *q* = ½ + ε_{2}. Then we may restate our results in terms of ε_{1} and ε_{2} (known as biases).

Pr(X_1\oplus X_2=0)=\frac{1}{2}+2\epsilon_1\epsilon_2

We may therefore see that the bias for *X*_{1} ⊕ *X*_{2 }= 0 is given by ε_{1}ε_{2}.

**Piling Up for ***n* variables:

This notion can be expanded to more complex linear functions and any arbitrary number of random binary values *X*_{1} to *X*_{n} with probabilities of being 0 of *p*_{i }= ½ + ε_{i} for all *i* from 1 to *n*. Under the assumption that each binary variable is independent we may apply the piling up lemma as it is described below.

__Statement of the Piling Up Lemma__**: **For *n* independent binary variables *X*_{1}, *X*_{2}, …, *X*_{n},

Pr(X_1\oplus\cdots\oplus X_n=0)=\frac{1}{2}+2^{n-1}\prod_{i=1}^n\epsilon_i

or equivalently

\epsilon_{1,2,..,n}=2^{n-1}\prod_{i=1}^n\epsilon_i

where ε_{1,2,…,n} denotes the bias of *X*_{1} ⊕ *X*_{2} ⊕ … ⊕ *X*_{n} = 0.

In developing the linear approximation of a cipher, the X_{i} values will actually represent linear approximations of the S-boxes.

**Example:**

*Taken from (Heys, 2002)*

Consider four random binary variables *X*_{1}, *X*_{2}, *X*_{3}, *X*_{4}. Let *Pr*(*X*_{1} ⊕ *X*_{2}) = ½ + ε_{1,2} and *Pr*(*X*_{2} ⊕ *X*_{3}) = ½ + ε_{2,3} and consider the sum *X*_{1} ⊕ *X*_{3} to be derived by adding *X*_{1} ⊕ *X*_{2 }and *X*_{2} ⊕ *X*_{3} together. Hence:

Pr(X_1\oplus X_3=0)=Pr([X_1\oplus X_2]\oplus[X_2\oplus X_3])=0

In this case linear expressions are combined to yield a new linear expression. This shows the power of linear cryptanalysis in that it can be applied not only to a single S-Box but also to a series of many applied in sequence which is more often seen in cryptography.

Since we may consider the random variables *X*_{1} ⊕ *X*_{2 }and *X*_{2} ⊕ *X*_{3 }to be independent we may invoke the Piling-Up lemma to attain:

Pr(X_1\oplus X_3=0)=\frac{1}{2}+2\epsilon_{1,2}\epsilon_{2,3}

and consequently

\epsilon_{1,3}=2\epsilon_{1,2}\epsilon_{2,3}

The expressions *X*_{1} ⊕ *X*_{2} = 0 and *X*_{2} ⊕ *X*_{3} = 0 are analogous to linear approximations of S-boxes and *X*_{1} ⊕ *X*_{3} = 0 is analogous to a cipher approximation where the intermediate bit *X*_{2} is eliminated.

Hence, we first find the linear approximation for each S-Box, and then a full solution can be built by concatenating the results for each S-Box (and other operations in the encryption algorithm, such as exclusive-or operations).

## Using Linear Approximations to find Private Key

Let ε represent the bias from ½ of the probability that the linear expression for the complete cipher holds. Matsui (1994) shows that the number of known plaintexts required in the attack (N_{L}) is reasonably approximated by *N*_{L} = ε^{-2}.

**Example:**

Suppose that our encryption algorithm consists of two keys (K_{1} and K_{2}) and a single S-Box. The algorithm encrypts 4-bit values as follows:

- Take the input plaintext value (
*P*) and calculate *I* = *P* ⊕ *K*_{1}. - Pass the result I through the S-Box, to produce output O.
- Take the output from the S-Box (
*O*) and calculate ciphertext *C* = *O* ⊕ K_{2}.

We will take the keys* K*_{1} and *K*_{2} to be 0010 and 1010 respectively.

Step 1 and step 3 are already linear. We perform linear approximation of the S-box. We might keep multiple linear approximations which have high biases. We combine the linear approximations in various ways to get relationships between bits of P, C, K_{1} and K_{2}.

Given that we know values of some plaintexts and the corresponding ciphertexts (for the same key), we can plug in those values into the above linear approximations. This gives us various constraints that bits of the private key (K_{1}, K_{2}) are highly likely to satisfy. Once the number of unknown bits is small enough, we can perform a brute-force search.

# Differential Cryptanalysis

Differential cryptanalysis preceded linear cryptanalysis having initially been designed in 1990 as an attack on DES. Differential cryptanalysis is similar to linear cryptanalysis; differential cryptanalysis aims to map bitwise differences in inputs to differences in the output in order to reverse engineer the action of the encryption algorithm. It is again aiming to approximate the encryption algorithm looking to find a maximum likelihood estimator of the true encryption action by altering plaintexts or (looking at different plaintexts) and analysing the impact of changes to the plaintext to the resulting ciphertext. Differential cryptanalysis is therefore a chosen plaintext attack.

The description of differential cryptanalysis is analogous to that of linear cryptanalysis and is essentially the same as would be the case of applying linear cryptanalysis to input differences rather than to input and output bits directly.

Drawing again from the work of Howard Heys we may formulate the initial set up.

Differential cryptanalysis exploits the high probability of certain occurrences of plaintext differences and differences into the last round of the cipher. For example, consider a system with input *X* = [*X*_{1} *X*_{2} ... *X*_{n}] and output **Y** = [*Y*_{1} *Y*_{2} ... *Y*_{n}]. Let two inputs to the system be *X*’ and *X*’’ with the corresponding outputs *Y*’ and *Y*’’, respectively. The input difference is given by **Δ***X* = *X*’ ⊕ *X*’’ where ⊕ represents a bit-wise exclusive-OR of the n-bit vectors and, hence, **Δ***X* = [Δ*X*_{1} Δ*X*_{2} ... Δ*X*_{n}] where Δ*X*_{i} = *X*_{i}’ ⊕ *X*_{i}’’ with *X*_{i}’ and *X*_{i}’’ denoting the *i*^{th} bit of *X*’ and *X*’’ respectively. Similarly **Δ***Y* = *Y*’ ⊕ *Y*’’ is the output difference and , **Δ***Y* = [Δ*Y*_{1} Δ*Y*_{2} ... Δ*Y*_{n}] where Δ*Y*_{i} = *Y*_{i}’ ⊕ *Y*_{i}’’.

Again analogous to the linear cryptanalysis case an ideally randomising cipher would yield the output difference **Δ***Y* in response to the input difference **Δ***X* with probability ½^{n} where *n* is the number of bits of *X*. This therefore provides a benchmark from which bias can be denoted and hence the information gain of a given approximation can be gleaned.

Referring back to Heys’ work:

Differential cryptanalysis seeks to exploit a scenario where a particular **Δ***Y* occurs given a particular input difference **Δ***X* with a very high probability p_{D} (i.e., much greater than ½^{n}). The pair (**Δ***X*, **Δ***Y*) is referred to as a differential.

The input and output differences of the S-boxes are considered in order to determine a high probability difference pair. Combining S-box difference pairs from round to round so that the nonzero output difference bits from one round correspond to the non-zero input difference bits of the next round, enables us to find a high probability differential consisting of the plaintext difference and the difference of the input to the last round.

The subkey bits of the cipher end up disappearing from the difference expression because they are involved in both data points being differenced.

Given the differences highlighted here the analysis is run in a similar way to linear cryptanalysis with the impact of chained S-Boxes being concatenated. The piling-up lemma can also be applied in an analogous way.

For a more rigorous consideration of linear and differential cryptanalysis see the paper (H. Heys, 2002) which contains several detailed numerical examples.

# References

H. Heys (2002) , A Tutorial on Linear and Differential Cryptanalysis*,* Cryptologia, vol. XXVI, no. 3, pp. 189-221*, *2002.

M. Matsui (1994), *Linear Cryptanalysis Method for DES Cipher*, Advances in Cryptology - EUROCRYPT ’93, Springer-Verlag, pp. 386-397, 1994.