Cryptanalysis is the study of cryptosystems with the objective of attacking them and decrypting codes and ciphers. The field includes rigorous mathematical investigation of encryption and decryption algorithms as well as side-channel attacks whereby flaws in implementation are exploited rather than a mathematical flaw in the algorithm itself.
Cryptanalysis generally falls into one of several categories which can be broadly considered to be ciphertext only (where only the encrypted output is known), known plaintext (where the plaintext corresponding to some given ciphertext is known), chosen plaintext (where the cryptanalyst may choose plaintext and receive the related ciphertext) and chosen ciphertext attacks (where the cryptanalyst may choose some ciphertext and receive the corresponding plaintext).
In other tutorials we have seen simple cryptanalysis such as letter frequency analysis in the field of affine ciphers. This considers the letter frequencies in natural language and in some given ciphertext and uses this information to reverse engineer the encryption key mapping letters to one another based on the frequency with which they occur.
In this tutorial we will consider linear and differential cryptanalysis. These are both instances of known plaintext attacks where to be effective a certain amount of plaintext and its corresponding ciphertext must be known. The approaches were initially designed to aid in breaking the Data Encryption Standard (DES). In this case the fact that the algorithm was known (although the key in each case was not) enabled plaintext to be encrypted by the cryptanalyst to see the related ciphertext.
Linear cryptanalysis is an approach where we aim to find affine approximations to the action of a cipher. Letter frequency analysis is one of the simplest forms of linear cryptanalysis. Differential cryptanalysis is an approach to cryptanalysis whereby differences in inputs are mapped to differences in outputs and patterns in the mappings of plaintext edits to ciphertext variation are used to reverse engineer a key.
Linear and differential cryptanalysis are most often applied to block ciphers (encryption functions operating on messages that are split into blocks). They are symmetric key algorithms.
The paradigm of linear cryptanalysis was originally designed in 1993 as a theoretical attack on DES. It is now used widely on block ciphers across the field of cryptanalysis and is an effective starting point for developing more complex attacks.
Linear cryptanalysis posits a linear relationship between the elements (characters or individual bits) of plaintext, the ciphertext, and the key. It therefore tries to find a linear approximation to the action of a cipher, i.e. if "ciphertext = f(plaintext, key)", then we are trying to find a linear approximation of f.
Any linear relation between the plaintext bits and ciphertext bits can be written as a chain of exclusive-or operations of the following form:
where ⊕ denotes the binary operation XOR (exclusive-OR), Xi denotes the ith bit of the input X = [X1, X2, …], Yj denotes the jth bit of the output Y = [Y1, Y2, …] and Kk denotes the kth bit of the key K = [K1, K2, …]. The sum therefore denotes the XOR ‘sum’ of u input and v output bits vs w private key bits.
Howard Heys (2002) summarizes the process of linear cryptanalysis well in his introductory paper “A Tutorial on Linear and Differential Cryptanalysis”. He says,
The approach in linear cryptanalysis is to determine expressions of the form above which have a high or low probability of occurrence. (No obvious linearity such as above should hold for all input and output values or the cipher would be trivially weak.) If a cipher displays a tendency for [the] equation [above] to hold with high probability or not hold with high probability, this is evidence of the cipher’s poor randomization abilities. Consider that if we randomly selected values for [...] bits and placed them into the equation above, the probability that the expression would hold would be exactly ½. It is the deviation or bias from the probability of ½ for an expression to hold that is exploited in linear cryptanalysis: the further away that a linear expression is from holding with a probability of ½, the better the cryptanalyst is able to apply linear cryptanalysis.
This quote tells us the fundamental paradigm of linear (and indeed differential) cryptanalysis. The cryptanalyst aims to exploit the fact that encryption is non-random, attaining information through the measurement of deviations from random behavior.
In the most common use case, we assume that everything about the encryption algorithm is known apart from the private key. Performing linear cryptanalysis on a block cipher usually consists of three steps.
- Find linear approximations of the non-linear parts of the encryption algorithm (usually only the substitution boxes, known as S-boxes).
- Combine linear approximations of S-boxes with the rest of the (linear) operations done in the encryption algorithm, to obtain a linear approximation of the entire encryption algorithm. This linear approximation is a function which relates the plaintext bits, the ciphertext bits, and the bits of the private key.
- Use the linear approximation as a guide for which keys to try first. This leads to substantial computational savings over trying all possible values of the key. Multiple linear approximations may be used to further cut down the number of keys that need to be tried.
We will discuss each of these steps in further detail in the following sections.
The non-linearity in block ciphers is often introduced through S-Boxes (operations such as exclusive-or, bit-shifts are linear in nature). An S-Box is used to map incoming binary sequences to some output. It provides the non-linearity that builds strength and renders the affine approximation gained through linear cryptanalysis only an approximation and unable to be a true representation of the encryption.
See the example of the S-Box described below. It uses the 1st and 4th bits to determine the column and the middle two bits (bits 3 and 4) to determine the row (so the input 1110 yields 0101).
| 11 | 10 | 01 | 00----+------+------+------+------00 | 0000 | 0001 | 0010 | 001101 | 1000 | 1001 | 1111 | 101110 | 1100 | 1101 | 1110 | 101011 | 0100 | 0101 | 0010 | 0111
All the inputs and the outputs of the S-box are listed below in tabular form.
Input (I) | Output (O)-----------+------------0000 | 00110001 | 00100010 | 10110011 | 11110100 | 10100101 | 11100110 | 01110111 | 00101000 | 00011001 | 00001010 | 10011011 | 10001100 | 11011101 | 11001110 | 01011111 | 0100
Suppose that we would like to linearly approximate the above S-box. Given this information we may consider the accuracy of various linear approximations. There are 256 linear approximations of the form
where Ii and Oi denote the ith bit of the input I and the output O respectively and ai and bi are all either 0 or 1. For example, the linear approximation I2 = O1 ⊕ O4 is given by a = 01002 and b = 10012.
We then tabulate the number of times each is correct in a 16 x 16 table (with entries indexed by the values of a and b, so each row corresponds to the 4-bit value of a, and column corresponds to 4-bit value of b). The informativeness of each linear approximation is given by the number of times it is true minus 8 (to work on a scale from -8 to +8). Then we can attain the probability biases simply by dividing through by 8. For example, for I2 = O1 ⊕ O4, this is correct 10 times leading to a bias of +0.250. Doing this for all possibilities of our proposed linear form can enable those with the largest biases to be chosen so that a full mapping from I to O can be approximated.
For situations different from S-boxes where listing all possible values of I and O is not feasible, we may calculate the bias by taking a large number of values of pairs (I, O), say 1 million.
This then ultimately yields the most accurate linear approximation of the S-box.
Once we have linear approximations for each part of the system, a linear approximation of the entire system can be optioned by combining all the linear approximations into one. To combine, we simply 'sum' (exclusive-or) all the equations in various combinations in-order to obtain a single equation, while eliminating intermediate variables. This is discussed in further detail below.
Piling Up for 2 variables:
Let us consider two random binary variables X1 and X2 and the linear relationship X1 ⊕ X2 = 0 (equivalent to X1 = X2). Let the probability that X1 = 0 be denoted by p and the probability that X2 = 0 be denoted by q. Then if we assume that the two random variables are independent:
Furthermore it can be shown that:
Since the aim is to gain information upon the encryption through our analysis it is helpful to instead state the probabilities p and q in terms relative to the case of zero information gain where p = q = ½ in every case. Let us instead state p = ½ + ε1 and q = ½ + ε2. Then we may restate our results in terms of ε1 and ε2 (known as biases).
We may therefore see that the bias for X1 ⊕ X2 = 0 is given by ε1ε2.
Piling Up for n variables:
This notion can be expanded to more complex linear functions and any arbitrary number of random binary values X1 to Xn with probabilities of being 0 of pi = ½ + εi for all i from 1 to n. Under the assumption that each binary variable is independent we may apply the piling up lemma as it is described below.
Statement of the Piling Up Lemma: For n independent binary variables X1, X2, …, Xn,
where ε1,2,…,n denotes the bias of X1 ⊕ X2 ⊕ … ⊕ Xn = 0.
In developing the linear approximation of a cipher, the Xi values will actually represent linear approximations of the S-boxes.
Taken from (Heys, 2002)
Consider four random binary variables X1, X2, X3, X4. Let Pr(X1 ⊕ X2) = ½ + ε1,2 and Pr(X2 ⊕ X3) = ½ + ε2,3 and consider the sum X1 ⊕ X3 to be derived by adding X1 ⊕ X2 and X2 ⊕ X3 together. Hence:
In this case linear expressions are combined to yield a new linear expression. This shows the power of linear cryptanalysis in that it can be applied not only to a single S-Box but also to a series of many applied in sequence which is more often seen in cryptography.
Since we may consider the random variables X1 ⊕ X2 and X2 ⊕ X3 to be independent we may invoke the Piling-Up lemma to attain:
The expressions X1 ⊕ X2 = 0 and X2 ⊕ X3 = 0 are analogous to linear approximations of S-boxes and X1 ⊕ X3 = 0 is analogous to a cipher approximation where the intermediate bit X2 is eliminated.
Hence, we first find the linear approximation for each S-Box, and then a full solution can be built by concatenating the results for each S-Box (and other operations in the encryption algorithm, such as exclusive-or operations).
Let ε represent the bias from ½ of the probability that the linear expression for the complete cipher holds. Matsui (1994) shows that the number of known plaintexts required in the attack (NL) is reasonably approximated by NL = ε-2.
Suppose that our encryption algorithm consists of two keys (K1 and K2) and a single S-Box. The algorithm encrypts 4-bit values as follows:
- Take the input plaintext value (P) and calculate I = P ⊕ K1.
- Pass the result I through the S-Box, to produce output O.
- Take the output from the S-Box (O) and calculate ciphertext C = O ⊕ K2.
We will take the keys K1 and K2 to be 0010 and 1010 respectively.
Step 1 and step 3 are already linear. We perform linear approximation of the S-box. We might keep multiple linear approximations which have high biases. We combine the linear approximations in various ways to get relationships between bits of P, C, K1 and K2.
Given that we know values of some plaintexts and the corresponding ciphertexts (for the same key), we can plug in those values into the above linear approximations. This gives us various constraints that bits of the private key (K1, K2) are highly likely to satisfy. Once the number of unknown bits is small enough, we can perform a brute-force search.
Differential cryptanalysis preceded linear cryptanalysis having initially been designed in 1990 as an attack on DES. Differential cryptanalysis is similar to linear cryptanalysis; differential cryptanalysis aims to map bitwise differences in inputs to differences in the output in order to reverse engineer the action of the encryption algorithm. It is again aiming to approximate the encryption algorithm looking to find a maximum likelihood estimator of the true encryption action by altering plaintexts or (looking at different plaintexts) and analysing the impact of changes to the plaintext to the resulting ciphertext. Differential cryptanalysis is therefore a chosen plaintext attack.
The description of differential cryptanalysis is analogous to that of linear cryptanalysis and is essentially the same as would be the case of applying linear cryptanalysis to input differences rather than to input and output bits directly.
Drawing again from the work of Howard Heys we may formulate the initial set up.
Differential cryptanalysis exploits the high probability of certain occurrences of plaintext differences and differences into the last round of the cipher. For example, consider a system with input X = [X1 X2 ... Xn] and output Y = [Y1 Y2 ... Yn]. Let two inputs to the system be X’ and X’’ with the corresponding outputs Y’ and Y’’, respectively. The input difference is given by ΔX = X’ ⊕ X’’ where ⊕ represents a bit-wise exclusive-OR of the n-bit vectors and, hence, ΔX = [ΔX1 ΔX2 ... ΔXn] where ΔXi = Xi’ ⊕ Xi’’ with Xi’ and Xi’’ denoting the ith bit of X’ and X’’ respectively. Similarly ΔY = Y’ ⊕ Y’’ is the output difference and , ΔY = [ΔY1 ΔY2 ... ΔYn] where ΔYi = Yi’ ⊕ Yi’’.
Again analogous to the linear cryptanalysis case an ideally randomising cipher would yield the output difference ΔY in response to the input difference ΔX with probability ½n where n is the number of bits of X. This therefore provides a benchmark from which bias can be denoted and hence the information gain of a given approximation can be gleaned.
Referring back to Heys’ work:
Differential cryptanalysis seeks to exploit a scenario where a particular ΔY occurs given a particular input difference ΔX with a very high probability pD (i.e., much greater than ½n). The pair (ΔX, ΔY) is referred to as a differential.
The input and output differences of the S-boxes are considered in order to determine a high probability difference pair. Combining S-box difference pairs from round to round so that the nonzero output difference bits from one round correspond to the non-zero input difference bits of the next round, enables us to find a high probability differential consisting of the plaintext difference and the difference of the input to the last round.
The subkey bits of the cipher end up disappearing from the difference expression because they are involved in both data points being differenced.
Given the differences highlighted here the analysis is run in a similar way to linear cryptanalysis with the impact of chained S-Boxes being concatenated. The piling-up lemma can also be applied in an analogous way.
For a more rigorous consideration of linear and differential cryptanalysis see the paper (H. Heys, 2002) which contains several detailed numerical examples.
H. Heys (2002) , A Tutorial on Linear and Differential Cryptanalysis, Cryptologia, vol. XXVI, no. 3, pp. 189-221, 2002.
M. Matsui (1994), Linear Cryptanalysis Method for DES Cipher, Advances in Cryptology - EUROCRYPT ’93, Springer-Verlag, pp. 386-397, 1994.