Because the decoding function is not necessarily linear. Check kernel PCA.

Here's a bit more detailed proof (the moderator can check for any errors in the proof):

consider this:

x = d_1d_1^Tx + d_2d_2^Tx + ... + d_nd_n^Tx\\
\text{where } d_i \text{ is the }i^{th}\text{ principal compopnent ordered based on:}\\
^{max}_{d_i}\ d_i^TX^TXd_i^T

if we want to use the first l principal components represented by Dl, we have the reconstruction error:

error = x - D_lD_l^Tx = d_{l+1}d_{l+1}^Tx + .. + d_nd_n^Tx

Now if want to use an additional principal component, one good idea to do it is to extract the first PC of the above error, which will follow the same way when extracting the first PC of X.

let:

