Because the decoding function is not necessarily linear. Check kernel PCA.

Active In

Deep Learning Book

Machine Learning

Artificial Intelligence

Competitive Programming

Featured Contributions

reply in this discussion

Because the decoding function is not necessarily linear. Check kernel PCA.

Read more… (11 words)

reply in this discussion

Here's a bit more detailed proof (the moderator can check for any errors in the proof):

consider this:

\begin{aligned}
x = d_1d_1^Tx + d_2d_2^Tx + ... + d_nd_n^Tx\\
\text{where } d_i \text{ is the }i^{th}\text{ principal compopnent ordered based on:}\\
^{max}_{d_i}\ d_i^TX^TXd_i^T
\end{aligned}

if we want to use the first l principal components represented by Dl, we have the reconstruction error:

error = x - D_lD_l^Tx = d_{l+1}d_{l+1}^Tx + .. + d_nd_n^Tx

Now if want to use an additional principal component, one good idea to do it is to extract the first PC of the above error, which will follow the same way when extracting the first PC of X.

let:

Read more… (172 words)