After Equation 2.67, the new paragraph says that since we're using the same matrix D to decode all the points, we can no longer consider the points in isolation. Then the author uses the Frobenius norm of the matrix instead of the L2 norm used before. So, why is that?
Also, what are the limits of summation in the norm and how are we defining the transformation on a scalar, ie r(x(i))j
In page 126, "it is common to say that algorithm A is better than algorithm B if the upper bound of the 95 percent confidence interval for the error of algorithm A is less than the lower bound of the 95 percent confidence interval for the error of algorithm B .
In the front matter, the boldface 1 is defined as the indicator function, which takes values of one or zero depending on whether the condition in the subscript is satisfied. However, in equation 10.18,
the boldface 1 does not have a condition in its subscript (i, y(t) is not a condition). This is a typo. The subscript of t...