In page 126, "it is common to say that algorithm A is better than algorithm B if the upper bound of the 95 percent confidence interval for the error of algorithm A is less than the lower bound of the 95 percent confidence interval for the error of algorithm B .
In the front matter, the boldface 1 is defined as the indicator function, which takes values of one or zero depending on whether the condition in the subscript is satisfied. However, in equation 10.18,
the boldface 1 does not have a condition in its subscript (i, y(t) is not a condition). This is a typo. The subscript of t...
More precisely, they showed that piecewise linear networks (which can be obtained from rectifier nonlinearities or maxout units) can represent functions with a number of regions that is exponential in the depth of the network.
What does regions here referring to. How does this line helps us to conclude that deep networks require exponential units if we want to represent them using single hidden layer?