How can we go from the 6.22 equation to the 6.23 equation for a general case not the y=1/2 and z=0 trivial one?
Greetings John, I'm sorry for my late answer. Here is, how I interpret the text that you've rightly found not obvious.
In fact, in a regular Gaussian the variance 𝛔^2 is not a variable but an unknown constant. It's easy to compute that variance using parametric formulas. In a general case, and for more complicated model, the variance could vary. Statistician called it Heteroscedasticity. For example, the variance in the length of a cat's tail probably changes with age (from the Data Skeptic podcast: Heteroskedasticity). In that situation, a general estimation approach like the one used by machine learning / neural network could be used to compute the variance.
As I've said, the main weakness of the GBC book (Goodfellow, Bengio, Courvile) is the lack of practical programming exercises left to a companion web site.
So, I dream of he same content in the form of a series of iPython Notebooks with all exercices and code samples using Keras, TensorFlow and Theano.
Recently, I commit a modest contribution to my dream by coding the 6.1 Example: Learning XOR pp. 166 to 171, using TensorFlow (The revenge of Perceptron! — Learning XOR with TensorFlow.) . Ok, I know I should have used Theano which is mainly developed and maintained by the MILA lab at UdeM… Next time!
Among arguments to support my observation on the fact that the GBC book missed to deal appropriately with « reinforcement learning », just look at the list of most common machine learning tasks exposed from page 97 to 100.
I know that's impossible to set a complete and satisfactory list of tasks but there is not even a little mention of any task related to reinforcement learning. We should add something like: « Agent interaction / game interaction » and «Learning sequences of actions» and maybe « Robot navigation / interaction ». A software agent takes actions / makes decisions in an environment in order to maximize a cumulative reward.
[Note] To be completely honest the authors wrote a short disclaimer in the «Machine Learning Basics» chapter 5, page 103 about reinforcement learning. « Such algorithms are beyond the scope of this book ».
My best introductory level tutorials on Reinforcement Learning is a series of 5 blog posts by Massimiliano Patacchiola with source code in Python on Github. mpatacchiola's blog
The «Introduction to Reinforcement Learning» by Joelle Pineau (OK, I'm from Montréal... but it's very good!) Introduction to Reinforcement Learning
If you aren't allergic to maths! For more advanced content, in continuous or partially observable contexts like robotics , the tutorial on «Policy Optimization», comparing many methods, by Pieter Abbeel and John Shulman presented at NIPS 2016 is for you. https://goo.gl/FMoGzy
Another presentation on reinforcement learning (RL) by John Schulman's from NIPS 2016, on the engineering practical side called «Nuts and bolts of RL», almost no maths... https://goo.gl/RIhywx
Finally Denny Britz code repo on GitHub https://goo.gl/x5j8pB
The 3rd part of the book is on advanced and research subjects. Very well done, except for the reinforcement learning which is superficially exposed and that would require an additional chapter. Maybe even a chapter in the second part of the book.
About the basic applied maths chapters 2 to 4. Those chapters should be considered just an overview or a refresher of the mathematical prerequisites needed to really grasp the domain. By no means those few but pretty well written pages could replace a course or a good textbook on linear algebra, prob & stats or calculus (with exercises). That said, I was surprised to find advanced topics in math like measure theory, generalized Lagrangian or nonlinear optimization just few pages from matrices multiplication. ;)