From Section 126.96.36.199:
For example, we may wish to learn the variance of a conditional Gaussian for y, given x. In the simple case, where the variance σ^2 is a constant, there is a closed form expression because the maximum likelihood estimator of variance is simply the empirical mean of the squared difference between observations y and their expected value. A computationally more expensive approach that does not require writing special-case code is to simply include the variance as one of the properties of the distribution p(y | x) that is controlled by ω = f(x;θ).
This seems to be one of the rare moments in the first half of the book where the prose lacks clarity for me. What does the mechanism that "includes the variance as one of the properties of ... p(y | x)" look like in practice?
To my mind, the learned variance would be controlled...