Definition of the expectation from (3.9) assumes summation across **all** possible values of **x** (or taking the integral for all possible x values in the case of a continuous variable). It's only possible if we know the exact form of p(**x**) (not our case). What we have is a sample of *m* points, and we could only pick an estimate of the expected value by defining an estimator (see section 5.4 of *Deep Learning book* for more details on estimators). Estimator is not guaranteed to provide true value of the expectation (that's why it called *estimator*), but we usually could achieve arbitrary precision by increasing sample size, as stated by the *law of large numbers (Wikipedia *provides pretty good explanation*: *Law of large numbers

Show admin stats