Definition of the expectation from (3.9) assumes summation across all possible values of x (or taking the integral for all possible x values in the case of a continuous variable). It's only possible if we know the exact form of p(x) (not our case). What we have is a sample of m points, and we could only pick an estimate of the expected value by defining an estimator (see section 5.4 of Deep Learning book for more details on estimators). Estimator is not guaranteed to provide true value of the expectation (that's why it called estimator), but we usually could achieve arbitrary precision by increasing sample size, as stated by the law of large numbers (Wikipedia provides pretty good explanation: Law of large numbers
Show admin stats