In the front matter, the boldface 1 is defined as the indicator function, which takes values of one or zero depending on whether the condition in the subscript is satisfied. However, in equation 10.18,
the boldface 1 does not have a condition in its subscript (i, y(t) is not a condition). This is a typo. The subscript of the boldface 1 should be i = y(t).
This section is also confusing because the authors state that the loss of the model is the negative log-likelihood of the "true target y(t) given the input so far." The loss is the negative log of the model probability for the true label y(t), not the negative log of the true probability.
Following the above assumptions, one can derive the corrected version of equation 10.18.