Ill try explaining,
the possible values for y are 1 & -1.
Now if the prediction P is wrong that means we need to punish the learned algorithm, we move the weight vector in the opposite direction from the input wrongly predicted.
Now, try to look at the impact of this new W, (read the "Why does the algorithm work?" section and come back).
As it explains when now evaluating the f(x) we see that:
dot-product(w + x, x) + b > dot-product(w, x) + b
i.e. f(x)new >f(x)
this implies that now we have a higher chance to get f(x)>0, i.e. y=1 ( which is the correct prediction ).