Nikhil, I had some feedback on your post. I loved everything up to the section Learning Rates and the Delta Rule. Starting in this section, I think the distinction between calculating the gradient and calculating the delta could be made clearer. For example, the delta rule is always the following,
What changes is the derivative of E w.r.t to the weight, based on the architecture of the neural network and the type of neurons in the neural network.
Also I think its important to note that the gradients can be calculated automatically in any standard deep learning library. Its important to understand what is happening in back-propagation, but the focus isn't on the algebra.
Again, all this being said, I still think that overa...