Nikhil, I had some feedback on your post. I loved everything up to the section *Learning Rates and the Delta Rule*. Starting in this section, I think the distinction between calculating the gradient and calculating the delta could be made clearer. For example, the delta rule is always the following,

What changes is the derivative of E w.r.t to the weight, based on the architecture of the neural network and the type of neurons in the neural network.

Also I think its important to note that the gradients can be calculated automatically in any standard deep learning library. Its important to understand what is happening in back-propagation, but the focus isn't on the algebra.

Again, all this being said, I still think that overa...