Hello Keshav,

Thank you for making it clearer now :)

Read more… (10 words)

Active In

Deep Learning

Algorithms and Data Structures

Artificial Intelligence

Technology

Marc Andreessen's Essays

Startups

Startup School

Featured Contributions

comment in this discussion

comment in this discussion

So this is the summary of what I took away from the article. It would be great if someone could tell me if I went wrong somewhere!

- Gradient Descent w/o momentum is slow to converge, and sometimes doesn't converge at all.
- Adding momentum skips the various unnecessary ravines and troughs we would encounter in our imaginary valley if we were using only gradient descent.
- The added z^(k+1) parameter [where we include momentum beta] basically tells us "How fast do I go?", and the actual weight update tells us "In which direction do I go?". In gradient descent w/o beta, we only had access to the latter question.

Did not understand the quite dense mathematics behind most of the derivations, but I hope I got the intuition right?

Read more… (127 words)