Understanding how Long Short-Term Memory (LSTMs) work
Let’s have a walk through the actual mechanism of LSTMs. We will first briefly discuss the overall view of an LSTM cell and then start discussing each of the operations taking place within an LSTM cell along with an example of text generation.
This article is an excerpt from the book Natural Language Processing with TensorFlow written by Thushan Ganegedara. The book provides an emphasis on both the theory and practice of natural language processing. It introduces the reader to existing TensorFlow functions and explains how to apply them while writing NLP algorithms. Specific examples are used to make the concepts and techniques concrete.
LSTMs are mainly composed of the following three gates:
Input gate: A gate which outputs values between 0 (the current input is not written to the cell state), and 1 (the current input is fully written to the cell state). Sigmoid activation is used to squash the output to between 0 and 1.