This article is a summary of this paper, which proposes a method for applying dropout to LSTMs and how it could reduce overfitting in tasks like language modeling, speech recognition, image caption generation and machine translation.
Dropout is a regularization method that drops out (or temporarily removes) units in a neural network (along with all its incoming and outgoing connections). Conventional dropout does not work well with RNNs as the recurrence amplifies the noise and hurts learning.
- In the modified dropout algorithm, we only apply dropout to the non-recurrent connections (i.e. between LSTM layers, not within the LSTM layer).
- The dropout operator corrupts information carried by some units (and not all) forcing them to perform intermediate computations more robustly.
- The information is corrupted L+1 times where L is the number of layers and is independent of timestamps traversed by the information.
In the context of language modeling, image caption generation, speech recognition and machine translation, dropout enables training larger networks and reduces the testing error in terms of perplexity and frame accuracy.