Training an LSTM (NTM—Neural Turing Machine—is tested too) to output an update for a steepest descent iteration at each time step. Aptly titled “Learning to learn by gradient descent by gradient descent”.
Anybody interested in trying Learning to learn to learn by gradient descent by gradient descent by gradient descent?