A novel regularization is proposed that enforces the output of a model to be isotropically smooth around each input data point irrespective of the availability of the training label. This is achieved by investigating the local anisotropy around each input data point and smoothing the model in the most anisotropic direction.
The approach to investigate the local anisotropy around each data point is to find the direction of a small perturbation that the model is most sensitive to. The sensitivity of the model is measured in terms of the KL divergence between the output class probability distribution given the input and the input plus the perturbation. The desired perturbation is called the 'virtual adversarial perturbation'. The model is supposed to minimize the supervised loss (cross entropy, MSE etc) as well as the sensitivity to the virtual adversarial perturbation.
One way to think about why this works is that VAT pu...
In the modern literature (last few years), the first successful neural models for translation worked w...
Read more… (165 words)
Read more (165 words)
[Research paper] Learning to learn by gradient descent by gradient descent
Training an LSTM (NTM—Neural Turing Machine—is tested too) to output an update for a steepest descent iteration at each time step. Aptly titled “Learning to learn by gradient descent by gradient descent”.
Anybody interested in trying Learning to learn to learn by gradient descent by gradient descent by gradient descent?
One cool thing is that they found that the NTM, with access to a complete memory bank, produced behavior similar to that of a second-order optimization algorithm like L-BFGS (I think because being able to store a Jacobian matrix should allow one to estimate a Hessian matrix?).
Read more… (47 words)
Read more (47 words)
A Clockwork RNN
A clockwork RNN is a recurrent neural network (RNN) architecture designed to remember things easily over long periods of time. It is a follow up on Long Short-Term Memory architecture (also by Jurgen Schmidhuber).
In my opinion, the paper is quite interesting and deserves more attention than it has received.
The nodes in the RNN are divided into K groups. Group k has a time period T[k], which means that those nodes are only updated once every T[k] time steps. Specifically, in the paper, T[k] = 2 ** k
Read more… (40 words)
Read more (40 words)
PixelCNN + PixelRNN + PixelCNN 2.0
PixelCNN 2.0 is the state-of-the-art (as of October 2016 :P) in image generation. The original PixelRNN paper is available here: Pixel RNN, and the follow-up paper is available here: PixelCNN 2.0. Here are some generated samples,