A novel regularization is proposed that enforces the output of a model to be isotropically smooth around each input data point irrespective of the availability of the training label. This is achieved by investigating the local anisotropy around each input data point and smoothing the model in the most anisotropic direction.
The approach to investigate the local anisotropy around each data point is to find the direction of a small perturbation that the model is most sensitive to. The sensitivity of the model is measured in terms of the KL divergence between the output class probability distribution given the input and the input plus the perturbation. The desired perturbation is called the 'virtual adversarial perturbation'. The model is supposed to minimize the supervised loss (cross entropy, MSE etc) as well as the sensitivity to the virtual adversarial perturbation.
One way to think about why this works is that VAT pu...
[Research paper] Learning to learn by gradient descent by gradient descent
Training an LSTM (NTM—Neural Turing Machine—is tested too) to output an update for a steepest descent iteration at each time step. Aptly titled “Learning to learn by gradient descent by gradient descent”.
Anybody interested in trying Learning to learn to learn by gradient descent by gradient descent by gradient descent?
One cool thing is that they found that the NTM, with access to a complete memory bank, produced behavior similar to that of a second-order optimization algorithm like L-BFGS (I think because being able to store a Jacobian matrix should allow one to estimate a Hessian matrix?).
A clockwork RNN is a recurrent neural network (RNN) architecture designed to remember things easily over long periods of time. It is a follow up on Long Short-Term Memory architecture (also by Jurgen Schmidhuber).
In my opinion, the paper is quite interesting and deserves more attention than it has received.
In general deep learning models cannot be naturally adapted to find solutions to structured learning problems.
In this paper, they introduce a variant of deep recurrent neural networks, which can learn to parse a sentence by learning transitions in a shift-reduce parser. One of their main contributions is to batch this algorithm. In spite of variability in structures between examples, they have managed to invent a batched algorithm.
They use this unusual architecture for solving the natural language inference problem.
PixelCNN 2.0 is the state-of-the-art (as of October 2016 :P) in image generation. The original PixelRNN paper is available here: Pixel RNN, and the follow-up paper is available here: PixelCNN 2.0. Here are some generated samples,