NVP stands for non-volume-preserving transformations. This work introduces an ingenious way to create non-linear but exactly invertible transformations to map from the latent variables z to the data x. Though it is not mentioned in the paper but the transformations seem to be inspired from Fiestel structures used in encryption, an example being the DES encryption algorithm.
Once they have an invertible and non-linear transformation from z to x, they use the change of variable formula to get an expression for the density of any datapoint x. Thereafter they train to maximize the likelihood of the data. Inference of z given x is trivial here because of the invertibility of the transformation.
In general deep learning models cannot be naturally adapted to find solutions to structured learning problems.
In this paper, they introduce a variant of deep recurrent neural networks, which can learn to parse a sentence by learning transitions in a shift-reduce parser. One of their main contributions is to batch this algorithm. In spite of variability in structures between examples, they have managed to invent a batched algorithm.
They use this unusual architecture for solving the natural language inference problem.
Variational autoencoders were introduced to efficiently to solve the inference problem in deep generative models. A generative model generates data by sampling as follows: z ~ p(z), x ~ p(x|z). p in this case can be modeled using deep neural networks. But using deep neural networks makes it difficult to infer z, given an x. This paper introduces a novel method based on calculus of variations to approximately infer z given x. The authors start with an intuitive objective to optimize, reduce it to the problem of optimization of a variational lower bound and finally illustrate an example using deep neural networks to model the probability distributions. In contrast to GANs (which can easily diverge even with small variations), variational autoencoders are easier to train. Similar to GANs, trained variational autoencoders have be used to generate new images and interpolate between images.