# Introduction

- The paper proposes an adversarial approach for estimating generative models where one model (generative model) tries to learn a data distribution and another model (discriminative model) tries to distinguish between samples from the generative model and original data distribution.
- Link to the paper

# Adversarial Net

- Two models - Generative Model(
*G*) and Discriminative Model(*D*) - Both are multi-layer perceptrons.
*G*takes as input a noise variable*z*and outputs data sample*x(=G(z))*.*D*takes as input a data sample*x*and predicts whether it came from true data or from*G*.*G*tries to minimise*log(1-D(G(z)))*while*D*tries to maximise the probability of correct classification.- Think of it as a minimax game between 2 players and the global optimum would be when
*G*generates perfect samples and*D*can not distinguish between the samples (thereby always returning 0.5 as the probability of sample coming from true data). - Alternate between
*k*steps of training*D*and 1 step of training*G*so that*D*is maintained near its optimal solution. - When starting training, the loss
*log(1-D(G(z)))*would saturate as*G*would be weak. Instead maximise*log(D(G(z)))* - The paper contains the theoretical proof for global optimum of the minimax game.

# Experiments

- Datasets: MNIST, Toronto Face Database, CIFAR-10
- Generator model uses RELU and sigmoid activations.
- Discriminator model uses maxout and dropout.
- Evaluation Metric: Fit Gaussian Parzen window to samples obtained from
*G*and compare log-likelihood.

# Strengths

- Computational advantages
- Backprop is sufficient for training with no need for Markov chains or performing inference.
- A variety of functions can be used in the model.
- Since
*G*is trained only using the gradients from*D*, fewer chances of directly copying features from the true data. - Can represent sharp (even degenerate) distributions.

# Weakness

*D*must be well synchronised with*G*.- While
*G*may learn to sample data points that are indistinguishable from true data, no explicit representation can be obtained.

# Possible Extensions

- Conditional generative models.
- Inference network to predict
*z*given*x*. - Implement a stochastic extension of the deterministic Multi-Prediction Deep Boltzmann Machines
- Using discriminator net or inference net for feature selection.
- Accelerating training by ensuring better coordination between
*G*and*D*or by determining better distributions to sample*z*from during training.