shagunsodhani/Generative Adversarial Nets.md

## Generative Adversarial Nets.md

      
    Raw
  

              Generative Adversarial Nets.md
            
          
    Generative Adversarial Nets

Introduction


The paper proposes an adversarial approach for estimating generative models where one model (generative model) tries to learn a data distribution and another model (discriminative model) tries to distinguish between samples from the generative model and original data distribution.
Link to the paper

Adversarial Net


Two models - Generative Model(G) and Discriminative Model(D)
Both are multi-layer perceptrons.
G takes as input a noise variable z and outputs data sample x(=G(z)).
D takes as input a data sample x and predicts whether it came from true data or from G.
G tries to minimise log(1-D(G(z))) while D tries to maximise the probability of correct classification.
Think of it as a minimax game between 2 players and the global optimum would be when G generates perfect samples and D can not distinguish between the samples (thereby always returning 0.5 as the probability of sample coming from true data).
Alternate between k steps of training D and 1 step of training G so that D is maintained near its optimal solution.
When starting training, the loss log(1-D(G(z))) would saturate as G would be weak. Instead maximise log(D(G(z)))
The paper contains the theoretical proof for global optimum of the minimax game.

Experiments


Datasets

MNIST, Toronto Face Database, CIFAR-10


Generator model uses RELU and sigmoid activations.
Discriminator model uses maxout and dropout.
Evaluation Metric

Fit Gaussian Parzen window to samples obtained from G and compare log-likelihood.


Strengths


Computational advantages

Backprop is sufficient for training with no need for Markov chains or performing inference.
A variety of functions can be used in the model.


Since G is trained only using the gradients from D, fewer chances of directly copying features from the true data.
Can represent sharp (even degenerate) distributions.

Weakness


D must be well synchronised with G.
While G may learn to sample data points that are indistinguishable from true data, no explicit representation can be obtained.

Possible Extensions


Conditional generative models.
Inference network to predict z given x.
Implement a stochastic extension of the deterministic Multi-Prediction Deep Boltzmann Machines
Using discriminator net or inference net for feature selection.
Accelerating training by ensuring better coordination between G and D or by determining better distributions to sample z from during training.