shagunsodhani/Conditional Image Generation with PixelCNN Decoders.md

## Conditional Image Generation with PixelCNN Decoders.md

      
    Raw
  

              Conditional Image Generation with PixelCNN Decoders.md
            
          
    Conditional Image Generation with PixelCNN Decoders

Introduction


The paper explores the domain of conditional image generation by adopting and improving PixelCNN architecture.
Link to the paper

Based on PixelRNN and PixelCNN


Models image pixel by pixel by decomposing the joint image distribution as a product of conditionals.
PixelRNN uses two-dimensional LSTM while PixelCNN uses convolutional networks.
PixelRNN gives better results but PixelCNN is faster to train.

Gated PixelCNN


PixelRNN outperforms PixelCNN due to the larger receptive field and because they contain multiplicative units, LSTM gates, which allow modelling more complex interactions.
To account for these, deeper models and gated activation units (equation 2 in the paper) can be used respectively.
Masked convolutions can lead to blind spots in the receptive fields.
These can be removed by combining 2 convolutional network stacks:

Horizontal stack - conditions on the current row.
Vertical stack - conditions on all rows above the current row.


Every layer in the horizontal stack takes as input the output of the previous layer as well as that of the vertical stack.
Residual connections are used in the horizontal stack and not in the vertical stack (as they did not seem to improve results in the initial settings).

Conditional PixelCNN


Model conditional distribution of image, given the high-level description of the image, represented using the latent vector h (equation 4 in the paper)
This conditioning does not depend on the location of the pixel in the image.
To consider the location as well, map h to spatial representation s = m(h) (equation 5 in the the paper)

PixelCNN Auto-Encoders


Start with a traditional auto-encoder architecture and replace the deconvolutional decoder with PixelCNN and train the network end-to-end.

Experiments


For unconditional modelling, Gated PixelCNN either outperforms PixelRNN or performs almost as good and takes much less time to train.
In the case of conditioning on ImageNet classes, the log likelihood measure did not improve a lot but the visual quality of the generated sampled was significantly improved.
Paper also included sample images generated by conditioning on human portraits and by training a PixelCNN auto-encoder on ImageNet patches.