PixelCNN 2.0 is the state-of-the-art (as of October 2016 :P) in image generation. The original PixelRNN paper is available here: Pixel RNN, and the follow-up paper is available here: PixelCNN 2.0. Here are some generated samples,
and results from some interpolation experiments!
Introduction: In all the models (PixelCNN, PixelRNN and PixelCNN 2.0), the image is generated pixel by pixel. Since we are generating it pixel by pixel, we need to assume some sequential order in which the pixels will be generated. The chosen order is color by color, left to right, top to bottom. For example, in a 2x2 image, the order of generation would be (1,1,R), (1,1,G), (1,1,B), (1,2,R), (1,2,G), (1,2,B), (2,1,R), (2,1,G), (2,1,B), (2,2,R), (2,2,G), (2,2,B). Each of these can take 256 values, but you can also choose to quantize the number of possible values much smaller values such as 32 or 4. When we generate the value at a particular pixel, we have already generated the values at the previous pixels, and can condition on the same, i.e. we predict current value given previous values.
Masking: In all the models, the above context is implemented using masks. The basic idea is to zero-out all the values that are not supposed to be available to the model. So when generating the pixel in the middle, the white values are zeroed-out.
You have to be little careful since the masks are actually 3D (2D image, plus a third dimension for color). For each color, values from other colors may or may not be available. They explain this as masks of type A and type B in the paper. During implementation though, each mask is actually a 3D mask made of type A masks and type B masks put together.
Implementations: When a new paper with good results is published, there is a flurry of activity to recreate the results of the paper, and to try out variations and extensions. Here are implementations of PixelRNN and PixelCNN in TensorFlow and Theano.