Analysis of DCGAN of Generative Model Classic Network

In the original GAN ​​model, both the generator and the discriminator are shallow models. In order to generate higher resolution images , we need deeper models. Indico's Alec Radford et al. proposed a DCGAN (Deep Convolutional GAN) model in 2016. Both the discriminator and generator in this model use a fully convolutional neural network. Except for the last layer of the discriminator and the first layer of the generator, all other layers use convolutional layers. This model improves the GAN's ability to generate large-scale images. After training on the LSUN (Large-sale Scene Understanding) dataset, it can generate
a realistic image with a size of 64 × 64 , which exceeds other previous GAN models.

def discriminator(self, image, reuse=False):
    if reuse:
        tf.get_variable_scope().reuse_variables()

    h0 = lrelu(conv2d(image, self.df_dim, name='d_h0_conv'))
    h1 = lrelu(self.d_bn1(conv2d(h0, self.df_dim*2, name='d_h1_conv')))
    h2 = lrelu(self.d_bn2(conv2d(h1, self.df_dim*4, name='d_h2_conv')))
    h3 = lrelu(self.d_bn3(conv2d(h2, self.df_dim*8, name='d_h3_conv')))
    h4 = linear(tf.reshape(h3, [-1, 8192]), 1, 'd_h3_lin')

    return tf.nn.sigmoid(h4), h4


def generator(self, z):
    self.z_, self.h0_w, self.h0_b = linear(z,self.gf_dim*8*4*4,'g_h0_lin', with_w=True)
    self.h0 = tf.reshape(self.z_, [-1, 4, 4, self.gf_dim * 8])
    h0 = tf.nn.relu(self.g_bn0(self.h0))

    self.h1, self.h1_w, self.h1_b = conv2d_transpose(h0,[self.batch_size, 8, 8, self.gf_dim*4], name='g_h1', with_w=True)
    h1 = tf.nn.relu(self.g_bn1(self.h1))
    h2, self.h2_w, self.h2_b = conv2d_transpose(h1,[self.batch_size, 16, 16, self.gf_dim*2], name='g_h2', with_w=True)
    h2 = tf.nn.relu(self.g_bn2(h2))
    h3, self.h3_w, self.h3_b = conv2d_transpose(h2,[self.batch_size, 32, 32, self.gf_dim*1], name='g_h3', with_w=True)
    h3 = tf.nn.relu(self.g_bn3(h3))
    h4, self.h4_w, self.h4_b = conv2d_transpose(h3,[self.batch_size, 64, 64, 3], name='g_h4', with_w=True)

    return tf.nn.tanh(h4)


As can be seen from the above figure, the generator G expands a 100-dimensional noise vector into a 64 * 64 * 3 matrix output, and the whole process adopts the method of fractionally-strided convolutions.

self.d_loss_real = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(self.D_logits,tf.ones_like(self.D)))
self.d_loss_fake = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(self.D_logits_,tf.zeros_like(self.D_)))
self.d_loss = self.d_loss_real + self.d_loss_fake

self.g_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(self.D_logits_,tf.ones_like(self.D_)))

Compared with the improvement of GAN
(1) use convolution and deconvolution instead of pooling layer
(2) add batch normalization operation in both the generator and discriminator
(3) remove the fully connected layer of GAN, use The global pooling layer replaces
(4) The output layer of the generator uses Tanh activation function, and other layers use RELU
(5) All layers of the discriminator use LeakyReLU activation function

http://bamos.github.io/2016/08/09/deep-completion/

Guess you like

Origin blog.csdn.net/weixin_38052918/article/details/107917695