一、背景

BEGAN，即边界平衡GAN（Boundary Equilibrium GAN），是DavidBerthelot等人[1]于2017年03月提出的一种方法。传统的GAN是利用判别器去评估生成器生成的图片和真实图片的数据分布是否一致，而BEGAN则代替了这种概率估计的方法，作者认为只要分布之间的误差分布相近的话，那就可以认为这些分布相近。同时作者又对网络结构进行了改进，并取得了比较好的实验效果。

本实验基于CelebA数据[2]，利用BEGAN生成不同的人脸，参考代码[3]并做改进，用尽可能少的代码实现该过程（关于数据集和代码实现过程，后面会详细介绍）。

[1]文章链接：https://arxiv.org/abs/1703.10717

[2]CelebA数据集：http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

[3]主要参考代码：https://github.com/JorgeCeja/began-tensorflow

二、BEGAN原理

BEGAN网上的解读也比较多，这里不再过多介绍，推荐一篇比较好的解读文章：

[4]BEGAN解读

对于BEGAN论文[1]，作者也很清楚的介绍了他的贡献：

In this paper, we make the following contributions:

• A GAN with a simple yet robust architecture, standard training procedure with fast and stable convergence.（更为鲁棒的GAN，更快速稳定收敛）

• An equilibrium concept that balances the power of the discriminator against the generator. （一种判别器于生成器的平衡概念）

扫描二维码关注公众号，回复： 4502686 查看本文章

• A new way to control the trade-off between image diversity and visual quality. （图像多样性和生成质量的控制）

• An approximate measure of convergence. To our knowledge the only other published measure is from Wasserstein GAN [1] (WGAN), which will be discussed in the next section.（关于收敛的近似评估的讨论，当然作者的灵感来自于WGAN）

作者在文章中也提到了关于BEGAN的模型结构：判别器采用自编码器（auto-encoder），使用Wasserstein距离来匹配自编码器的loss分布，添加了一个判别器和生成器之间的平衡项。

We use an auto-encoder as a discriminator as was ﬁrst proposed in EBGAN [21]. While typical GANs try to match data distributions directly, our method aims to match auto-encoder loss distributions using a loss derived from the Wasserstein distance. This is done using a typical GAN objective with the addition of an equilibrium term to balance the discriminator and the generator. Our method has an easier training procedure and uses a simpler neural network architecture compared to typical GAN techniques.

作者给出了网络结构为：

当然，BEGAN的详细解读可以参考[4]。关于BEGAN的实现，网上的代码不太多，而且github上排名前几的代码量都非常大，对于阅读理解不太友好。对于网上的代码，我主要参考了[3]，并对其中多余的部分进行删减和简单修改，另外再推荐几个比较适合阅读的代码，供大家参考：

[5]https://github.com/artcg/BEGAN

[6]https://github.com/Heumi/BEGAN-tensorflow

[7]https://github.com/k920049/BEGAN/blob/master/model/BEGAN.py

本实现的目的在于利用BEGAN生成不同的人脸，虽然之前的文章：对抗神经网络学习（二）——DCGAN生成Cage人脸图像(tensorflow实现)，也能生成人脸，但是区别在于DCGAN的实验只能生成一种人脸，而BEGAN是可以生成不同的人脸。实验基于CelebA数据集，用尽可能少的代码实现BEGAN。

三、BEGAN实现

1.文件结构

所有的文件结构如下：

-- main.py                          (主要运行文件)
-- model.py                         (BEGAN模型文件)
-- utils.py                         (相关函数文件)
-- data                             (训练数据文件夹)
    |------ img_align_celeba_png
                |------ image01.png
                |------ image02.png
                |------ ...

2.数据集介绍

本实验所使用的数据集是CelebA，该数据集的官方地址为：http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html，直接打开上述网页，可以看到CelebA是一个人脸数据集：

官网也给出了该数据集的介绍：

CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations, including

10,177 number of identities,

202,599 number of face images, and

5 landmark locations, 40 binary attributes annotations per image.

可以得知，该数据集采集了10177个样本，共有数据202599张人脸影像，这里我们需要下载相关人脸数据，从官网地址向下拉，看到下面的界面，选择Align&Cropped Images进行下载：

当然，如果该下载链接无法使用的话，可以点击下面的百度云盘下载，这里同样也给出下载地址：https://pan.baidu.com/s/1eSNpdRG#list/path=%2FCelebA。下载好数据集之后，对其进行解压，并在根目录下创建data文件夹，将解压好的数据放置在路径'./data/img_align_celeba_png/*.png'。

整理好数据集之后，我们可以打开看看数据集，里面确实有202599张影像，每张影像的正中间包含人脸，且人脸类型差异挺大，所有的图像尺寸为178*218。具体的数据集展示：

3.辅助函数文件utils.py

该文件主要放置一些数据集预处理函数，主要做的事情包括自动下载数据集（如果数据集自己下载了的话可以注释掉这些代码），将数据集裁剪成64*64大小，将数据分解成多个batch，绘制最终结果等函数。具体的代码如下：

import math
import os

import matplotlib.pyplot as plt
import numpy as np
from PIL import Image


# 根据图像路径来读取图像，并对图像进行裁剪，由于人脸基本都是处在图像的正中央，因此直接裁剪中心部分即可
def get_image(image_path, width, height, mode):
    """
    Read image from image_path
    :param image_path: Path of image
    :param width: Width of image
    :param height: Height of image
    :param mode: Mode of image
    :return: Image data
    """
    image = Image.open(image_path)

    if image.size != (width, height):  # HACK - Check if image is from the CELEBA dataset
        # Remove most pixels that aren't part of a face
        face_width = face_height = 108
        j = (image.size[0] - face_width) // 2
        i = (image.size[1] - face_height) // 2
        image = image.crop([j, i, j + face_width, i + face_height])
        image = image.resize([width, height], Image.BILINEAR)

    return np.array(image.convert(mode))


# 将读取的图像分批
def get_batch(image_files, width, height, mode):
    data_batch = np.array(
        [get_image(sample_file, width, height, mode) for sample_file in image_files]).astype(np.float32)

    # Make sure the images are in 4 dimensions
    if len(data_batch.shape) < 4:
        data_batch = data_batch.reshape(data_batch.shape + (1,))

    return data_batch


# 构建数据集类，需要用到上述两个函数
class Dataset(object):

    def __init__(self, data_files):
        """
        param data_files: List of files in the database
        """
        IMAGE_WIDTH = 64
        IMAGE_HEIGHT = 64

        self.image_mode = 'RGB'
        image_channels = 3

        self.data_files = data_files
        self.shape = len(data_files), IMAGE_WIDTH, IMAGE_HEIGHT, image_channels

    def get_batches(self, batch_size):
        """
        Generate batches
        :param batch_size: Batch Size
        :return: Batches of data
        """
        IMAGE_MAX_VALUE = 255

        current_index = 0
        while current_index + batch_size <= self.shape[0]:
            data_batch = get_batch(
                self.data_files[current_index:current_index + batch_size],
                *self.shape[1:3], 
                self.image_mode)

            current_index += batch_size

            yield data_batch / IMAGE_MAX_VALUE - 0.5


# 将生成的图像放置在一起
def images_square_grid(images, mode):
    """
    Save images as a square grid
    :param images: Images to be used for the grid
    :param mode: The mode to use for images
    :return: Image of images in a square grid
    """
    # Get maximum size for square grid of images
    save_size = math.floor(np.sqrt(images.shape[0]))

    # Scale to 0-255
    images = (((images - images.min()) * 255) /
              (images.max() - images.min())).astype(np.uint8)

    # Put images in a square arrangement
    images_in_square = np.reshape(
        images[:save_size * save_size],
        (save_size, save_size, images.shape[1], images.shape[2], images.shape[3]))
    if mode == 'L':
        images_in_square = np.squeeze(images_in_square, 4)

    # Combine images to grid image
    new_im = Image.new(
        mode, (images.shape[1] * save_size, images.shape[2] * save_size))
    for col_i, col_images in enumerate(images_in_square):
        for image_i, image in enumerate(col_images):
            im = Image.fromarray(image, mode)
            new_im.paste(
                im, (col_i * images.shape[1], image_i * images.shape[2]))

    return new_im


# 对最终的结果进行绘制并保存
def save_plot(data, title, image_mode=None, isImage=False):
    """
    Save images or plot to file on the out folder
    Can also save stacked plots
    """
    if not os.path.exists('out/'):
        os.makedirs('out/')

    fig = plt.figure()

    if isImage:
        cmap = None if image_mode == 'RGB' else 'gray'
        plt.imshow(data, cmap=cmap)

    else:

        if type(data) == list:
            for i in data:
                plt.plot(i)
        else:
            plt.plot(data)

    fig.savefig('out/' + title)
    plt.close(fig)


# 对生成器的结果进行绘制，需要用到上述两个函数
def show_generator_output(sess, generator, input_z, example_z, out_channel_dim, image_mode, num):
    """
    Show example output for the generator
    :param sess: TensorFlow session
    :param n_images: Number of Images to display
    :param input_z: Input Z Tensor
    :param out_channel_dim: The number of channels in the output image
    :param image_mode: The mode to use for images ("RGB" or "L")
    """

    samples = sess.run(
        generator(input_z, out_channel_dim, False),
        feed_dict={input_z: example_z})

    images_grid = images_square_grid(samples, image_mode)
    save_plot(images_grid, '{}.png'.format(num), image_mode, True)


# 进行高斯平滑
def smooth(list, degree=5):
    """
    By Scott W Harden from www.swharden.com
    """
    window = degree * 2 - 1
    weight = np.array([1.0] * window)
    weightGauss = []

    for i in range(window):
        i = i - degree + 1
        frac = i / float(window)
        gauss = 1 / (np.exp((4 * (frac)) ** 2))
        weightGauss.append(gauss)

    weight = np.array(weightGauss) * weight
    smoothed = [0.0] * (len(list) - window)

    for i in range(len(smoothed)):
        smoothed[i] = sum(np.array(list[i:i + window]) * weight) / sum(weight)

    return smoothed

4.模型函数文件model.py

下面是BEGAN的模型文件，该文件中主要定义BEGAN的模型结构，其代码为：

import tensorflow as tf
from tensorflow.python.ops import math_ops
from tensorflow.python.framework import ops
import numpy as np


class BEGAN(object):
    def __init__(self, place_holder=''):
        self.place_holder = place_holder
        # pass

    def model_inputs(self, image_width, image_height, image_channels, z_dim):
        """
        Create the model inputs/tensors
        """
        inputs_real = tf.placeholder(
            tf.float32, (None, image_width, image_height, image_channels), name='input_real')
        inputs_z = tf.placeholder(tf.float32, (None, z_dim), name='input_z')
        learning_rate = tf.placeholder(tf.float32, [], name='learning_rate')
        k_t = tf.placeholder(tf.float32, name='k_t')

        return inputs_real, inputs_z, learning_rate, k_t

    # default aplha is 0.2, 0.01 works best for this example
    # Function from TensorFlow v1.4 for backwards compatability
    def leaky_relu(self, features, alpha=0.01, name=None):
        with ops.name_scope(name, "LeakyRelu", [features, alpha]):
            features = ops.convert_to_tensor(features, name="features")
            alpha = ops.convert_to_tensor(alpha, name="alpha")

            return math_ops.maximum(alpha * features, features)

    def fully_connected(self, x, output_shape):
        # flatten and dense
        shape = x.get_shape().as_list()
        dim = np.prod(shape[1:])

        x = tf.reshape(x, [-1, dim])
        x = tf.layers.dense(x, output_shape, activation=None)

        return x

    def decoder(self, h, n, img_dim, channel_dim):
        """
        Reconstruction network
        """
        h = tf.layers.dense(h, img_dim * img_dim * n, activation=None)
        h = tf.reshape(h, (-1, img_dim, img_dim, n))

        conv1 = tf.layers.conv2d(
            h, n, 3, padding="same", activation=self.leaky_relu)
        conv1 = tf.layers.conv2d(
            conv1, n, 3, padding="same", activation=self.leaky_relu)

        upsample1 = tf.image.resize_nearest_neighbor(
            conv1, size=(img_dim * 2, img_dim * 2))

        conv2 = tf.layers.conv2d(
            upsample1, n, 3, padding="same", activation=self.leaky_relu)
        conv2 = tf.layers.conv2d(
            conv2, n, 3, padding="same", activation=self.leaky_relu)

        upsample2 = tf.image.resize_nearest_neighbor(
            conv2, size=(img_dim * 4, img_dim * 4))

        conv3 = tf.layers.conv2d(
            upsample2, n, 3, padding="same", activation=self.leaky_relu)
        conv3 = tf.layers.conv2d(
            conv3, n, 3, padding="same", activation=self.leaky_relu)

        conv4 = tf.layers.conv2d(conv3, channel_dim, 3,
                                 padding="same", activation=None)

        return conv4

    def encoder(self, images, n, z_dim, channel_dim):
        """
        Feature extraction network
        """
        conv1 = tf.layers.conv2d(
            images, n, 3, padding="same", activation=self.leaky_relu)

        conv2 = tf.layers.conv2d(
            conv1, n, 3, padding="same", activation=self.leaky_relu)
        conv2 = tf.layers.conv2d(
            conv2, n * 2, 3, padding="same", activation=self.leaky_relu)

        subsample1 = tf.layers.conv2d(
            conv2, n * 2, 3, strides=2, padding='same')

        conv3 = tf.layers.conv2d(subsample1, n * 2, 3,
                                 padding="same", activation=self.leaky_relu)
        conv3 = tf.layers.conv2d(
            conv3, n * 3, 3, padding="same", activation=self.leaky_relu)

        subsample2 = tf.layers.conv2d(
            conv3, n * 3, 3, strides=2, padding='same')

        conv4 = tf.layers.conv2d(subsample2, n * 3, 3,
                                 padding="same", activation=self.leaky_relu)
        conv4 = tf.layers.conv2d(
            conv4, n * 3, 3, padding="same", activation=self.leaky_relu)

        h = self.fully_connected(conv4, z_dim)

        return h

    def discriminator(self, images, z_dim, channel_dim, reuse=False):
        """
        Create the discriminator network: The autoencoder
        """
        with tf.variable_scope('discriminator', reuse=reuse):
            x = self.encoder(images, 64, z_dim, channel_dim)
            x = self.decoder(x, 64, 64 // 4, channel_dim)

            return x

    def generator(self, z, channel_dim, is_train=True):
        """
        Create the generator network: Only the encoder part
        """
        reuse = False if is_train else True
        with tf.variable_scope('generator', reuse=reuse):
            x = self.decoder(z, 64, 64 // 4, channel_dim)

            return x

    def model_loss(self, input_real, input_z, channel_dim, z_dim, k_t):
        """
        Get the loss for the discriminator and generator
        """
        g_model_fake = self.generator(input_z, channel_dim, is_train=True)
        d_model_real = self.discriminator(input_real, z_dim, channel_dim)
        d_model_fake = self.discriminator(
            g_model_fake, z_dim, channel_dim, reuse=True)

        # l1 loss
        d_real = tf.reduce_mean(tf.abs(input_real - d_model_real))
        d_fake = tf.reduce_mean(tf.abs(g_model_fake - d_model_fake))

        d_loss = d_real - k_t * d_fake
        g_loss = d_fake

        return d_loss, g_loss, d_real, d_fake

    def model_opt(self, d_loss, g_loss, learning_rate, beta1, beta2=0.999):
        """
        Get optimization operations
        """
        # Get variables
        g_vars = tf.get_collection(
            tf.GraphKeys.GLOBAL_VARIABLES, "generator")
        d_vars = tf.get_collection(
            tf.GraphKeys.GLOBAL_VARIABLES, "discriminator")

        # Optimize
        d_train_opt = tf.train.AdamOptimizer(
            learning_rate, beta1=beta1, beta2=beta2).minimize(d_loss, var_list=d_vars)
        g_train_opt = tf.train.AdamOptimizer(
            learning_rate, beta1=beta1, beta2=beta2).minimize(g_loss, var_list=g_vars)

        return d_train_opt, g_train_opt

5.主文件main.py

定义完上述两个文件之后，需要在主文件中定义训练过程，并对BEGAN的训练结果进行绘图，先直接给出BEGAN的main.py的代码：

from models import BEGAN
import tensorflow as tf
from glob import glob
import numpy as np
import utils
import math
import os


def train(model, epoch_count, batch_size, z_dim, star_learning_rate, beta1, beta2, get_batches, data_shape, image_mode):

    input_real, input_z, lrate, k_t = model.model_inputs(*(data_shape[1:]), z_dim)

    d_loss, g_loss, d_real, d_fake = model.model_loss(
        input_real, input_z, data_shape[3], z_dim, k_t)

    d_opt, g_opt = model.model_opt(d_loss, g_loss, lrate, beta1, beta2)

    losses = []
    iter = 0

    epoch_drop = 3

    lam = 1e-3
    gamma = 0.5
    k_curr = 0.0

    test_z = np.random.uniform(-1, 1, size=(16, z_dim))

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        for epoch_i in range(epoch_count):

            learning_rate = star_learning_rate * \
                math.pow(0.2, math.floor((epoch_i + 1) / epoch_drop))
            for batch_images in get_batches(batch_size):
                iter += 1
                batch_images *= 2

                batch_z = np.random.uniform(-1, 1, size=(batch_size, z_dim))

                _, d_real_curr = sess.run([d_opt, d_real], feed_dict={
                                          input_z: batch_z, input_real: batch_images, lrate: learning_rate, k_t: k_curr})

                _, d_fake_curr = sess.run([g_opt, d_fake], feed_dict={
                                          input_z: batch_z, input_real: batch_images, lrate: learning_rate, k_t: k_curr})

                k_curr = k_curr + lam * (gamma * d_real_curr - d_fake_curr)

                # save convergence measure
                if iter % 100 == 0:
                    measure = d_real_curr + \
                        np.abs(gamma * d_real_curr - d_fake_curr)
                    losses.append(measure)

                    print("Epoch {}/{}, batch {}...".format(epoch_i + 1, epoch_count, iter),
                          'Convergence measure: {:.4}'.format(measure))

                # save test and batch images
                if iter % 700 == 0:
                    utils.show_generator_output(
                        sess, model.generator, input_z, batch_z, data_shape[3], image_mode, 'batch-' + str(iter))

                    utils.show_generator_output(
                        sess, model.generator, input_z, test_z, data_shape[3], image_mode, 'test-' + str(iter))

        print('Training steps: ', iter)

        losses = np.array(losses)

        utils.save_plot([losses, utils.smooth(losses)],
                         'convergence_measure.png')


if __name__ == '__main__':
    batch_size = 16
    z_dim = 64  
    learning_rate = 0.0001
    beta1 = 0.5
    beta2 = 0.999
    epochs = 20

    data_dir = './data/'

    model = BEGAN()

    celeba_dataset = utils.Dataset(glob(os.path.join(data_dir, 'img_align_celeba_png/*.png')))

    with tf.Graph().as_default():
        train(model,
              epochs,
              batch_size,
              z_dim,
              learning_rate,
              beta1,
              beta2,
              celeba_dataset.get_batches,
              celeba_dataset.shape,
              celeba_dataset.image_mode)

直接运行main.py文件，即可运行试验。

四、实验结果

试验暂且设置为20个epoch，每个batch_size设置为16张图片，所以每个epoch约有12000多个batch。每700个batch用生成器生成一组人脸数据，并对之前设置好的数据再生成一组人脸数据，这样每700个batch会得到两组结果，20个epoch训练完毕之后，会自动绘制loss曲线。

用生成器随机生成的人脸数据的变化：

以上是每一次训练完成之后对于随机数据的生成器生成结果，那么对于固定数据来说，其变化过程为：

当然，我目前并没有训练完，不过已经可以看到，大约在训练8000左右个batch 的时候，生成的结果已经非常好了。不过随着生成结果的质量越来越好，噪声的问题也越来越明显。

第二次更新

最终我一共训练了253240个batch，用GPU大概训练了2天，训练最终的loss图像为（纵坐标表示loss值，横坐标表示训练次数，蓝色是真实的loss曲线，橙色的是平滑后的loss曲线）：

人脸生成的图像为（上面一行为随机生成的人脸，下面一行为固定生成的人脸，选择batch的值分别为~100000, ~150000, ~200000, ~250000）：

五、分析

1.BEGAN可以用于生成多种不同的人脸数据，且试验的效果较好。

2.文件的结构可以参见三（1）中的介绍。

3.源代码中给出了python下载数据的代码，由于我是自己下载的代码，所以就将这部分代码删除了，下面给出这部分代码（我自己修改过一点），有兴趣的话可以自己试试，直接运行下面的代码，会在'./data/'文件夹下面下载到数据：

import os
import hashlib
from urllib.request import urlretrieve
import zipfile
import shutil
from tqdm import tqdm


def download_extract(database_name, data_path):
    """
    Download and extract database
    :param database_name: Database name
    """
    url = 'https://s3-us-west-1.amazonaws.com/udacity-dlnfd/datasets/celeba.zip'
    hash_code = '00d2c5bc6d35e252742224ab0c1e8fcb'
    extract_path = os.path.join(data_path, 'img_align_celeba')
    save_path = os.path.join(data_path, 'celeba.zip')
    extract_fn = _unzip

    if os.path.exists(extract_path):
        print('Found {} Data'.format(database_name))
        return

    if not os.path.exists(data_path):
        os.makedirs(data_path)

    if not os.path.exists(save_path):
        with DLProgress(unit='B', unit_scale=True, miniters=1, desc='Downloading {}'.format(database_name)) as pbar:
            urlretrieve(
                url,
                save_path,
                pbar.hook)

    assert hashlib.md5(open(save_path, 'rb').read()).hexdigest() == hash_code, \
        '{} file is corrupted.  Remove the file and try again.'.format(
            save_path)

    os.makedirs(extract_path)
    try:
        extract_fn(save_path, extract_path, database_name, data_path)
    except Exception as err:
        # Remove extraction folder if there is an error
        shutil.rmtree(extract_path)
        raise err

    # Remove compressed data
    os.remove(save_path)


def _unzip(save_path, _, database_name, data_path):
    """
    Unzip wrapper with the same interface as _ungzip
    :param save_path: The path of the gzip files
    :param database_name: Name of database
    :param data_path: Path to extract to
    :param _: HACK - Used to have to same interface as _ungzip
    """
    print('Extracting {}...'.format(database_name))
    with zipfile.ZipFile(save_path) as zf:
        zf.extractall(data_path)


class DLProgress(tqdm):
    """
    Handle Progress Bar while Downloading
    """
    last_block = 0

    def hook(self, block_num=1, block_size=1, total_size=None):
        """
        A hook function that will be called once on establishment of the network connection and
        once after each block read thereafter.
        :param block_num: A count of blocks transferred so far
        :param block_size: Block size in bytes
        :param total_size: The total size of the file. This may be -1 on older FTP servers which do not return
                            a file size in response to a retrieval request.
        """
        self.total = total_size
        self.update((block_num - self.last_block) * block_size)
        self.last_block = block_num


if __name__ == '__main__':
    data_dir='./data/'
    download_extract('celeba', data_dir)

对抗神经网络学习（六）——BEGAN实现不同人脸的生成(tensorflow实现)