轻轻松松使用StyleGAN（四）：对StyleGAN的逆向网络的训练过程进行优化

前面我们介绍了如何构造StyleGAN的逆向网络，并通过训练得到一个比较好的模型，并利用这样的模型从目标图像中提取特征码，内容请参考：

https://blog.csdn.net/weixin_41943311/article/details/102370766

在训练过程中，我们使用了一些训练技巧，将accuracy从0.8933最终提升到0.9501，下面简单介绍一下逐步优化的过程：

（1）第一次训练，我们用StyleGAN生成了1200个dlatents以及它们所对应的人脸图片，并记录到文件中，训练时我们从文件中加载数据，并训练lotus模型。由于lotus模型输出的不是目标的分类，而是18x512的张量，因此我们使用mean_squared_error作为损失函数loss，优化器（optimizer）使用的是adam，epochs = 503，batch_size = 6，最后训练得到的accuracy = 0.8933。

model.compile(optimizer="adam", loss="mean_squared_error", metrics=["mae", "acc"])
model.fit(X_train, Y_train, epochs=503, batch_size=6)

（2）第二次，我们将batch_size从6扩大到8。我们用StyleGAN生成了1280个dlatents以及它们所对应的人脸图片，并记录到文件中。我们使用mean_squared_error作为损失函数loss，优化器（optimizer）使用的是adam，epochs = 599，batch_size = 8，最后训练得到的accuracy = 0.9201。

model.compile(optimizer="adam", loss="mean_squared_error", metrics=["mae", "acc"])
model.fit(X_train, Y_train, epochs=599, batch_size=8)

由于我所使用的HP笔记本，配置的是比较低端的NVIDIA GeForce GTX 1060，显存只有6GB，因此当batch_size扩大到9以上时，系统会报告内存不足，因此只能做到batch_size = 8。

按某些人的说法，建议配置显存大小为11GB的GTX 1080 Ti 或者GTX 2080 Ti ，batch_size可以最大开到48。（模型的批量大小不是线性的，显存的很大一部分要用来装载神经网络）

batch_size增加后，会加快模型训练和收敛的速度。但Yan LeCun同志说：

Training with large minibatches is bad for your health. More importantly, it's bad for your test error. Friends don‘t let friends use minibatches larger than 32. Let's face it: the only people have switched to minibatch sizes larger than one since 2012 is because GPUs are inefficient for batch sizes smaller than 32. That's a terrible reason. It just means our hardware sucks.

他的建议是batch_size不要超过32，并且较小的batch_size往往有助于带领算法走出局部最优（鞍点）。

（3）第三次，我们引入了callback（回调函数）。我们仍然使用StyleGAN生成的1280个dlatents以及它们所对应的人脸图片进行训练。

# model.fit()的callbacks函数
# 若发现monitor相比上一个epoch训练没有改进，则经过patience个epoch后停止训练
es = keras.callbacks.EarlyStopping(
    monitor='loss',
    patience=60,
    verbose=0,
    mode='auto'
)

# save_best_only=True，只保存在训练集上性能最好的模型，period说明CheckPoint之间的间隔的epoch数
mc = keras.callbacks.ModelCheckpoint(
    'resnet50_model_face.h5',
    monitor='acc',
    verbose=0,
    save_best_only=True,
    save_weights_only=False,
    mode='auto',
    period=2
)

# 当patience个epoch过去而模型性能不提升时，学习率减少的动作会被触发
# factor：每次减少学习率的因子，学习率将以learning_rate = lr*factor的形式被减少
# 学习率减少后，会经过cooldown个epoch才重新进行正常操作
# min_delta：阈值，用来确定是否进入检测值的“平原区”
rp = keras.callbacks.ReduceLROnPlateau(
    monitor='loss',
    factor=0.30,
    patience=20,
    verbose=0,
    mode='auto',
    min_delta=0.0001,
    cooldown=0,
    min_lr=0
)

# 将运行的结果记录下来，比较吃内存
tb = keras.callbacks.TensorBoard(log_dir='./logs',  # log 目录
                 histogram_freq=1,  # 按照何等频率（epoch）来计算直方图，0为不计算
                 batch_size=32,     # 用多大量的数据计算直方图
                 write_graph=True,  # 是否存储网络结构图
                 write_grads=False, # 是否可视化梯度直方图
                 write_images=False,# 是否可视化参数
                 embeddings_freq=0,
                 embeddings_layer_names=None,
                 embeddings_metadata=None)

# callbacks = [es, mc, rp, tb]
callbacks = [mc, rp]

model.compile(optimizer="adam", loss="mean_squared_error", metrics=["ce", "mae", "acc"])
model.fit(X_train, Y_train, epochs=680, batch_size=8, shuffle=True, callbacks=callbacks)

回调函数可以有多个，这些回调函数以及回调函数中的参数选择比较有意思，有以下几个粗浅的经验供参考：

（3.1）回调函数EarlyStopping对于提升accuracy用处不大，它的主要用处是早一点结束训练。当“鞍点”的平台期比较长的时候，设置较短的patience（比如：patience = 20）往往会导致尚未到达令人满意的水平（acc或者loss）训练就提前结束了；

（3.2）回调函数TensorBoard会消耗不少内存，内存紧张时可以考虑不用；

（3.3）回调函数ModelCheckpoint很有用，保存在训练集上性能最好的模型。保存模型会消耗一定的时间（因而会减慢训练的速度），比较好的一点是首次保存模型会消耗比较多的时间，第二次及以后的保存动作会快很多，建议把参数设置为period = 2；

（3.4）回调函数ReduceLROnPlateau能显著提升accuracy，参数factor设置合适时能避免模型在局部优化时的反复摆动，加快模型进一步收敛的速度。一些文章建议factor = 0.10会有特别好的效果，从本次的优化经验来看，在对StyleGAN的逆向网络的训练过程进行优化的过程中，factor = 0.30时效果比较好。

选择batch_size = 8，EarlyStopping（patienc = 60），ReduceLROnPlateau(factor = 0.30/patience = 12)时，我们训练得到的accuracy = 0.9266。

选择batch_size = 8，ReduceLROnPlateau(factor = 0.30/patience = 20)时，我们最终训练得到的accuracy = 0.9501。而且我们发现，accuracy从0.84左右到0.92左右有一个比较显著的跳跃，这是模型训练走出局部“鞍点”的情形。

下面是完整的lotus网络模型训练源代码（带中文注释）：

# -*- coding: UTF-8 -*-

import os,sys
import numpy as np
import scipy
from scipy import ndimage
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing import image

import keras
import pickle
import PIL.Image
import random

import dnnlib
import dnnlib.tflib as tflib
import config
import glob

# 设置已训练好的模型，用于StyleGAN正向生成人脸图片
#Model = './cache/generator_yellow.pkl'
Model = './cache/karras2019stylegan-ffhq-1024x1024.pkl'
#Model = './cache/2019-03-08-stylegan-animefaces-network-02051-021980.pkl'

synthesis_kwargs = dict(output_transform=dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True), minibatch_size=8)
_Gs_cache = dict()

# 设置图片文件的前缀
PREFIX = 'Person'
#PREFIX = 'Animation'

# 训练集和测试集中的图片数量
NUM_FIGURES = 1280

# 定义训练集和测试集的路径，这里创建了 train ， 和 test 文件夹
train_path_face = './dataset/train/face/'
train_path_face_dlatents = './dataset/train/face/dlatents/'
test_path_face = './dataset/test/face/'
test_path_face_dlatents = './dataset/test/face/dlatents/'

# 加载StyleGAN已训练好的网络模型
def load_Gs(model):
    if model not in _Gs_cache:
        model_file = glob.glob(Model)
        if len(model_file) == 1:
            model_file = open(model_file[0], "rb")
        else:
            raise Exception('Failed to find the model')

        _G, _D, Gs = pickle.load(model_file)
        # _G = Instantaneous snapshot of the generator. Mainly useful for resuming a previous training run.
        # _D = Instantaneous snapshot of the discriminator. Mainly useful for resuming a previous training run.
        # Gs = Long-term average of the generator. Yields higher-quality results than the instantaneous snapshot.

        # Print network details.
        Gs.print_layers()

        _Gs_cache[model] = Gs
    return _Gs_cache[model]


# 用StyleGAN生成图像，保存dlatents和图像到文件
def generate_dlatents_and_figures(Gs, w, h, num):

    # 生成的latents，大小是512
    for i in range(num):
        # 生成latents.
        SEED = i+ 200
        rnd = np.random.RandomState(SEED)
        latents = rnd.randn(1, Gs.input_shape[1])

        # 按照StyleGAN的网络架构，从z变换到w
        dlatents = Gs.components.mapping.run(latents, None)

        # 保存dlatents到文件
        save_name = PREFIX + '_' + str(SEED) + '.npy'

        os.makedirs(train_path_face_dlatents, exist_ok=True)
        save_path = os.path.join(train_path_face_dlatents, save_name)
        np.save(save_path, dlatents)

        os.makedirs(test_path_face_dlatents, exist_ok=True)
        save_path = os.path.join(test_path_face_dlatents, save_name)
        np.save(save_path, dlatents)

        # 从w生成图像
        images = Gs.components.synthesis.run(dlatents, randomize_noise=False, **synthesis_kwargs)

        # 保存图像到文件
        save_name = PREFIX + '_' + str(SEED) + '.png'

        os.makedirs(train_path_face, exist_ok=True)
        save_path = os.path.join(train_path_face, save_name)
        PIL.Image.fromarray(images[0], 'RGB').save(save_path)

        os.makedirs(test_path_face, exist_ok=True)
        save_path = os.path.join(test_path_face, save_name)
        PIL.Image.fromarray(images[0], 'RGB').save(save_path)


# 准备Resnet50的训练集，把StyleGAN生成的图片作为输入，对应的dlatents作为输出的验证，训练Resnet50，生成StyleGAN的一个反向网络
def DataSet():

    # os.listdir(path) 是 python 中的函数，它会列出 path 下的所有文件名
    os.makedirs(train_path_face, exist_ok=True)
    imglist_train_face = os.listdir(train_path_face)
    os.makedirs(train_path_face_dlatents, exist_ok=True)

    # 读取 /test/face 下的所有图片文件名
    os.makedirs(test_path_face, exist_ok=True)
    imglist_test_face = os.listdir(test_path_face)
    os.makedirs(test_path_face_dlatents, exist_ok=True)

    # 定义两个 numpy 对象，X_train 和 Y_train

    # X_train 对象用来存放训练集的图片。每张图片都需要转换成 numpy 向量形式
    # X_train 的 shape 是 (，256，256，3)
    # resnet50 缺省的输入图片尺寸是 (224,224) ,我们这里设置为(256,256)
    # 3 是图片的通道数（rgb）

    # Y_train 用来存放训练集中每张图片对应的dlatents
    # Y_train 的 shape 是 （，18,512），与StyleGAN的dlatents一致
    X_train = np.empty((len(imglist_train_face), 256, 256, 3))
    Y_train = np.empty((len(imglist_train_face), 18, 512))

    # count 对象用来计数，每添加一张图片便加 1
    count = 0
    # 遍历 /train/face 下所有图片，即训练集下所有的图片
    for img_name in imglist_train_face:
        # 得到图片的路径
        if img_name.endswith('png') or img_name.endswith('jpg'):
            img_path = os.path.join(train_path_face, img_name)
            # 通过 image.load_img() 函数读取对应的图片，并转换成目标大小
            # image 是 tensorflow.keras.preprocessing 中的一个对象
            img = image.load_img(img_path, target_size=(256, 256))
            # 将图片转换成 numpy 数组，并除以 255 ，归一化
            # 转换之后 img 的 shape 是 （256，256，3）
            img = image.img_to_array(img) / 255.0

            # 将处理好的图片装进定义好的 X_train 对象中
            X_train[count] = img

            # 将对应的标签装进 Y_train 对象中
            # 这里需要载入StyleGAN生成图片s时对应的dlatents
            only_name = os.path.splitext(img_name)[0]
            img_name = only_name + '.npy'
            img_path = os.path.join(train_path_face_dlatents, img_name)
            Y_train[count] = np.load(img_path)
            count += 1

    # 准备测试集的数据
    X_test = np.empty((len(imglist_test_face), 256, 256, 3))
    Y_test = np.empty((len(imglist_test_face), 18, 512))
    """
    count = 0
    for img_name in imglist_test_face:
        if img_name.endswith('png') or img_name.endswith('jpg'):
            img_path = os.path.join(test_path_face, img_name)
            img = image.load_img(img_path, target_size=(256, 256))
            img = image.img_to_array(img) / 255.0
            X_test[count] = img

            only_name = os.path.splitext(img_name)[0]
            img_name = only_name + '.npy'
            img_path = os.path.join(test_path_face_dlatents, img_name)
            Y_test[count] = np.load(img_path)
            count += 1
    """
    # 打乱训练集中的数据
    index = [i for i in range(len(X_train))]
    random.shuffle(index)
    X_train = X_train[index]
    Y_train = Y_train[index]

    # 打乱测试集中的数据
    index = [i for i in range(len(X_test))]
    random.shuffle(index)
    X_test = X_test[index]
    Y_test = Y_test[index]

    return X_train, Y_train, X_test, Y_test

# 定义StyleGAN的逆向网络模型lotus
# 下面的功能函数均使用keras原生函数构造
def lotus_body(x):

    # input: (none, 256, 256, 3), output: (none, 8, 8,2048)
    # 必须设定include_top=False, weights=None, 才能将输入设为256x256x3
    # resnet输出C5，C5的shape是(none, 8, 8, 2048)
    resnet = keras.applications.resnet50.ResNet50(include_top=False, weights=None, input_tensor=x, input_shape=(256,256,3))
    y = resnet.output
    print('ResNet50 C5 shape : ', y.shape)

    # output: (none, 8, 8, 144)
    # 输出feature maps = filters = 144, 2D卷积输出的高和宽8 - 1 + 1 = 8
    y = keras.layers.convolutional.Conv2D(144, (1, 1), padding='same', activation='relu')(y)

    # output: (none, 96, 96)
    y = keras.layers.Reshape((96, 96))(y)

    for i in range(3):
        # output: (none, 96, 96)
        # 输出feature maps = filters = 96, 1D卷积输出的长度96 - 1 + 1 = 96
        y = keras.layers.local.LocallyConnected1D(96, 1)(y)

        # output: (none, 96, 96)，（2，1）代表将输入的第二个维度重拍到输出的第一个维度，而将输入的第一个维度重排到第二个维度
        y = keras.layers.Permute((2, 1))(y)

    # output:(none, 96, 96)
    y = keras.layers.local.LocallyConnected1D(96, 1)(y)

    # output: (none, 18, 512)
    y = keras.layers.Reshape((18, 512))(y)
    print('lotus body output shape : ', y.shape)

    return y

# model.fit()的callbacks函数
# 若发现monitor相比上一个epoch训练没有改进，则经过patience个epoch后停止训练
es = keras.callbacks.EarlyStopping(
    monitor='loss',
    patience=60,
    verbose=0,
    mode='auto'
)

# save_best_only=True，只保存在训练集上性能最好的模型，period说明CheckPoint之间的间隔的epoch数
mc = keras.callbacks.ModelCheckpoint(
    'resnet50_model_face.h5',
    monitor='acc',
    verbose=0,
    save_best_only=True,
    save_weights_only=False,
    mode='auto',
    period=2
)

# 当patience个epoch过去而模型性能不提升时，学习率减少的动作会被触发
# factor：每次减少学习率的因子，学习率将以learning_rate = lr*factor的形式被减少
# 学习率减少后，会经过cooldown个epoch才重新进行正常操作
# min_delta：阈值，用来确定是否进入检测值的“平原区”
rp = keras.callbacks.ReduceLROnPlateau(
    monitor='loss',
    factor=0.30,
    patience=20,
    verbose=0,
    mode='auto',
    min_delta=0.0001,
    cooldown=0,
    min_lr=0
)

# 将运行的结果记录下来，比较吃内存
tb = keras.callbacks.TensorBoard(log_dir='./logs',  # log 目录
                 histogram_freq=1,  # 按照何等频率（epoch）来计算直方图，0为不计算
                 batch_size=32,     # 用多大量的数据计算直方图
                 write_graph=True,  # 是否存储网络结构图
                 write_grads=False, # 是否可视化梯度直方图
                 write_images=False,# 是否可视化参数
                 embeddings_freq=0,
                 embeddings_layer_names=None,
                 embeddings_metadata=None)

# callbacks = [es, mc, rp, tb]
callbacks = [mc, rp]

# 主程序
def main():
    tflib.init_tf()

    # 第一次生成训练数据时使用，生成后可以注释掉，使得主程序专注于训练
    # generate_dlatents_and_figures(load_Gs(Model), w=1024, h=1024, num=NUM_FIGURES)

    X_train, Y_train, X_test, Y_test = DataSet()

    inputs = keras.Input(shape=(256, 256, 3))
    model = keras.Model(inputs, lotus_body(inputs))

    # 损失函数使用mean_squared_error
    model.compile(optimizer="adam", loss="mean_squared_error", metrics=["ce", "mae", "acc"])
    training = model.fit(X_train, Y_train, epochs=680, batch_size=8, shuffle=True, callbacks=callbacks)

    # 画图看一下训练的效果
    plt.plot(training.history['acc'])
    plt.plot(training.history['loss'])
    plt.title('model acc and loss')
    plt.xlabel('epoch')
    plt.ylabel('acc')
    plt.legend(['acc', 'loss'], loc='upper left')
    plt.show()

    # model.evaluate(X_test, Y_test, batch_size=32)

    # 把训练好的模型保存到文件
    model.save('resnet50_model_face.h5')
    print('Trainning completed!')

    # 用逆向网络lotus生成一个StyleGan的对比样片
    # model = keras.models.load_model('resnet50_model_face.h5')

    img_path = "f:\AI\stylegan-master\dataset\\train\\face\Person_397.png"
    img = image.load_img(img_path, target_size=(256, 256))
    plt.imshow(img)
    img = image.img_to_array(img) / 255.0
    img = np.expand_dims(img, axis=0)  # 为batch添加第四维

    predict_dlatents = model.predict(img)
    print('predict_dlatents.shape: ', predict_dlatents.shape)

    # 从w生成图像
    Gs = load_Gs(Model)
    resnet50_images = Gs.components.synthesis.run(predict_dlatents, randomize_noise=False, **synthesis_kwargs)

    # 画空白画布
    num = len(resnet50_images)
    canvas = PIL.Image.new('RGB', (1024, 1024 * num ), 'white')

    # 绘制图像
    for row, resnet50_image in enumerate(list(resnet50_images)):
        canvas.paste(PIL.Image.fromarray(resnet50_image, 'RGB'), (0, 1024 * row))

    canvas.save(os.path.join(config.family_dir, 'resnet50_001.png'))
    print('resnet50_001.png generated.')


if __name__ == "__main__":
    main()

（完）

轻轻松松使用StyleGAN（五）：提取真实人脸特征码的一些探索

amao93

发布了32 篇原创文章 · 获赞 75 · 访问量 3万+

私信关注

轻轻松松使用StyleGAN（四）：对StyleGAN的逆向网络的训练过程进行优化

猜你喜欢