Source resolve - image segmentation -FCN network

FCN Network Introduction

FCN network by article Fully Convolutional Networks for Semantic Segmentation . The traditional multi-network CNN fully connected to the end of the layer for image classification level. Semantic Segmentation ( the Semantic Segmentation ) issues to each pixel point in the image are classified. Although fully connected layers global semantic information can be acquired, but it will destroy the local position information of the pixel. FCN network layer first to a convolutional alternate layers fully connected, end to end to achieve the predicted pixel level, as shown below.
Alt

FCN network source

Knowledge about the network structure FCN FCN please refer to the code under other blog, this article focuses on Tensorflow framework, the code is derived from ( shekkizh / FCN.tensorflow ), the structure is relatively clear, ideal for entry-level analysis. Code is divided into inference(construct a network structure, return prediction result), train(gradient calculation, to update the parameter) and main(operation data, Loss calculation) of three parts.

inference section

def vgg_net(weights, image):
    """
    使用vgg_net作为基网络,使用预训练模型中的参数初始化网络,返回网络各层的结果
    :param weights:     预训练模型参数,作为权重和偏置初始值
    :param image:       网络的输入
    :return:            网络各层的计算结果
    """
    # vgg_net19网络的结构,舍弃了后面的全连接层
    layers = (
        'conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1',

        'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2',

        'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3',
        'relu3_3', 'conv3_4', 'relu3_4', 'pool3',

        'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3',
        'relu4_3', 'conv4_4', 'relu4_4', 'pool4',

        'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3',
        'relu5_3', 'conv5_4', 'relu5_4'
    )

    net = {}
    current = image
    for i, name in enumerate(layers):
        kind = name[:4]     # 根据
        if kind == 'conv':
            kernels, bias = weights[i][0][0][0][0]      # 从预训练模型中提取出权重和偏置参数

            # matconvnet: weights are [width, height, in_channels, out_channels]
            # tensorflow: weights are [height, width, in_channels, out_channels]
            # 预训练模型mat文件中的存储格式位[W,H,N,C],tensorflow中对卷积核的格式应为[N,C,H,W]
            kernels = utils.get_variable(np.transpose(kernels, (1, 0, 2, 3)), name=name + "_w")
            bias = utils.get_variable(bias.reshape(-1), name=name + "_b")

            '''使用weights的值创建一个变量
            如果不想使用预训练模型中的值,可以使用截断正态分布来初始化权重参数,使用常量初始化偏置参数
            def get_variable(weights, name):
                init = tf.constant_initializer(weights, dtype=tf.float32)
                var = tf.get_variable(name=name, initializer=init,  shape=weights.shape)
                return var
            '''

            current = utils.conv2d_basic(current, kernels, bias)    # stride=1的卷积
        elif kind == 'relu':
            current = tf.nn.relu(current, name=name)
            if FLAGS.debug:
                # 将current加入到tensorboard中,用于可视化参数在训练过程中的分布及变化情况
                utils.add_activation_summary(current)
        elif kind == 'pool':
            current = utils.avg_pool_2x2(current)   # 2*2平均池化
        net[name] = current     # 保存该层的输出结果

    return net


def inference(image, keep_prob):
    """
    计算FCN网络的输出
    Semantic segmentation network definition
    :param image: 输入的彩色图像,范围0-255,shape  =[N, IMAGE_SIZE, IMAGE_SIZE, 3]
    :param keep_prob: dropout比例
    :return:
    """

    print("setting up vgg initialized conv layers ...")
    # 从FLAGS.model_dir下读取imagenet-vgg-verydeep-19.mat文件,如果不存在则先根据MODEL_URL链接下载文件
    # 这个函数看着复杂,但是其实就是一个读取mat文件(scipy.io.loadmat函数),返回结果的过程。
    model_data = utils.get_model_data(FLAGS.model_dir, MODEL_URL)

    mean = model_data['normalization'][0][0][0]
    mean_pixel = np.mean(mean, axis=(0, 1))     # 从预训练模型文件中提取图像文件的均值

    # 这个操作与预训练模型mat文件的存储格式相关,想了解它的格式可写个脚本读取,一步一步解析,但可能不值得
    # 参考:https://zhuanlan.zhihu.com/p/28897952
    weights = np.squeeze(model_data['layers'])

    processed_image = utils.process_image(image, mean_pixel)    # image - mean_pixel,减去均值

    with tf.variable_scope("inference"):
        image_net = vgg_net(weights, processed_image)   # 计算VGG的各层输出结果
        conv_final_layer = image_net["conv5_3"]         # 提取conv5_3层的结果,图像分辨率为输入图像的1/16

        pool5 = utils.max_pool_2x2(conv_final_layer)    # max pool,分辨率再降

        # 下面这种代码块包括创建变量,然后计算。推荐使用TensorFlow-Slim代码会简洁很多
        W6 = utils.weight_variable([7, 7, 512, 4096], name="W6")    # conv7*7-4096, relu, dropout
        b6 = utils.bias_variable([4096], name="b6")
        conv6 = utils.conv2d_basic(pool5, W6, b6)
        relu6 = tf.nn.relu(conv6, name="relu6")
        if FLAGS.debug:
            utils.add_activation_summary(relu6)
        relu_dropout6 = tf.nn.dropout(relu6, keep_prob=keep_prob)

        W7 = utils.weight_variable([1, 1, 4096, 4096], name="W7")   # conv1*1-4096, relu, dropout
        b7 = utils.bias_variable([4096], name="b7")
        conv7 = utils.conv2d_basic(relu_dropout6, W7, b7)
        relu7 = tf.nn.relu(conv7, name="relu7")
        if FLAGS.debug:
            utils.add_activation_summary(relu7)
        relu_dropout7 = tf.nn.dropout(relu7, keep_prob=keep_prob)

        W8 = utils.weight_variable([1, 1, 4096, NUM_OF_CLASSESS], name="W8")    # conv1*1-4096, relu, dropout
        b8 = utils.bias_variable([NUM_OF_CLASSESS], name="b8")
        conv8 = utils.conv2d_basic(relu_dropout7, W8, b8)
        # annotation_pred1 = tf.argmax(conv8, dimension=3, name="prediction1")

        '''FCN网络中conv8已经是预测的结果,但是此时图像的分辨率为原始的1/32。为了获得更加准确的边界信息,
        直接进行线性插值上采样得到的结果并不好,因此使用转置卷积并与中间层的结果融合,提高输出的分辨率。
        (也可使用插值上采样加3*3卷积替代转置卷积)'''

        # now to upscale to actual image size
        # 与VGG中间层pool4的结果融合,分辨率从1/32变为1/16
        deconv_shape1 = image_net["pool4"].get_shape()                  # pool4的尺寸
        W_t1 = utils.weight_variable([4, 4, deconv_shape1[3].value, NUM_OF_CLASSESS], name="W_t1")
        b_t1 = utils.bias_variable([deconv_shape1[3].value], name="b_t1")
        # k_size=4, stride=2,则图像分辨率扩大2倍
        conv_t1 = utils.conv2d_transpose_strided(conv8, W_t1, b_t1, output_shape=tf.shape(image_net["pool4"]))
        fuse_1 = tf.add(conv_t1, image_net["pool4"], name="fuse_1")     # 与pool4的结果相加

        # 与VGG中间层pool3的结果融合,分辨率从1/16变为1/8
        deconv_shape2 = image_net["pool3"].get_shape()
        W_t2 = utils.weight_variable([4, 4, deconv_shape2[3].value, deconv_shape1[3].value], name="W_t2")
        b_t2 = utils.bias_variable([deconv_shape2[3].value], name="b_t2")
        # k_size=4, stride=2,则图像分辨率扩大2倍
        conv_t2 = utils.conv2d_transpose_strided(fuse_1, W_t2, b_t2, output_shape=tf.shape(image_net["pool3"]))
        fuse_2 = tf.add(conv_t2, image_net["pool3"], name="fuse_2")     # 与pool3的结果相加

        # 还原到原始图像大小,分辨率从1/8变为1/1
        shape = tf.shape(image)
        deconv_shape3 = tf.stack([shape[0], shape[1], shape[2], NUM_OF_CLASSESS])
        W_t3 = utils.weight_variable([16, 16, NUM_OF_CLASSESS, deconv_shape2[3].value], name="W_t3")
        b_t3 = utils.bias_variable([NUM_OF_CLASSESS], name="b_t3")
        # k_size=16, stride=8,则图像分辨率扩大8倍(SAME模式下,分辨率变化仅与stride有关)
        conv_t3 = utils.conv2d_transpose_strided(fuse_2, W_t3, b_t3, output_shape=deconv_shape3, stride=8)

        # conv_t3.shape=[N, H, W, NUM_OF_CLASSESS]
        # 取NUM_OF_CLASSESS中值最大的作为预测的分类结果,annotation_pred.shape=[N,H,W]
        annotation_pred = tf.argmax(conv_t3, dimension=3, name="prediction")

    return tf.expand_dims(annotation_pred, dim=3), conv_t3

1. The FLAGS.debug==Truecontents of TensorBoard related, mainly for visualization (variable distribution, changes in variables, etc.), can ignore that early part of the code.
2. transpose convolution prone checkerboard effect , a method for avoiding k_size stride is an integer multiple relationship, the upsampling may also be added to replace the convolution transpose convolution.
3. Fusion of the intermediate layer, may be tf.add()added directly, it can also be used tf.concat()spliced together.

train part

def train(loss_val, var_list):
    """
    计算并更新梯度。Optimizer.minimize()函数中直接包含compute_gradients()和apply_gradients()两步
    :param loss_val: loss值
    :param var_list: 需更新的变量,一般都是用tf.trainable_variables()获取所有需训练的变量
    :return: 梯度更新操作
    """
    optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)             # 创建Adam优化器
    grads = optimizer.compute_gradients(loss_val, var_list=var_list)    # 计算变量的梯度
    # 此处可增加tf.clip_by_value()操作,对梯度进行修建,避免梯度消失或者爆炸之类的情况
    if FLAGS.debug:
        # print(len(var_list))
        for grad, var in grads:
            utils.add_gradient_summary(grad, var)       # 将变量加到tensorboard中可视化
    # 更新梯度。
    return optimizer.apply_gradients(grads)

1. This portion can be small changes, select other optimization methods (Momentum, Adagrad etc.), learning rate may be attenuated (exponential decay, polynomial attenuation, etc.).

main section

def main(argv=None):
    # 设置dropout,输入图像,真实值的占位,实际训练或预测时需要传入sess.run()中
    keep_probability = tf.placeholder(tf.float32, name="keep_probabilty")
    image = tf.placeholder(tf.float32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 3], name="input_image")
    annotation = tf.placeholder(tf.int32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 1], name="annotation")

    # pred_annotation.shape=[N, H, W, 1],网络的预测结果,像素对应的得分值最大的类别
    # logits.shape=[N, H, W, NUM_OF_CLASSESS],网络每个像素对应每类标签的得分值
    pred_annotation, logits = inference(image, keep_probability)    # 计算FCN网络的预测结果

    tf.summary.image("input_image", image, max_outputs=2)
    tf.summary.image("ground_truth", tf.cast(annotation, tf.uint8), max_outputs=2)
    tf.summary.image("pred_annotation", tf.cast(pred_annotation, tf.uint8), max_outputs=2)  # 还是可视化

    # 计算logits与annotation的交叉熵
    '''sparse_softmax_cross_entropy_with_logits()函数要求labels.shape=[d_0, d_1, ..., d_{r-1}],
    logits.shape=[d_0, d_1, ..., d_{r-1}, num_classes]。该函数比softmax_cross_entropy_with_logits()多一个
    标签稀疏化的步骤,计算的时候会先将labels进行稀疏化(one-hot编码,例如labels=[3]的10分类问题,
    会将其转化为[0,0,0,1,0,0,0,0,0,0]),然后再计算softmax值,再计算交叉熵,返回每个神经元的结果,形状与labels一致'''
    loss = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
                                                                          labels=tf.squeeze(annotation, squeeze_dims=[3]),
                                                                          name="entropy")))
    loss_summary = tf.summary.scalar("entropy", loss)

    trainable_var = tf.trainable_variables()    # 获取所有的可训练变量
    if FLAGS.debug:
        for var in trainable_var:
            utils.add_to_regularization_and_summary(var)
    train_op = train(loss, trainable_var)       # 计算更新梯度

    print("Setting up summary op...")
    summary_op = tf.summary.merge_all()         # 汇总所有的summary

    print("Setting up image reader...")
    train_records, valid_records = scene_parsing.read_dataset(FLAGS.data_dir)   # 读取数据集,此处只是获取图像和标签的路径名
    print(len(train_records))
    print(len(valid_records))

    # 此处是的数据集操作了解一下就行。大致为以下步骤:获取文件名,然后读取图像,调整图像大小,按照序列每次取batch_size个图像
    # 作为网络的输入。一个epoch结束后,打乱序列后重新选取数据作为输入。
    # 此处读取全部训练集和验证集的图像数据,加载到内存中。但是如果内存有限或者数据集非常大,可以在实际输入时再读取图像数据。
    print("Setting up dataset reader")
    image_options = {'resize': True, 'resize_size': IMAGE_SIZE}     # 图像读取时需进行缩放
    if FLAGS.mode == 'train':
        train_dataset_reader = dataset.BatchDatset(train_records, image_options)
    validation_dataset_reader = dataset.BatchDatset(valid_records, image_options)

    sess = tf.Session()

    print("Setting up Saver...")
    saver = tf.train.Saver()

    # create two summary writers to show training loss and validation loss in the same graph
    # need to create two folders 'train' and 'validation' inside FLAGS.logs_dir
    train_writer = tf.summary.FileWriter(FLAGS.logs_dir + '/train', sess.graph)     # 用于将summary写入到文件中
    validation_writer = tf.summary.FileWriter(FLAGS.logs_dir + '/validation')

    sess.run(tf.global_variables_initializer())     # 变量初始化
    ckpt = tf.train.get_checkpoint_state(FLAGS.logs_dir)
    if ckpt and ckpt.model_checkpoint_path:         # 如何存在模型文件,则加载文件还原当前sess中的变量
        saver.restore(sess, ckpt.model_checkpoint_path)
        print("Model restored...")

    # 训练模式
    if FLAGS.mode == "train":
        for itr in xrange(MAX_ITERATION):
            # 从训练集中读取FLAGS.batch_size大小的图像和标签数据,函数内部会自行调整读取的图像序列
            train_images, train_annotations = train_dataset_reader.next_batch(FLAGS.batch_size)
            feed_dict = {image: train_images, annotation: train_annotations, keep_probability: 0.85}

            sess.run(train_op, feed_dict=feed_dict)         # 训练

            if itr % 10 == 0:   # 每10次计算当前的loss,并保存到文件中
                train_loss, summary_str = sess.run([loss, loss_summary], feed_dict=feed_dict)
                print("Step: %d, Train_loss:%g" % (itr, train_loss))
                train_writer.add_summary(summary_str, itr)  # 将summary_str写入到文件中

            if itr % 500 == 0:  # 每500次使用验证集中的数据计算当前网络的loss,并保存到文件中
                valid_images, valid_annotations = validation_dataset_reader.next_batch(FLAGS.batch_size)
                valid_loss, summary_sva = sess.run([loss, loss_summary],
                                                   feed_dict={image: valid_images, annotation: valid_annotations,
                                                              keep_probability: 1.0})
                print("%s ---> Validation_loss: %g" % (datetime.datetime.now(), valid_loss))

                # add validation loss to TensorBoard
                validation_writer.add_summary(summary_sva, itr)
                saver.save(sess, FLAGS.logs_dir + "model.ckpt", itr)
    # 测试模式,
    elif FLAGS.mode == "visualize":
        # 从测试集中随机获取FLAGS.batch_size个图像数据
        valid_images, valid_annotations = validation_dataset_reader.get_random_batch(FLAGS.batch_size)
        pred = sess.run(pred_annotation, feed_dict={image: valid_images, annotation: valid_annotations,
                                                    keep_probability: 1.0})     # 计算预测结果
        valid_annotations = np.squeeze(valid_annotations, axis=3)               # 真实值
        pred = np.squeeze(pred, axis=3)

        # 保存原始图像、真实分割图像和预测的分割图像
        for itr in range(FLAGS.batch_size):
            utils.save_image(valid_images[itr].astype(np.uint8), FLAGS.logs_dir, name="inp_" + str(5+itr))
            utils.save_image(valid_annotations[itr].astype(np.uint8), FLAGS.logs_dir, name="gt_" + str(5+itr))
            utils.save_image(pred[itr].astype(np.uint8), FLAGS.logs_dir, name="pred_" + str(5+itr))
            print("Saved image: %d" % itr)

1. image data, it is recommended to use opencv library for processing, code scipy.misc like recent versions of compatibility a problem.
2. The image segmentation data and the tag data is input image, relatively simple to operate, not on the target detection but also as label box complex conversion loss values to be calculated. Depending on how the data is stored, write your own image reading, the expansion of (crop, reverse, channel conversion, etc.) of the class, trying to write their own proposals to familiarize yourself with a basic knowledge of image processing.

Guess you like

Origin www.cnblogs.com/Relu110/p/11031822.html