深度学习之图片语义分割代码实现

使用tensorflow和python，vgg基础上实现FCN8s网络，实现图片语义分割：

数据集：VOC2012/ImageSets/Segmentation中，分为train.txt 1464张图片和val.txt1449张图片。

# class
classes = ['background', 'aeroplane', 'bicycle', 'bird', 'boat',
           'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable',
           'dog', 'horse', 'motorbike', 'person', 'potted plant',
           'sheep', 'sofa', 'train', 'tv/monitor']

# RGB color for each class
colormap = [[0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0], [0, 0, 128],
            [128, 0, 128], [0, 128, 128], [128, 128, 128], [64, 0, 0], [192, 0, 0],
            [64, 128, 0], [192, 128, 0], [64, 0, 128], [192, 0, 128],
            [64, 128, 128], [192, 128, 128], [0, 64, 0], [128, 64, 0],
            [0, 192, 0], [128, 192, 0], [0, 64, 128]]

预训练模型：modelzoo中的VGG16模型

数据集处理：使用convert_fcn_dataset.py脚本，生成tfrecord文件

FCN8s理解：

这是个基于VGG_16的全卷积网络示意图，输入是500X500(要把输入变换处理成离原图尺寸最近的32的倍数，例如:512x512)，在最后的输出，没有做展平处理，还是保留了16x16x4096, 由于共有21个分类，分类输出结果是16x16x21,

对结果做一个2x上采样（或者叫反卷积或者是卷积转置），得到32x32x21，然后从vgg_16/pool4中取出feature maps，通过一个分类器（1x1的卷积）后得到32x32x21的结果，再将这两个同维度的结果通过element wise相加得到的还是32x32x21，接着再用此结果做一个2x上采样，得到的结果是64x64x21。这时，从vgg_16/pool3中取出feature maps，同样通过一个分类器（1x1的卷积）后得到64x64x21的结果，然后同样将这两个同维度的结果进行element wise相加得到的还是64x64x21的结果，对这个结果，最后执行一个8x的上采样，就可以得到和原来输入一致的结果512x512x21

如此，经过2个2x和1个8x的上采样，就得到了FCN8s

代码实现：

#从vgg_16中的endpoints['vgg_16/pool4']中取出feature maps,然后再通过一个1x1的卷积对feature maps做
#一个21分类处理，初始化用的是zeros_initializer,所以输出的结果不会有任何改变,赋值到aux_logits_16s，结果是32x32x21
pool4_feature = end_points['vgg_16/pool4']
with tf.variable_scope('vgg_16/fc8'):
    aux_logits_16s = slim.conv2d(pool4_feature, number_of_classes, [1, 1],
                                 activation_fn=None,
                                 weights_initializer=tf.zeros_initializer,
                                 scope='conv_pool4')


upsample_filter_np_x2 = bilinear_upsample_weights(2,  #2x上采样的双线性插值filter权重
                                                  number_of_classes)

upsample_filter_tensor_x2 = tf.Variable(upsample_filter_np_x2, name='vgg_16/fc8/t_conv_x2') #2x上采样filter

#对vgg_16最后输出的logits,执行2x的上采样，得到的结果是和aux_logits_16s一样大小，为32x32x21
upsampled_logits = tf.nn.conv2d_transpose(logits, upsample_filter_tensor_x2,
                                          output_shape=tf.shape(aux_logits_16s),
                                          strides=[1, 2, 2, 1],
                                          padding='SAME')

#将两个同维度的feature maps做element wise相加，结果还是32x32x21，放入upsampled_logits
upsampled_logits = upsampled_logits + aux_logits_16s

#再从vgg_16中的endpoints['vgg_16/pool3']中取出feature maps,然后再通过一个1x1的卷积对feature maps做
#一个21分类处理，初始化用的是zeros_initializer,所以输出的结果不会有任何改变,赋值到aux_logits_8s，结果是64x64x21
pool3_feature = end_points['vgg_16/pool3']
with tf.variable_scope('vgg_16/fc8'):
    aux_logits_8s = slim.conv2d(pool3_feature, number_of_classes, [1, 1],
                             activation_fn=None,
                             weights_initializer=tf.zeros_initializer,
                             scope='conv_pool3')

#对upsampled_logits再做一次2x的上采样，得到的结果和aux_logits_8s一样大小，为64x64x21
upsampled_logits = tf.nn.conv2d_transpose(upsampled_logits, upsample_filter_tensor_x2,
                                 output_shape=tf.shape(aux_logits_8s),
                                 strides=[1, 2, 2, 1],
                                         padding='SAME')

#将两个同维度的feature maps做element wise相加，结果还是64x64x21，仍然赋值回upsampled_logits
upsampled_logits = upsampled_logits + aux_logits_8s

upsample_filter_np_x8 = bilinear_upsample_weights(upsample_factor, #8x上采样的双线性插值filter权重
                                               number_of_classes)

upsample_filter_tensor_x8 = tf.Variable(upsample_filter_np_x8, name='vgg_16/fc8/t_conv_x8') #8x上采样filter

#对64x44x21的upsampled_logits最后做一个8x上采样，可以得到和验证时的模型输入一样大小的结果512x352x21
upsampled_logits = tf.nn.conv2d_transpose(upsampled_logits, upsample_filter_tensor_x8,
                                 output_shape=upsampled_logits_shape,
                                 strides=[1, upsample_factor, upsample_factor, 1],
                                         padding='SAME')

训练：train.py

生成语义分割图片和经CRF的图片

深度学习之图片语义分割代码实现

猜你喜欢