深度学习之图片语义分割代码实现

使用tensorflow和python,vgg基础上实现FCN8s网络,实现图片语义分割:

fcn论文: https://arxiv.org/abs/1411.4038

数据集:VOC2012/ImageSets/Segmentation中,分为train.txt 1464张图片和val.txt1449张图片。

# class
classes = ['background', 'aeroplane', 'bicycle', 'bird', 'boat',
           'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable',
           'dog', 'horse', 'motorbike', 'person', 'potted plant',
           'sheep', 'sofa', 'train', 'tv/monitor']

# RGB color for each class
colormap = [[0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0], [0, 0, 128],
            [128, 0, 128], [0, 128, 128], [128, 128, 128], [64, 0, 0], [192, 0, 0],
            [64, 128, 0], [192, 128, 0], [64, 0, 128], [192, 0, 128],
            [64, 128, 128], [192, 128, 128], [0, 64, 0], [128, 64, 0],
            [0, 192, 0], [128, 192, 0], [0, 64, 128]]

预训练模型:modelzoo中的VGG16模型

数据集处理:使用convert_fcn_dataset.py脚本,生成tfrecord文件

FCN8s理解:

这是个基于VGG_16的全卷积网络示意图,输入是500X500(要把输入变换处理成离原图尺寸最近的32的倍数,例如:512x512),在最后的输出,没有做展平处理,还是保留了16x16x4096, 由于共有21个分类,分类输出结果是16x16x21,

对结果做一个2x上采样(或者叫反卷积或者是卷积转置),得到32x32x21,然后从vgg_16/pool4中取出feature maps,通过一个分类器(1x1的卷积)后得到32x32x21的结果,再将这两个同维度的结果通过element wise相加得到的还是32x32x21,接着再用此结果做一个2x上采样,得到的结果是64x64x21。这时,从vgg_16/pool3中取出feature maps,同样通过一个分类器(1x1的卷积)后得到64x64x21的结果,然后同样将这两个同维度的结果进行element wise相加得到的还是64x64x21的结果,对这个结果,最后执行一个8x的上采样,就可以得到和原来输入一致的结果512x512x21

如此,经过2个2x和1个8x的上采样,就得到了FCN8s

代码实现:

#从vgg_16中的endpoints['vgg_16/pool4']中取出feature maps,然后再通过一个1x1的卷积对feature maps做
#一个21分类处理,初始化用的是zeros_initializer,所以输出的结果不会有任何改变,赋值到aux_logits_16s,结果是32x32x21
pool4_feature = end_points['vgg_16/pool4']
with tf.variable_scope('vgg_16/fc8'):
    aux_logits_16s = slim.conv2d(pool4_feature, number_of_classes, [1, 1],
                                 activation_fn=None,
                                 weights_initializer=tf.zeros_initializer,
                                 scope='conv_pool4')


upsample_filter_np_x2 = bilinear_upsample_weights(2,  #2x上采样的双线性插值filter权重
                                                  number_of_classes)

upsample_filter_tensor_x2 = tf.Variable(upsample_filter_np_x2, name='vgg_16/fc8/t_conv_x2') #2x上采样filter

#对vgg_16最后输出的logits,执行2x的上采样,得到的结果是和aux_logits_16s一样大小,为32x32x21
upsampled_logits = tf.nn.conv2d_transpose(logits, upsample_filter_tensor_x2,
                                          output_shape=tf.shape(aux_logits_16s),
                                          strides=[1, 2, 2, 1],
                                          padding='SAME')

#将两个同维度的feature maps做element wise相加,结果还是32x32x21,放入upsampled_logits
upsampled_logits = upsampled_logits + aux_logits_16s

#再从vgg_16中的endpoints['vgg_16/pool3']中取出feature maps,然后再通过一个1x1的卷积对feature maps做
#一个21分类处理,初始化用的是zeros_initializer,所以输出的结果不会有任何改变,赋值到aux_logits_8s,结果是64x64x21
pool3_feature = end_points['vgg_16/pool3']
with tf.variable_scope('vgg_16/fc8'):
    aux_logits_8s = slim.conv2d(pool3_feature, number_of_classes, [1, 1],
                             activation_fn=None,
                             weights_initializer=tf.zeros_initializer,
                             scope='conv_pool3')

#对upsampled_logits再做一次2x的上采样,得到的结果和aux_logits_8s一样大小,为64x64x21
upsampled_logits = tf.nn.conv2d_transpose(upsampled_logits, upsample_filter_tensor_x2,
                                 output_shape=tf.shape(aux_logits_8s),
                                 strides=[1, 2, 2, 1],
                                         padding='SAME')

#将两个同维度的feature maps做element wise相加,结果还是64x64x21,仍然赋值回upsampled_logits
upsampled_logits = upsampled_logits + aux_logits_8s

upsample_filter_np_x8 = bilinear_upsample_weights(upsample_factor, #8x上采样的双线性插值filter权重
                                               number_of_classes)

upsample_filter_tensor_x8 = tf.Variable(upsample_filter_np_x8, name='vgg_16/fc8/t_conv_x8') #8x上采样filter

#对64x44x21的upsampled_logits最后做一个8x上采样,可以得到和验证时的模型输入一样大小的结果512x352x21
upsampled_logits = tf.nn.conv2d_transpose(upsampled_logits, upsample_filter_tensor_x8,
                                 output_shape=upsampled_logits_shape,
                                 strides=[1, upsample_factor, upsample_factor, 1],
                                         padding='SAME')

训练:train.py

生成语义分割图片和经CRF的图片

猜你喜欢

转载自blog.csdn.net/weixin_41694971/article/details/81349993