开源框架PointNet 代码详解——/pointnet/sem_seg/model.py

本文介绍用于点云语义分割(Semantic Segmentation in Scenes)的模型文件 /pointnet/sem_seg/model.py。

原论文:PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (Charles R. Qi,  Hao Su,  Kaichun Mo, Leonidas J. Guibas         Stanford University)

def placeholder_inputs(batch_size, num_point):
    pointclouds_pl = tf.placeholder(tf.float32,
                                    shape=(batch_size, num_point, 9))
    labels_pl = tf.placeholder(tf.int32,
                                shape=(batch_size, num_point))
    return pointclouds_pl, labels_pl

placeholder_inputs 函数返回点云及其标签的placeholder,注意这里的点云第三个维度的size即每个点的维度是9 而不是 3。

def get_model(point_cloud, is_training, bn_decay=None):
    """ ConvNet baseline, input is BxNx3 gray image """
    batch_size = point_cloud.get_shape()[0].value
    num_point = point_cloud.get_shape()[1].value

    input_image = tf.expand_dims(point_cloud, -1)
    # CONV
    net = tf_util.conv2d(input_image, 64, [1,9], padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training, scope='conv1', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 64, [1,1], padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training, scope='conv2', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 64, [1,1], padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training, scope='conv3', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1], padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training, scope='conv4', bn_decay=bn_decay)
    points_feat1 = tf_util.conv2d(net, 1024, [1,1], padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training, scope='conv5', bn_decay=bn_decay)

get_model函数最后返回卷积等计算后的预测结果。

输入点云point_cloud有3个axis,即B×N×9,tf.expand_dims(point_cloud, -1) 将点云最后加上一个size为1 的axis 作为 input_image(B×N×9×1),则input_image的channel数为1。

tf_util.conv2d 函数 定义在/pointnet/tf_util.py 里,函数原型如下所示:

def conv2d(inputs,
           num_output_channels,
           kernel_size,
           scope,
           stride=[1, 1],
           padding='SAME',
           use_xavier=True,
           stddev=1e-3,
           weight_decay=0.0,
           activation_fn=tf.nn.relu,
           bn=False,
           bn_decay=None,
           is_training=None):

在get_model函数中除conv1用的1×9的卷积核外,每个卷积层都用的1×1的卷积核。因为是1×1的卷积核,故padding参数在这里无所谓valid或same。经过5次卷积后每层卷积的输出维度为:

input_image:B×N×9×1

conv1:B×N×1×64

conv2:B×N×1×64

conv3:B×N×1×64

conv4:B×N×1×128

conv5(points_feat1):B×N×1×1024

这5层卷积都用了Batch Normalization(bn=true)。最后输出的points_feat1 为 每个点的局部特征向量

    # MAX
    pc_feat1 = tf_util.max_pool2d(points_feat1, [num_point,1], padding='VALID', scope='maxpool1')
    # FC
    pc_feat1 = tf.reshape(pc_feat1, [batch_size, -1])
    pc_feat1 = tf_util.fully_connected(pc_feat1, 256, bn=True, is_training=is_training, scope='fc1', bn_decay=bn_decay)
    pc_feat1 = tf_util.fully_connected(pc_feat1, 128, bn=True, is_training=is_training, scope='fc2', bn_decay=bn_decay)
    print(pc_feat1)

接下来是非常重要的一步:max pooling。max pooling 作为对称函数是解决点云无序性的关键。对称函数:函数值与输入数据的顺序无关。tf_util.max_pool2d(points_feat1, [num_point,1], padding='VALID', scope='maxpool1') 计算points_feat1在1024个channel上每个channel所有点的最大值,自然与点的顺序无关。返回值pc_feat1是1024维的向量,它是每个块(Block)的全局特征向量。注意每个块中所有点的 label 并不相同 (块的概念见我写的https://blog.csdn.net/shaozhenghan/article/details/81087024)。这个全局特征向量经过两个全连接层降维至128维。

    # CONCAT 
    pc_feat1_expand = tf.tile(tf.reshape(pc_feat1, [batch_size, 1, 1, -1]), [1, num_point, 1, 1])
    points_feat1_concat = tf.concat(axis=3, values=[points_feat1, pc_feat1_expand])

然后是特征连接。先将pc_feat1 reshape 为 B×1×1×128,tf.tile()函数将其在N这个axis上复制num_point次,则pc_feat1_expand的shape为B×N×1×128。然后用tf.concat()函数将局部特征与全局特征连接起来,得到连接后的points_feat1_concat 的shape为:B×N×1×1152。

    # CONV 
    net = tf_util.conv2d(points_feat1_concat, 512, [1,1], padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training, scope='conv6')
    net = tf_util.conv2d(net, 256, [1,1], padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training, scope='conv7')
    net = tf_util.dropout(net, keep_prob=0.7, is_training=is_training, scope='dp1')
    net = tf_util.conv2d(net, 13, [1,1], padding='VALID', stride=[1,1],
                         activation_fn=None, scope='conv8')
    net = tf.squeeze(net, [2])

    return net

连接后的特征通过3层卷积conv6 至 conv8. 每层卷积核依然为1×1。注意在conv7 中加入了dropout。conv8输出一个net的 shape 为B×N×1×13,通道数13对应13个语义分类标签。然后net = tf.squeeze(net, [2]) 将第2个axis上size为1的dim去掉,变为B×N×13

model.py 注意事项:

此处模型结构与原论文不一致!主要有两点:

1. 没有在input和point feature上采用transformation (/pointnet/models/transform_nets.py)!原文在5.2中提到:It’s interesting to see that the most basic architecture already achieves quite reasonable results. Using input transformation gives a 0.8% performance boost.

2. 并非如原论文模型结构图所描述,64维的局部特征与1024维的全局特征结合后然后卷积,而是先将1024维的全局特征用两个FC降维至128,再与1024维的局部特征结合,再卷积。而论文中描述的结构在/pointnet/models/pointnet_seg.py 中。

def get_loss(pred, label):
    """ pred: B,N,13
        label: B,N """
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=pred, labels=label)
    return tf.reduce_mean(loss)

get_loss(pred, label)函数. 注意tf.nn.sparse_softmax_cross_entropy_with_logits 相比softmax_cross_entropy_with_logits多了一步稀疏编码,所以labels只需要输入不同的数值带表不同的类别即可,例如[1,2,3,3,2,1],而不是one-hot

猜你喜欢

转载自blog.csdn.net/shaozhenghan/article/details/81098350
今日推荐