开源框架PointNet 代码详解——/pointnet/sem_seg/model.py

本文介绍用于点云语义分割（Semantic Segmentation in Scenes）的模型文件 /pointnet/sem_seg/model.py。

原论文：PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas Stanford University)

def placeholder_inputs(batch_size, num_point):
    pointclouds_pl = tf.placeholder(tf.float32,
                                    shape=(batch_size, num_point, 9))
    labels_pl = tf.placeholder(tf.int32,
                                shape=(batch_size, num_point))
    return pointclouds_pl, labels_pl

placeholder_inputs 函数返回点云及其标签的placeholder，注意这里的点云第三个维度的size即每个点的维度是9 而不是 3。

def get_model(point_cloud, is_training, bn_decay=None):
    """ ConvNet baseline, input is BxNx3 gray image """
    batch_size = point_cloud.get_shape()[0].value
    num_point = point_cloud.get_shape()[1].value

    input_image = tf.expand_dims(point_cloud, -1)
    # CONV
    net = tf_util.conv2d(input_image, 64, [1,9], padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training, scope='conv1', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 64, [1,1], padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training, scope='conv2', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 64, [1,1], padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training, scope='conv3', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1], padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training, scope='conv4', bn_decay=bn_decay)
    points_feat1 = tf_util.conv2d(net, 1024, [1,1], padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training, scope='conv5', bn_decay=bn_decay)

get_model函数最后返回卷积等计算后的预测结果。

输入点云point_cloud有3个axis，即B×N×9，tf.expand_dims(point_cloud, -1) 将点云最后加上一个size为1 的axis 作为 input_image（B×N×9×1），则input_image的channel数为1。

tf_util.conv2d 函数定义在/pointnet/tf_util.py 里，函数原型如下所示：

def conv2d(inputs,
           num_output_channels,
           kernel_size,
           scope,
           stride=[1, 1],
           padding='SAME',
           use_xavier=True,
           stddev=1e-3,
           weight_decay=0.0,
           activation_fn=tf.nn.relu,
           bn=False,
           bn_decay=None,
           is_training=None):

在get_model函数中除conv1用的1×9的卷积核外，每个卷积层都用的1×1的卷积核。因为是1×1的卷积核，故padding参数在这里无所谓valid或same。经过5次卷积后每层卷积的输出维度为：

input_image：B×N×9×1

conv1：B×N×1×64

conv2：B×N×1×64

conv3：B×N×1×64

conv4：B×N×1×128

conv5（points_feat1）：B×N×1×1024

这5层卷积都用了Batch Normalization（bn=true）。最后输出的points_feat1 为每个点的局部特征向量。

    # MAX
    pc_feat1 = tf_util.max_pool2d(points_feat1, [num_point,1], padding='VALID', scope='maxpool1')
    # FC
    pc_feat1 = tf.reshape(pc_feat1, [batch_size, -1])
    pc_feat1 = tf_util.fully_connected(pc_feat1, 256, bn=True, is_training=is_training, scope='fc1', bn_decay=bn_decay)
    pc_feat1 = tf_util.fully_connected(pc_feat1, 128, bn=True, is_training=is_training, scope='fc2', bn_decay=bn_decay)
    print(pc_feat1)

接下来是非常重要的一步：max pooling。max pooling 作为对称函数是解决点云无序性的关键。对称函数：函数值与输入数据的顺序无关。tf_util.max_pool2d(points_feat1, [num_point,1], padding='VALID', scope='maxpool1') 计算points_feat1在1024个channel上每个channel所有点的最大值，自然与点的顺序无关。返回值pc_feat1是1024维的向量，它是每个块（Block）的全局特征向量。注意每个块中所有点的 label 并不相同（块的概念见我写的https://blog.csdn.net/shaozhenghan/article/details/81087024）。这个全局特征向量经过两个全连接层降维至128维。

    # CONCAT 
    pc_feat1_expand = tf.tile(tf.reshape(pc_feat1, [batch_size, 1, 1, -1]), [1, num_point, 1, 1])
    points_feat1_concat = tf.concat(axis=3, values=[points_feat1, pc_feat1_expand])

然后是特征连接。先将pc_feat1 reshape 为 B×1×1×128，tf.tile()函数将其在N这个axis上复制num_point次，则pc_feat1_expand的shape为B×N×1×128。然后用tf.concat()函数将局部特征与全局特征连接起来，得到连接后的points_feat1_concat 的shape为：B×N×1×1152。

    # CONV 
    net = tf_util.conv2d(points_feat1_concat, 512, [1,1], padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training, scope='conv6')
    net = tf_util.conv2d(net, 256, [1,1], padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training, scope='conv7')
    net = tf_util.dropout(net, keep_prob=0.7, is_training=is_training, scope='dp1')
    net = tf_util.conv2d(net, 13, [1,1], padding='VALID', stride=[1,1],
                         activation_fn=None, scope='conv8')
    net = tf.squeeze(net, [2])

    return net

连接后的特征通过3层卷积conv6 至 conv8. 每层卷积核依然为1×1。注意在conv7 中加入了dropout。conv8输出一个net的 shape 为B×N×1×13，通道数13对应13个语义分类标签。然后net = tf.squeeze(net, [2]) 将第2个axis上size为1的dim去掉，变为B×N×13

model.py 注意事项：

此处模型结构与原论文不一致！主要有两点：

1. 没有在input和point feature上采用transformation （/pointnet/models/transform_nets.py）！原文在5.2中提到：It’s interesting to see that the most basic architecture already achieves quite reasonable results. Using input transformation gives a 0.8% performance boost.

2. 并非如原论文模型结构图所描述，64维的局部特征与1024维的全局特征结合后然后卷积，而是先将1024维的全局特征用两个FC降维至128，再与1024维的局部特征结合，再卷积。而论文中描述的结构在/pointnet/models/pointnet_seg.py 中。

def get_loss(pred, label):
    """ pred: B,N,13
        label: B,N """
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=pred, labels=label)
    return tf.reduce_mean(loss)

get_loss(pred, label)函数. 注意tf.nn.sparse_softmax_cross_entropy_with_logits 相比softmax_cross_entropy_with_logits多了一步稀疏编码，所以labels只需要输入不同的数值带表不同的类别即可，例如[1,2,3,3,2,1]，而不是one-hot

开源框架PointNet 代码详解——/pointnet/sem_seg/model.py

model.py 注意事项：

猜你喜欢