本文介绍用于点云语义分割(Semantic Segmentation in Scenes)的模型文件 /pointnet/sem_seg/model.py。
原论文:PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas Stanford University)
def placeholder_inputs(batch_size, num_point):
pointclouds_pl = tf.placeholder(tf.float32,
shape=(batch_size, num_point, 9))
labels_pl = tf.placeholder(tf.int32,
shape=(batch_size, num_point))
return pointclouds_pl, labels_pl
placeholder_inputs 函数返回点云及其标签的placeholder,注意这里的点云第三个维度的size即每个点的维度是9 而不是 3。
def get_model(point_cloud, is_training, bn_decay=None):
""" ConvNet baseline, input is BxNx3 gray image """
batch_size = point_cloud.get_shape()[0].value
num_point = point_cloud.get_shape()[1].value
input_image = tf.expand_dims(point_cloud, -1)
# CONV
net = tf_util.conv2d(input_image, 64, [1,9], padding='VALID', stride=[1,1],
bn=True, is_training=is_training, scope='conv1', bn_decay=bn_decay)
net = tf_util.conv2d(net, 64, [1,1], padding='VALID', stride=[1,1],
bn=True, is_training=is_training, scope='conv2', bn_decay=bn_decay)
net = tf_util.conv2d(net, 64, [1,1], padding='VALID', stride=[1,1],
bn=True, is_training=is_training, scope='conv3', bn_decay=bn_decay)
net = tf_util.conv2d(net, 128, [1,1], padding='VALID', stride=[1,1],
bn=True, is_training=is_training, scope='conv4', bn_decay=bn_decay)
points_feat1 = tf_util.conv2d(net, 1024, [1,1], padding='VALID', stride=[1,1],
bn=True, is_training=is_training, scope='conv5', bn_decay=bn_decay)
get_model函数最后返回卷积等计算后的预测结果。
输入点云point_cloud有3个axis,即B×N×9,tf.expand_dims(point_cloud, -1) 将点云最后加上一个size为1 的axis 作为 input_image(B×N×9×1),则input_image的channel数为1。
tf_util.conv2d 函数 定义在/pointnet/tf_util.py 里,函数原型如下所示:
def conv2d(inputs,
num_output_channels,
kernel_size,
scope,
stride=[1, 1],
padding='SAME',
use_xavier=True,
stddev=1e-3,
weight_decay=0.0,
activation_fn=tf.nn.relu,
bn=False,
bn_decay=None,
is_training=None):
在get_model函数中除conv1用的1×9的卷积核外,每个卷积层都用的1×1的卷积核。因为是1×1的卷积核,故padding参数在这里无所谓valid或same。经过5次卷积后每层卷积的输出维度为:
input_image:B×N×9×1
conv1:B×N×1×64
conv2:B×N×1×64
conv3:B×N×1×64
conv4:B×N×1×128
conv5(points_feat1):B×N×1×1024
这5层卷积都用了Batch Normalization(bn=true)。最后输出的points_feat1 为 每个点的局部特征向量。
# MAX
pc_feat1 = tf_util.max_pool2d(points_feat1, [num_point,1], padding='VALID', scope='maxpool1')
# FC
pc_feat1 = tf.reshape(pc_feat1, [batch_size, -1])
pc_feat1 = tf_util.fully_connected(pc_feat1, 256, bn=True, is_training=is_training, scope='fc1', bn_decay=bn_decay)
pc_feat1 = tf_util.fully_connected(pc_feat1, 128, bn=True, is_training=is_training, scope='fc2', bn_decay=bn_decay)
print(pc_feat1)
接下来是非常重要的一步:max pooling。max pooling 作为对称函数是解决点云无序性的关键。对称函数:函数值与输入数据的顺序无关。tf_util.max_pool2d(points_feat1, [num_point,1], padding='VALID', scope='maxpool1') 计算points_feat1在1024个channel上每个channel所有点的最大值,自然与点的顺序无关。返回值pc_feat1是1024维的向量,它是每个块(Block)的全局特征向量。注意每个块中所有点的 label 并不相同 (块的概念见我写的https://blog.csdn.net/shaozhenghan/article/details/81087024)。这个全局特征向量经过两个全连接层降维至128维。
# CONCAT
pc_feat1_expand = tf.tile(tf.reshape(pc_feat1, [batch_size, 1, 1, -1]), [1, num_point, 1, 1])
points_feat1_concat = tf.concat(axis=3, values=[points_feat1, pc_feat1_expand])
然后是特征连接。先将pc_feat1 reshape 为 B×1×1×128,tf.tile()函数将其在N这个axis上复制num_point次,则pc_feat1_expand的shape为B×N×1×128。然后用tf.concat()函数将局部特征与全局特征连接起来,得到连接后的points_feat1_concat 的shape为:B×N×1×1152。
# CONV
net = tf_util.conv2d(points_feat1_concat, 512, [1,1], padding='VALID', stride=[1,1],
bn=True, is_training=is_training, scope='conv6')
net = tf_util.conv2d(net, 256, [1,1], padding='VALID', stride=[1,1],
bn=True, is_training=is_training, scope='conv7')
net = tf_util.dropout(net, keep_prob=0.7, is_training=is_training, scope='dp1')
net = tf_util.conv2d(net, 13, [1,1], padding='VALID', stride=[1,1],
activation_fn=None, scope='conv8')
net = tf.squeeze(net, [2])
return net
连接后的特征通过3层卷积conv6 至 conv8. 每层卷积核依然为1×1。注意在conv7 中加入了dropout。conv8输出一个net的 shape 为B×N×1×13,通道数13对应13个语义分类标签。然后net = tf.squeeze(net, [2]) 将第2个axis上size为1的dim去掉,变为B×N×13
model.py 注意事项:
此处模型结构与原论文不一致!主要有两点:
1. 没有在input和point feature上采用transformation (/pointnet/models/transform_nets.py)!原文在5.2中提到:It’s interesting to see that the most basic architecture already achieves quite reasonable results. Using input transformation gives a 0.8% performance boost.
2. 并非如原论文模型结构图所描述,64维的局部特征与1024维的全局特征结合后然后卷积,而是先将1024维的全局特征用两个FC降维至128,再与1024维的局部特征结合,再卷积。而论文中描述的结构在/pointnet/models/pointnet_seg.py 中。
def get_loss(pred, label):
""" pred: B,N,13
label: B,N """
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=pred, labels=label)
return tf.reduce_mean(loss)
get_loss(pred, label)函数. 注意tf.nn.sparse_softmax_cross_entropy_with_logits 相比softmax_cross_entropy_with_logits多了一步稀疏编码,所以labels只需要输入不同的数值带表不同的类别即可,例如[1,2,3,3,2,1],而不是one-hot