Motionnet： joint Perception and Motion prediction for autonomous Driving based on Bird's Eye View Maps

1.摘要

Motionnet 以Lidar 点云序列作为输入，输出鸟瞰图，鸟瞰图包含目标的类别以及运动信息。Motionnet的主干网络为时空金字塔网络。用时空一致性损失来正则化确保时空预测的平滑性。开源地址https://github.com/pxiangwu/MotionNet

2.引言.环境状态估计包含两部分：1.感知-从背景中感知前景目标;2.预测-预测目标的未来轨迹·[5][22],基于相机的2D目标检测[20,27,41,63]. 基于点云的3D目标检测[19,46,64]. 基于融合的目标检测[6,23,24]。检测到的bounding box送入到目标跟踪器。一些方法带有轨迹的bounding box.[4,31,59].这种状态估计策略在真实的开放场景中容易失败。

用Occupancy grid map (OGM) 来表示3D环境信息，OGM 均匀的将3D点云离散成2D grid cells, OGM能够用来指定可行驶区域，不过缺点是，在连续时刻很难保证一致性，而且OGM 并没有提供目标的类别信息。为了处理这个问题，用BEV map表示环境信息，与OGM类似，BEV map扩充OGM 提供了3层信息，占用，运动和类别信息。这样就能决定可行驶区域，描述每个目标的运动行为。贡献点有1.提出一种基于鸟瞰图联合感知和运动预测的网络模型motionNet,bounding box free;2.提出一种时空金字塔网络;3.时空一致性约束损失来约束网络的训练.

3.方法

pipeline包含3部分。1.将原始的3D点云表示乘BEV 图 2.backbone为spatio-temporal pyramid network; 3.head for分类和运动预测

3.1 自我运动补偿

网络的输入是3D点云序列，单帧点云有各自的坐标系统，需要将过去帧合成到当前帧，用当前的坐标系统表示所有的点云坐标。

3.2 基于鸟瞰图的表示

与2D图像不同的是，3D点云是稀疏不规则分散的，并不能用标准的卷积进行处理，为了处理这个问题，将其转换为鸟瞰图。首先将其量化为规则的体素，简单的使用2元状态作为体素的表示，指明该体素是否包含一个以上的点云，然后将3D体素晶格表示成2D伪图像，将高度维度作为图像通道，这样就能用2D 卷积。

3.3 时空金字塔网络

3.4 输出Heads

STPN 后有3个head:1. cell-classification head, output is H × W × C, where C is the number of cell categories,; 2.motion-prediction head;, output shape is N × H × W × 2, N 为未来帧数目 3. state-estimation head( static or moving)， output is H × W.

3.5 损失函数

分类和状态估计用交叉熵损失，运动预测估计使用l1损失

3.5.1

Spatial consistency loss（for the cells belonging to the same rigid object, their predicted motionsshould be very close without much divergence）

Foreground temporal consistency loss（assume that there will be no sharp change of motions between two consecutive frames）

Background temporal consistency loss

total loss

4.实验

Motionnet： joint Perception and Motion prediction for autonomous Driving based on Bird's Eye View Maps

猜你喜欢