Article Interpretation--FlowNet3D: Learning Scene Flow in 3D Point Clouds

Article Interpretation – FlowNet3D: Learning Scene Flow in 3D Point Clouds

References :
1. FlowNet3D: Learning Scene Flow in 3D Point Clouds study notes (CVPR2019)
2. FlowNet3D paper reading
3. FlowNet3D paper study notes
4. "FlowNet3D"

1. Summary

Many applications in robotics and human-computer interaction technology benefit from the understanding of 3D motion in dynamic environments, broadly known as 场景流. While most previous methods have focused on stereo vision and RGB-D images, taking these as input, few have attempted to estimate scene flow directly from point clouds. In this work, we propose a novel deep neural network named FlowNet3D, which 端到端learns scene flow from point clouds in a manner. Our network simultaneously learns deep hierarchical features of point clouds and flow embeddings representing point cloud motion, supported by two newly proposed learning layers for point clouds. We evaluate the network on FlyingThings3Dchallenging synthetic data from and KITTIreal lidar scans from . Training on synthetic data only, we successfully generalize to real lidar data, outperforming various baseline algorithms and showing results comparable to state-of-the-art. We also demonstrate two applications of scene streaming output (point cloud registration and motion segmentation) to demonstrate its potential wide application.

2. Introduction

We propose a FlowNet3Dnovel network named , which enables end-to-end scene flow estimation for a pair of consecutive frame point clouds.
We introduce two new learning layers on point clouds:

  • Flow vector mapping layer (flow embedding): learn to associate two frames of point clouds, and give flow embedding features
  • set up conv layer: Extend the feature vector from one set of points to another set of points (for upsampling to obtain the entire point cloud scene flow data)

We show how to use the proposed FlowNet3D, trained with a large-scale synthetic dataset (FlyingThings3D), and apply it on the real point cloud dataset from KITTI. Compared with traditional methods, great improvements have been achieved in 3D scene flow estimation.

3. Problem description

Dynamically sampled two sets of 3D point cloud scenes, in two consecutive time frames: P = {xi|i =1, . . . , n1} (point cloud 1) and Q = {yj|j = 1, . . . , n2}(point cloud 2), where xi, yj ∈ R3 are point cloud XYZ coordinates. Note that due to object motion and viewing angle changes, the two point clouds do not necessarily have the same number of points or any correspondence between their points.
Now consider that a sampled real point coordinate xi moves to position x′ i in the second frame, then the translational motion vector of this point is di = x′ i - xi. Our goal is, 给定点云 P 和 Q,恢复第一帧中的每一个采样点的场景流:D = {di|i = 1, . . . , n1}.

4. Network structure

In this section, we introduce FlowNet3D, an end-to-end scene flow estimation network based on point clouds. The model has three key modules, which are used to:

  • (1) Point cloud feature learning
  • (2) Point cloud information mixing
  • (3) Point cloud stream upsampling

These modules contain three key deep point cloud processing layers: set conv layer, flow embedding layer and set upconv layer.
insert image description here

4.1 Hierarchical Point Cloud Feature Learning

set abstraction = sampling + grouping + pointnet

Since a point cloud is a collection of irregular and unordered points, traditional convolution is not suitable. We therefore follow the recently proposed PointNet++ architecture, a network that learns features hierarchically with translation invariance. Although the conv layer is set up for 3D classification and segmentation, we found that its feature learning layer is also very advantageous for scene flow tasks.

This layer first samples the input point cloud at the farthest distance to obtain n' regions (region center is x′j), then groups each region (neighborhood specified by radius r), and finally uses a symmetric function to extract its local feature.

4.2 Point cloud information mixing

Point cloud, you have no way of knowing which point is the sampling point of the previous frame to the next frame, and even due to the sparseness of collection and occlusion, there may be no point with the same semantics in the next frame, so it is difficult to find the real corresponding point. So the author proposed to use it 软策略, that is, to find some similar points, and use 投票the method to estimate the scene flow.

4.3 Point Cloud Upsampling

Introduce the set upconv layer to upsample the result, and convert the motion information of some points to the entire point cloud.
This layer samples the points in the flow embedding layer back to the original number of input networks, and each output represents the scene flow of the point.

Guess you like

Origin blog.csdn.net/weixin_36354875/article/details/126096628