3d detection point cloud model pointpillar

PointPillars

A model .https from industry: //arxiv.org/abs/1812.05784

3D object detection usual practice

  • 3d convolution
  • The front projection plane
  • Operation on the bird-view

3d is still turn roadmap 2d, first transfer point clouds into a three-dimensional pseudo image is 2d.

Feature Net

The point cloud data into data of the similar image.

That pillar pillars. point cloud characteristics determine the points in each column are rare.

A bunch of points, we will be seen as one pillar (ie pillar), there are many points within each pillar. point in all of the pillar that is composed of a point cloud.

Hxw out a first projection of a grid in the xy plane. That aside hxw out a pillar.

Raw point cloud data point has (x, y, z, r) 4 dimensions, r representative of the reflectance. We be extended to nine dimensions (x, y, z, r, x_c, y_c, z_c, x_p, y_p, the subject of the point c with respect to the deviation of the center of the column, with the subscript p is a deviation with respect to the center of the grid points of each column of the midpoint of the sampled more than N, N is less than 0 is filled. Thus is formed (D, P, N) D = 9, N is the number of samples of each pillar (set value), P is the total number pillar, H * W.

Such point cloud data can become an expression of (D, P, N) of Tensor.

Then the convolution of a (C, P, N) of the Tensor. Max operation done in this dimension N. To give (C, P) of the tensor. Modification obtained (C, H, W) tensor.

At this point, we use a (C, H, W) of the tensor complete expression of point cloud data.

Backbone

backbone complete feature extraction

divided into 2 parts

  • top-down network generates progressively lower spatial resolution feature map
  • second network to do upsample and concatenation, fine feature.

top-down portion may be described as a series of block (S, L, F) S represents the relative pseudo-image (i.e., obtained feature tensor net) of the stride. 2D convolution with a layer of a block of L of 3x3. The number represents the output of the channel F.

tensor backbone is output (6C, H / 2, W / 2)

Detection

With ssd. Height z is a separate regression.

Experimental Details

Turn point cloud image portion C = 64. backbone portion, the vehicle / person / S is not the same as the bicycle.

loss design

3d box is determined by the (x, y, z, w, l, h, theta). 2d box similarly determined by the (x, y, w, h), 3d box data of a plurality of z-direction, and an angle, to expected (an angle about the z-axis) 3d box orientation.

loss consists of three parts

  • Positioning loss, a measure of whether the Right 3d box painting
  • Classification loss, a measure of object classes within the box determines whether the Right
  • direction loss. Although the positioning of loss has been considered a point of view, but can not distinguish flipped box. That is such a 3d box inside the car, and walked toward the north towards the south, marked out 3d box are the same.

Positioning loss:

Category loss: 

Focal loss by giving different weights to the loss for the different samples, the weighting is associated with a sample predicted probability of the current value
coefficient (1-p) variants. to achieve smaller p, greater weight loss rights object. i.e. amplifying hard example of loss. So that the sample is difficult to adapt the model to better classification.

Directions loss:
obtained from softmax.

Code analysis

todo
Code: https://github.com/traveller59/second.pytorch

python ./pytorch/train.py evaluate --config_path=./configs/car.fhd.config --model_dir=/path/to/model_dir --measure_time=True --batch_size=1

Guess you like

Origin www.cnblogs.com/sdu20112013/p/12455629.html