[BEV Perception] 2-BEV Perception Algorithm Data Form

1 image

The image is generated by the camera, which maps the coordinate points (in meters) in the three-dimensional world to the two-dimensional image plane (in pixels)

Disadvantages of images: 3D to 2D mapping loses spatial information. Compared with point cloud 360 viewing angle collection, the vehicle-mounted single camera has viewing angle limitations.

Advantages of images: rich texture, low cost

Image-based tasks and basic models are relatively mature and complete, and are relatively easy to extend to BEV perception algorithms.

1.1 How to obtain image features?

Use a 2d image processing network to extract features from the image.

insert image description here

Whether it is the backbone in BEVFormer or the Encoder in BEVFusion, the essence is the same, and image features are extracted through existing 2D image networks such as Resnet.

2 point clouds

The basic unit of a point cloud is a point, and a collection of points is called a point cloud.

Point cloud features: sparsity, disorder, 3d representation.

2.1 Sparsity

1 shade. resulting in missing point cloud data.
insert image description here
2 Ray divergence leads to long-distance sampling interval (missing sampling) and short-distance sampling interval

2.2 Disorder

{1,2,3,4,5} = {1,4,5,3,2}

2.3 Why use point cloud?

Point clouds contain depth information.

2.4 How to extract point cloud features?

No matter which extraction method is used, it is not a feature extraction of a single point (meaningless), but a certain aggregation method is used.

For example, sampling a single point in the point cloud cannot determine whether the individual point belongs to a car or a person, and it needs to be judged in combination with certain local spatial information.

Point-based

Select some key points from the existing point cloud, key points (green) and nearby points (yellow inside the ball)
insert image description here

Voxel-based

Starting from the scene, the scene is divided into many small blocks, and the points within a certain spatial range are aggregated. (For example, the 3x3 grid below is aggregated into the above grid)
insert image description here

3 image + point cloud

Guess you like

Origin blog.csdn.net/guai7guai11/article/details/132090277