3D target detection dataset KITTI (label format analysis, point cloud to image, point cloud to BEV)

This article introduces the understanding and use of the KITTI data set in 3D target detection, including the basic situation of KITTI, downloading the data set, label format analysis, point cloud to image, and point cloud to BEV.

Table of contents

 1. The effect of 3D box visualization in the KITTI data set

2. Watch a video first to understand the basic situation of KITTI

3. Go to KITTI official website and download the data set

4. Label format

5. Calibration parameter analysis

6. Point cloud 3D results --> Image 3D results (coordinate system conversion)

6. Image 3D results --> Point cloud 3D results (coordinate system conversion)

8. Point cloud 3D results-->Image BEV bird's-eye view results (coordinate system conversion)


 1. The effect of 3D box visualization in the KITTI data set

2. Watch a video first to understand the basic situation of KITTI

Introduction to KITTI data set


3. Go to KITTI official website and download the data set

The KITTI Vision Benchmark Suite (cvlibs.net)

If you need to register an account to download data, you can download it from Baidu Netdisk; the format of the file is as follows

Image format: xxx.jpg

Point cloud format: xxx.bin (point cloud is stored in bin binary format)

Calibration parameters: xxx.txt (a file includes the internal parameters of each camera, the distortion correction matrix, the matrix from lidar coordinates to camera coordinates , and the matrix from IMU coordinates to lidar coordinates )

Label format: xxx.txt (including category, truncation situation, occlusion situation, observation angle, coordinates of the upper left corner of the 2D box, coordinates of the lower right corner of the 2D box, size of the 3D object - height, width and length, center coordinates of the 3D object - xyz, confidence level )

4. Label format

You can watch this video at this time:

Introduction to multiple BEV open source datasets such as Nuscenes and KITTI

5. Calibration parameter analysis

Then take a look at the calibration parameters; P0-P3: are the internal parameter matrices of each camera;

R0_rect: is the distortion correction matrix of the left camera;

Tr_velo_to_cam: is the matrix from the lidar coordinate system to the camera coordinate system;

Tr_imu_to_velo: is the matrix from IMU coordinates to lidar coordinates

6. Point cloud 3D results --> Image 3D results (coordinate system conversion)

When there is a point cloud 3D result, how to project it into the image? In essence, it is a problem of coordinate system conversion. The process idea is as follows:

  1. The point cloud coordinates (x, y, z) are known and are currently in the lidar coordinate system.
  2. To transfer the lidar coordinate system to the camera coordinate system, you need to use the Tr_velo_to_cam matrix in the calibration parameters. At this time, the camera coordinates (x1, y1, z1) are obtained.
  3. For distortion correction of the camera coordinate system, the R0_rect matrix in the calibration parameters needs to be used, and the camera coordinates (x2, y2, z2) are obtained at this time
  4. To convert the camera coordinate system to the image coordinate system, the P0 matrix in the calibration parameters needs to be used, that is, the camera memory matrix, and the image coordinates (u, v) are obtained at this time

6. Image 3D results --> Point cloud 3D results (coordinate system conversion)

When you have the 3D image result, how to project it into the point cloud? In essence, it is a problem of coordinate system conversion, and the above is the reverse process. The process idea is as follows:

  1. Known image coordinates (u, v), currently in the image coordinate system
  2. To convert the image coordinate system to the camera coordinate system, you need to use the P0 inverse matrix in the calibration parameters, that is, the camera memory matrix, to obtain the camera coordinates (x, y, z)
  3. To correct distortion in the camera coordinate system, you need to use the R0_rect inverse matrix in the calibration parameters to obtain the camera coordinates (x1, y1, z1)
  4. After correction, the camera coordinate system is converted to the lidar coordinate system, and the Tr_velo_to_cam inverse matrix in the calibration parameters needs to be used. At this time, the lidar coordinates (x2, y2, z2) are obtained.

8. Point cloud 3D results-->Image BEV bird's-eye view results (coordinate system conversion)

Thought process:

  1. Read point cloud data. The point cloud storage format is n*4. n refers to the number of point clouds in the current file. 4 respectively represents (x, y, z, intensity), that is, the spatial three-dimensional coordinates and reflection intensity of the point cloud.
  2. We only need to read the first two lines to get the coordinate point (x, y)
  3. Then take the coordinate points (x, y) and draw a scatter plot

The BEV aerial view effect is as follows:

3D datasets such as Nuscenes and Waymo will be introduced later.

Guess you like

Origin blog.csdn.net/qq_41204464/article/details/132776800