(1) KITTI dataset for 3D target detection

Introduction to KITTI dataset

Basic data

KITTIIt is an open source data set of the Karlsruhe Institute of Technology and Toyota Chicago Research Institute in Germany, which was first published in the Year Month2012 issue.0320

The corresponding paper Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite was published CVPR2012on .

KITTIThe data set is collected from Karlsruhe , Germany , including traffic scenes such as urban/suburb/highway. Collected during the day of the year, month2011 and month .0926/28/29/301003

KITTIThe platform used for data collection is shown in the figure below.

The above platforms include

  • 2 1.4 million pixel black and white cameras
  • 2 color cameras with 1.4 million pixels
  • 4 Edmund Optics
  • 1 x 64-line Velodyne 3D laser scanner
  • 1 OXTS RT3003 inertial navigation system

As can be seen from the above figure

  • In the coordinate system of the camera, the Z axis is facing forward, the Y axis is facing down, and the entire coordinate system is a right-handed coordinate system.
  • The X-axis of the lidar is facing straight ahead, the Z-axis is vertically upward, and the Y-axis is determined according to the right-hand rule
  • IMU/GPSThe orientation of the coordinate system of the system is consistent with that of the lidar

To sum up, KITTIthe data set is composed of 4a camera, 1a lidar, and 1an IMU/GPSinertial navigation system. What we want to clarify is the coordinate system relationship and time synchronization information between 6the sensors.

For the size parameters of the sensor, please refer to the figure below.

The collection data folder corresponding to each sensor [date]-drive-sync-[sqquence]is stored in the directory .6

- image00
- image01
- image02
- image03
- oxts
- velodyne-points
  • Timestamp , velodyne-pointsthere are three timestamp files under the folder

    • timestamps_start.txt, the time the laser scanner starts scanning in a week
    • timestamps_end.txt, the time at which the laser scanner scans for a week ends
    • timestamps.txtThe time when the laser scans to the front and triggers the camera to take pictures.
  • image data

    • The image is after cropping the hood and sky
    • The image is the data after de-distortion
  • OSTX data , each frame stores 30different field values ​​including latitude and longitude/speed/acceleration

  • radar data

    • The binary file of floating point numbers, each point is composed of 4 floating point numbers, x,y,zthe coordinates in the radar coordinate system, the reflection intensity of the laserr
    • For each scan, approximately 120000points are obtained 3D.
    • The lidar rotates counterclockwise around the vertical axis

Sensor Calibration and Time Synchronization

The entire system takes the lidar to rotate once as one frame. When the lidar rotates to a specific position, the camera is triggered to take pictures through the physical contact of a spring plate. Data cannot be collected through this triggering method, but the frequency of data collection IMUhas IMUreached 100HZ, IMUrecord the time stamp of the collected data, select the data that is closest to the camera collection time as the data of the current frame, and the maximum time error is 5ms.

Camera calibration , camera internal and external parameter calibration method is A toolbox for
automatic calibration of range and camera sensors using a single shot
, all the camera centers are aligned, that is, they are all on the same XYplane, which is convenient for image correction (remove sky and hood)

Before starting to collect data every day, the entire data acquisition system will be calibrated to avoid positional deviation between sensors.

Under the daily date_calib.zipfolder, there are 3 text files,

  • calib_cam_to_cam.txt
    The transfer of the external link image failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.
    Under the explanation of the last three parameter values, srect ( i ) s^{(i)}_{rect}srect(i)Indicates the size of the image after correction (removing redundant sky and hood); R rect ( i ) \bold{R}^{(i)}_{rect}Rrect(i)Is the rotation matrix used for correction; P rect ( i ) \bold{P}^{(i)}_{rect}Prect(i)is the projection matrix used for calibration. i ∈ { 0 , 1 , 2 , 3 } i\in\{0,1,2,3\}i{ 0,1,2,3 } indicates the serial number of the camera, 0 indicates the left grayscale camera, 1 indicates the right grayscale camera, 2 indicates the left color camera, and 3 indicates the right color camera. The grayscale camera 0 on the left is used as the reference camera coordinate system, a3Dpointx = ( x , y , z , 1 ) \bold{x}=(x,y,z,1)x=(x,y,z,1 ) , to transform into the thicamera image, the following relationship can be used:y = Prect ( i ) R rect ( 0 ) x \bold{y}=\bold{P}^{(i)}_{rect }\bold{R}^{(0)}_{rect}\bold{x}y=Prect(i)Rrect(0)x

  • calib_velo_to_cam.txt, KITTIthe dataset lidar is also calibrated relative to the left grayscale camera 0, the rotation matrix R velocam \bold{R}^{cam}_{velo} ​​from the laser to the cameraRvelocamand translation vector tvelocam \bold{t}^{cam}_{velo}tvelocamComposition, the homogeneous transformation matrix can be written as,
    T velocam = ( R velocamtvelocam 0 1 ) \bold{T}^{cam}_{velo}=\begin{pmatrix}R^{cam}_{velo} ​​&t^{cam }_{velo} ​​\\0 &1\end{pmatrix}Tvelocam=(Rvelocam0tvelocam1)
    In this way, transform the points in the radar coordinate system3Dto theiiThe formula for i camera coordinate system is: y = Prect ( i ) R rect ( 0 ) T velocamx \bold{y}=\bold{P}^{(i)}_{rect}\bold{R }^{(0)}_{rect}\bold{T}^{cam}_{velo}\bold{x}y=Prect(i)Rrect(0)Tvelocamx

  • calib_imu_to_velo.txt, this file saves imuthe homogeneous transformation matrix T imuvelo \bold{T}_{imu}^{velo} ​​from the coordinate system to the laser coordinate systemTimuvelo, so, the formula for transforming IMUa point in the coordinate system 3Dto ithe pixel coordinates in the th image can be written as
    y = P rect ( i ) R rect ( 0 ) T velocam T imuvelox \bold{y}=\bold{P }^{(i)}_{rect}\bold{R}^{(0)}_{rect}\bold{T}^{cam}_{velo}\bold{T}_{imu}^{ velo}\bold{x}y=Prect(i)Rrect(0)TvelocamTimuvelox

For 3D object detection

The tasks supported by the initial dataset are binocular, optical flow and odometry. Later, tasks such as depth estimation, 2D target detection, 3D target detection, BEV target detection, semantic segmentation, instance segmentation, and multi-target tracking were successively supported.

In target detection, there are 8 categories defined:

Car/Van/Truck/Pedestrian/Person_sitting/Cyclist/Tram/Misc(其他)

The labeling of 3D objects is carried out in the lidar coordinate system, but it is worth noting that the 3D target detection now only consists of the coordinates xyz of the center point of the detection frame, the length/width/height of the detection frame and the deviation of the detection frame The navigation angle yaw consists of 7 degrees of freedom .

The transfer of the external link image failed. The source site may have an anti-leeching mechanism. It is recommended to save the image and upload it directly.

However, there is also a question that can easily cause ambiguity, which axis length/width/heightis it corresponding xyzto? It can be seen from the figure below lengththat corresponds dx, widthcorresponds dy, heightcorresponds dz. The entire 3Dbox is labeled in the lidar coordinate system.

KITTI 3DThe object detection dataset includes 7481Zhang training data and 7518Zhang testing data. Although KITTIthe data set contains marked 8objects, only Car/Pedestrianthe marked ones are sufficient, which KITTIare officially used to evaluate the algorithm. There are 3 types of detection boxes KITTIused in BenchMark 3D, namely Car/Pedestrian/Cyclist.

The folders included in the downloaded 3Dtarget detection dataset are

  • image_02, pngan image in color format for the left eye
  • label_02, the object label annotated in the left eye color image
  • calib, the coordinate transformation relationship between the sensors
  • velodyne, laser point cloud data
  • plane, in the laser coordinate system, the plane equation of the road surface

The content of each line in the label file is as follows:

Pedestrian 0.00 0 -0.20 712.40 143.00 810.73 307.92 1.89 0.48 1.20 1.84 1.47 8.41 0.01

The fields included are,

  • typeThe type of target, such as Car/Van..., 1 field
  • truncatedFloating point number 0-1, the ratio of the target object leaving the camera's field of view, 1 field
  • occludedinteger 0,1,2,3, 0:all visible, 1:partially occluded, 2:mostly unoccluded, 3:unknown, 1 field
  • alpha, the observation angle of the object, 1 field [ − π , π ] [-\pi, \pi][ π ,p ]
  • bbox, 2Dthe pixel coordinates of the detection frame, x1,y1,x2,y2, 4 fields
  • dimensions, 3Dthe size of the object, height,width,length, the unit is , 3 fields
  • location, 3Dthe center coordinates of the object in the camera coordinate system xyz, the unit ism
  • rotation_y, yawangle, yaw angle, [ − π , π ] [-\pi, \pi][ π ,p ]
  • score, the score of the target object, used to calculate ROCthe curve or MAP.

Points in the camera coordinate system can calibbe transformed into image pixel coordinates by the transformation matrix in .

rotation_yalphaThe difference between and is that alphathe measure is the angle from the camera center to the object center. rotation_yMeasures ythe rotation angle of the object around the axis of the camera coordinate system yaw. xTaking a car as an example, when a car is located in the axis direction in the camera coordinate system , its angle is zero, no matter where rotation_ythe car is in the plane. x,zAs for alphathe angle, only when the car is on the lower axis of the camera coordinate system z, alphathe angle is zero, and zwhen it is off the axis, alphathe angle is not zero.

The formula that can be used to project the laser point cloud into the left eye color image is: X=P2*R0_rect*Tr_vel_to_cam*Y.

R0_rectis 3x3the correction matrix, Tr_vel_to_camand is 3x4the transformation matrix of the radar transformation to the camera coordinate system.

Code combat

open3dRead point cloud using

import numpy as np
import struct
import open3d as o3d

def convert_kitti_bin_to_pcd(binFilePath):
    size_float = 4
    list_pcd = []
    with open(binFilePath, "rb") as f:
        byte = f.read(size_float * 4)
        print(byte)
        while byte:
            x, y, z, intensity = struct.unpack("ffff", byte)
            list_pcd.append([x, y, z])
            byte = f.read(size_float * 4)
    np_pcd = np.asarray(list_pcd)
    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(np_pcd)
    return pcd

bs = "/xx/xx/data/code/mmdetection3d/demo/data/kitti/000008.bin"
pcds = convert_kitti_bin_to_pcd(bs)
o3d.visualization.draw_geometries([pcds])

# save
o3d.io.write_point_cloud('000008.pcd', pcds, write_ascii=False, compressed=False, print_progress=False)

By numpyreading,

def load_bin(bin_file):
    data = np.fromfile(bin_file, np.float32).reshape((-1, 4))
    data = data[:, :-1]
    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(data)
    return pcd

The above code can read and visualize point cloud data, and save it as a pcdformatted point cloud.

The selected 3Dtarget detection task dataset training/image_02/000008.png,

The corresponding tag file 000008.txt,

Car 0.88 3 -0.69 0.00 192.37 402.31 374.00 1.60 1.57 3.23 -2.70 1.74 3.68 -1.29
Car 0.00 1 2.04 334.85 178.94 624.50 372.04 1.57 1.50 3.68 -1.17 1.65 7.86 1.90
Car 0.34 3 -1.84 937.29 197.39 1241.00 374.00 1.39 1.44 3.08 3.81 1.64 6.15 -1.31
Car 0.00 1 -1.33 597.59 176.18 720.90 261.14 1.47 1.60 3.66 1.07 1.55 14.44 -1.25
Car 0.00 0 1.74 741.18 168.83 792.25 208.43 1.70 1.63 4.08 7.24 1.55 33.20 1.95
Car 0.00 0 -1.65 884.52 178.31 956.41 240.18 1.59 1.59 2.47 8.48 1.75 19.96 -1.25

Draw it 000008.pngabove as,

import cv2
img = cv2.imread(img_s)
with open(label_s) as f:
    lines = f.read().split("\n")[:-1]
for item in lines:
    boxes = item.split()[4:8]
    boxes = [float(x) for x in boxes]
    bb = np.array(boxes, dtype=np.int32)
    cv2.rectangle(img, bb[:2], bb[-2:], (0,0,255), 1)
cv2.imwrite("/xx/xx/data/code/mmdetection3d/demo/data/kitti/08_res.png", img)

Look at the two detection boxes in the lower left corner, the edges are not marked well.

The corresponding point cloud data is:

KITTIThe coordinates of the center point given by the data label of 3D object detection 3Dare in the left-eye color camera coordinate system.

The code for using Open3Dthe visual detection frame can be referred to,

"""
from https://github.com/dtczhl/dtc-KITTI-For-
Beginners.git
"""
import matplotlib.pyplot as plt
import matplotlib.image as mping
import numpy as np
import os
import open3d as o3d
import open3d.visualization as o3d_vis
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon


MARKER_COLOR = {
    
    
    'Car': [1, 0, 0],               # red
    'DontCare': [0, 0, 0],          # black
    'Pedestrian': [0, 0, 1],        # blue
    'Van': [1, 1, 0],               # yellow
    'Cyclist': [1, 0, 1],           # magenta
    'Truck': [0, 1, 1],             # cyan
    'Misc': [0.5, 0, 0],            # maroon
    'Tram': [0, 0.5, 0],            # green
    'Person_sitting': [0, 0, 0.5]}  # navy

# image border width
BOX_BORDER_WIDTH = 5

# point size
POINT_SIZE = 0.005


def show_object_in_image(img_filename, label_filename):
    img = mping.imread(img_filename)
    with open(label_filename) as f_label:
        lines = f_label.readlines()
        for line in lines:
            line = line.strip('\n').split()
            left_pixel, top_pixel, right_pixel, bottom_pixel = [int(float(line[i])) for i in range(4, 8)]
            box_border_color = MARKER_COLOR[line[0]]
            for i in range(BOX_BORDER_WIDTH):
                img[top_pixel+i, left_pixel:right_pixel, :] = box_border_color
                img[bottom_pixel-i, left_pixel:right_pixel, :] = box_border_color
                img[top_pixel:bottom_pixel, left_pixel+i, :] = box_border_color
                img[top_pixel:bottom_pixel, right_pixel-i, :] = box_border_color
    plt.imshow(img)
    plt.show()

def show_object_in_point_cloud(point_cloud_filename, label_filename, calib_filename):
    pc_data = np.fromfile(point_cloud_filename, '<f4')  # little-endian float32
    pc_data = np.reshape(pc_data, (-1, 4))
    cloud = o3d.geometry.PointCloud()
    cloud.points = o3d.utility.Vector3dVector(pc_data[:,:-1])
    pc_color = np.ones((len(pc_data), 3))
    calib = load_kitti_calib(calib_filename)
    rot_axis = 2
    with open(label_filename) as f_label:
        lines = f_label.readlines()
        bboxes_3d = []
        for line in lines:
            line = line.strip('\n').split()
            point_color = MARKER_COLOR[line[0]]
            veloc, dims, rz, box3d_corner = camera_coordinate_to_point_cloud(line[8:15], calib['Tr_velo_to_cam'])
            bboxes_3d.append(np.concatenate((veloc, dims, np.array([rz]))))
        bboxes_3d = np.array(bboxes_3d)
        print(bboxes_3d.shape)
        lines = []
        for i in range(len(bboxes_3d)):
            center = bboxes_3d[i, 0:3]
            dim = bboxes_3d[i, 3:6]
            yaw = np.zeros(3)
            yaw[rot_axis] = bboxes_3d[i, 6]
            rot_mat = o3d.geometry.get_rotation_matrix_from_xyz(yaw)
            # bottom center to gravity center
            center[rot_axis] += dim[rot_axis] / 2

            box3d = o3d.geometry.OrientedBoundingBox(center, rot_mat, dim)
            
            line_set = o3d.geometry.LineSet.create_from_oriented_bounding_box(
                box3d)
            line_set.paint_uniform_color(np.array(point_color) / 255.)
            lines.append(line_set)
        
        for i, v in enumerate(pc_data):
            if point_in_cube(v[:3], box3d_corner) is True:
                pc_color[i, :] = point_color
                
        cloud.colors = o3d.utility.Vector3dVector(pc_color)
        o3d_vis.draw([*lines, cloud])

def point_in_cube(point, cube):
    z_min = np.amin(cube[:, 2], 0)
    z_max = np.amax(cube[:, 2], 0)

    if point[2] > z_max or point[2] < z_min:
        return False

    point = Point(point[:2])
    polygon = Polygon(cube[:4, :2])

    return polygon.contains(point)


def load_kitti_calib(calib_file):
    """
    This script is copied from https://github.com/AI-liu/Complex-YOLO
    """
    with open(calib_file) as f_calib:
        lines = f_calib.readlines()

    P0 = np.array(lines[0].strip('\n').split()[1:], dtype=np.float32)
    P1 = np.array(lines[1].strip('\n').split()[1:], dtype=np.float32)
    P2 = np.array(lines[2].strip('\n').split()[1:], dtype=np.float32)
    P3 = np.array(lines[3].strip('\n').split()[1:], dtype=np.float32)
    R0_rect = np.array(lines[4].strip('\n').split()[1:], dtype=np.float32)
    Tr_velo_to_cam = np.array(lines[5].strip('\n').split()[1:], dtype=np.float32)
    Tr_imu_to_velo = np.array(lines[6].strip('\n').split()[1:], dtype=np.float32)

    return {
    
    'P0': P0, 'P1': P1, 'P2': P2, 'P3': P3, 'R0_rect': R0_rect,
            'Tr_velo_to_cam': Tr_velo_to_cam.reshape(3, 4),
            'Tr_imu_to_velo': Tr_imu_to_velo}


def camera_coordinate_to_point_cloud(box3d, Tr):
    """
    This script is copied from https://github.com/AI-liu/Complex-YOLO
    """
    def project_cam2velo(cam, Tr):
        T = np.zeros([4, 4], dtype=np.float32)
        T[:3, :] = Tr
        T[3, 3] = 1
        T_inv = np.linalg.inv(T)
        lidar_loc_ = np.dot(T_inv, cam)
        lidar_loc = lidar_loc_[:3]
        return lidar_loc.reshape(1, 3)

    def ry_to_rz(ry):
        angle = -ry - np.pi / 2

        if angle >= np.pi:
            angle -= np.pi
        if angle < -np.pi:
            angle = 2 * np.pi + angle
        return angle

    h, w, l, tx, ty, tz, ry = [float(i) for i in box3d]
    cam = np.ones([4, 1])
    cam[0] = tx
    cam[1] = ty
    cam[2] = tz
    t_lidar = project_cam2velo(cam, Tr)

    Box = np.array([[-l / 2, -l / 2, l / 2, l / 2, -l / 2, -l / 2, l / 2, l / 2],
                    [w / 2, -w / 2, -w / 2, w / 2, w / 2, -w / 2, -w / 2, w / 2],
                    [0, 0, 0, 0, h, h, h, h]])

    rz = ry_to_rz(ry)

    rotMat = np.array([
        [np.cos(rz), -np.sin(rz), 0.0],
        [np.sin(rz), np.cos(rz), 0.0],
        [0.0, 0.0, 1.0]])

    velo_box = np.dot(rotMat, Box)

    cornerPosInVelo = velo_box + np.tile(t_lidar, (8, 1)).T

    box3d_corner = cornerPosInVelo.transpose()
    dims = np.array([l, w, h])
    # t_lidar: the x, y coordinator of the center of the object
    # box3d_corner: the 8 corners
    print(t_lidar.shape)
    return t_lidar.reshape(-1), dims, rz, box3d_corner.astype(np.float32)


if __name__ == '__main__':

    # updates
    ROOT = "/media/lx/data/code/mmdetection3d/demo/data/kitti"
    IMG_DIR = f'{
      
      ROOT}/image_2'
    LABEL_DIR = f'{
      
      ROOT}/label_2'
    POINT_CLOUD_DIR = f'{
      
      ROOT}/velo'
    CALIB_DIR = f'{
      
      ROOT}/calib'

    # id for viewing
    file_id = 8

    img_filename = os.path.join(IMG_DIR, '{0:06d}.png'.format(file_id))
    label_filename = os.path.join(LABEL_DIR, '{0:06d}.txt'.format(file_id))
    pc_filename = os.path.join(POINT_CLOUD_DIR, '{0:06d}.bin'.format(file_id))
    calib_filename = os.path.join(CALIB_DIR, '{0:06d}.txt'.format(file_id))

    # show object in image
    show_object_in_image(img_filename, label_filename)

    # show object in point cloud
    show_object_in_point_cloud(pc_filename, label_filename, calib_filename)

The visualized result is:

From the above, you can have a basic understanding of the situation KITTIwhere the data set is used for 3Dtarget detection tasks. Later, when using multi-modality, you will add how to combine the 2Ddetection frame to identify and locate the target.


Guess you like

Origin blog.csdn.net/lx_ros/article/details/132592963