Introduction to KITTI dataset
Basic data
KITTI
It is an open source data set of the Karlsruhe Institute of Technology and Toyota Chicago Research Institute in Germany, which was first published in the Year Month2012
issue.03
20
The corresponding paper Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite was published CVPR2012
on .
KITTI
The data set is collected from Karlsruhe , Germany , including traffic scenes such as urban/suburb/highway. Collected during the day of the year, month2011
and month .09
26/28/29/30
10
03
KITTI
The platform used for data collection is shown in the figure below.
The above platforms include
- 2 1.4 million pixel black and white cameras
- 2 color cameras with 1.4 million pixels
- 4 Edmund Optics
- 1 x 64-line Velodyne 3D laser scanner
- 1 OXTS RT3003 inertial navigation system
As can be seen from the above figure
- In the coordinate system of the camera, the Z axis is facing forward, the Y axis is facing down, and the entire coordinate system is a right-handed coordinate system.
- The X-axis of the lidar is facing straight ahead, the Z-axis is vertically upward, and the Y-axis is determined according to the right-hand rule
IMU/GPS
The orientation of the coordinate system of the system is consistent with that of the lidar
To sum up, KITTI
the data set is composed of 4
a camera, 1
a lidar, and 1
an IMU/GPS
inertial navigation system. What we want to clarify is the coordinate system relationship and time synchronization information between 6
the sensors.
For the size parameters of the sensor, please refer to the figure below.
The collection data folder corresponding to each sensor [date]-drive-sync-[sqquence]
is stored in the directory .6
- image00
- image01
- image02
- image03
- oxts
- velodyne-points
-
Timestamp ,
velodyne-points
there are three timestamp files under the foldertimestamps_start.txt
, the time the laser scanner starts scanning in a weektimestamps_end.txt
, the time at which the laser scanner scans for a week endstimestamps.txt
The time when the laser scans to the front and triggers the camera to take pictures.
-
image data
- The image is after cropping the hood and sky
- The image is the data after de-distortion
-
OSTX data , each frame stores
30
different field values including latitude and longitude/speed/acceleration -
radar data
- The binary file of floating point numbers, each point is composed of 4 floating point numbers,
x,y,z
the coordinates in the radar coordinate system, the reflection intensity of the laserr
- For each scan, approximately
120000
points are obtained3D
. - The lidar rotates counterclockwise around the vertical axis
- The binary file of floating point numbers, each point is composed of 4 floating point numbers,
Sensor Calibration and Time Synchronization
The entire system takes the lidar to rotate once as one frame. When the lidar rotates to a specific position, the camera is triggered to take pictures through the physical contact of a spring plate. Data cannot be collected through this triggering method, but the frequency of data collection IMU
has IMU
reached 100HZ
, IMU
record the time stamp of the collected data, select the data that is closest to the camera collection time as the data of the current frame, and the maximum time error is 5ms
.
Camera calibration , camera internal and external parameter calibration method is A toolbox for
automatic calibration of range and camera sensors using a single shot , all the camera centers are aligned, that is, they are all on the same XY
plane, which is convenient for image correction (remove sky and hood)
Before starting to collect data every day, the entire data acquisition system will be calibrated to avoid positional deviation between sensors.
Under the daily date_calib.zip
folder, there are 3 text files,
-
calib_cam_to_cam.txt
Under the explanation of the last three parameter values, srect ( i ) s^{(i)}_{rect}srect(i)Indicates the size of the image after correction (removing redundant sky and hood); R rect ( i ) \bold{R}^{(i)}_{rect}Rrect(i)Is the rotation matrix used for correction; P rect ( i ) \bold{P}^{(i)}_{rect}Prect(i)is the projection matrix used for calibration. i ∈ { 0 , 1 , 2 , 3 } i\in\{0,1,2,3\}i∈{ 0,1,2,3 } indicates the serial number of the camera, 0 indicates the left grayscale camera, 1 indicates the right grayscale camera, 2 indicates the left color camera, and 3 indicates the right color camera. The grayscale camera 0 on the left is used as the reference camera coordinate system, a3D
pointx = ( x , y , z , 1 ) \bold{x}=(x,y,z,1)x=(x,y,z,1 ) , to transform into the thi
camera image, the following relationship can be used:y = Prect ( i ) R rect ( 0 ) x \bold{y}=\bold{P}^{(i)}_{rect }\bold{R}^{(0)}_{rect}\bold{x}y=Prect(i)Rrect(0)x -
calib_velo_to_cam.txt
,KITTI
the dataset lidar is also calibrated relative to the left grayscale camera 0, the rotation matrix R velocam \bold{R}^{cam}_{velo} from the laser to the cameraRvelocamand translation vector tvelocam \bold{t}^{cam}_{velo}tvelocamComposition, the homogeneous transformation matrix can be written as,
T velocam = ( R velocamtvelocam 0 1 ) \bold{T}^{cam}_{velo}=\begin{pmatrix}R^{cam}_{velo} &t^{cam }_{velo} \\0 &1\end{pmatrix}Tvelocam=(Rvelocam0tvelocam1)
In this way, transform the points in the radar coordinate system3D
to theiiThe formula for i camera coordinate system is: y = Prect ( i ) R rect ( 0 ) T velocamx \bold{y}=\bold{P}^{(i)}_{rect}\bold{R }^{(0)}_{rect}\bold{T}^{cam}_{velo}\bold{x}y=Prect(i)Rrect(0)Tvelocamx -
calib_imu_to_velo.txt
, this file savesimu
the homogeneous transformation matrix T imuvelo \bold{T}_{imu}^{velo} from the coordinate system to the laser coordinate systemTimuvelo, so, the formula for transformingIMU
a point in the coordinate system3D
toi
the pixel coordinates in the th image can be written as
y = P rect ( i ) R rect ( 0 ) T velocam T imuvelox \bold{y}=\bold{P }^{(i)}_{rect}\bold{R}^{(0)}_{rect}\bold{T}^{cam}_{velo}\bold{T}_{imu}^{ velo}\bold{x}y=Prect(i)Rrect(0)TvelocamTimuvelox
For 3D object detection
The tasks supported by the initial dataset are binocular, optical flow and odometry. Later, tasks such as depth estimation, 2D target detection, 3D target detection, BEV target detection, semantic segmentation, instance segmentation, and multi-target tracking were successively supported.
In target detection, there are 8 categories defined:
Car/Van/Truck/Pedestrian/Person_sitting/Cyclist/Tram/Misc(其他)
。
The labeling of 3D objects is carried out in the lidar coordinate system, but it is worth noting that the 3D target detection now only consists of the coordinates xyz of the center point of the detection frame, the length/width/height of the detection frame and the deviation of the detection frame The navigation angle yaw consists of 7 degrees of freedom .
However, there is also a question that can easily cause ambiguity, which axis length/width/height
is it corresponding xyz
to? It can be seen from the figure below length
that corresponds dx
, width
corresponds dy
, height
corresponds dz
. The entire 3D
box is labeled in the lidar coordinate system.
KITTI
3D
The object detection dataset includes 7481
Zhang training data and 7518
Zhang testing data. Although KITTI
the data set contains marked 8
objects, only Car/Pedestrian
the marked ones are sufficient, which KITTI
are officially used to evaluate the algorithm. There are 3 types of detection boxes KITTI
used in BenchMark 3D
, namely Car/Pedestrian/Cyclist
.
The folders included in the downloaded 3D
target detection dataset are
image_02
,png
an image in color format for the left eyelabel_02
, the object label annotated in the left eye color imagecalib
, the coordinate transformation relationship between the sensorsvelodyne
, laser point cloud dataplane
, in the laser coordinate system, the plane equation of the road surface
The content of each line in the label file is as follows:
Pedestrian 0.00 0 -0.20 712.40 143.00 810.73 307.92 1.89 0.48 1.20 1.84 1.47 8.41 0.01
The fields included are,
type
The type of target, such asCar/Van
..., 1 fieldtruncated
Floating point number 0-1, the ratio of the target object leaving the camera's field of view, 1 fieldoccluded
integer0,1,2,3
,0
:all visible,1
:partially occluded,2
:mostly unoccluded,3
:unknown, 1 fieldalpha
, the observation angle of the object, 1 field [ − π , π ] [-\pi, \pi][ − π ,p ]bbox
,2D
the pixel coordinates of the detection frame,x1,y1,x2,y2
, 4 fieldsdimensions
,3D
the size of the object,height,width,length
, the unit is米
, 3 fieldslocation
,3D
the center coordinates of the object in the camera coordinate systemxyz
, the unit ism
rotation_y
,yaw
angle, yaw angle, [ − π , π ] [-\pi, \pi][ − π ,p ]score
, the score of the target object, used to calculateROC
the curve orMAP
.
Points in the camera coordinate system can calib
be transformed into image pixel coordinates by the transformation matrix in .
rotation_y
alpha
The difference between and is that alpha
the measure is the angle from the camera center to the object center. rotation_y
Measures y
the rotation angle of the object around the axis of the camera coordinate system yaw
. x
Taking a car as an example, when a car is located in the axis direction in the camera coordinate system , its angle is zero, no matter where rotation_y
the car is in the plane. x,z
As for alpha
the angle, only when the car is on the lower axis of the camera coordinate system z
, alpha
the angle is zero, and z
when it is off the axis, alpha
the angle is not zero.
The formula that can be used to project the laser point cloud into the left eye color image is: X=P2*R0_rect*Tr_vel_to_cam*Y
.
R0_rect
is 3x3
the correction matrix, Tr_vel_to_cam
and is 3x4
the transformation matrix of the radar transformation to the camera coordinate system.
Code combat
open3d
Read point cloud using
import numpy as np
import struct
import open3d as o3d
def convert_kitti_bin_to_pcd(binFilePath):
size_float = 4
list_pcd = []
with open(binFilePath, "rb") as f:
byte = f.read(size_float * 4)
print(byte)
while byte:
x, y, z, intensity = struct.unpack("ffff", byte)
list_pcd.append([x, y, z])
byte = f.read(size_float * 4)
np_pcd = np.asarray(list_pcd)
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(np_pcd)
return pcd
bs = "/xx/xx/data/code/mmdetection3d/demo/data/kitti/000008.bin"
pcds = convert_kitti_bin_to_pcd(bs)
o3d.visualization.draw_geometries([pcds])
# save
o3d.io.write_point_cloud('000008.pcd', pcds, write_ascii=False, compressed=False, print_progress=False)
By numpy
reading,
def load_bin(bin_file):
data = np.fromfile(bin_file, np.float32).reshape((-1, 4))
data = data[:, :-1]
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(data)
return pcd
The above code can read and visualize point cloud data, and save it as a pcd
formatted point cloud.
The selected 3D
target detection task dataset training/image_02/000008.png
,
The corresponding tag file 000008.txt
,
Car 0.88 3 -0.69 0.00 192.37 402.31 374.00 1.60 1.57 3.23 -2.70 1.74 3.68 -1.29
Car 0.00 1 2.04 334.85 178.94 624.50 372.04 1.57 1.50 3.68 -1.17 1.65 7.86 1.90
Car 0.34 3 -1.84 937.29 197.39 1241.00 374.00 1.39 1.44 3.08 3.81 1.64 6.15 -1.31
Car 0.00 1 -1.33 597.59 176.18 720.90 261.14 1.47 1.60 3.66 1.07 1.55 14.44 -1.25
Car 0.00 0 1.74 741.18 168.83 792.25 208.43 1.70 1.63 4.08 7.24 1.55 33.20 1.95
Car 0.00 0 -1.65 884.52 178.31 956.41 240.18 1.59 1.59 2.47 8.48 1.75 19.96 -1.25
Draw it 000008.png
above as,
import cv2
img = cv2.imread(img_s)
with open(label_s) as f:
lines = f.read().split("\n")[:-1]
for item in lines:
boxes = item.split()[4:8]
boxes = [float(x) for x in boxes]
bb = np.array(boxes, dtype=np.int32)
cv2.rectangle(img, bb[:2], bb[-2:], (0,0,255), 1)
cv2.imwrite("/xx/xx/data/code/mmdetection3d/demo/data/kitti/08_res.png", img)
Look at the two detection boxes in the lower left corner, the edges are not marked well.
The corresponding point cloud data is:
KITTI
The coordinates of the center point given by the data label of 3D object detection 3D
are in the left-eye color camera coordinate system.
The code for using Open3D
the visual detection frame can be referred to,
"""
from https://github.com/dtczhl/dtc-KITTI-For-
Beginners.git
"""
import matplotlib.pyplot as plt
import matplotlib.image as mping
import numpy as np
import os
import open3d as o3d
import open3d.visualization as o3d_vis
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon
MARKER_COLOR = {
'Car': [1, 0, 0], # red
'DontCare': [0, 0, 0], # black
'Pedestrian': [0, 0, 1], # blue
'Van': [1, 1, 0], # yellow
'Cyclist': [1, 0, 1], # magenta
'Truck': [0, 1, 1], # cyan
'Misc': [0.5, 0, 0], # maroon
'Tram': [0, 0.5, 0], # green
'Person_sitting': [0, 0, 0.5]} # navy
# image border width
BOX_BORDER_WIDTH = 5
# point size
POINT_SIZE = 0.005
def show_object_in_image(img_filename, label_filename):
img = mping.imread(img_filename)
with open(label_filename) as f_label:
lines = f_label.readlines()
for line in lines:
line = line.strip('\n').split()
left_pixel, top_pixel, right_pixel, bottom_pixel = [int(float(line[i])) for i in range(4, 8)]
box_border_color = MARKER_COLOR[line[0]]
for i in range(BOX_BORDER_WIDTH):
img[top_pixel+i, left_pixel:right_pixel, :] = box_border_color
img[bottom_pixel-i, left_pixel:right_pixel, :] = box_border_color
img[top_pixel:bottom_pixel, left_pixel+i, :] = box_border_color
img[top_pixel:bottom_pixel, right_pixel-i, :] = box_border_color
plt.imshow(img)
plt.show()
def show_object_in_point_cloud(point_cloud_filename, label_filename, calib_filename):
pc_data = np.fromfile(point_cloud_filename, '<f4') # little-endian float32
pc_data = np.reshape(pc_data, (-1, 4))
cloud = o3d.geometry.PointCloud()
cloud.points = o3d.utility.Vector3dVector(pc_data[:,:-1])
pc_color = np.ones((len(pc_data), 3))
calib = load_kitti_calib(calib_filename)
rot_axis = 2
with open(label_filename) as f_label:
lines = f_label.readlines()
bboxes_3d = []
for line in lines:
line = line.strip('\n').split()
point_color = MARKER_COLOR[line[0]]
veloc, dims, rz, box3d_corner = camera_coordinate_to_point_cloud(line[8:15], calib['Tr_velo_to_cam'])
bboxes_3d.append(np.concatenate((veloc, dims, np.array([rz]))))
bboxes_3d = np.array(bboxes_3d)
print(bboxes_3d.shape)
lines = []
for i in range(len(bboxes_3d)):
center = bboxes_3d[i, 0:3]
dim = bboxes_3d[i, 3:6]
yaw = np.zeros(3)
yaw[rot_axis] = bboxes_3d[i, 6]
rot_mat = o3d.geometry.get_rotation_matrix_from_xyz(yaw)
# bottom center to gravity center
center[rot_axis] += dim[rot_axis] / 2
box3d = o3d.geometry.OrientedBoundingBox(center, rot_mat, dim)
line_set = o3d.geometry.LineSet.create_from_oriented_bounding_box(
box3d)
line_set.paint_uniform_color(np.array(point_color) / 255.)
lines.append(line_set)
for i, v in enumerate(pc_data):
if point_in_cube(v[:3], box3d_corner) is True:
pc_color[i, :] = point_color
cloud.colors = o3d.utility.Vector3dVector(pc_color)
o3d_vis.draw([*lines, cloud])
def point_in_cube(point, cube):
z_min = np.amin(cube[:, 2], 0)
z_max = np.amax(cube[:, 2], 0)
if point[2] > z_max or point[2] < z_min:
return False
point = Point(point[:2])
polygon = Polygon(cube[:4, :2])
return polygon.contains(point)
def load_kitti_calib(calib_file):
"""
This script is copied from https://github.com/AI-liu/Complex-YOLO
"""
with open(calib_file) as f_calib:
lines = f_calib.readlines()
P0 = np.array(lines[0].strip('\n').split()[1:], dtype=np.float32)
P1 = np.array(lines[1].strip('\n').split()[1:], dtype=np.float32)
P2 = np.array(lines[2].strip('\n').split()[1:], dtype=np.float32)
P3 = np.array(lines[3].strip('\n').split()[1:], dtype=np.float32)
R0_rect = np.array(lines[4].strip('\n').split()[1:], dtype=np.float32)
Tr_velo_to_cam = np.array(lines[5].strip('\n').split()[1:], dtype=np.float32)
Tr_imu_to_velo = np.array(lines[6].strip('\n').split()[1:], dtype=np.float32)
return {
'P0': P0, 'P1': P1, 'P2': P2, 'P3': P3, 'R0_rect': R0_rect,
'Tr_velo_to_cam': Tr_velo_to_cam.reshape(3, 4),
'Tr_imu_to_velo': Tr_imu_to_velo}
def camera_coordinate_to_point_cloud(box3d, Tr):
"""
This script is copied from https://github.com/AI-liu/Complex-YOLO
"""
def project_cam2velo(cam, Tr):
T = np.zeros([4, 4], dtype=np.float32)
T[:3, :] = Tr
T[3, 3] = 1
T_inv = np.linalg.inv(T)
lidar_loc_ = np.dot(T_inv, cam)
lidar_loc = lidar_loc_[:3]
return lidar_loc.reshape(1, 3)
def ry_to_rz(ry):
angle = -ry - np.pi / 2
if angle >= np.pi:
angle -= np.pi
if angle < -np.pi:
angle = 2 * np.pi + angle
return angle
h, w, l, tx, ty, tz, ry = [float(i) for i in box3d]
cam = np.ones([4, 1])
cam[0] = tx
cam[1] = ty
cam[2] = tz
t_lidar = project_cam2velo(cam, Tr)
Box = np.array([[-l / 2, -l / 2, l / 2, l / 2, -l / 2, -l / 2, l / 2, l / 2],
[w / 2, -w / 2, -w / 2, w / 2, w / 2, -w / 2, -w / 2, w / 2],
[0, 0, 0, 0, h, h, h, h]])
rz = ry_to_rz(ry)
rotMat = np.array([
[np.cos(rz), -np.sin(rz), 0.0],
[np.sin(rz), np.cos(rz), 0.0],
[0.0, 0.0, 1.0]])
velo_box = np.dot(rotMat, Box)
cornerPosInVelo = velo_box + np.tile(t_lidar, (8, 1)).T
box3d_corner = cornerPosInVelo.transpose()
dims = np.array([l, w, h])
# t_lidar: the x, y coordinator of the center of the object
# box3d_corner: the 8 corners
print(t_lidar.shape)
return t_lidar.reshape(-1), dims, rz, box3d_corner.astype(np.float32)
if __name__ == '__main__':
# updates
ROOT = "/media/lx/data/code/mmdetection3d/demo/data/kitti"
IMG_DIR = f'{
ROOT}/image_2'
LABEL_DIR = f'{
ROOT}/label_2'
POINT_CLOUD_DIR = f'{
ROOT}/velo'
CALIB_DIR = f'{
ROOT}/calib'
# id for viewing
file_id = 8
img_filename = os.path.join(IMG_DIR, '{0:06d}.png'.format(file_id))
label_filename = os.path.join(LABEL_DIR, '{0:06d}.txt'.format(file_id))
pc_filename = os.path.join(POINT_CLOUD_DIR, '{0:06d}.bin'.format(file_id))
calib_filename = os.path.join(CALIB_DIR, '{0:06d}.txt'.format(file_id))
# show object in image
show_object_in_image(img_filename, label_filename)
# show object in point cloud
show_object_in_point_cloud(pc_filename, label_filename, calib_filename)
The visualized result is:
From the above, you can have a basic understanding of the situation KITTI
where the data set is used for 3D
target detection tasks. Later, when using multi-modality, you will add how to combine the 2D
detection frame to identify and locate the target.