Introduction to the KITTI dataset
The KITTI data set is jointly established by the Karlsruhe Institute of Technology in Germany and the Toyota American Institute of Technology. It is a public data set obtained by collecting data from actual traffic scenes using assembled and well-equipped collection vehicles. The data set contains rich and diverse sensor data (with binocular cameras, 64-line lidar, GPS/IMU integrated navigation and positioning system, which basically meets the needs for image, point cloud and positioning data), a large number of calibration truth values (including detection 2D and 3D bounding boxes, tracking tracklets) and some official development tools, etc.
data collection
collection range
The data is collected from Karlsruhe, Germany, and the schematic diagram of the collection area is as follows:
Collection platform
The schematic diagram of the collection platform is as follows:
Collection platform parameters:
The platform is VW Passat station wagon houses a PC with two six-core Intel XEON X5650 processors and a shock-absorbed RAID 5 hard disk storage with a capacity of 4 Terabytes. Our computer runs Ubuntu Linux (64 bit) and a real-time database to store the incoming data streams.
Sensor list:
- 2 × PointGray Flea2 grayscale cameras (FL2-14S3M-C), 1.4 Megapixels, 1/2” Sony ICX267 CCD, global shutter
- 2 × PointGray Flea2 color cameras (FL2-14S3C-C), 1.4 Megapixels, 1/2” Sony ICX267 CCD, global shutter
- 4 × Edmund Optics lenses, 4mm, opening angle ∼ 90◦, vertical opening angle of region of interest (ROI) ∼ 35◦
- 1 × Velodyne HDL-64E rotating 3D laser scanner, 10 Hz, 64 beams, 0.09 degree angular resolution, 2 cm distance accuracy, collecting ∼ 1.3 million points/second, field of view: 360◦ horizontal, 26.8◦ vertical, range: 120 m
- 1 × OXTS RT3003 inertial and GPS navigation system, 6 axis, 100 Hz, L1/L2 RTK, resolution: 0.02m / 0.1◦
Note: The definition of the coordinate system in Fig.3. is crucial to the visualization and analysis of subsequent data, as well as the understanding and use of the calibration matrix.
data organization
Sample image display
Here mainly introduces the organizational form of raw data. Raw data was collected on September 26, 28, 29, 30 and October 3, 2011. It contains a total of 180G data, which is divided into , , and Road
four City
sequences Residential
. Person
The sample image is as follows:
Because the car hood and part of the sky area are intercepted , you can see that the width of the above image is relatively small.
data storage structure
For each of the above sequences, the dataset provides 传感器原始数据
, 目标的3维bounding box
, and 标定文件
. The directory structure of the file is as follows:
Among them, image_00
to image03
represents the image sequence collected by the four cameras, which is stored in 8-bit png format; oxts
the GPS/IMU data is stored in the folder. Each frame of image stores 30 different GPS/IMU data; velodyne_points
the folder stores lidar data. date_drive_tracklects.zip
It stores Tracklects data and date_calib.zip
stores calibration data. It should be noted that before the collection starts every day, the collectors have calibrated the hardware.
data label
For all moving targets in the field of view, the dataset provides 3D bounding box labels based on Velodyne coordinates. Label categories include Car
, Van
, Truck
, Pedestrain
, Person(sitting)
, Cyclist
, Tram
and Misc(eg:, Trailers, Segways)
.
Through the development tools provided by the dataset, you can see the data labels as shown below:
development tools
The KITTI dataset official website provides many practical development tools, and interested readers can directly read the readme file provided by it.
BenchMark
The KITTI dataset provides BenchMark for multiple CV tasks, such as 3D target detection, target tracking, SLAM, etc. For details, see the KITTI dataset official website .
Introduction to proper nouns
- IMU, Inertial Measurement Unit, Inertial Measurement Unit
- GPS, Global Positioning System, Global Positioning System
- PointGray, point gray
- Megapixels, Megapixels
- Edmund Optics lenses, Edmund Optics lenses
- global shutter, global shutter
- opening angle, (shutter) opening angle
- Velodyne, Velodyne (radar brand)
- field of view, field of view
Related basic knowledge
- Note that the color cameras lack in terms of resolution due
to the Bayer pattern interpolation process and are less sensitive
to light. This is the reason why we use two stereo camera
rigs, one for grayscale and one for color [2].