Introduction to Computer Vision Datasets: KITTI Dataset

Introduction to the KITTI dataset

The KITTI data set is jointly established by the Karlsruhe Institute of Technology in Germany and the Toyota American Institute of Technology. It is a public data set obtained by collecting data from actual traffic scenes using assembled and well-equipped collection vehicles. The data set contains rich and diverse sensor data (with binocular cameras, 64-line lidar, GPS/IMU integrated navigation and positioning system, which basically meets the needs for image, point cloud and positioning data), a large number of calibration truth values (including detection 2D and 3D bounding boxes, tracking tracklets) and some official development tools, etc.

data collection

collection range

The data is collected from Karlsruhe, Germany, and the schematic diagram of the collection area is as follows: KITTI dataset collection range

Collection platform

The schematic diagram of the collection platform is as follows:

Please add a picture description

Collection platform parameters:

The platform is VW Passat station wagon houses a PC with two six-core Intel XEON X5650 processors and a shock-absorbed RAID 5 hard disk storage with a capacity of 4 Terabytes. Our computer runs Ubuntu Linux (64 bit) and a real-time database to store the incoming data streams.

Sensor list:

2 × PointGray Flea2 grayscale cameras (FL2-14S3M-C), 1.4 Megapixels, 1/2” Sony ICX267 CCD, global shutter
2 × PointGray Flea2 color cameras (FL2-14S3C-C), 1.4 Megapixels, 1/2” Sony ICX267 CCD, global shutter
4 × Edmund Optics lenses, 4mm, opening angle ∼ 90◦, vertical opening angle of region of interest (ROI) ∼ 35◦
1 × Velodyne HDL-64E rotating 3D laser scanner, 10 Hz, 64 beams, 0.09 degree angular resolution, 2 cm distance accuracy, collecting ∼ 1.3 million points/second, field of view: 360◦ horizontal, 26.8◦ vertical, range: 120 m
1 × OXTS RT3003 inertial and GPS navigation system, 6 axis, 100 Hz, L1/L2 RTK, resolution: 0.02m / 0.1◦

Please add a picture description

Note: The definition of the coordinate system in Fig.3. is crucial to the visualization and analysis of subsequent data, as well as the understanding and use of the calibration matrix.

data organization

Sample image display

Here mainly introduces the organizational form of raw data. Raw data was collected on September 26, 28, 29, 30 and October 3, 2011. It contains a total of 180G data, which is divided into , , and Roadfour Citysequences Residential. PersonThe sample image is as follows:
Please add a picture description
Because the car hood and part of the sky area are intercepted , you can see that the width of the above image is relatively small.

data storage structure

For each of the above sequences, the dataset provides 传感器原始数据, 目标的3维bounding box, and 标定文件. The directory structure of the file is as follows:

Please add a picture description
Among them, image_00to image03represents the image sequence collected by the four cameras, which is stored in 8-bit png format; oxtsthe GPS/IMU data is stored in the folder. Each frame of image stores 30 different GPS/IMU data; velodyne_pointsthe folder stores lidar data. date_drive_tracklects.zipIt stores Tracklects data and date_calib.zipstores calibration data. It should be noted that before the collection starts every day, the collectors have calibrated the hardware.

data label

For all moving targets in the field of view, the dataset provides 3D bounding box labels based on Velodyne coordinates. Label categories include Car, Van, Truck, Pedestrain, Person（sitting）, Cyclist, Tramand Misc（eg:, Trailers, Segways）.

Through the development tools provided by the dataset, you can see the data labels as shown below:

Please add a picture description

development tools

The KITTI dataset official website provides many practical development tools, and interested readers can directly read the readme file provided by it.

BenchMark

The KITTI dataset provides BenchMark for multiple CV tasks, such as 3D target detection, target tracking, SLAM, etc. For details, see the KITTI dataset official website .

Introduction to proper nouns

IMU, Inertial Measurement Unit, Inertial Measurement Unit
GPS, Global Positioning System, Global Positioning System
PointGray, point gray
Megapixels, Megapixels
Edmund Optics lenses, Edmund Optics lenses
global shutter, global shutter
opening angle, (shutter) opening angle
Velodyne, Velodyne (radar brand)
field of view, field of view

Related basic knowledge

Note that the color cameras lack in terms of resolution due
to the Bayer pattern interpolation process and are less sensitive
to light. This is the reason why we use two stereo camera
rigs, one for grayscale and one for color [2].