Where are the most popular competitions for autonomous driving? Summary of nearly 20+ autonomous driving data sets and Benchmark!

Click the card below to pay attention to the " Automatic Driving Heart " public account

ADAS Jumbo dry goods, you can get it

Recently, many friends have asked about matters related to autonomous driving competitions, and they also want to refer to other people's solutions and technology stacks. Today, I will take stock of some commonly used lists for you! If you have related work to share, please contact us at the end of the article!

>> Click to enter→ The Heart of Autopilot【Full Stack Algorithm】Technical Exchange Group

1.Nuscenes

Dataset link: nuScenes

There are multiple tasks under the nuscenes data set, involving detection (2D/3D), tracking, prediction, lidar segmentation, panoramic tasks, planning control and other tasks;

The nuScenes dataset is a large-scale autonomous driving dataset with three-dimensional target annotations. It is also a benchmark for mainstream algorithm evaluation. Its characteristics:

● Full sensor suite (1 lidar, 5 radars, 6 cameras, IMU, GPS)

● 1000 scenes of 20s

● 1,400,000 camera images

● 390,000 lidar scans

● Two different cities: Boston and Singapore

● left traffic vs right traffic

● Detailed map information

● 1.4M 3D bounding boxes manually annotated for 23 object classes

8863aa6d9ab3a513715bf97c4f8c4dd8.png

All data sets can be downloaded in Knowledge Planet: Knowledge Planet of the Heart of Autopilot (the first autonomous driving technology exchange community in China)

2. KITTY

Dataset official website: The KITTI Vision Benchmark Suite (cvlibs.net)

The ITTI data set was jointly established by the Karlsruhe Institute of Technology in Germany and the Toyota American Institute of Technology. The data set is used to evaluate stereo vision (stereo), optical flow (optical flow), visual odometry (visual odometry), and 3D object detection (object detection) and 3D tracking (tracking) and other computer vision techniques in the vehicle environment performance. KITTI contains real image data collected from scenes such as urban areas, rural areas, and highways. There are up to 15 vehicles and 30 pedestrians in each image, as well as various degrees of occlusion and truncation. The entire dataset consists of 389 pairs of stereo images and optical flow maps, a 39.2 km visual odometry sequence, and images of over 200k 3D labeled objects, sampled and synchronized at a frequency of 10Hz. Overall, the original dataset is categorized as 'Road', 'City', 'Residential', 'Campus' and 'Person'. For 3D object detection, the label is subdivided into car, van, truck, pedestrian, pedestrian(sitting), cyclist, tram and misc.

Because of the small amount of data, many algorithm verifications are currently on nuscenes~~~

bb4b94c9d802e66e181445e6caa6ecec.png 7436d6782ff1b78238960614699a6279.png

3. Wamyo

Year: 2020;

Authors: Waymo LLC and Google LLC

Number of scenes: 1150 scenes in total, mainly collected from San Francisco, Mountain View, Phoenix, etc.;

Number of categories: 4 categories in total, namely Vehicles, Pedestrians, Cyclists and Signs;

Whether 360° collection: yes;

Total data: 2030 segments in total, each segment is 20 seconds long;

Total number of annotations: about 12,600,000 3D annotation boxes;

Sensor model: including 1 mid-range LiDAR, 4 short-range LiDARs, 5 cameras (front and side), and LiDAR and camera are synchronized and calibrated;

Dataset link: https://waymo.com/open/;

Introduction: Waymo is one of the most important data sets in the field of autonomous driving, with a large scale, mainly used to support the research of autonomous driving perception technology. Waymo mainly consists of two datasets, Perception Dataset and Motion Dataset. Among them, Perception Dataset includes 3D annotations, 2D panoramic segmentation annotations, key point annotations, 3D semantic segmentation annotations, etc. The Motion Dataset is mainly used for the research of interactive tasks. It contains a total of 103,354 20s clips, marking different objects and corresponding 3D map data.

2772b58e0adbd4fa02a22382954f859b.png

4.BDD100K

The BDD100K data set was released in May 2018 by the University of Berkeley AI Laboratory (BAIR), and a picture annotation system was designed at the same time. The BDD100K dataset contains 100,000 high-definition videos, each video is about 40 seconds/720p/30 fps. The key frames are sampled at the 10th second of each video to obtain 100,000 pictures with a resolution of 1280*720, and they are labeled. The database set contains pictures of different weather, scenes, and time, and has the characteristics of large scale and diversity.

Main tasks: video, maneuverable area, lane line, semantic segmentation, strength segmentation, panoramic segmentation, MOT, detection tasks, Pose, etc.;

Dataset link: Berkeley DeepDrive

4ad077571cb6be14209047770c8b6552.png

5. Lyft L5 Dataset

Year: 2019;

By: Woven Planet Holdings;

Number of scenes: a total of 1805 scenes, outdoor;

Number of categories: 9 categories in total, including Car, Pedestrian, traffic lights, etc.;

Whether 360° collection: yes;

Total data: including 46,000 image data and corresponding point cloud data;

Total number of annotations: about 1300,000 3D annotation boxes;

Sensor model: including 2 LiDARs, 40 lines and 64 lines respectively, installed on the roof and bumper, with a resolution of 0.2°, collecting about 216,000 points at 10Hz. In addition, it also includes 6 360° cameras and 1 telephoto camera, and the camera has the same collection frequency as LiDAR.

Dataset link: https://level-5.global/data/;

Introduction: Lyft L5 is a complete set of L5-level autonomous driving datasets, which is said to be "the industry's largest autonomous driving public dataset", covering Prediction Dataset and Perception Dataset. Among them, the Prediction Dataset covers various targets encountered by the autonomous driving test team along the Palo Alto route, such as Cars, Cyclists and Pedestrians. The Perception Dataset covers the real data collected by the LiDARs and cameras of the self-driving fleet, and manually labels a large number of 3D bounding boxes.

6. H3D data set

Year: 2019;

Author: Honda Research Institute;

Number of scenes: a total of 160 scenes, outdoor;

Number of categories: 8 categories in total;

Whether 360° collection: No;

Total data: including 27,000 image data and corresponding point cloud data;

Total number of annotations: about 1100,000 3D annotation boxes;

Sensor model: equipped with a total of 3 cameras, the model is Grasshopper 3, and the resolution is 1920x1200. Except for the camera on the back, the FOV is 80°, and the FOV of the other two cameras is 90°. A 64-line LiDAR is used, and the model is Velodyne HDL64E S2, and a GNSS+IMU model ADMA-G;

Dataset link: http://usa.honda-ri.com/H3D;

Introduction: Honda Research Institute released its unmanned driving direction data set H3D in March 2019. The dataset includes 3D multi-object detection and tracking data collected using a 3D LiDAR scanner, and contains 160 congested and highly interactive traffic scenes with over 1 million labeled instances in 27,721 frames.

Main tasks include:

856c4a56d0f3c48fe13cadb5e295aacf.png

7. ApplloScape dataset

Year: 2019;

Author: Baidu Research;

Number of scenes: a total of 103 scenes, outdoor;

Number of categories: 26 categories in total, including small vehicles, big vehicles, pedestrian, motorcyclist, etc.;

Whether 360° collection: No;

Total data: including 143,906 image data and corresponding point cloud data;

Total number of labels: The total number of labels is unknown;

Sensor model: A total of 2 VUX-1HA laser scanners, 6 VMX-CS6 cameras (two front cameras with a resolution of 3384x2710), and an IMU/GNSS device are configured; laser scanners use two laser beams to scan their The surrounding environment, compared with the commonly used Velodyne HDL64E, the scanner can obtain a higher density point cloud with higher accuracy (5mm/3mm);

Dataset link: http://apolloscape.auto/index.html;

Introduction: ApolloScape consists of RGB videos and corresponding dense point clouds. Contains more than 140K pictures, and each picture has pixel-level semantic information. The data collected in China, so compared with some foreign data sets, the ApolloScape data set contains more complex traffic scenes and a large number of various targets, and is similar to the KITTI data set, which also includes three subsets of Easy, Moderate, and Hard .

The main tasks include: lane line, positioning, trajectory prediction, detection, tracking, binocular, scene recognition, etc.;

1865d6ab99c38679c3b92a6e7951ffa2.png

8. Argoverse dataset

Year: 2019;

Author: Argo AI, etc.;

Number of scenes: 113 scenes in total, outdoor, including USA, Pennsylvania, Miami, Florida, etc.;

Number of categories: 15 categories in total, including Vehicle, Pedestrian, Stroller, Animal, etc.;

Whether 360° collection: yes;

Total data: including 44,000 image data and corresponding point cloud data;

Total number of annotations: about 993,000 3D annotation boxes;

Sensor model: Similar to KITTI and nuScenes, the Argoverse dataset is configured with two 32-line LiDAR sensors, model VLP-32. At the same time, including 7 high-resolution surround-view cameras with a resolution of 1920x1200 and 2 front-facing cameras with a resolution of 2056x2464;

Dataset link: https://www.argoverse.org/;

Main tasks: 3D tracking, motion prediction and other tasks

What it is: The data in Argoverse comes from a subset of areas where Argo AI's self-driving test vehicles operate in Miami and Pittsburgh, two U.S. cities with different urban driving challenges and local driving habits. Includes recordings of sensor data or "log segments" across different seasons, weather conditions and times of day to provide a wide range of real-world driving scenarios. It contains 3D tracking annotations of a total of 113 scenes, each segment is 15-30 seconds long, and contains a total of 11052 tracking targets. Among them, 70% of the marked objects are vehicles, and the rest are pedestrians, bicycles, motorcycles, etc.; in addition, Argoverse contains high-definition map data, mainly including 290 kilometers of lane maps in Pittsburgh and Miami, such as location, connection, traffic signal, elevation, etc. information.

8b9c2f7f177251db87233f993627a9bc.png

9. Argoversev2 dataset

Argoverse 2 is a collection of open-source autonomous driving data and high-definition (HD) maps from six U.S. cities: Austin, Detroit, Miami, Pittsburgh, Palo Alto, and Washington, DC. This release builds on the debut of the Argovverse ("Argoverse1"), one of the first data releases to include high-definition maps for machine learning and computer vision research.

Argoverse 2 includes four open source datasets:

Argoverse 2 Sensor Dataset: Contains 1000 3D annotated scenes with lidar, stereo and ring camera images. This dataset improves upon the Argoverse 1 3D tracking dataset;

Argoverse 2 Motion Prediction Dataset: Contains 250,000 scenes with trajectory data for many object types. This dataset improves upon the Argoverse 1 motion prediction dataset;

Argoverse 2 lidar dataset: contains 20,000 unlabeled lidar sequences;

Argoverse 2 Map Changes Dataset: Contains 1000 scenes, 200 of which describe real-world HD map changes!

The Argoverse 2 datasets share a common HD map format that is richer than the HD maps in Argoverse 1. The Argoverse 2 datasets also share a common API that allows users to easily access and visualize data and maps.

3cb5e9d506d081aaff582e2d29e82ed9.png 8bbc98f396362a09f4a84ce76045b632.png

10. Occ3D

Produced by Tsinghua University and NVIDIA, the first large-scale occupancy grid benchmark!

Dataset link: Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving (tsinghua-mars-lab.github.io)

The authors generated two 3D occupancy prediction datasets, Occ3D nuScenes and Occ3D Waymo. Occ3D nuScenes contains 600 scenes for training, 150 scenes for validation and 150 scenes for testing, totaling 40000 frames. It has 16 public classes and an additional Generic Object (GO) class. Each sample covers a range of [-40m, -40m, -1m, 40m, 40m, 5.4m] and a voxel size of [0.4m, 0.4m, 0.4m]. Occ3D Waymo contains 798 sequences for training and 202 for validation, accumulating 200,000 frames. It has 14 known object classes and an additional GO class. Each sample covers a range of [-80m, -80m, -1m, 80m, 80m, and 5.4m] with a very fine voxel size of [0.05m, 0.05m, 0.05m].

e40cb4449eb479272bfb9514590ffaf0.png

11.nuPlan

nuPlan is the world's first large-scale planning benchmark for autonomous driving, although there is a growing body of ML-based motion planners, the lack of established datasets, simulation frameworks, and metrics has limited progress in the field. Existing benchmarks for motion prediction for autonomous vehicles (Argovest, Lyft, Waymo) focus on short-term motion prediction of other agents rather than long-term planning for the ego car. This leads to previous work using L2-based metrics for open-loop evaluation, which is not suitable for fair evaluation of long-term planning. This benchmark overcomes these limitations by providing a training framework to develop machine learning-based planners, a lightweight closed-loop simulator, metrics specific to motion planning, and an interactive tool to visualize results.

Provides a large-scale dataset containing 1200 hours of human driving data from 4 cities in the US and Asia (Boston, Pittsburgh, Las Vegas, and Singapore). Datasets are automatically labeled using a state-of-the-art Offline Perception system. In contrast to existing datasets of this size, not only 3D boxes of objects detected in the dataset are released, but also 10% of the raw sensor data (120h).

Dataset link: nuPlan (nuscenes.org)

71dd562d009590f81d969a6563b9d13f.png

12.ONCE (One Million Scenes)

● Publisher: Huawei

● Release date: 2021

● Introduction: ONCE (One millioN sCenEs) is a 3D object detection dataset in autonomous driving scenarios. The ONCE dataset consists of 1 million LiDAR scenes and 7 million corresponding camera images. The data was drawn from 144 driving hours, 20 times longer than other available 3D autonomous driving datasets such as nuScenes and Waymo, and was collected over a range of different regions, time periods and weather conditions. Consists of: 1 million LiDAR frames, 7 million camera images 200 km2 driving area, 144 driving hours 15k fully annotated scenes divided into 5 categories (cars, buses, trucks, pedestrians, cyclists people) Diverse environments (day/night, sunny/rainy, urban/suburban).

● Download address: https://opendatalab.org.cn/ONCE

● Paper address: https://arxiv.org/pdf/2106.1103

13.Cityscape

● Publisher: TU Darmstadt Max Planck Institute for Informatics ● Release time: 2016

● Introduction: Cityscapes is a large database focused on semantic understanding of urban street scenes. It provides semantic, instance and dense pixel annotations for 30 classes grouped into 8 categories (plane, human, vehicle, construction, object, nature, sky and void). The dataset consists of approximately 5000 finely annotated images and 20000 coarsely annotated images. Data was captured in 50 cities over several months, during daylight hours and under good weather conditions. It was originally recorded as a video, so frames were manually selected to feature: lots of dynamic objects, different scene layouts and different backgrounds.

● Download link: https://opendatalab.org.cn/CityScapes

● Paper address: https://arxiv.org/pdf/1604.0168

14.YouTube Driving Dataset

● Publisher: The Chinese University of Hong Kong · University of California ● Release time: 2022

● Introduction: Grab first view driving videos from YouTube. 134 videos with a total length of more than 120 hours were collected. The videos cover different driving scenarios with various weather conditions (sunny, rainy, snowy, etc.) and regions (rural and urban areas). One frame is sampled every second, resulting in a data set of 1.3 million frames. Divide the YouTube driving dataset into a training set with 70% data and a test set with 30% data, and perform ACO training on the training set.

● Download address: https://opendatalab.org.cn/YouTube_Driving_Dataset

● Paper address: https://arxiv.org/pdf/2204.02393.pdf

15. A2D2

● Publisher: Audi

● Release time: 2020

● Introduction: We have released the Audi Autonomous Driving Dataset (A2D2) to support startups and academic researchers working on autonomous driving. Equipping vehicles with multimodal sensor suites, recording large datasets, and labeling them is time-consuming and laborious. The A2D2 dataset removes this high barrier to entry and allows researchers and developers to focus on developing new technologies. The dataset has 2D semantic segmentation, 3D point cloud, 3D bounding box and vehicle bus data.

● Download address: https://opendatalab.org.cn/A2D2

● Paper address: https://arxiv.org/pdf/2004.0632

16. Cam2BEV

● Publisher: RWTH Aachen University

● Release time: 2020

This dataset contains two synthetic, semantically segmented subsets of road scene images that were developed and applied for the one described in the paper "A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented". method created. This dataset is available through the official code implementation of the Cam2BEV method described on Github.

Dataset link: Cam2BEV-OpenDataLab

17.SemanticKITTI

● Publisher: University of Bonn

● Release time: 2019

This is a large-scale dataset based on the KITTI Vision Benchmark and uses all sequences provided by the odometry task. We provide dense annotations for each individual scan of sequence 00-10, which enables the use of multiple sequential scans for semantic scene interpretation, such as semantic segmentation and semantic scene completion. The remaining sequences, namely sequences 11-21, are used as the test set, showing a large number of challenging traffic situations and environment types. Labels for the test set were not provided, we used the evaluation service to score submissions and provide test set results.

● Download address: https://opendatalab.org.cn/SemanticKITTI

● Paper address: https://arxiv.org/pdf/1904.0141

18. OpenLane

● Publisher: Shanghai Artificial Intelligence Laboratory · Shanghai Jiao Tong University · SenseTime Research Institute

● Release time: 2022

OpenLane is the first real-world and largest 3D lane dataset to date. Our dataset collects valuable content from the public perception dataset Waymo Open Dataset and provides Lane and Closest Path Object (CIPO) annotations for 1000 road segments. In short, OpenLane has 200K frames and over 880K carefully annotated lanes. We made the OpenLane dataset publicly available to help the research community advance 3D perception and autonomous driving technologies.

● Download address: https://opendatalab.org.cn/OpenLane

● Paper address: https://arxiv.org/pdf/2203.11089.pdf

19. OpenLane-V2

● Publisher: Shanghai Artificial Intelligence Laboratory

● Release time: 2023

The world's first road structure perception and reasoning benchmark for autonomous driving. The first task of the dataset is scene structure perception and reasoning, which requires the model to be able to recognize the drivable status of lanes in the surrounding environment. The tasks of this dataset include not only lane centerline and traffic element detection, but also topological relationship recognition of detected objects.

● Download link: https://opendatalab.org.cn/OpenLane-V2

① Exclusive video courses on the whole network

BEV perception, millimeter-wave radar vision fusion, multi-sensor calibration, multi-sensor fusion, multi-modal 3D object detection, point cloud 3D object detection, object tracking, Occupancy, cuda and TensorRT model deployment, collaborative perception, semantic segmentation, autonomous driving simulation , sensor deployment, decision planning, trajectory prediction and other learning videos (scan code learning)

ca4bf9c23535ed83bd2f6f2536d2426f.png Video official website: www.zdjszx.com

② The first autonomous driving learning community in China

A communication community of nearly 2,000 people, involving 30+ autonomous driving technology stack learning routes, who want to learn more about autonomous driving perception (2D detection, segmentation, 2D/3D lane lines, BEV perception, 3D object detection, Occupancy, multi-sensor fusion, Multi-sensor calibration, target tracking, optical flow estimation), automatic driving positioning and mapping (SLAM, high-precision map, local online map), automatic driving planning control/trajectory prediction and other technical solutions, AI model deployment in actual combat, industry trends, Job announcement, welcome to scan the QR code below to join the knowledge planet of the heart of autonomous driving, this is a place with real dry goods, communicate with the field leaders about various problems in getting started, studying, working, and job-hopping, and share papers + codes daily +Video , looking forward to communication!

20e9e3fa1d6f521a73aa6460419dcc3b.png

③【Heart of Autopilot】Technical exchange group

The Heart of Autopilot is the first autopilot developer community, focusing on object detection, semantic segmentation, panoramic segmentation, instance segmentation, key point detection, lane lines, object tracking, 3D object detection, BEV perception, multi-modal perception, Occupancy, Multi-sensor fusion, transformer, large model, point cloud processing, end-to-end automatic driving, SLAM, optical flow estimation, depth estimation, trajectory prediction, high-precision map, NeRF, planning control, model deployment, automatic driving simulation test, products Manager, hardware configuration, AI job search and communication , etc. Scan the QR code to add Autobot Assistant WeChat to invite to join the group, note: school/company + direction + nickname (quick way to join the group)

f41cbb680730f23fcfc9e312c1f4395c.jpeg

④【Automatic Driving Heart】Platform matrix, welcome to contact us!

adad621de86e939d4e0f5adf5882a6eb.jpeg

Guess you like

Origin blog.csdn.net/CV_Autobot/article/details/132158219