A 3D lane line detection method PersFormer and its open source OpenLane dataset

Source丨Computer Vision Deep Learning and Autonomous Driving

March 2022 paper "PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark" on arXiv, the author is from Shanghai AI Lab, and several universities.

490429143eb2c6495fb4b584b4e8975d.png

Recently, 3D lane detection methods have emerged to solve the problem of inaccurate estimation of lane layout in many autonomous driving scenarios (uphill/downhill, bumps, etc.). Previous work is difficult in complex environments because the spatial transformation design between front view and bird's eye view (BEV) is too simplistic and lacks ground-truth datasets. In response to these problems, the author proposes PersFormer (Perspective Transformer): an end-to-end monocular 3D lane detector with a Transformer-based spatial feature conversion module. The model takes the camera parameters as a reference and generates BEV features by focusing on relevant front view local regions. PersFormer adopts a unified 2D/3D anchor design and adds an auxiliary task to detect 2D/3D lanes simultaneously, which enhances feature consistency and shares the benefits of multi-task learning. Furthermore, this paper releases one of the first large-scale real-world 3D lane datasets, called OpenLane, with high-quality annotations and scene diversity. OpenLane contains 200,000 frames, over 880,000 instance-level lanes, 14 lane categories (single white dashed line, double yellow solid, left/right curb, etc.), as well as scene labels and Route Proximity Object (CIPO) annotations to encourage development Lane detection and more industry-relevant autonomous driving methods. PersFormer significantly outperforms competing benchmark algorithms on the OpenLAN dataset, as well as Baidu Apollo's own 3D lane synthesis dataset, on the 3D lane detection task, and is also consistent with state-of-the-art algorithms on the OpenLAN 2D task.

The project webpage: https://github.com/OpenPerceptionX/OpenLane.


As shown is an intuitive introduction to the motivation to perform lane detection from 2D in (a) to the BEV in (b): under the plane assumption, the lanes will diverge/converge in the projected BEV, a 3D solution considering the height can be accurate Predict the parallel topology in this case.

4382b7ea89e6ad376f0f7095773518e5.png

First, the spatial feature transformation is modeled as a learning process with an attention mechanism that captures the interactions between local regions in front view features and between two views (front view to BEV map), enabling the generation of detailed Granular BEV feature representation. This paper builds a Transformer-based module to achieve this, while adopting a deformable attention mechanism to significantly reduce computational memory requirements, and dynamically adjust keys through a cross-attention module to capture salient features in local regions. Compared with direct 1-1 transformation via inverse perspective mapping (IPM), the generated features are more representative and robust because it pays attention to the surrounding local environment and aggregates relevant information.

The picture shows the entire PersFormer pipeline: its core is to learn the spatial feature conversion from the front view to the BEV space, pay attention to the local environment around the reference point, and the BEV features generated at the target point will be more representative; PersFormer is self-attention It consists of modules to interact with its own BEV query; the cross-attention module obtains key-value pairs from IPM-based front view features to generate fine-grained BEV features.

39cd2d16641c78be5258ee190e8ba8c5.png

Here the backbone network takes the resized image as input and generates multi-scale front view features. The backbone network adopts the popular ResNet variant, and these features may suffer from scale changes, occlusions, etc. defects from feature extraction inherent in the front view space. Finally, the lane detection head is responsible for predicting 2D and 3D coordinates and lane types. The 2D/3D detection heads are called LaneATT and 3D LaneNet with some modifications to the structure and anchor design.

As shown in the figure, the key is generated in the cross attention: the point (x, y) in the BEV space projects the corresponding point (u, v) in the front view through the intermediate state (x', y'); by learning the offset, The network learns the mapping from the green rectangular box to the yellow target reference point, and the associated blue box as the Transformer key.

72ec789bd0ac6ff1bae09a5513cb36e9.png

A further goal is to unify the 2-D lane detection and 3-D lane detection tasks, using co-learning for optimization. On the one hand, in perspective, 2D lane detection is still of interest; on the other hand, unifying 2D and 3D tasks is naturally feasible, since the BEV features for predicting 3D output come from their counterparts in the 2D branch.

Unifying anchor point design in 2D and 3D: first place the curated anchor point (red) in the BEV space (left) and then project it to the front view (right). The offsets xik and uik (dashed lines) predict the matching of ground truth (yellow and green) to anchor points. This establishes the correspondence and optimizes the features together.

be0d848110d2725e4922a43c28639319.png


For example, the table is a comparison of OpenLane and other benchmarks:

3e4d4803c053e2a70e58667ccd54cc3b.png

The challenges of building a real-world 3-D lane dataset lie primarily in a precise localization system and occlusion. This paper compares several popular sensor datasets, projects 3D object annotations to the image plane, and constructs a 3D scene graph using a learning-based or SLAM algorithm. The figure shows the comparison between OpenLane and other lane line datasets on annotation:

d579318989e4a7389e0672726f0493c4.png

An example of lane marking in OpenLane:

88488cec3df5328fdf29924dee60d83d.png

The experimental results are as follows:

07f46749976de1d80bbecaf63f09943d.png

90127225128641d35763f9bcde599d4b.png

c114ed154096664e8f07c2c94e3a847e.png

2a557b215979fb77ec0b238ed962e85d.png

This article is for academic sharing only, if there is any infringement, please contact to delete the article.

Dry goods download and study

Backstage reply: Barcelona Autonomous University courseware, you can download the 3D Vison high-quality courseware accumulated by foreign universities for several years

Background reply: computer vision books, you can download the pdf of classic books in the field of 3D vision

Backstage reply: 3D vision courses, you can learn excellent courses in the field of 3D vision

3D visual quality courses recommended:

1. Multi-sensor data fusion technology for autonomous driving

2. A full-stack learning route for 3D point cloud target detection in the field of autonomous driving! (Single-modal + multi-modal/data + code)
3. Thoroughly understand visual 3D reconstruction: principle analysis, code explanation, and optimization and improvement
4. The first domestic point cloud processing course for industrial-level combat
5. Laser-vision -IMU-GPS fusion SLAM algorithm sorting
and code
explanation
Indoor and outdoor laser SLAM key algorithm principle, code and actual combat (cartographer + LOAM + LIO-SAM)

9. Build a structured light 3D reconstruction system from scratch [theory + source code + practice]

10. Monocular depth estimation method: algorithm sorting and code implementation

11. The actual deployment of deep learning models in autonomous driving

12. Camera model and calibration (monocular + binocular + fisheye)

13. Heavy! Quadcopters: Algorithms and Practice

14. ROS2 from entry to mastery: theory and practice

15. The first 3D defect detection tutorial in China: theory, source code and actual combat

Heavy! 3DCVer- Academic paper writing and submission  exchange group has been established

Scan the code to add a WeChat assistant, and you can apply to join the 3D Vision Workshop - Academic Paper Writing and Submission WeChat exchange group, which aims to exchange writing and submission matters such as top conferences, top journals, SCI, and EI.

At the same time , you can also apply to join our subdivision direction exchange group. At present, there are mainly 3D vision , CV & deep learning , SLAM , 3D reconstruction , point cloud post-processing , automatic driving, multi-sensor fusion, CV introduction, 3D measurement, VR/AR, 3D face recognition, medical imaging, defect detection, pedestrian re-identification, target tracking, visual product landing, visual competition, license plate recognition, hardware selection, academic exchanges, job search exchanges, ORB-SLAM series source code exchanges, depth estimation and other WeChat groups .

Be sure to note: research direction + school/company + nickname , for example: "3D Vision + Shanghai Jiaotong University + Jingjing". Please note according to the format, it can be quickly passed and invited to the group. Please contact for original submissions .

3b80155126898d6d96a2456f516a5332.png

▲Long press to add WeChat group or contribute

b4a2d45dfe7645ed9991fe28a994e7b2.png

▲Long press to follow the official account

3D vision from entry to proficient knowledge planet : video courses for 3D vision field ( 3D reconstruction series , 3D point cloud series , structured light series , hand-eye calibration , camera calibration , laser/vision SLAM, automatic driving, etc. ), summary of knowledge points , entry and advanced learning route, the latest paper sharing, and question answering for in-depth cultivation, and technical guidance from algorithm engineers from various large factories. At the same time, Planet will cooperate with well-known companies to release 3D vision-related algorithm development jobs and project docking information, creating a gathering area for die-hard fans that integrates technology and employment. Nearly 5,000 Planet members make common progress and knowledge for creating a better AI world. Planet Entrance:

Learn the core technology of 3D vision, scan and view the introduction, unconditional refund within 3 days

801f34e8165c9c585979e931fcee27e8.png

 There are high-quality tutorial materials in the circle, answering questions and solving doubts, and helping you solve problems efficiently

I find it useful, please give a like and watch~  53b443402d1834cbbff8836fc2a778c9.gif

Guess you like

Origin blog.csdn.net/Yong_Qi2015/article/details/124138368