Author丨StrongerTang
Source丨https://zhuanlan.zhihu.com/p/496228909
Editor丨Computer Vision Workshop
Sharing a new work on the direction of lane line detection that I read some time ago, it also won the 2022CVPR of the recently published results. It was completed by Shanghai Jiaotong University, East China Normal University, City University of Hong Kong and SenseTime, and the code has been open sourced.
Paper link: https://arxiv.org/abs/2203.02431
Code link: https://github.com/voldemortX/pytorch-auto-drive
Introduction
Lane detection strategies
As shown in the figure above, deep learning-based lane line detection methods can be divided into three categories: segmentation-based schemes (green example in the figure), point detection-based schemes (blue example in the figure), and polynomial curve-based schemes ( example in yellow in the figure).
Among them, the schemes based on segmentation and point detection generally have better performance, but the representations based on segmentation schemes and point detection schemes are local and indirect, and the abstract factors (a, b, c, d) in polynomial curves are difficult to achieve. optimization. To this end, the article proposes a solution based on the cubic B´ezier curve, namely the red curve and the dashed box in the above figure, because the Bezier curve has the characteristics of easy calculation, stability, and freedom of conversion. In addition, the author also designs a feature flip fusion module based on deformable convolution to explore the symmetry properties of lane lines.
The final paper's scheme achieves new state-of-the-art performance on the lane line detection benchmark dataset LLAMAS while maintaining high speed (>150FPS) and small size (<10M), while being competitive on TuSimple and CULane datasets precision performance.
Supplements related to B´ezier curves
A Bezier curve (take 3rd order as an example) is a smooth curve drawn according to the coordinates of arbitrary points at four positions. It creates and edits graphics by controlling four points on the curve (start point, end point, and two separated intermediate points). The important part is the control line in the center of the curve. This line is virtual, intersects the Bezier curve in the middle, and controls the endpoints at both ends. When moving the endpoints at both ends, the Bezier curve changes the curvature (degree of bending) of the curve; when moving the middle point (that is, moving the virtual control line), the Bezier curve moves evenly with the start and end points locked .
For any order Bezier curve, it can be expressed by the following formula:
The article also conducts comparative experiments on Bezier curves and polynomial equation curves, as shown in the following table. The indicators in the table are the results on the TuSimple test set, the lower the better.
Through the above experiments, the article chooses to use the classic 3rd-order Bezier curve (n=3), because it is found in the experiment that the 3rd-order is sufficient for lane line modeling and has a better fit than the 3rd-order polynomial curve. ability, and the 3rd-order polynomial curve is the basic equation in many previous schemes (so said in the paper). In fact, Xiaotang learned from some of the work he participated in and his exchanges with peers that most of the schemes currently in actual mass production are also is a 3rd order polynomial curve scheme. The article also points out that higher-order curves do not bring corresponding performance gains, but can cause instability due to high degrees of freedom.
The Proposed Architecture
For the input RGB image, the feature map obtained by feature extraction is sent to the feature flip fusion module to obtain the feature map of CxH/16xW/16 size, and then the feature map of CxW/16 obtained by average pooling, and finally after a classification and A regression branch gets the corresponding Bezier curve results.
Feature Flip Fusion
The feature flipping module is one of the main works of this paper.
By modeling the lane lines as historical curves, the paper focuses on the geometric characteristics of each lane line, such as thin, long, continuous and other characteristics. When considering the global structure of lane lines from the perspective of the forward-looking camera, a road has spatially bisected lane lines, approximately symmetrical, eg. The presence of lane lines on the left may imply the presence of corresponding lane lines on the right. The article models this symmetry and designs a feature flipping module for this purpose.
an auxiliary binary segmentation branch
The article also designs an additional two-category segmentation branch on the ResNet backbone to enhance the learning of spatial details. And experimentally found that this extra branch only works when working with the feature flip fusion module. This is because the localization of the segmentation task is beneficial to provide a more spatially accurate feature map, which in turn supports more accurate fusion between flipped feature maps.
This additional binary segmentation branch is only used during training and turned off during inference.
The article illustrates the impact of this design through the Grad-CAM visualization shown above, and details can be found in the original text.
Overall Loss
Because there is no imbalance of positive and negative samples in the lane line detection dataset, a simple weighted cross-entropy loss is used for both classification and segmentation.
Experiments
Results on test set of CULane and TuSimple. *reproduced results in our code framework, best performance from three random runs. **reported from reliable open-source codes from the authors.
ablation studies
Visual example:
The article gives good results, but there are still problems in scenes such as ramps and big turns. Interested friends can run the code to see it. (The following four pictures are the results of running with the open source code and weight of the article. If you are interested, you can run it yourself)
Finally, since I don't read much about lane lines and my writing ability is limited, I welcome criticism and corrections if I have any mistakes. Friends who are interested in lane line detection, autonomous driving, computer vision, etc. are also welcome to join the autonomous driving exchange group to learn and play together! !
This article is for academic sharing only, if there is any infringement, please contact to delete the article.
Dry goods download and study
Backstage reply: Barcelona Autonomous University courseware, you can download the 3D Vison high-quality courseware accumulated by foreign universities for several years
Background reply: computer vision books, you can download the pdf of classic books in the field of 3D vision
Backstage reply: 3D vision courses, you can learn excellent courses in the field of 3D vision
Computer Vision Workshop official website: 3dcver.com
1. Multi-sensor data fusion technology for autonomous driving
2. A full-stack learning route for 3D point cloud target detection in the field of autonomous driving! (Single-modal + multi-modal/data + code)
3. Thoroughly understand visual 3D reconstruction: principle analysis, code explanation, and optimization and improvement
4. The first domestic point cloud processing course for industrial-level combat
5. Laser-vision -IMU-GPS fusion SLAM algorithm sorting
and code
explanation
Indoor and outdoor laser SLAM key algorithm principle, code and actual combat (cartographer + LOAM + LIO-SAM)
9. Build a structured light 3D reconstruction system from scratch [theory + source code + practice]
10. Monocular depth estimation method: algorithm sorting and code implementation
11. The actual deployment of deep learning models in autonomous driving
12. Camera model and calibration (monocular + binocular + fisheye)
13. Heavy! Quadcopters: Algorithms and Practice
14. ROS2 from entry to mastery: theory and practice
15. The first 3D defect detection tutorial in China: theory, source code and actual combat
Heavy! Computer Vision Workshop - Learning Exchange Group has been established
Scan the code to add a WeChat assistant, and you can apply to join the 3D Vision Workshop - Academic Paper Writing and Submission WeChat exchange group, which aims to exchange writing and submission matters such as top conferences, top journals, SCI, and EI.
At the same time , you can also apply to join our subdivision direction exchange group. At present, there are mainly ORB-SLAM series source code learning, 3D vision , CV & deep learning , SLAM , 3D reconstruction , point cloud post-processing , automatic driving, CV introduction, 3D measurement, VR /AR, 3D face recognition, medical imaging, defect detection, pedestrian re-identification, target tracking, visual product landing, visual competition, license plate recognition, hardware selection, depth estimation, academic exchanges, job search exchanges and other WeChat groups, please scan the following WeChat account plus group, remarks: "research direction + school/company + nickname", for example: "3D vision + Shanghai Jiaotong University + Jingjing". Please remark according to the format, otherwise it will not be approved. After the addition is successful, the relevant WeChat group will be invited according to the research direction. Please contact for original submissions .
▲Long press to add WeChat group or contribute
▲Long press to follow the official account
3D vision from entry to proficient knowledge planet : video courses for the field of 3D vision ( 3D reconstruction series , 3D point cloud series , structured light series , hand-eye calibration , camera calibration , laser/vision SLAM, automatic driving, etc. ), summary of knowledge points , entry and advanced learning route, the latest paper sharing, and question answering for in-depth cultivation, and technical guidance from algorithm engineers from various large factories. At the same time, Planet will cooperate with well-known companies to release 3D vision-related algorithm development jobs and project docking information, creating a gathering area for die-hard fans that integrates technology and employment. Nearly 4,000 Planet members make common progress and knowledge to create a better AI world. Planet Entrance:
Learn the core technology of 3D vision, scan and view the introduction, unconditional refund within 3 days
There are high-quality tutorial materials in the circle, which can answer questions and help you solve problems efficiently
I find it useful, please give a like and watch~