ECCV2020 ultra-fast lane line detection algorithm-super fast structure-aware deep lane detection paper


Preface

  With the rapid development of intelligent interconnection, artificial intelligence technology and new energy technology, the development of autonomous driving technology is in full swing.

  As an important part of automatic driving technology, lane line detection is a prerequisite for automatic driving. Therefore, in recent years, the task of lane line detection has become a hot topic in the field of computer vision.

  For the task of lane line detection, there are currently two mainstream methods:

  • Traditional image processing methods

  • Deep segmentation method

  In the early days, there have been many lane line detection algorithms based on traditional image processing. However, with the deepening of research, the scenes for lane line detection tasks have become more and more diversified, and they have gradually moved away from the "white and yellow lines". Low-level understanding.

  At present, more methods are seeking to detect the location of the lane line semantically. Depth segmentation methods naturally have stronger semantic representation capabilities than traditional image processing methods, so the main method of modern lane line detection is to regard it as a pixel-level segmentation problem. But deep segmentation processing lane line detection also has certain limitations.

  This article first briefly introduces the limitations of using deep segmentation, and then introduces the design ideas of an ultra-fast lane line detection algorithm on ECCV2020.

1. Limitations of depth segmentation

  1. The speed is slow
      because the segmentation is classified pixel by pixel, and each pixel in the image must be classified. In order to divide the lane lines, a very intensive calculation is required, which results in a slower speed.

  2.   Another problem of partial receptive field segmentation is the receptive field problem. Because the segmentation is generally full convolution to obtain the segmentation result, and the convolution is basically relatively local, the receptive field of each pixel is limited. It may not be a big problem in other segmentation problems, but it is a big problem in lane line detection.

2. Difficulties in current lane line detection

  Why does the problem of local receptive fields have a greater impact on the task of lane line detection? This is also a major difficulty in current lane line detection.
Because the current difficulty in lane line detection is to find semantic lines, rather than confined to apparent lines. As shown below:

Insert picture description here
  The lane line in the picture is affected by the illumination and blocked by the surrounding vehicles, and the location of the lane line has not been completely recognized. Under such severe occlusion and extreme lighting conditions, lane line recognition is mainly based on background information and global information. Therefore, for the detection of semantic lines in the above figure, a good perception of the overall situation is required to achieve good positioning. That is, we need to have a higher level of semantic analysis of lane lines.

3. Super fast lane line detection algorithm

  In order to solve the two problems of depth segmentation and to simply model the lane line, a new lane line detection algorithm is proposed. Innovations of the paper:

  • Using the lane line prior, design structural loss to add geometric constraints to the network output to solve the problems mentioned above

  • Good results have been obtained on CULane and TuSimple, and the detection speed is above 300fps (the GPU used is 1080Ti)

1. Algorithm definition

  The lane line detection is defined as finding the set of certain line positions of the lane line in the figure, namely selection and classification based on the position in the line direction. As shown below:

Insert picture description here
  As can be seen in the figure, Row anchors are called row anchors, which are predetermined row positions. On each row anchor, the position is divided into many cells. Lane line detection is described as selecting certain cells on predefined row anchors.

2. How to solve the problem of speed

  First of all, we know that if we want to detect the image size of a lane line is H * W , for the segmentation problem, we need to deal with H * W classification problems. But this scheme is a row-wise selection, and only needs to deal with the classification problem on h rows, but the classification problem on each row is W -dimensional. In this way, the original H * W classification problems are simplified to only h classification problems, and since the positioning on which rows can be set manually, the size of h can be set as needed, but generally h is much smaller than the image the height H of.
The following figure shows the comparison between this method and the lane line detection method based on segmentation.
Insert picture description here
  In the figure, we can see that the segmentation algorithm needs to perform (C+1)-dimensional H * W classifications, and the algorithm only needs to solve the (w+1)-dimensional C * h classification problem. therefore,

  • The computational cost of the segmentation algorithm

H * W * (C+1)

  • The computational cost of the algorithm

C * h *(ω+1)

  In general, the number of predefined row anchors and grid size are much smaller than the size of the image, so this method reduces the computational complexity to a very small range, solves the problem of slow segmentation, and greatly improves The speed of the lane line detection algorithm. This method can reach 300+fps.

3. How to solve the "no-visual-clue" problem

  First, we have to make it clear that the problem of no visual cues means that there is no information at the target location. As shown in the figure below:
Insert picture description here
  From the figure we can see that a lane is blocked by a car, but the algorithm can still determine the position of the lane based on the information of other lanes, road shapes and even car directions. So the key to solving the problem of no visual cues is to use information from other locations.

  This paper explains how the algorithm solves the problem of "no visual clues" from two perspectives.

  • Feel the angle of the wild

  Since this algorithm is not a fully convolutional form of segmentation, it is a general classification based on a fully connected layer, so the features it uses are global features. Therefore, when the method detects the lane line position of a certain row, the receptive field is the full image size. Then the contextual information from other locations in the image can be used to solve the problem of no visual cues.

  • Learning perspective

  Using smoothness and rigidity to join the prior constraints of lane lines, in addition to the classification loss, two loss functions are further proposed.

1. similarity loss

  It is derived from the fact that the lanes are continuous, which means that the lane points in adjacent row anchors should be close to each other. In this formula, the position of the lane is represented by the classification vector. Therefore, continuity is achieved by constraining the distribution of classification vectors on adjacent row anchors. The L1 norm classified on adjacent rows is defined as smoothness, and it is hoped that the lane line positions on adjacent rows are similar and change smoothly.
Insert picture description here
2. shape loss

  The loss function focuses on the shape of the lane. Generally speaking, most lanes are straight, that is, bends. Due to the perspective effect, most of it is still straight. In this work, the algorithm uses a second-order difference equation to constrain the shape of the lane. Since lane lines are mostly straight lines, the second-order difference is 0. Therefore, constraining the difference between the second-order difference and 0 can be used in the optimization process. The predicted lane line is straighter.
Insert picture description here
  The final structural loss is:
Insert picture description here
  because the segmentation method obtains a binary segmentation map of lane lines, and its structure is pixel-by-pixel modeling, it is almost impossible to achieve the above-mentioned high-level semantic (smooth, rigid) level constraints, which is also the algorithm Another advantage.

4. Overall structure diagram

Insert picture description here

  The blue line is the auxiliary branch, and the blue line is the main branch. The feature extractor is displayed in the blue box, the classification prediction is displayed in the green box, and the auxiliary segmentation task is displayed in the orange box.

  It should be noted that in order to obtain global context information, the author added a semantic segmentation branch in the network training process, so as to realize the aggregation of global and local information, and delete it in the testing phase. In this way, even if we add additional segmentation tasks, the running speed of the algorithm will not be affected.
  The final total loss is:
Insert picture description here

5. Algorithm shortcomings

  1. Structural loss is mostly used to constrain the longitudinal lane line geometry, and the algorithm has doubts about the detection ability of lane lines in other directions

  2. There is an upper limit on the number of lane lines that can be detected, which needs to be manually set (that is, the parameter C mentioned above)

Guess you like

Origin blog.csdn.net/m0_46988935/article/details/109905176