Lane Line Detection Traditional Method & Deep Learning Method Overview + Two Papers Lead Reading LaneATT+LaneNet

Lane line detection is a basic module in automatic driving, lane keeping, adaptive cruise, automatic lane change; it is also very important for subsequent lane departure or trajectory planning decisions of fully automatic driving vehicles.

At present, there are two main schemes for lane line detection: traditional methods and deep learning .

1. Traditional method

(1) Edge detection + Hough transform
Method flow: color image to grayscale, blurring, edge detection, Hough transform
This method can generally detect the two lane lines currently driving a vehicle in a simple scene, and occasionally Adjacent lanes (depending on the angle of the forward looking camera). This method can use the result of Hough transform (the slope of the line) to further filter out the left and right lane lines. But at the same time, this method also relies on the results of edge detection, so parameter adjustment (edge ​​detection, Hough transform) and other tricks (ROI selection, etc.) are very important.

(2) Color threshold
Method flow: Convert the image to color space (general HSV), set the threshold value for each channel in the new color space (the value greater than the threshold value is 1, and the value less than the value is 0), and the result is obtained.
This method relies on the selection of the threshold of each channel, and only needs to adjust a few threshold parameters, but sometimes the robustness will be poor, for example, the vehicles in front of the current vehicle may all be set to 1.

(3) Perspective transformation
Method flow: obtain perspective transformation matrix, perspective transformation, lane line detection The
advantage of this method is that the image captured by the front-view camera can be converted into a bird's-eye view, and multiple lines can be detected. The key lies in the accuracy of the perspective transformation matrix (without considering the converted lane line detection). For the converted bird's-eye view, the lane line can be detected in the above two ways.
insert image description here
insert image description here

Edge detection, k-Means, Gaussian probability model

(1) Udacity lane line detection

  • Camera calibration, openCV

Reasons: sensor manufacturing error, nonlinear radial distortion, tangential distortion

Reference: https://zhuanlan.zhihu.com/p/87334006 ; https://blog.csdn.net/a083614/article/details/78579163

  • ROI selection, image perspective transformation, converting the image to a top view
  • Binarize and find lane lines

insert image description here
insert image description here

Sliding windows and polynomial fitting

  • The least squares method fits the center point to form the lane line

(2) Lane line detection based on projection

  • Image calibration, ROI perspective

Original image:

insert image description here

Binarization:

insert image description here

  • Project the binarized point onto the abscissa

insert image description here

Choose the corresponding maximum value:

insert image description here

  • Ransac (Random Sampling Consensus Algorithm) Polynomial Fitting Nearby Points

insert image description here

The effect of projecting back to the original image:

insert image description here

In actual scenarios, the robustness of the traditional method is not good. In addition to the influence of illumination and adjacent vehicles, the indicator arrows in the middle of the lane and the sidewalk are also challenges that such algorithms are difficult to handle.

2. Deep Learning

The wave of deep learning has made great progress in lane line detection

Limitations of lane line detection work (continuous, i.e. generally present so far):

insert image description here

The baseline of lane line detection work is not clear, and different methods and different scene applications have their own limitations. For example:

  • Output type : mask mask/point set/vector line
  • Instantiation : Whether each lane line forms an instance
  • Classification : Whether to classify the lane lines (single white, double yellow, etc.)
  • Parameters defined in advance : whether only a fixed number of lane lines can be detected
  • Lane markings : Whether to detect the driving markings on the lane

This will affect the labeling of the data and the output form of the network, and what is ultimately needed is the equation of the lane line in the world coordinate system. The neural network is more suitable for extracting features at the image level, and the parameters of the direct regression equation have more restrictions. Therefore, relatively complex post-processing is required to solve the problem of real coordinates after the output of network reasoning.

The lane detection problem is often formulated as a segmentation task, where, given an input image, the output is a segmentation map with per-pixel predictions.

Performance:

insert image description here

When judging True or False, there are two main ways:

  • end point: By judging whether the distance between the endpoints of the line and its enclosing area exceeds the threshold
  • IOU: directly calculate the overlapping area of ​​IOU

IOU=(A∩B)/(A∪B)

IOU=SI/(SA+SB-SI)

Work pipeline:

insert image description here

The current mainstream method pipeline is divided into multi-stage and single-stage.

  • The multi-stage can be divided into two parts, binary semantic segmentation generates a mask map and fits a line to the mask map. Among them, the binary semantic segmentation mainly adopts the CNN method, and through SCNN (Spatial As Deep: Spatial CNN for Traffic Scene Understanding), CNN+RNN (Robust Lane Detection from Continuous Driving Scenes Using Deep Neural Networks), GAN (EL-GAN: Embedding Loss Driven Generateive Adversarial Networks for Lane Detection) and other methods to improve the accuracy of semantic segmentation. For the line fitting of the mask image, the learned conversion matrix is ​​used to first convert the segmentation result to the perspective of the bird's-eye view, and then, the uniform point selection + least square method is used for fitting, and the fitting equation can be a cubic equation.
  • For the single-stage method, it is to directly return the outgoing parameters, that is, modify the branch on the CNN, and use a special layer to output the parameters.

datasets:

  • Data balance of each scene category is required, such as highway, auxiliary road, winding mountain road, night, rainy day and other data
  • Check and filter out better quality pictures, such as highway data at night and videos of driving in the rain are blurry
  • Similar pictures are drawn and marked, and one picture can be extracted for every 10 pictures. For example, if multiple pictures at low speed are similar, the accuracy rate will be falsely high.
  • Augment pictures of small categories, such as viewing the histogram of lane line coefficients, and then rotate them slightly to make the distribution of each coefficient more reasonable
  • Scale and normalize data to speed up convergence

Commonly used lane line detection datasets include TuSimple, CULane, BDD100K, etc. Among them, the TuSimple dataset has a total of 7K pictures, and the scene is the US highway data. CULane has a total of 133K pieces of data. The scene includes eight difficult-to-detect conditions such as crowded and dark nights. The data is collected in Beijing. BDD100K has a total of 120M pictures. The data set contains a variety of autonomous driving perception tasks. The data set is relatively new, and the algorithm adoption rate is not high. (Data Update)

insert image description here

papers with code https://paperswithcode.com/task/lane-detection/latest#code

insert image description here

insert image description here
insert image description here

Paper 1: " LaneATT——Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection " real-time attention-guided lane detection

Paper address:

https://arxiv.org/pdf/2010.12035.pdf

Github address:

https://github.com/lucastabelini/LaneATT

Inadequacies of existing methods: maintaining real-time efficiency

LaneATT: An anchor-based single-stage detection model for deep lanes, which is similar to other general deep object detectors, using anchors for a feature pooling step

An RGB image is received as input from a front-facing camera mounted on the vehicle:
insert image description here
the output is the lane boundary lines (lanes), and to generate these outputs, a convolutional neural network (CNN), the backbone, generates a feature map and then merges the features map to extract features for each anchor. Those features combine a set of global features produced by the attention module . By combining local and global features, the model can more easily use information from other lanes, which may be necessary in situations such as occlusions or no visible lane markings. Finally, the combined features are passed to a fully connected layer to predict the final output channel.

insert image description here

The backbone network generates feature maps from the input image. Subsequently, each anchor is projected onto the feature map. This projection is used to aggregate features concatenated with another set of features created in the attention module. Finally, using the resulting feature set, two layers, one for classification and the other for regression, make the final prediction.

Reproducibility:

Measuring model efficiency: multiply-accumulate operations (mac) and frames-per-second (FPS)

in conclusion:

Tusimple: This method achieves the second highest F1 (only 0.02% difference), while being much faster than the top F1 method (171 vs 30 FPS);

CULane: This method achieves a high level among real-time methods in terms of both speed and accuracy (compared to the state-of-the-art method, the speed of the former is about 170 FPS);

Achieved a high F1 (+93%) on the LLAMAS benchmark.

Paper 2: " Towards End-to-End Lane Detection: an Instance Segmentation Approach " is an instance segmentation for end-to-end lane detection

Paper address:

Towards End-to-End Lane Detection: an Instance Segmentation Approach

Github address:

https://github.com/MaybeShewill-CV/lanenet-lane-detection

  1. LaneNet and H-Net two network models

insert image description here

LanNet is a multi-task model that combines semantic segmentation and vector representation of pixels , and finally uses clustering to complete instance segmentation of lane lines. H-Net has a small network structure, which is responsible for predicting the transformation matrix H , and using the transformation matrix H to remodel all pixels belonging to the same lane line (using y coordinates to represent x coordinates).

LaneNet

insert image description here

Instance segmentation task → semantic segmentation (a branch of LanNet) and clustering (a branch of LanNet to extract embedding express, Mean-Shift clustering)

Embedding branch : Embedded representation of pixels

Segmentation branch : Two classifications, judging whether the pixel belongs to the lane line or the background

Training to get the embedding vector for clustering

Design Semantic Segmentation Model

1. Deal with the occlusion problem: restore (estimate) the lane lines and dotted lines occluded by the vehicle;

2. Solve the problem of unbalanced sample distribution (the pixels belonging to the lane line are far less than the pixels belonging to the background): use the bounded inverse class weight to weight the loss:
insert image description here

p is the probability of the corresponding category appearing in the overall sample, c is the hyperparameter (1.02 in the paper) " A Deep Neural Network Architecture for Real-Time Semantic Segmentation "

Pixel embedding:

In order to distinguish which lane the pixel on the lane line belongs to, embedding_branch initializes an embedding vector for each pixel, and when designing the loss, make the representation vector distance belonging to the same lane line as small as possible, and the representation vector distance belonging to different lane lines as large as possible.

The loss function in this part is composed of three parts:

insert image description here
insert image description here
insert image description here

C: the number of lane lines; Nc: the number of pixels belonging to the same lane line; μc: the mean vector of the lane line; xi is the pixel vector (pixel embedding); [x]+=max(0,x).

H-Net:

The output of LaneNet is a collection of pixels for each lane line, and a lane line needs to be regressed based on these pixel points. The traditional approach is to project the picture into the bird's-eye view, and then use a 2nd or 3rd order polynomial for fitting. In this method, the transformation matrix H is calculated only once, and the same transformation matrix is ​​used for all images, which leads to errors under ground level (mountain, hill) variations.

In order to solve this problem, the paper trains a neural network H-Net that can predict the transformation matrix H. The input of the network is a picture, and the output is the transformation matrix H:

insert image description here

Constrain the transpose matrix by setting 0, that is, the horizontal line remains horizontal under the transformation. (ie the transformation of coordinate y is not affected by coordinate x)

The transpose matrix H has only 6 parameters, so the output of H-Net is a 6-dimensional vector. H-Net is composed of 6 layers of ordinary convolutional network and a layer of fully connected network. Its network structure is shown in the figure:

insert image description here

in conclusion:

H-Net is better than using a fixed transformation matrix, using a third-order polynomial fit is better than a second-order, and using a perspective transformation is better than not using it.

Challenges faced by deep learning for lane line recognition:

(1) The slender morphological structure of the lane line requires a more powerful fusion of high-level and low-level features to simultaneously obtain the global spatial structure relationship and the positioning accuracy of the details.

(2) There are many uncertainties in the shape of lane lines, such as being blocked, worn, and discontinuous when the road changes . The network needs to have a strong inferential ability for these situations.

(3) When the vehicle deviates or changes lanes, the lane of the own vehicle will switch, and the lane line will also switch to the left/right line . Some methods of assigning fixed serial numbers to lane lines in advance will cause ambiguity in the process of changing lanes.

over

Guess you like

Origin blog.csdn.net/weixin_48936263/article/details/123902963