"Towards End-to-End Lane Detection: an Instance Segmentation Approach" paper reading

Summary

        Automatic lane keeping is an important assisted driving function, which relies on the lane detection algorithm. Traditional lane line detection algorithms rely on artificially designed features and rules, as well as post-processing methods, so they are computationally intensive and cannot adapt to different scenarios. Recently, people have begun to use deep learning to achieve pixel-level lane segmentation. Due to the huge receptive field, even lane lines without significant features can be detected. However, these methods can only detect a predefined fixed number of lane lines, such as the ego-lanes method; and cannot handle changes in lane lines. In this paper, the author proposes an end-to-end approach that treats lane detection as an instance segmentation problem—each lane is an instance. At the same time, an H-Net network is proposed, which can learn an H matrix according to the image content, and replace the fixed transformation matrix for image transformation, so as to cope with the change of the camera posture when the vehicle is moving. The network runs at 50fps and can handle any number of lane lines and camera pose changes. The author did experiments on the TuSimple dataset and thinks the results are not bad.

Introduction

        Using cameras to detect lane lines is a fundamental and important function in autonomous driving. Traditional lane line detection methods rely on artificially designed features and rules, such as color-based features, combined with Hough transform and particle filter; and then use post-processing methods to filter false detections to obtain the final result. The disadvantage of traditional methods is that it is difficult to adapt to different road conditions, resulting in poor robustness.

        Recently, people have begun to use deep learning to achieve pixel-level lane line segmentation; but these methods can only detect a predefined fixed number of lane lines, and cannot deal with changes in lane lines caused by camera pose changes. Based on this, the author proposes to treat lane line detection as an instance segmentation problem, and each lane line is used as an instance; a multi-task network is designed, including a lane line segmentation branch and a lane line feature branch. The lane segmentation branch will predict whether a pixel belongs to a lane, and the feature score is used to cluster the pixels belonging to the lane into different instances, so that any number of lanes can be processed.

        After that, it is necessary to parametrically represent each lane line instance, and a curve fitting algorithm will be used here. Generally, the image is transformed into a bird's-eye view using a transformation matrix, and then the lane line is fitted to improve the fitting quality. The transformation matrix is ​​generally fixed, but the transformation matrix will change due to bumps and other reasons when the actual vehicle is moving. If the fixed transformation matrix is ​​still used at this time, a wrong bird's-eye view will be obtained and the lane line fitting result will be affected. Aiming at this problem, the author trained a network that inputs the image content and outputs the corresponding transformation matrix. The overall framework is shown in the figure below.

method

        LaneNet outputs different lane line instances, and each instance has a corresponding set of pixels. It is necessary to process the pixel set to fit the lane line. Here, first use the H-Net estimation to obtain a reasonable transformation matrix and apply it to the input image to obtain a bird's-eye view, and then perform the lane line fitting operation in the bird's-eye view.

LaneNet

        LaneNet is divided into two parts, one part is used for lane line segmentation, and the other part is used for clustering of lane line instances; in order to improve speed and accuracy, these two parts are put into a multi-task network, as shown in the figure below.

        The binary segmentation branch will output a binary map, marking whether a pixel belongs to a lane or not. Here, the author marks the lane lines on the ground that are occluded by other objects or do not exist but "exist" in the human senses when generating the real value, so that the trained model also has a similar "prediction" ability. Since the proportion of the lane line area in the image is too small, the samples in the training are unbalanced. The author uses bounded inverse class weighting to calculate the cross loss entropy.

        The instance segmentation branch will output the feature value of each pixel, which is used for the clustering of lane lines. The loss function is designed in the training so that the distance between the eigenvalues ​​of pixels belonging to the same lane line is relatively small, and the distance between the eigenvalues ​​of pixels belonging to different lane lines is relatively large. This design is more classic and will be seen in subsequent lane line detection papers.

        The network model refers to ENet, and the 1/2 layer in the encoding stage is shared, and the third layer and subsequent decoding are divided into two branches for separate training; the output width and height of the binary segmentation branch are the same as the input, and the number of channels is 1; Example The output width and height of the split branch are the same as the input, and the number of channels is N; the weights of the losses of the two branches are equal.

Curve Fitting Based on H-NET

        Generally, the curve is not fitted in the original image, because the perspective effect of the image requires a higher-order polynomial for fitting; the detected line segment can be transformed to a bird's-eye view, and a second-order/third-order polynomial is used for fitting.

        In order to reduce the problem that the actual transformation matrix will change due to bumps and other problems when the vehicle is moving, the author proposes an H-Net network. After inputting image data, the network outputs a 6-degree-of-freedom H matrix. The H degree of freedom here is 6. The author is to ensure that the horizontal line in the original image remains horizontal after being transformed into the bird's-eye view.

 

        The H-Net network structure is shown below.

        Use the above H to act on the original image to obtain the coordinates in the bird's-eye view, and then use the least square method to fit the n-order curve. Here, when training H-Net, the loss is set to be smaller when the transformed point is better fitted to the n-order curve.

Guess you like

Origin blog.csdn.net/lwx309025167/article/details/127284049