Lane line detection (1) - PINet paper reading

Table of contents

Summary

Introduction

method

network structure

Resizing Network

Predicting Network

Loss Function

implementation details

experiment

in conclusion


《Key Points Estimation and Point Instance Segmentation Approach for Lane Detection》

Paper: https://arxiv.org/abs/2002.06604

Code: https://github.com/koyeongmin/PINet_new

Summary

        A paper in 2020 proposed a Point Instance Network (PINet) based on key point detection and instance segmentation technology to detect lane lines. The network contains several stacked hourglass networks trained at the same time, which can be cut and used directly without retraining to reduce the amount of model computation. The network can adapt to different numbers of lane lines, achieving good accuracy and low false postitve on TuSimple and Culane datasets. The framework diagram of the network is as follows.

Introduction

        Lane line detection is a basic and important capability in an automatic driving system. The author proposes a network that can detect the key points of the lane line from the RGB input image, and then use the embedded features output by the network to classify the key points. At the same time, the network has the characteristic of cutting, that is, it can be used without training after cutting, and can cope with the situation where the computing power of the computing platform is tight.

        Many methods are based on CNN and semantic segmentation, but the data annotation is complex, and the network output size is equal to the input, so the output contains a lot of useless information and the amount of calculation is large; at the same time, the number of points in the post-processing leads to a large amount of calculation, and the overall huge amount of calculation limits the project. landing.

        The hourglass net is often used in the field of key point detection, such as pose estimation and object detection. The network uses a series of downsampling and upsampling operations to achieve information at different scale levels. The stacked hourglass network contains several hourglass modules trained with the same loss function, which makes it easy to cut to control the amount of network parameters.

        For the introduction of hourglass net, please refer to Hourglass Network hourglass network (pose estimation pose estimation)_hxxjxw's blog-CSDN blog_Hourglass network , I think it is a form of multi-scale + resnet, a single module introduces multi-scale/level features ; Multiple modules in parallel can improve performance by deepening the network

        The author believes that false detection has a greater impact on the downstream, and the false detection rate of many sota methods at that time was relatively high, so the author also used false positive as a comparative test indicator.

        The following figure shows the network structure. The author believes that there are four main innovations: 1) The network outputs key points instead of regions, and the output size is greatly reduced; 2) Using stacked hourglass modules can be directly cut to reduce network parameters It can be used directly without retraining; 3) There are no restrictions on the number and orientation of lane lines; 4) It has a lower false detection rate and good accuracy on public data sets

method

        The network has three outputs, which are confidence, offset information, and embedded features. Confidence and offset information are used for lane line key point positioning, and the loss function in YOLO will be applied to it; embedded features are used for key point classification in post-processing, and its loss function borrows from the instance segmentation method in SPGN.

network structure

Resizing Network

        The input RGB size is 512X256, which is sent to the resizing network and becomes 64x32; this part of the network model is as follows

Predicting Network

        After that is the prediction network, including 4 hourglass modules, each module includes encoding, decoding and three output branches, as shown in the figure below.

        Each color block in the above figure represents a bottle neck, and the specific composition is shown in the figure below, in which transposed convolution is used to achieve upsampling.

        The following table details the input and output size information of an hourglass module. In the output three-way branch, the number of confidence channels is 1, the number of offset channels is 2, and the embedding is 4. The increase in the number of hourglass module stacks can improve the detection effect, so it can be used as a teacher network, and the knowledge distillation method can be used to improve the network effect containing a small number of hourglass modules.

Loss Function

        The network outputs a 64x32 cell, each cell has 7 channels, including confidence, offset, and embedded features, and each has a loss function; at the same time, the distillation method is used in the training, and there is a corresponding distillation loss . The specific definition refers to the paper, which is briefly introduced as follows.

The loss func for confidence is as follows

The loss func for offset is as follows

        For the loss func of the embedding feature, the goal of setting is that the feature should be as close as possible when it belongs to the same instance, and the feature should be as far away as possible when it does not belong to the same instance.

        Introduce a distilled loss func, and try to make the output of each hourglass module as close as possible to the output of the last hourglass module, so as to achieve the effect of cutting the hourglass network and using it directly without training.

implementation details

        The input image size is 512x256, bgr format (confirm after reading the code), normalized to (0,1);

        In the data set, there are too few data points for lane markings that are close to the level. The author has done interpolation to increase the number of processing

        The author believes that the number of scene data in the data set is unevenly distributed, marking those data that perform poorly during training, and increasing the probability of selecting these data in subsequent training, similar to hard negative mining technique

experiment

        Using the TuSimple and CULane data sets, each data set corresponds to a different evaluation index; the results show that the accuracy of PINet is quite good, while having a low false positive.

        Verify that the distillation module has the ability to reduce the output difference between the first three hourglass modules and the last hourglass module.

in conclusion

        A new lane line detection network is proposed, which is realized by extracting key points and instance segmentation, and can handle lane lines in any direction without limiting the number. Using the stacked hourglass module and applying the distillation learning strategy in training, the model can be cut according to the computing power of the platform during deployment, and can be used directly without retraining the weights. It is verified by experiments that it has good accuracy and low false detection rate.

Guess you like

Origin blog.csdn.net/lwx309025167/article/details/126694916