1024, take you to build the first lane line detection network LaneNet


Insert picture description here


Resource summary:

Paper download address: https://arxiv.org/abs/1802.05591
github project address: https://github.com/MaybeShewill-CV/lanenet-lane-detection
LanNet data collection: https://pan.baidu.com/ s/17dy1oaYKj5XruxAL38ggRw Extraction code: 1024
LanNet paper translation: LaneNet of lane line detection network


1. Detailed explanation of LaneNet algorithm

1.1 Introduction to LaneNet

Traditional lane line detection methods rely on manually extracted features for identification, such as color features, structure tensors, contours, etc. These features may also be combined with Hough transform, various operators or Kalman filters. After identifying the lane lines, post-processing techniques are used to filter false detections and group them together to form the final lane. However, due to changes in road scenes, these traditional methods are prone to robustness problems!

The updated method utilizes deep learning models that are trained for pixel-level lane segmentation. However, these methods are limited to detecting a predefined fixed number of lanes, such as the current lane, and cannot cope with lane changes.

Based on this, Davy Neven et al. proposed a new lane line detection network LaneNet in 2018. LaneNet mainly made the following two contributions :

  • The problem of lane detection is reduced to an instance segmentation problem, where each lane forms its own instance, which can be trained end-to-end.
  • A new network H-Net is constructed to learn the perspective transformation parameters of a given input image. The perspective transformation can fit the lane lines on the sloped road well and overcome the problem of poor robustness.

1.2 Overall structure analysis

The author proposes a multi-branch network structure, including a binary segmentation network (lane segmentation) and an instance segmentation network (lane embedding), so as to achieve end-to-end, any number of lane line detection. Specifically, the binary segmentation network outputs all lane line pixels, and the instance segmentation network distributes the output lane line pixels to different lane line instances. The overall network structure diagram is as follows:
Insert picture description here
On the other hand, the data set is input into the H-Net network, and the perspective transformation parameter H matrix is ​​learned. For the pixels of different lane line instances, perform lane line fitting to obtain the continuous point-like lane line shown in the figure above.

1.3 LaneNet network structure

The overall network structure of LaneNet is as follows:
Insert picture description here
Binary segmentation network

One branch of Lanenet is a binary segmentation network, which separates lane line pixels from the background. Since the target category is 2 types (lane/background) and highly imbalanced, ENet is referred to, and the loss function uses the standard cross-entropy loss function .

Instance segmentation network

The branch network refers to "Semantic Instance Segmentation with a Discriminative Loss Function", uses a one-shot-based method for distance measurement learning, and integrates this method into a standard feedforward neural network, which can be used for real-time processing. The branch network outputs a lane line pixel distance after training. Based on the basic idea that the pixels belonging to the same lane are close and the pixels of different lane lines are far away, each lane line is clustered by using the clustering loss function .

Cluster loss function

The loss function is as follows:
Insert picture description here
    [x] + = max (0, x) [x]_+=max(0, x)[x]+=max(0,x)

     L t o t a l = L v a r + L d i s t L_{total}=L_{var}+L_{dist} Ltotal=Lv a r+Ldist

Among them, each parameter is expressed as follows:

  • C-indicates the number of lane line instances;
  • N c N_c Nc——The number of pixels in each lane line instance;
  • u c u_c uc——The pixel center of each lane line instance;
  • L var L_ {var} Lv a rIt is the variance loss, and its purpose is to reduce the distance within the class ;
  • L d i s t L_{dist} LdistIs the distance loss, its purpose is to increase the distance between classes (the distance between different lane lines);

Network structure diagram

LaneNet's architecture is based on the encoder-decoder network ENet, which is composed of 5 stages. The first three stages are the encoder network and two downsampling are carried out; the latter two stages are the decoder network and two upsampling is carried out.
Insert picture description here
LaneNet is modified into a dual branch network on the basis of this network . Since the encoder of ENet contains more parameters than the decoder, sharing the complete encoder completely between the two tasks will lead to unsatisfactory results. Therefore, LaneNet only shares the first two stages (1 and 2) between the two branches, leaving stage 3 of the ENet encoder and the complete ENet decoder as the backbone of each individual branch. The last layer of the segmentation branch outputs a single-channel image for binary segmentation; and the last layer of the instance segmentation branch outputs an N-channel image, where N is the instance dimension. The loss items of each branch are equally weighted and propagate back through the network.

1.4 H-Net network structure

The output of the LaneNet network is the pixel set of each lane line. The conventional processing is to convert the image into a bird's-eye view, and then use a second or third polynomial to fit the curved lane line. However, the parameters of the perspective transformation matrix currently used are usually preset and will not change. The fitting of lane lines in the face of the influence of horizontal line fluctuations (such as up and down slopes) is not accurate, and the robustness is not good. Strong. Therefore, the author proposes the H-net model to learn the parameter H of the perspective transformation matrix.

Insert picture description here
H has 6 degrees of freedom, and zero is placed to enforce constraints, that is, the horizontal line remains horizontal under the transformation.

The network architecture of H-NET is small, consisting of continuous blocks of 3x3 convolution, BN layer and Relu. Use the maximum pooling layer to reduce the dimensionality, and add 2 fully connected layers at the end. The complete network structure is shown in the figure below: the
Insert picture description here
number of nodes in the last fully connected layer is 6, which corresponds to the 6 parameters in the H matrix.

1.5 LaneNet performance advantages

Detection speed . Tested on Nvidia 1080Ti graphics card, it took 19ms to detect a 512x512 color picture, so it can process about 50 frames per second.
Insert picture description here
Detection accuracy. By using LaneNet combined with third-order polynomial fitting and the transformation matrix of H-Net, the detection accuracy reached 96.4% in the tuSimple challenge, and won the fourth place, which is only 0.5% difference from the first place. The results can be seen in the table below.
Insert picture description here

Second, take you hand in hand to realize LaneNet

2.1 Project introduction

The project has been open sourced on github, and it has won a 1.3k star. Students who want to try can clone it: https://github.com/MaybeShewill-CV/lanenet-lane-detection , if you can’t open it, you can also Download from my Baidu cloud disk: LaneNet data collection, extraction code: 1024

The code structure and functions of each part are as follows:

lanenet-lane-detection
├── config //配置文件
├── data //一些样例图片和曲线拟合参数文件
├── data_provider // 用于加载数据以及制作 tfrecords
├── lanenet_model 
│   ├── lanenet.py //网络布局 inference/compute_loss/compute_acc
│   ├── lanenet_front_end.py // backbone 布局
│   ├── lanenet_back_end.py // 网络任务和Loss计算 inference/compute_loss
│   ├── lanenet_discriminative_loss.py //discriminative_loss实现
│   ├── lanenet_postprocess.py // 后处理操作,包括聚类和曲线拟合
├── model //保存模型的目录semantic_segmentation_zoo
├── semantic_segmentation_zoo // backbone 网络定义
│   ├── __init__.py
│   ├── vgg16_based_fcn.py //VGG backbone
│   └─+ mobilenet_v2_based_fcn.py //mobilenet_v2 backbone
│   └── cnn_basenet.py // 基础 block
├── tools //训练、测试主函数
│   ├── train_lanenet.py //训练
│   ├── test_lanenet.py //测试
│   └──+ evaluate_dataset.py // 数据集评测 accuracy
│   └── evaluate_lanenet_on_tusimple.py // 数据集检测结果保存
│   └── evaluate_model_utils.py // 评测相关函数 calculate_model_precision/calculate_model_fp/calculate_model_fn
│   └── generate_tusimple_dataset.py // 原始数据转换格式
├─+ showname.py //模型变量名查看
├─+ change_name.py //模型变量名修改
├─+ freeze_graph.py//生成pb文件
├─+ convert_weights.py//对权重进行转换,为了模型的预训练
└─+ convert_pb.py //生成pb文

2.2 Environment setup

According to the open source author's description, the test environment is:

  • ubuntu 16.04
  • python3.5
  • cuda-9.0
  • cudnn-7.0
  • GTX-1070 GPU
  • tensorflow 1.12.0

The environment and configuration I use are:

  • ubuntu16.04 system
  • PyCharm 2020
  • python3.6
  • tensorflow1.13.1-gpu
  • miracles-10.0
  • cudnn 7.6.4
  • opencv4.0.0
  • RTX 2070 GPU

Friends who want to try can refer to the above two configurations, or try other versions by themselves.

2.3 Preparation

If you want to train on your own, you can download the TuSimple dataset for training. Similarly, we can also directly use the officially trained model to input pictures and see the test results. For convenience, let us directly load the trained model for local testing.

(1) Download the TuSimple data set, if you do not train, you can skip this step.

(2) Download the trained model, download link: LaneNet data collection, extraction code: 1024

After downloading, we put the model file tusimple_lanenet in the model file in the project directory, as shown in the following figure:

Insert picture description here

2.4 Model testing

After completing the environment configuration and model deployment, we can test!

(1) First test the pictures in the TusSample dataset

  • The first step is to create a Mytest folder in the data file in the original project directory, and then arbitrarily select a picture in the TusSample data set, such as 1.jpg, as shown in the following figure:

Insert picture description here

  • The second step is to use PyCharm to open the downloaded project. After configuring the environment, open the terminal, as shown in the following figure:

Insert picture description here

  • The third step is to enter the following command in the terminal to execute the program:
python tools/test_lanenet.py --weights_path model/tusimple_lanenet/tusimple_lanenet.ckpt --image_path data/Mytest/1.jpg

The final lane line detection effect is as follows:
Insert picture description here

(2) Test your own pictures

  • The first step is to select a lane line picture 2.jpg taken by yourself and put it into the newly created Mytest folder, as shown in the following figure:

Insert picture description here
The second step is to open the terminal, enter the command, and execute the program:

python tools/test_lanenet.py --weights_path model/tusimple_lanenet/tusimple_lanenet.ckpt --image_path data/Mytest/2.jpg

The detection results of pictures taken by yourself are as follows:

Insert picture description here
test analysis:

It can be seen from the figure that when detecting one's own picture, although the final detection result can perfectly coincide with the actual lane line, it extends to the air.

The main reason for this situation is that they did not make their own data sets for training, so as to get more targeted models. Since the test model I used here was trained under the TuSimple data set, we will test the pictures in TuSimple very well, such as the previous 1.jpg.

If we want to test our own pictures and get better results, then we need our own data set. The better way is:

  • First, train under the TuSimple data set, and the obtained training model is used as the pre-training model. This part of the work has already been done. You can download the pre-training model directly.
  • Then, on the basis of the pre-trained model, load the data set made by yourself, and then train until the desired effect is achieved.

Using this kind of transfer learning idea can often get twice the result with half the effort!

1024, I wish you all a happy holiday! Give me a like if you like it, your support is the biggest motivation for my creation!

Guess you like

Origin blog.csdn.net/wjinjie/article/details/108575549