The semantic dividing lane detection Lanenet (tensorflow Edition)

flax 

  End of a network, comprising two Lanenet + HNet network model, wherein, Lanenet complete instance of the lane line segmentation, hNET is a small network structures for the prediction transform matrix H, using a transformation matrix H all belong to a lane line pixels remodel

 

 

 

 

The semantics of the pixel and the divided multi-task model combining vector representation, using clustering recently completed instances of the lane dividing line.

 

 

       Examples of dismantling is divided into semantic segmentation and clustering tasks, responsible for dividing the input image branch semantic segmentation (for binary pixels, determining the pixels belonging to the lane lines or background), the branch of the pixels embedded embedding said can divided lane line obtained after separation into different lanes example, for clustering training vectors obtained. Finally, the results using the two branches MeanShift clustering algorithm, examples of the results obtained segmentation.

      When the lane to give an example, a description of the parameter needs to be done for each line, this parameter curve fitting algorithm as described, conventional fitting algorithm has the cubic polynomial, spline, clothoid curve. In order to improve the quality of the fit, usually after the bird's-eye view image to make the fitting, and then an inverse transform to the original image.

       1. Semantic Segmentation

        Training outputs the resulting binarized a segmentation map, white represents the lane line, a black representative of the background.

       When designing the main model, the main consideration the following points:

       1) Label Label When constructing, for occlusion handling, may belong to the pixel line corresponding to each lane are connected to a line. Even if the benefit is that the lane line is blocked, the network still predicted lane position.

       2) Loss using cross-entropy, in order to solve the problem of uneven distribution of the sample (pixel belongs to a lane line is far less than the pixel belonging to the background), using Bounded Inverse class weight for weighted Loss:

         Wclass=1ln(c + p(class))

         Where, p is the probability of the corresponding categories appear in the overall sample, c is super parameters.

View Code

       2. Examples of division
       when the division branch lane to get recognition, in order to separate lane pixels (pixel is to know what this return, which owned the lane), we trained a lane instance embedding branch network, we are based on one-shot way to do measure the distance learning, the method is easily integrated into standard feed-forward neural networks can be used for real-time processing. Loss function using clustering, the training output branch instance embedding a lane line from the pixel, pixels from the same lane near the home, whereas far, this strategy is based, may be obtained by the clustering of each lane line.

       Roughly works as follows:

       There are two strands in force to do battle, a surge of variance is attributable mainly to mean that each embedding direction of a lane line pull (Pull activate the premise of this action is embedding too far, farther than the threshold δv began to pull) , another unit is the distance term, is to make two types of lane line as far as possible (activation push premise of this action is two lanes lines from the cluster center too close to you, close to the threshold value δd to push). The total loss function L final formula is as follows:

 

View Code

 

Clustering

Clustering can be seen after the treatment, the previous step Embedding_branch already provides a good clustering feature vectors, feature vectors using any clustering algorithm may be used to complete the segmented target instance.

终止聚类的条件是:车道聚类(即各车道线间间距)中心间距离>δd,每个类(每条车道线)中包含的车道线像素离该车道线距离<δv  设置 δd > 6δv为迭代终止条件,使上述的loss做迭代。

 网络架构

  基于ENet的encoder-deconder模型,ENet由5个stage组成,其中stage2和stage3基本相同,stage1,2,3属于encoder,stage4,5属于decoder。 

Lanenet网络共享前面两个stage1,2,并将stage3和后面的decoder层作为各自的分支进行训练。其中语义分割分支输出单通道的图像W*H*2。embedding分支输出N通道的图像W*H*N。两个分支的loss权重相同。

 

 

 

 用H-NET做车道线曲线拟合
lanenet网络输出的是每条车道线的像素集合。常规处理是将图像转为鸟瞰图,这么做的目的就是为了做曲线拟合时弯曲的车道能用2次或3次多项式拟合(拟合起来简单些)。但变换矩阵H只被计算一次,所有图片使用相同变换矩阵,导致地平面(山地,丘陵)变化下的误差。

为了解决这个问题,需要训练一个可以预测变换矩阵H的神经网络HNet,网络输入是图片,输出是变换矩阵H:

 

通过置0对转置矩阵进行约束,即水平线在变换下保持水平。(坐标y的变换不受坐标x的影响)

意思就是通过H-Net网络学习得到的变换矩阵参数适用性更好,转置矩阵H只有6个参数,HNet输出一个6维向量,HNet由6层普通卷积网络和一层全连接层构成。

曲线拟合 

通过坐标y去重新预测坐标x的过程:

。对于包含N个像素点的车道线,每个像素点pi=[xi,yi,1]TPpi=[xi,yi,1]T∈P, 首先使用 H-Net 的预测输出 H 对其进行坐标变换:

 P=HPP′=HP

  • 随后使用 最小二乘法对 3d 多项式的参数进行拟合:

                                    w=(YTY)1YTxw=(YTY)−1YTx′

  •  根据拟合出的参数 w=[α,β,γ]Tw=[α,β,γ]T 预测出 xixi′∗

                                  xi=αy2+βy+γxi′∗=αy′2+βy′+γ

  •  最后将 xixi′∗ 投影回去:

                                     pi=H1pi

拟合函数

Loss=1/NNi=1(xixi)2

 

模型网络设置(帧率达50fps)

LaneNet

Dataset : Tusimple embedding维度是4(输出4通道),δv=0.5,δd=3,输入图像resize到512x256,采用Adam优化器,batchsize=8,学习率=5e-4;

H-Net

Dataset : Tusimple,3阶多项式,输入图像128x64,Adam优化器,batchsize=10,学习率=5e-5;

评估标准:

accuracy=2/(1/recall+1/precision)

recall=|P1G1|/|G1|     # 统计GT中车道线分对的概率

precision=|P0G0|/|G0| # 统计GT中背景分对的概率

设定 G1 代表 GT二值图里像素值为 1 部分的集合,P1 表示检测结果为 1 的集合。

View Code

fp=(|P1||P1G1|)/|P1|   # 统计Pre中的车道线误检率

View Code

fn=(|G1||P1G1|)/|G1| # 统计GT车道线中漏检率

View Code

相关试验:

 1.替换Backbone为moblilenet_v2

 2.调整embedding维度

 3.预处理方式调整

 4.上采样方式替换

 5.学习率衰减方式

 6.反卷积卷积核尺寸调整

代码结构:

lanenet_detection

    ├── config //配置文件
    ├── data //一些样例图片和曲线拟合参数文件
    ├── data_provider // 用于加载数据以及制作 tfrecords
    ├── lanenet_model 
    │   ├── lanenet.py //网络布局 inference/compute_loss/compute_acc
    │   ├── lanenet_front_end.py // backbone 布局
    │   ├── lanenet_back_end.py // 网络任务和Loss计算 inference/compute_loss
    │   ├── lanenet_discriminative_loss.py //discriminative_loss实现
    │   ├── lanenet_postprocess.py // 后处理操作,包括聚类和曲线拟合
    ├── model //保存模型的目录semantic_segmentation_zoo
    ├── semantic_segmentation_zoo // backbone 网络定义
    │   ├── __init__.py
    │   ├── vgg16_based_fcn.py //VGG backbone
    │   └─+ mobilenet_v2_based_fcn.py //mobilenet_v2 backbone
    │   └── cnn_basenet.py // 基础 block
    ├── tools //训练、测试主函数
    │   ├── train_lanenet.py //训练
    │   ├── test_lanenet.py //测试
    │   └──+ evaluate_dataset.py // 数据集评测 accuracy
    │   └── evaluate_lanenet_on_tusimple.py // 数据集检测结果保存
    │   └── evaluate_model_utils.py // 评测相关函数 calculate_model_precision/calculate_model_fp/calculate_model_fn
    │   └── generate_tusimple_dataset.py // 原始数据转换格式
    ├─+ showname.py //模型变量名查看
    ├─+ change_name.py //模型变量名修改
    ├─+ freeze_graph.py//生成pb文件
    ├─+ convert_weights.py//对权重进行转换,为了模型的预训练
    └─+ convert_pb.py //生成pb文

 

Guess you like

Origin www.cnblogs.com/jimchen1218/p/11811958.html