Depth study notes (xiv) lane detection SCNN

论文:Spatial As Deep: Spatial CNN for Traffic Scene Understanding

Code: https: //github.com/XingangPan/SCNN

reference:

Data set: Culane

Overview

General CNNs is typically accomplished by stacking layers convolution. However, this is not effective access to spatial relationship of the rows and columns of pixels seized. This is similar to the spatial relationship for this lane detection has "a shape prior strong but consistent appearance if the" semantic object is very important.

In this paper, the authors propose a network architecture called SCNN (Spatial CNN), it will be a conventional layer-by-layer convolutions replaced to slice-byslice convolutions, which makes it possible interbank and cross-listed information transfer . Such SCNN especially for continuous large-scale structure or shape of a target, such target spatial relationship but less strong clues appearance.

OF lane detection on the data set and the data set divided instances experiment. The experimental results show that, SCNN can significantly improve system performance. This paper also won the first place TuSimple Benchmark Lane Detection Challenge is, with an accuracy of 96.53%.

 

As shown, the lane line is a long continuous shape and might be occluded target, humans can easily be filled by the context information is partially occluded, but generally is not CNNs.

In a layer-by-layer CNN, a convolution layer receives input from the former layer, applies convolution operation and nonlinear activation, and sends result to the next layer. This process is done sequentially. Similarly, SCNN views rows or columns of feature maps as layers and applies convolution, nonlinear activation, and sum operations sequentially, which forms a deep neural network. In this way information could be propagated between neurons in the same layer.

CULane 

作者还提供了 CULane 数据集,在这之前的一些数据集,例如 Tusimple(6408 imgs),场景简单且对于磨损的车道线没有进行标注,而这种车道线人肉眼是很容易被推断补齐的。

To collect data, we mounted cameras on six different vehicles driven by different drivers and recorded videos during driving in Beijing on different days. More than 55 hours of videos were collected and 133,235 frames were extracted, which is more than 20 times of TuSimple Dataset. We have divided the dataset into 88880 for training set, 9675 for validation set, and 34680 for test set. These images were undistorted using tools in (Scaramuzza, Martinelli, and Siegwart 2006) and have a resolution of 1640 x 590

 

CULane 数据集对于遮挡和磨损的部分进行了估计,如上图2、4;为了让算法能够识别出栅栏,只对栅栏一侧进行标注,如上图1。CULane 只标注了最需要关注的 4 条车道线。

SCNN

以 SCNN_D 为例,假定经过一系列卷积操作后, feature map size 为 $C x H x W$, 从 $H$ 维度出发,将feature map 切成 $H$ slices,然后第一个 slice 经过一个 $1 x w x C$ 的卷积、非线性操作后加到下一个 slice 上(一般这个 $w$ 是个较大尺度的卷积核,作者实验证明 $w=9$ 结果最理想)。这个操作继续往下进行,直到最后一个 slice 也被更新了。图中的标记符 D, U, R, L 分别表示 SCNN that is downward, upward, rightward, and leftward 四个方向进行上述操作。

作者提到,同常规卷积相比 SCNN 有三个优点:

  • Computational efficiency. 这是相比于 MRF/CRF来说的。

  • Message as residual.
  • Flexibility. Usually, the top hidden layer contains information that is both rich and of high semantics,thus is an ideal place to apply SCNN

训练测试相关

In both tasks, we train the models using standard SGD with batch size 12, base learning rate 0.01, momentum 0.9, and weight decay 0.0001. The learning rate policy is ”poly” with power and iteration number set to 0.9 and 60K respectively.

As shown in Fig.5 (b), for each lane marking whose existence value is larger than 0.5, we search the corresponding probmap every 20 rows for the position with the highest response. These positions are then connected by cubic splines, which are the final predictions.

During training, the line width of the targets is set to 16 pixels, and the input and target images are rescaled to 800x288. Considering the imbalanced label between background and lane markings, the loss of background is multiplied by 0.4.

评价指标

作者利用 intersection over union (IoU) 和阈值(0.3/0.5)来判断一条车道线是否被检测出来。

\begin{equation}
\label{a}
\begin{split}
& Precision = \frac{TP}{TP+FP} \\
& Recall= \frac{TP}{TP+FN}\\
& F-measure = (1+\beta^2) \frac{Precision*Recall}{\beta^2*Precision + Recall}\\
& F-measure = \frac{2} {\frac{1}{Precision} + \frac{1}{Recall}}, if \bea =1
\end{split}
\end{equation}

Ablation Study

作者从 6 个角度证明了 SCNN 有多牛逼:

  1. Effectiveness of multidirectional SCNN:同简单添加常规卷积层相比 SCNN 更牛逼

  2. Effects of kernel width w:w 大小等于9 时最理想

  3. Spatial CNN on different positions:SCNN 结构作用在最后一个隐含层上比作用在输出层上更理想

  4. Effectiveness of sequential propagation:顺序执行(一个切片更新后再去更新下一个切片)比平行更新效果好很多,这说明 a pixel does not merely affected by nearby pixels, but do receive information from further positions.
  5. Comparison with state-of-the-art methods: 这里的 baseline 就是上面网络(a) DeepLab.


  6. Computational efficiency over other methods:这个,看下面表情。。。

作者还从 Semantic Segmentation on Cityscapes 角度做了实验,这里就不累述了

SCNN for Tusimple

。。。

 

Guess you like

Origin www.cnblogs.com/xuanyuyt/p/12022040.html