论文解读:D-LinkNet :LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satelli

Reference links:

Beijing University of Posts and Telecommunications CVPR 2018 DeepGlobe win the game, they are doing this satellite image recognition

论文链接:D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction

Github address Python2.7 pytorch0.2.0


D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction

Abstract: The semantic segmentation neural networks, LinkNet architecture, empty convolution (convolution expansion)

1.Introduction:

Most of the road image extraction method can be divided into three categories: 1 pixel level tag generation road. 2. skeleton detection roads. 3. The first two methods in combination.

We task as a way to extract a pixel level tag binary semantic segmentation task, to produce road.

a binary semantic segmentation task to generate pixel-level labeling of roads

Remote Sensing Image Segmentation challenging: 1. High resolution, requires neural network needs to have a lot of receptive field can contain the whole picture. 2. The road satellite imagery generally elongated, complex, covering a small part of the entire image. 3. natural way communication span. 

D_link_net Profile

With the use of D-LinkNet Linknet precoder as its backbone, and having an additional layer extended convolution (Dilated convolution) at the central portion. Linknet semantic segment is an effective neural network, which has hop connections, and the residual block coder - decoder architecture advantage. High resolution and running quickly.

Without reducing the resolution of the features of FIG extended convolution method is effective to adjust the characteristic point field (receptive fi elds) feelings.

 Transfer learning transfer learning is an effective way to improve network performance data in a limited amount of cases. D_linknet the encoder is pre-trained with Imagenet ResNet34 structure.

2. Method

2.1. Network Architecture

       Game, the size of the mask and the image provided is 1024 × 1024, D-LinkNet is designed to receive a 1024 × 1024 image size as an input, and stored detailed spatial information. D-LinkNet divided into three parts A, B, C of three parts, namely encoder, center part and decoder.

  Linknet only A, C institutions, D_linknet adds additional B (that is, the central part), the scope for expanding access, maintain detailed spatial information.

Challenges remote sensing images of the road previously discussed taking into account the split, increasing the receptive field of the feature point part of the central part of the network Center, and retain detailed information is very important. Use cell layer may multiply the receptive field of the feature points, but may decrease the resolution of feature maps, and loss of spatial information. Thus convolution void layer is preferably a layer instead of selecting the pool.

Dilated convolution layer can be desirable alternative of pooling layer.

D-LinkNet uses several dilated convolution layers with skip connections in the center part.

 Center part of the structure of an expanded view:

 The central portion of the network may be expanded as shown above, the parallel configuration shown in the figures can be effectively fused multi-scale features. Dilated-convolution can effectively expand the receptive field, corresponding to the receptive field from top to bottom are 31 , 15 , 7 , 3 , 1 , and finally the results of each branch are added to produce fusion characteristics.  Dilated convolution of two types: a cascade mode and a parallel mode (cascade mode and parallel mode). the center part comprises a hollow convolution cascade mode and the parallel mode, and a different receptive field for each path, so that the network may be combined in different scales.

Among them, 1,2,4,8 represents dilated rate. The next brief empty convolution, how to extend the receptive field.

Dilated convolution: empty convolution (swelling), injection holes in the standard convolution map in order to increase the receptive field. dilation rate refers to the number of kernel interval (eg normal convolution is dilatation rate 1).

pooling reduce the image receptive field size increases, the other upsampling enlarged image size. Decreases then increases the size of the previous process, there must be some information is lost. CONV Dilated , not by pooling can also have a larger receptive field to see more information.

dilated benefits are not pooling the case of loss of information, increase the receptive field, so that each convolution output contains a wide range of information.

Dilated convolution reference links

 D_Linknet decoder part is transposed convolution upsampled layer, the feature map resolution recovery from 32 × 32 to 1024 × 1024.

3. Experimental Section

 In DeepGlobe Road Extraction Challenge, a PyTorch as deep learning framework. All four models are NVIDIA GTX1080 gpu on training.

3.1 dataset

DeepGlobe Road Extraction dataset includes 6226 training images, 1243 and 1101 image verification test images with a resolution of 1024x1024. The data set represents a binary segmentation, where the road is marked as foreground, other objects marked as background.

3.2. Implementation details

数据增强: data augmentation in an ambitious way, including horizontal flip, vertical flip, diagonal flip, ambitious color jittering, image shifting, scaling.

Training model, Loss cross entropy and Dice coefficient loss with binary.

we used BCE (binary cross entropy) + dice coefficient loss as loss function and chose Adam as our optimizer.

3.3 results

Deep U-net used as a baseline model, also tested pre-trained LinkNet34, coupled with our own D-Linknet.

U-net has the background recognized as wrong road, LinkNet34 shortcomings on the road continuity. D-LinkNet avoid these two errors.

4 Conclusion

D-LinkNet can handle a certain extent narrow road characteristics, connectivity, and large span of complexity.

 D-LinkNet can handle roads’ properties such as narrowness, connectivity, complexity and long span to some extent. 

 

 

Published 10 original articles · won praise 10 · views 7508

Guess you like

Origin blog.csdn.net/qq_41647438/article/details/105229562