Depth study of semantic resolution remote sensing image segmentation [turn]

Original link: https://www.cnblogs.com/wzp-749195/p/11114624.html

       We all know that deep learning, achieved great success in the field of computer vision, automatic interpretation of remote sensing images in terms of same brought rapid development of automatic interpretation of remote sensing images in my field, but also made some modest work, published several papers, I have been concerned about the automatic interpretation of remote sensing field,

In Beijing on a business trip this time, can finally sink in the heart, hard look at the depth of learning, currently in the field of semantic segmentation, as well as some experience in this to share with you, when is the right learning from each other. This blog is to discuss the existing state-of-art method progress in the field of remote sensing image semantic segmentation, and future direction of development!

       First, more than that, I used the most current results show stability and high precision of several semantic segmentation network about: 1.unet network; 2.Deeplab network (mobile feature extractor, resnet18 feature extractor, resnet50 feature extractor, Inceptionv3 feature extraction, etc.); 3.CEnet.

Now I am a simple explanation for these types of networks, simple as initiate, if any individual wrong place, please see the expert wing, Email: 1,044,625,113 @ qq.com, Phone: 15,211,874,660. If you need a full range of remote sensing image segmentation semantic code can also contact me.

      1.unet network

      unet network due to the shape like a u-type, so called Unet network, information about it, we can find a blog CSDN paper introduces too much I will not go into detail!

      Its shape is shown below:

FIG 1 unet divided semantic network (refer to unetCSDN blog)

      From its shape, we can see, very beautiful, this is the architecture of the original paper, we can make a lot of improvements on this basis, for example, feature extraction block, we can replace the use of residual network (resnet), this is what good is it? Mainly to deepen the network, while preventing the disappearance of the gradient,

You can learn the deeper features, help to improve accuracy. I read several versions of the code, in the feature integration layer, it is generally used in two ways, first direct sum is about coding layer and the layer decoding features are added directly, another is common concat, about two What kind of advantages and disadvantages, I understand that,

concat can incorporate more features, in fact, that white is the sum of the previous vector (vector stacking), while the effect is good, GPU's memory will certainly be expensive; adding features and for the way, intuitive performance is to save GPU memory, but then, if better than concat it? I do not have to do the experiment, we can

Running the code to try!

 

      2.Deeplab network

FIG 3 DeeplabV3plus divided semantic network (refer to the original author paper)

      In fact, it can be seen from Deeplab network, this network is simple and beautiful, not so much complex combinations and so on, the core of things is four empty blocks of convolution, convolution kernel size were 161,218, about why the author it only four parameters, the authors experimentally just been described,

Proved that with these four parameters can be the highest accuracy, resolution refers to another, different size cavities convolution kernel can feel different ranges of characteristics. The second more important place, characterized in that with four times the sample size concat, a more important, it combines the coding layer and the layer decoding

Feature, in essence, this is actually a variant of unet, but you can customize the feature extraction network, for example, here I realized mobilenet, inceptionv3, resnet18, resnet50 other four classic feature extraction network.

      About DeeplabV3 key part of the inside of the ASPP (space pyramid pooling), the core code to achieve the following:

复制代码
%% 创建空洞卷积空间金字塔网络,Deeplab的核心算法部分
function LayerGraph = ASPP_layer(LayerGraph)
% 创建ASPP层
dilate_size2 = 6;
dilate_size3 = 12;
dilate_size4 = 18;


% 尺度1空洞卷积层
convLayer_scale1 = convolution2dLayer(1,256,...  % 1*1,原文为256个卷积核
    'Padding','same',...
    'BiasL2Factor',0,...
    'Name','convLayer_scale1');

% convLayer_scale1 = groupedConvolution2dLayer(1,1,40,'Padding','same', 'Name','convLayer_scale1');

bn_scale1 = batchNormalizationLayer('Name','bn_scale1');
% relu_scale1 = clippedReluLayer(6,'Name','relu_scale1');
relu_scale1 = reluLayer('Name','relu_scale1');
scale_net1 = [convLayer_scale1;bn_scale1;relu_scale1];


% 尺度2空洞卷积层
convLayer_scale2 = convolution2dLayer(3,256,...
    'Padding','same',...
    'DilationFactor', dilate_size2,...
    'BiasL2Factor',0,...
    'Name','convLayer_scale2');

% convLayer_scale2 = groupedConvolution2dLayer(3,1,40,'Padding','same', 'DilationFactor', dilate_size2, 'Name','convLayer_scale2');

bn_scale2 = batchNormalizationLayer('Name','bn_scale2');
% relu_scale2 = clippedReluLayer(6,'Name','relu_scale2');
relu_scale2 = reluLayer('Name','relu_scale2');
scale_net2 = [convLayer_scale2;bn_scale2;relu_scale2];


% 尺度3空洞卷积层
convLayer_scale3 = convolution2dLayer(3,256,...
    'Padding','same',...
    'DilationFactor', dilate_size3,...
    'BiasL2Factor',0,...
    'Name','convLayer_scale3');

% convLayer_scale3 = groupedConvolution2dLayer(3,1,40,'Padding','same', 'DilationFactor', dilate_size3, 'Name','convLayer_scale3');

bn_scale3 = batchNormalizationLayer('Name','bn_scale3');
% relu_scale3 = clippedReluLayer(6,'Name','relu_scale3');
relu_scale3 = reluLayer('Name','relu_scale3');
scale_net3 = [convLayer_scale3;bn_scale3;relu_scale3];


% 尺度4空洞卷积层
convLayer_scale4 = convolution2dLayer(3,256,...
    'Padding','same',...
    'DilationFactor', dilate_size4,...
    'BiasL2Factor',0,...
    'Name','convLayer_scale4');

% convLayer_scale4 = groupedConvolution2dLayer(3,1,40,'Padding','same', 'DilationFactor', dilate_size4, 'Name','convLayer_scale4');

bn_scale4 = batchNormalizationLayer('Name','bn_scale4');
% relu_scale4 = clippedReluLayer(6,'Name','relu_scale4');
relu_scale4 = reluLayer('Name','relu_scale4');
scale_net4 = [convLayer_scale4; bn_scale4; relu_scale4];


% 组合原来的layer
LayerGraph = addLayers(LayerGraph, scale_net1);
LayerGraph = addLayers(LayerGraph, scale_net2 );
LayerGraph = addLayers(LayerGraph, scale_net3);
LayerGraph = addLayers(LayerGraph, scale_net4);

LayerGraph = connectLayers(LayerGraph, 'mixed10', 'convLayer_scale1');
LayerGraph = connectLayers(LayerGraph, 'mixed10', 'convLayer_scale2');
LayerGraph = connectLayers(LayerGraph, 'mixed10', 'convLayer_scale3');
LayerGraph = connectLayers(LayerGraph, 'mixed10', 'convLayer_scale4');

catFeature4 = depthConcatenationLayer(4,'Name',"dec_cat_aspp");  % 融合多特征
LayerGraph = addLayers(LayerGraph, catFeature4);
LayerGraph = connectLayers(LayerGraph, 'relu_scale1', 'dec_cat_aspp/in1');
LayerGraph = connectLayers(LayerGraph, 'relu_scale2', 'dec_cat_aspp/in2');
LayerGraph = connectLayers(LayerGraph, 'relu_scale3', 'dec_cat_aspp/in3');
LayerGraph = connectLayers(LayerGraph, 'relu_scale4', 'dec_cat_aspp/in4');


% 卷积层降低参数个数
convLayer_input = convolution2dLayer(1,256,...  % 1*1卷积就是为了降低参数个数
    'Stride',[1 1],...
    'Padding',1,...
    'BiasL2Factor',0,...
    'Name','Conv_block16');
bn_layer1 = batchNormalizationLayer('Name','bn_block16');
% relu_layer1 = clippedReluLayer(6,'Name','relu_block16');
relu_layer1 = reluLayer('Name','relu_block16');

con_net = [convLayer_input; bn_layer1; relu_layer1];

LayerGraph = addLayers(LayerGraph, con_net);
LayerGraph = connectLayers(LayerGraph, 'dec_cat_aspp', 'Conv_block16');


% 向上采样四倍
deconvLayer = transposedConv2dLayer(8,256,...   % 8*8
    'Stride',[4 4],... % 四倍大小
    'Cropping','same',...
    'BiasL2Factor',0,...
    'Name','deconv_1');

decon_net = [deconvLayer;
    batchNormalizationLayer('Name','de_batch_1');
    reluLayer('Name','de_relu_1')];
%              clippedReluLayer(6,'Name','de_relu_1')];

LayerGraph = addLayers(LayerGraph, decon_net);
LayerGraph = connectLayers(LayerGraph, 'relu_block16', 'deconv_1');


end
复制代码

      这一段ASPP代码是我根据作者论文的原版实现,同时也参考了pytorch、keras、caffee等不同框架的实现代码,大家可以直接使用!

 

      3.CEnet网络

图2 CEet语义分割网络(参考原作者论文)

      CEnet这个网络主要是用到医学图像分割里面,发表在IEEE 的医学权威期刊,这个网络我第一眼就感觉特别熟悉,仔细一看,这不就是PSPnet的变种吗?后面组合了多个不同的最大池化层特征,前面组合了Deeplab里面的多尺度空洞卷积,

我觉得这里面比较有意思的地方在于,作者的空洞卷积核大小,由于血管比较小,因此作者的空洞卷积核并不大,最大只有5,这跟Deeplab的参数有较大的不同,作者这种设计网络的方式值得我们去学习,例如,我们只需要提取遥感影像上的

道路网络,我们是否真的需要那么的空洞卷积核呢??不需要!因此我们应当针对遥感影像地物的特征,设计不同的网络参数,这样才能取得一个比较好的精度!(纯属个人思想,如有不当之处,请高手指正!phone:15211874660,Email:1044625113)

      关于CEnet的全部代码实现,参见我的github网站(https://github.com/wzp8023391/CEnet,如果觉得好,请大家手动点个星星。。。)

 

      4.其他网络

      其他的语义分割网络,如PSPnet等网络,我这里不再多说,大家可以去看论文。回过头来看,大家有没有发现一个问题?就是目前所有的语义分割网络都是人工设计的!关于它表现的怎么样,谁知道呢,跑跑实验,行就行,不行就拉倒,

结果就出现了大量的“水”论文,(原谅我用了这个词,毕竟当年为了毕业,我也干了这种事),我举一个例子,CEnet里面的膨胀卷积+最大池化,为什么要设这个参数,为什么要这么干,作者其实自己并没有完全讲清楚,当然用实验证明也是

可以的,但是我们更讲究理论,这也是深度学习令人诟病的一个重点地方。手工设计的网络跟当年手工设计特征,何其相似!当然了大牛永远是大牛,LI feifei老师提出的auto Deeplearning我个人其实最看好的,下面就是一个自动化语义分割网络的

示例图,大家可以看看(CVPR2019 oral),这里面关键的地方就是自动寻找最优的网络组合,从而得到最优的语义分割网络,这个就非常有意思,这是以后语义分割一个指向灯!

 

图3 AutoDeeplab语义分割网络(参考原作者论文)

 

      5.实验总结

       我们以开源的全地物分类为例,对这几种经典的网络进行对比说明:

图4 原始真彩色高分辨率影像 

图5 使用Inceptionv3作为特征提取的DeepLabv3+语义分割结果

图6 使用mobilenetv2作为特征提取的DeepLabv3+语义分割结果

       从上面三个结果来看,Inceptionv3作为特征提取器要好于mobilenetv2网络,分割效率方面,mobilenet是Inception的三倍左右,效率还是非常高的。当然了,对于精度与效率是看大家的各自需求了!

      

      先写到这里,有空再持续更新,qq:1044625113,加qq时,请备注(语义分割交流)!

Guess you like

Origin www.cnblogs.com/xiexiaokui/p/12151688.html