轻量化网络：Xception

Xception: Deep Learning with Depthwise Separable Convolutions 是2017年 google的文章

Xception不是模型压缩技术，而是 “design strategies for CNN architectures with few parameters”

Xception 是对Inception v3的改进，是一种 Extreme Inception，因而得名 Xception，其主要是借鉴（非采用）depthwise separable convolution来替换原来Inception v3中的卷积操作。

（为什么是Extreme呢？因为Xception做了一个加强的假设，这个假设就是：
we make the following hypothesis: that the mapping of cross-channels correlations and spatial correlations in the feature maps of convolutional neural networks can be entirely decoupled）

创新点：
1. 借鉴（非采用）depthwise separable convolution 改进Inception V3

（PS：就是改变了卷积方式，MobileNet的创新点也是这个）

既然是改进了Inception v3，那就得提一提关于inception的一下假设（思想）了。
“the fundamental hypothesis behind Inception is that cross-channel correlations and spatial correlations are sufficiently decoupled that it is preferable not to map them jointly”
简单理解就是说，卷积的时候要将通道的卷积与空间的卷积进行分离，这样会比较好。（没有理论证明，只有实验证明，就当它是定理，接受就好了，现在大多数神经网络的论文都这样(笑哭脸)）

下图1 是Inception module，图2是作者简化了的 inception module（就是只保留1*1的那条“路”，如果带着avg pool，后面怎么进一步假设嘛~~~)
假设出一个简化版inception module之后，再进一步假设，把第一部分的3个1*1卷积核统一起来，变成一个1*1的，后面的3个3*3的分别“处理”几个通道；如图3所示；
最后Xception登场，提出“extreme” version of an Inception module，先用1*1卷积核对各通道之间（cross-channel）进行卷积，如图4所示。作者说了，这种卷积方式和depth-wise convolution 几乎一样。Depth-wise convolution 较早用于网络设计是来自：Rigid-Motion Scatteringfor Image Classification，但是具体是哪一年提出，不得而知；至少2012年就有相关研究，再比如说AlexNet，由于内存原因，AlexNet分成两组卷积；想深入了解Depth-wise convolution的可以查阅本论文2.Prior work，里面有详细介绍，和论文引用。

这里写图片描述

前面说了，Xception是借鉴Rigid-Motion Scatteringfor Image Classification 的Depth-wise convolution，是因为Xception与原版的Depth-wise convolution有两个不同之处
第一个：原版Depth-wise convolution，先逐通道卷积(逐通道卷积可参考文章：MobileNet)，再1*1卷积;而Xception是反过来，先1*1卷积，再逐通道卷积；
第二个：原版Depth-wise convolution的两个卷积之间是不带激活函数的，而Xception在经过1*1卷积之后会带上一个Relu的非线性激活函数；

Xception 结构如下图所示，共计36层分为Entry flow; Middle flow; Exit flow;
Entry flow 包含 8个conv；Middle flow 包含 3*8 =24个conv；Exit flow包含4个conv，所以Xception共计36层
这里写图片描述

实验：
在Imagenet上对比了 VGG-16，ResNet-152，Inception V3, Xception，结果如下图
这里写图片描述

胡思乱想：
Xception的idea主要来自depth-wise convolution这个操作，本质是假设并承认：the fundamental hypothesis behind Inception is that cross-channel correlations and spatial correlations are sufficiently decoupled that it is preferable not to map them jointly”卷积的时候要将通道的卷积与空间的卷积进行分离，这样会比较好。

不仅是Xception，MobileNet也仅仅是借鉴了depthwise convolution的思想来设计网络，并且主要讨论的是图像分类任务；

既然卷积神经网络在图像别的领域大有作为，是否可以抓着depthwise convolution去“修改”例如检测、跟踪、超分辨率等等一些列任务的对应卷积网络，然后。。。你懂的

轻量化网络：Xception

猜你喜欢