Mobilenets v2

论文：

《Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation》

和MobileNet V1相比，MobileNet V2主要的改进有两点：

1、Linear Bottlenecks。也就是去掉了小维度输出层后面的非线性激活层，目的是为了保证模型的表达能力。

2、Inverted Residual block。该结构和传统residual block中维度先缩减再扩增正好相反，因此shotcut也就变成了连接的是维度缩减后的feature map。

Linear Bottlenecks

在MobileNet V1中除了引入depthwise separable convolution代替传统的卷积，还做了一个实验是用width multiplier参数来做模型通道的缩减，相当于给模型“瘦身”，这样特征信息就能更集中在缩减后的通道中，但是如果此时加上一个非线性激活层，比如ReLU，就会有较大的信息丢失。

因此为了减少信息丢失，就有了文中的linear bottleneck，意思就是bottleneck的输出不接非线性激活层，所以是linear，而什么是bottleneck的输出？就是维度缩减那一层的输出。原文是这么说的：assuming the manifold of interest is low-dimensional we can capture this by inserting linear bottleneck layers into the convolutional blocks. Experimental evidence suggests that using linear layers is crucial as it prevents nonlinearities from destroying too much information.

To summarize, we have highlighted two properties that are indicative of the requirement that the manifold of interest should lie in a low-dimensional subspace of the higher-dimensional activation space:
1. If the manifold of interest remains non-zero volume after ReLU transformation, it corresponds to
a linear transformation.
2. ReLU is capable of preserving complete information about the input manifold, but only if the input manifold lies in a low-dimensional subspace of the input space.
第一点的意思是：对于ReLU层输出的非零值而言，ReLU层起到的就是一个线性变换的作用，这个从ReLU的曲线就能看出来。
第二点的意思是：ReLU层可以保留input manifold的信息，但是只有当input manifold是输入空间的一个低维子空间时才有效。

因此在MobileNet V2中，执行降维的卷积层后面不会接类似ReLU这样的非线性激活层，也就是linear bottleneck的含义。

Inverted residuals (残差反转)

残差和Inverted residuals区别：在residual block中是先降维再升维，在inverted residual block中是先升维再降维

MobileNetV1、MobileNetV2 和 ResNet 微结构对比：

3ec872278bbff15e9a2e80ebdf7a519aaf3cf949

MobileNets的核心分离卷积（Separable Convolution）可以在牺牲较小性能的前提下有效的减少参数量。但是它也存在局限，表现为Depthwise卷积的Kernel数取决于上一层的Depth，无法随意改变。MobileNetV2克服了这一局限。

Separable Convolution将传统的卷积运算用两步卷积运算代替：Depthwise convolution与Pointwise convolution，如下图所示

mobilenet-v1-conv

从图中可以明确的看出，由于输入图片为RGB三通道，Depthwise conv的Filter数量只能为3。Depthwise convolution的Filter数量局限于上一层的输出通道数，无法自由改变。

MobileNetV2在Depthwise convolution之前添加一层Pointwise convolution，如下图所示。

mobilenet-v2-conv

添加了这一层Pointwise convolution之后，Depthwise convolution的Filter数量取决于之前的Pointwise的通道数。而这个通道数是可以任意指定的，因此解除了3x3卷积核个数的限制。

网络结构

第6行的input应该是14*14*64，而不是28*28*64，因为第5行采用的是stride=2的卷积

表中的t表示expansion factor，也就是每个inverted residual block的第一个1*1卷积的升维比率；

c是每个inverted residual block的最后一层输出channel

n表示block的重复个数，

s表示stride

0d81eda742bf793af0c9dd9e2a523135bf8a3a2a

实验对比：

几个加速模型在ImageNet数据集上的Top1准确率以及模型大小、速度的对比

è¿éåå¾çæè¿°

关于SSD和SSDLite在关于参数量和计算量上的对比。SSDLite是将SSD网络中的3*3卷积用depthwise separable convolution代替得到的。

几个常见目标检测模型的对比。

è¿éåå¾çæè¿°

参考：

https://yinguobing.com/bottlenecks-block-in-mobilenetv2/

https://blog.csdn.net/u014380165/article/details/79200958

https://yq.aliyun.com/articles/422590

网络结构

猜你喜欢