CV study notes 3 - MobileNets paper notes

Original:  https://blog.csdn.net/Jesse_Mx/article/details/70766871

Papers Address:  https://arxiv.org/abs/1704.04861


Foreword

This paper was presented Google for mobile phones and other embedded devices a lightweight deep neural network, named MobileNets. The work of personal feeling theses tend to model compression, the core idea is clever decomposition convolution kernel, can effectively reduce the network parameters. May be due to the just-released, Google has not released the official Code (Tensorflow), Google has officially open this section of code (2017.6.15), see below specific address. In addition, on Github search "MolileNets", can be found in the code to achieve some personal, some training will provide a good model. Bloggers ran them caffe model, found that inference is not how to enhance the speed, look at online discussions, should be the caffe framework, in order to significantly enhance the speed, the framework should rely Tensorflow.

 

Summary

We offer a class called MobileNets efficient model for mobile and embedded vision applications. MobileNets is based on a streamlined architecture that uses separable convolution depth to build lightweight DNN. We introduce two global super-simple parameters, effectively balancing between latency and accuracy. This parameter allows the two super model builder to the constraints of the problem, the selected model for application of the appropriate size. We conducted extensive experimental resources and accuracy trade-offs, compared with other popular network model on ImageNet classification, MobileNets showed a strong performance. Finally, we demonstrate the effectiveness of MobileNets in a wide range of application scenarios, including object detection, classification of fine-grained, massive facial attributes and geographic positioning.

 

 

Introduction and Background

This part is to say, with the development of deep learning, convolution neural networks are becoming more common. The current development of the overall trend is to get a higher accuracy through a deeper and more complex networks, but the networks are often not much advantage in the model size and speed. Some applications on embedded platforms such as robots and automated driving, their limited hardware resources, it is in great need of a lightweight, low-latency (while accuracy is acceptable) network model, which is the main work.

On the establishment of small and efficient neural network has been some work, such as SqueezeNet, Google Inception, Flattened network and so on. Roughly divided into a pre-compression and direct training model trained small network two kinds. MobileNets focuses on optimized delay, taking into account the size of the model, unlike some model parameters, though a small, but can also slow.

 

MobileNets model structure

Decomposable convolution depth

MobileNets decomposable convolution model based on the depth, it can be decomposed into a standard convolution convolution and a depth point convolution (1 × 1 convolution). The depth of each convolution convolution kernel applied to each channel, and a combination of 1 × 1 channel convolution to convolution output. Hereinafter prove this decomposition can reduce the amount of calculation, reducing the size of the model. Figure 2 illustrates how standard convolution is decomposed.

 

 

Intuitively, this decomposition in effect is indeed equivalent. For example, the actual figure of the digital code into the input image dimension is 11 × 11 × 3, the standard convolution for the 3 × 3 × 3 × 16 (assuming a stride of 2, padding is 1), then the output can be obtained the output of the 6 × 6 × 16. Now the same input image, by a first depth dimension convolution of 3 × 3 × 1 × 3 (the input channel 3, there are three convolution kernel corresponding calculation, understood as the for loop), to give 6 × intermediate 6 × 3 output, and then through a convolution dimension is 1 × 1 1 × 1 × 3 × 16, the same output is obtained 6 × 6 × 16. More parsing can also use a classic GIF diagram to understand, first put here.

 

 

Next, the author calculated the extent to which this decomposition can reduce the amount of parameters. First, here indicated a clerical error exists in the paper, part of the lower note icon wrong, should correct DG × DG × NDG × DG × N.

 

首先是标准卷积,假定输入F的维度是 DF×DF×MDF×DF×M ,经过标准卷积核K得到输出G的维度 DG×DG×NDG×DG×N ,卷积核参数量表示为 DK×DK×M×NDK×DK×M×N 。如果计算代价也用数量表示,应该为 DK×DK×M×N×DF×DFDK×DK×M×N×DF×DF 。

现在将卷积核进行分解,那么按照上述计算公式,可得深度卷积的计算代价为 DK×DK×M×DF×DFDK×DK×M×DF×DF ,点卷积的计算代价为 M×N×DF×DFM×N×DF×DF 。

将二者进行比较,可得:

 

MobileNets使用了大量的3 × 3的卷积核,极大地减少了计算量(1/8到1/9之间),同时准确率下降的很少,相比其他的方法确有优势。

 

模型结构和训练

MobileNets结构建立在上述深度可分解卷积中(只有第一层是标准卷积)。该网络允许我们探索网络拓扑,找到一个适合的良好网络。其具体架构在表1说明。除了最后的全连接层,所有层后面跟了batchnorm和ReLU,最终输入到softmax进行分类。图3对比了标准卷积和分解卷积的结构,二者都附带了BN和ReLU层。按照作者的计算方法,MobileNets总共28层(1 + 2 × 13 + 1 = 28)。

 

 

MobileNet将95%的计算时间用于有75%的参数的1×1卷积,作者采用tensorflow框架进行训练,因为过拟合不太容易,所以数据增强和规则化用的不多。

 

宽度乘数

 

主要是按比例减少通道数,该参数记为α,其取值范围为(0,1],那么输入与输出通道数将变成αM和αN,对于depthwise separable convolution,其计算量变为:

网络解析(二):MoblieNets详解

其实,这相当于把M矩阵稀疏化成:

 

网络解析(二):MoblieNets详解

 

应用宽度乘数可以进一步减少计算量,大约有 α*α 的优化空间。

 

 

 

这里介绍模型的第一个超参数,即宽度乘数 αα 。为了构建更小和更少计算量的网络,作者引入了宽度乘数 αα ,作用是改变输入输出通道数,减少特征图数量,让网络变瘦。在 αα 参数作用下,MobileNets某一层的计算量为:

DK×DK×αM×DF×DF+αM×αN×DF×DF
DK×DK×αM×DF×DF+αM×αN×DF×DF
其中, αα 取值是0~1,应用宽度乘数可以进一步减少计算量,大约有 α2α2 的优化空间。

 

分辨率乘数

 

第二个超参数是分辨率乘数 ρ ,比如原来输入特征图是224x224,可以减少为192x192。分辨率乘数用来改变输入数据层的分辨率,同样也能减少参数。在 α 和 ρ 共同作用下,MobileNets某一层的计算量为:

 网络解析(二):MoblieNets详解

要说明的是,resolution multiplier仅仅影响计算量,但是不改变参数量。其中,ρ 是隐式参数,ρ 如果为{1,6/7,5/7,4/7},则对应输入分辨率为{224,192,160,128},ρ 参数的优化空间同样是 ρ2 左右。 从下图可以看出两个超参数在减少网络参数的上的作用:

 

网络解析(二):MoblieNets详解

 

第二个超参数是分辨率乘数 ρρ ,分辨率乘数用来改变输入数据层的分辨率,同样也能减少参数。在 αα 和 ρρ 共同作用下,MobileNets某一层的计算量为:

DK×DK×αM×ρDF×ρDF+αM×αN×ρDF×ρDF
DK×DK×αM×ρDF×ρDF+αM×αN×ρDF×ρDF
其中,ρρ 是隐式参数,ρρ 如果为{1,6/7,5/7,4/7},则对应输入分辨率为{224,192,160,128},ρρ 参数的优化空间同样是 ρ2ρ2 左右。 表3可以看出两个超参数在减少网络参数的上的作用。

 

实验分析

模型选择

表4中,同样是MobileNets的架构,使用可分离卷积,精度值下降1%,而参数仅为1/7。

 

表5中,深且瘦的网络比浅且胖的网络准确率高3%。

 

模型收缩超参数

表6中,αα 超参数减小的时候,模型准确率随着模型的变瘦而下降。

 

表7中,ρρ 超参数减小的时候,模型准确率随着模型的分辨率下降而下降。

 

表8中,在ImageNet数据集上,将MobileNets和VGG与GoogleNet做了对比。

 

目标检测

这里的实验主要是将MobileNets作为目标检测网络Faster R-CNN和SSD的基底(base network),和其他模型在COCO数据集上进行了对比。(为什么不在VOC PASCAL上进行对比,应该更直观吧?也不给一个帧率,不知道速度怎么样)

 

 

Guess you like

Origin www.cnblogs.com/adelaide/p/11350452.html