MobileNets学习与实现

《MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications》提出了MobileNets，它是基于流线型的架构，使用了深度可分离卷积构建轻量级的深度神经网络。引入了两个简单的全局超参数，可以高效平衡计算代价和精度。
网络共28层，1.0 MobileNet-224的精度与GoogleNet和VGG16想当，参数量为4.2M。
在这里插入图片描述

MobileNet架构

计算代价

标准卷积的计算代价

对于 $D_F * D_F * M$ 的特征图，需要与 $N$ 个 $D_K * D_K * M$ 的卷积核进行标准卷积，才能得到 $D_F * D_F * N$ 特征图。
要保证输出特征图的大小和输入特征图的大小一样，需要进行合适的padding，在步长为1的情况下，需要卷积核在输入特征图上滑动 $D_F * D_F$ 次，所以一个卷积核进行标准卷积的计算代价为 $D_K * D_K * M * D_F * D_F$ 。 $N$ 个卷积核的计算代价为 $D_K * D_K * M * D_F * D_F * N$ 。

深度可分离的卷积

对于 $D_F * D_F*M$ 的特征图，先对用 $M$ 个 $D_K * D_K*1$ 卷积核对M个通道的map进行的空间卷积，步长为1，进行合适的padding，得到中间结果 $D_F * D_F*M$ 的特征图，再用 $N$ 个 $1*1*M$ 的卷积核对中间结果进行标准卷积，得到 $D_F * D_F * N$ 特征图。

Depthwise convolution的计算代价：
$D_K * D_K * M * D_F * D_F$

Pointwise convolution的计算代价：
$M * N * D_F * D_F$

总计算代价
$D_K * D_K * M * D_F * D_F + M * N * D_F * D_F$

计算代价比值

$\frac{D_K * D_K * M * D_F * D_F + M * N * D_F * D_F} {D_K * D_K * M * D_F * D_F * N} = \frac1N + \frac{1}{D_K^2}$

MobileNets使用3*3的深度可分离卷积，计算代价是标准卷积的1/8 ~1/9，精度只损失了一点点。

两个超参数

通过width multiplier α和 Resolution Multiplier ρ这两个超参数来平衡模型的计算代价和精度。

width multiplier α

对于给定的层和width multiplier α，输入通道的数量M变成αM，输出通道的数量N变成αN。

计算代价变为： $D_K * D_K * αM * D_F * D_F + αM * αN * D_F * D_F$

Resolution Multiplier ρ

在实践中，通过设置输入分辨率来隐式设置ρ

计算代价变为： $D_K * D_K * αM * ρD_F * ρD_F + αM * αN * ρD_F * ρD_F$

架构

标准卷积与深度可分离卷积

在这里插入图片描述

MobileNets架构

在这里插入图片描述

实现

来自keras_application/mobilenets.py

深度可分离卷积

def _depthwise_conv_block(inputs, pointwise_conv_filters, alpha,
                          depth_multiplier=1, strides=(1, 1), block_id=1):
  
    channel_axis = 1 if backend.image_data_format() == 'channels_first' else -1
    pointwise_conv_filters = int(pointwise_conv_filters * alpha)

    if strides == (1, 1):
        x = inputs
    else:
        x = layers.ZeroPadding2D(((0, 1), (0, 1)),
                                 name='conv_pad_%d' % block_id)(inputs)
    x = layers.DepthwiseConv2D((3, 3),
                               padding='same' if strides == (1, 1) else 'valid',
                               depth_multiplier=depth_multiplier,
                               strides=strides,
                               use_bias=False,
                               name='conv_dw_%d' % block_id)(x)
    x = layers.BatchNormalization(
        axis=channel_axis, name='conv_dw_%d_bn' % block_id)(x)
    x = layers.ReLU(6., name='conv_dw_%d_relu' % block_id)(x)

    x = layers.Conv2D(pointwise_conv_filters, (1, 1),
                      padding='same',
                      use_bias=False,
                      strides=(1, 1),
                      name='conv_pw_%d' % block_id)(x)
    x = layers.BatchNormalization(axis=channel_axis,
                                  name='conv_pw_%d_bn' % block_id)(x)
    return layers.ReLU(6., name='conv_pw_%d_relu' % block_id)(x)

MobileNets实现

	x = _conv_block(img_input, 32, alpha, strides=(2, 2))
    x = _depthwise_conv_block(x, 64, alpha, depth_multiplier, block_id=1)

    x = _depthwise_conv_block(x, 128, alpha, depth_multiplier,
                              strides=(2, 2), block_id=2)
    x = _depthwise_conv_block(x, 128, alpha, depth_multiplier, block_id=3)

    x = _depthwise_conv_block(x, 256, alpha, depth_multiplier,
                              strides=(2, 2), block_id=4)
    x = _depthwise_conv_block(x, 256, alpha, depth_multiplier, block_id=5)

    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
                              strides=(2, 2), block_id=6)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=7)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=8)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=9)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=10)
    x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=11)

    x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier,
                              strides=(2, 2), block_id=12)
    x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier, block_id=13)

    shape = (int(1024 * alpha), 1, 1)

	x = layers.GlobalAveragePooling2D()(x)
	x = layers.Reshape(shape, name='reshape_1')(x)
	x = layers.Dropout(dropout, name='dropout')(x)
	x = layers.Conv2D(classes, (1, 1),
                          padding='same',
                          name='conv_preds')(x)
	x = layers.Activation('softmax', name='act_softmax')(x)
	x = layers.Reshape((classes,), name='reshape_2')(x)