《MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications》提出了MobileNets,它是基于流线型的架构,使用了深度可分离卷积构建轻量级的深度神经网络。引入了两个简单的全局超参数,可以高效平衡计算代价和精度。
网络共28层,1.0 MobileNet-224的精度与GoogleNet和VGG16想当,参数量为4.2M。
MobileNet架构
计算代价
标准卷积的计算代价
对于
的特征图,需要与
个
的卷积核进行标准卷积,才能得到
特征图。
要保证输出特征图的大小和输入特征图的大小一样,需要进行合适的padding,在步长为1的情况下,需要卷积核在输入特征图上滑动
次,所以一个卷积核进行标准卷积的计算代价为
。
个卷积核的计算代价为
。
深度可分离的卷积
对于 的特征图,先对用 个 卷积核对M个通道的map进行的空间卷积,步长为1,进行合适的padding,得到中间结果 的特征图,再用 个 的卷积核对中间结果进行标准卷积,得到 特征图。
Depthwise convolution的计算代价:
Pointwise convolution的计算代价:
总计算代价
计算代价比值
MobileNets使用3*3的深度可分离卷积,计算代价是标准卷积的1/8 ~1/9,精度只损失了一点点。
两个超参数
通过width multiplier α和 Resolution Multiplier ρ这两个超参数来平衡模型的计算代价和精度。
width multiplier α
对于给定的层和width multiplier α,输入通道的数量M变成αM,输出通道的数量N变成αN。
计算代价变为:
Resolution Multiplier ρ
在实践中,通过设置输入分辨率来隐式设置ρ
计算代价变为:
架构
标准卷积与深度可分离卷积
MobileNets架构
实现
来自keras_application/mobilenets.py
深度可分离卷积
def _depthwise_conv_block(inputs, pointwise_conv_filters, alpha,
depth_multiplier=1, strides=(1, 1), block_id=1):
channel_axis = 1 if backend.image_data_format() == 'channels_first' else -1
pointwise_conv_filters = int(pointwise_conv_filters * alpha)
if strides == (1, 1):
x = inputs
else:
x = layers.ZeroPadding2D(((0, 1), (0, 1)),
name='conv_pad_%d' % block_id)(inputs)
x = layers.DepthwiseConv2D((3, 3),
padding='same' if strides == (1, 1) else 'valid',
depth_multiplier=depth_multiplier,
strides=strides,
use_bias=False,
name='conv_dw_%d' % block_id)(x)
x = layers.BatchNormalization(
axis=channel_axis, name='conv_dw_%d_bn' % block_id)(x)
x = layers.ReLU(6., name='conv_dw_%d_relu' % block_id)(x)
x = layers.Conv2D(pointwise_conv_filters, (1, 1),
padding='same',
use_bias=False,
strides=(1, 1),
name='conv_pw_%d' % block_id)(x)
x = layers.BatchNormalization(axis=channel_axis,
name='conv_pw_%d_bn' % block_id)(x)
return layers.ReLU(6., name='conv_pw_%d_relu' % block_id)(x)
MobileNets实现
x = _conv_block(img_input, 32, alpha, strides=(2, 2))
x = _depthwise_conv_block(x, 64, alpha, depth_multiplier, block_id=1)
x = _depthwise_conv_block(x, 128, alpha, depth_multiplier,
strides=(2, 2), block_id=2)
x = _depthwise_conv_block(x, 128, alpha, depth_multiplier, block_id=3)
x = _depthwise_conv_block(x, 256, alpha, depth_multiplier,
strides=(2, 2), block_id=4)
x = _depthwise_conv_block(x, 256, alpha, depth_multiplier, block_id=5)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
strides=(2, 2), block_id=6)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=7)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=8)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=9)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=10)
x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=11)
x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier,
strides=(2, 2), block_id=12)
x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier, block_id=13)
shape = (int(1024 * alpha), 1, 1)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Reshape(shape, name='reshape_1')(x)
x = layers.Dropout(dropout, name='dropout')(x)
x = layers.Conv2D(classes, (1, 1),
padding='same',
name='conv_preds')(x)
x = layers.Activation('softmax', name='act_softmax')(x)
x = layers.Reshape((classes,), name='reshape_2')(x)