《Deep Learning with Depthwise Separable Convolutions》提出了Xception网络架构,它和Inception V3具有差不多的参数,由于模型参数的利用效率更高,使得它比Inception V3表现更好。
灵感来源
VGG
借鉴了VGG的模块堆叠思想
Inception
Inception系列网络第一次阐明了将卷积在通道和空间上进行分解的优势。
Depthwise separable convolutions
这是Xception的根基。高效的移动端模型MobileNets使用了depthwise separable convolutions,使得模型尺寸和计算代价都大大降低。这是作者打算用depthwise separable convolutions替代Inception模块的重要动机。
Residual connections
残差连接可以加速网络训练,提升网络性能。
Xception架构
作者做出如下假设:在卷积网络的特征图中,跨通道相关性和空间相关性的映射可以完全分离。
细节理解
Xception架构其实就是带有残差连接的深度可分离卷积层的线性堆叠。
depthwise separable convolution
depthwise separable convolution由两个过程组成:channel-wise spatial convolution和1x1 convolution。
tensorflow实现详解见:https://blog.csdn.net/mao_xiao_feng/article/details/78002811
channel-wise spatial convolution
滤波器分别对每个通道的特征图进行空间卷积。假设有m个滤波器,特征图有n个,则执行channel-wise spatial convolution后得到的mxn个新的特征图。
tensorflow实现详解见:https://blog.csdn.net/mao_xiao_feng/article/details/78003476
1x1 convolution
然后对mxn个新的特征图进行1x1 convolution。
Depthwise separable convolution与Inception模块的区别
- Depthwise separable convolution先进行逐通道的空间卷积,再进行1x1卷积。而Inception先执行1x1卷积。
- Inception模块的每个卷积后面都跟一个ReLU。separable convolution的实现是不包含非线性激活函数的。
用Keras实现Xception
来自:keras_application/xception.py
# Determine proper input shape
input_shape = _obtain_input_shape(input_shape,
default_size=299,
min_size=71,
data_format=backend.image_data_format(),
require_flatten=False,
weights=weights)
if input_tensor is None:
img_input = layers.Input(shape=input_shape)
else:
if not backend.is_keras_tensor(input_tensor):
img_input = layers.Input(tensor=input_tensor, shape=input_shape)
else:
img_input = input_tensor
x = layers.Conv2D(32, (3, 3),
strides=(2, 2),
use_bias=False,
name='block1_conv1')(img_input)
x = layers.BatchNormalization(name='block1_conv1_bn')(x)
x = layers.Activation('relu', name='block1_conv1_act')(x)
x = layers.Conv2D(64, (3, 3), use_bias=False, name='block1_conv2')(x)
x = layers.BatchNormalization(name='block1_conv2_bn')(x)
x = layers.Activation('relu', name='block1_conv2_act')(x)
residual = layers.Conv2D(128, (1, 1),
strides=(2, 2),
padding='same',
use_bias=False)(x)
residual = layers.BatchNormalization()(residual)
x = layers.SeparableConv2D(128, (3, 3),
padding='same',
use_bias=False,
name='block2_sepconv1')(x)
x = layers.BatchNormalization(name='block2_sepconv1_bn')(x)
x = layers.Activation('relu', name='block2_sepconv2_act')(x)
x = layers.SeparableConv2D(128, (3, 3),
padding='same',
use_bias=False,
name='block2_sepconv2')(x)
x = layers.BatchNormalization(name='block2_sepconv2_bn')(x)
x = layers.MaxPooling2D((3, 3),
strides=(2, 2),
padding='same',
name='block2_pool')(x)
x = layers.add([x, residual])
residual = layers.Conv2D(256, (1, 1), strides=(2, 2),
padding='same', use_bias=False)(x)
residual = layers.BatchNormalization()(residual)
x = layers.Activation('relu', name='block3_sepconv1_act')(x)
x = layers.SeparableConv2D(256, (3, 3),
padding='same',
use_bias=False,
name='block3_sepconv1')(x)
x = layers.BatchNormalization(name='block3_sepconv1_bn')(x)
x = layers.Activation('relu', name='block3_sepconv2_act')(x)
x = layers.SeparableConv2D(256, (3, 3),
padding='same',
use_bias=False,
name='block3_sepconv2')(x)
x = layers.BatchNormalization(name='block3_sepconv2_bn')(x)
x = layers.MaxPooling2D((3, 3), strides=(2, 2),
padding='same',
name='block3_pool')(x)
x = layers.add([x, residual])
residual = layers.Conv2D(728, (1, 1),
strides=(2, 2),
padding='same',
use_bias=False)(x)
residual = layers.BatchNormalization()(residual)
x = layers.Activation('relu', name='block4_sepconv1_act')(x)
x = layers.SeparableConv2D(728, (3, 3),
padding='same',
use_bias=False,
name='block4_sepconv1')(x)
x = layers.BatchNormalization(name='block4_sepconv1_bn')(x)
x = layers.Activation('relu', name='block4_sepconv2_act')(x)
x = layers.SeparableConv2D(728, (3, 3),
padding='same',
use_bias=False,
name='block4_sepconv2')(x)
x = layers.BatchNormalization(name='block4_sepconv2_bn')(x)
x = layers.MaxPooling2D((3, 3), strides=(2, 2),
padding='same',
name='block4_pool')(x)
x = layers.add([x, residual])
for i in range(8):
residual = x
prefix = 'block' + str(i + 5)
x = layers.Activation('relu', name=prefix + '_sepconv1_act')(x)
x = layers.SeparableConv2D(728, (3, 3),
padding='same',
use_bias=False,
name=prefix + '_sepconv1')(x)
x = layers.BatchNormalization(name=prefix + '_sepconv1_bn')(x)
x = layers.Activation('relu', name=prefix + '_sepconv2_act')(x)
x = layers.SeparableConv2D(728, (3, 3),
padding='same',
use_bias=False,
name=prefix + '_sepconv2')(x)
x = layers.BatchNormalization(name=prefix + '_sepconv2_bn')(x)
x = layers.Activation('relu', name=prefix + '_sepconv3_act')(x)
x = layers.SeparableConv2D(728, (3, 3),
padding='same',
use_bias=False,
name=prefix + '_sepconv3')(x)
x = layers.BatchNormalization(name=prefix + '_sepconv3_bn')(x)
x = layers.add([x, residual])
residual = layers.Conv2D(1024, (1, 1), strides=(2, 2),
padding='same', use_bias=False)(x)
residual = layers.BatchNormalization()(residual)
x = layers.Activation('relu', name='block13_sepconv1_act')(x)
x = layers.SeparableConv2D(728, (3, 3),
padding='same',
use_bias=False,
name='block13_sepconv1')(x)
x = layers.BatchNormalization(name='block13_sepconv1_bn')(x)
x = layers.Activation('relu', name='block13_sepconv2_act')(x)
x = layers.SeparableConv2D(1024, (3, 3),
padding='same',
use_bias=False,
name='block13_sepconv2')(x)
x = layers.BatchNormalization(name='block13_sepconv2_bn')(x)
x = layers.MaxPooling2D((3, 3),
strides=(2, 2),
padding='same',
name='block13_pool')(x)
x = layers.add([x, residual])
x = layers.SeparableConv2D(1536, (3, 3),
padding='same',
use_bias=False,
name='block14_sepconv1')(x)
x = layers.BatchNormalization(name='block14_sepconv1_bn')(x)
x = layers.Activation('relu', name='block14_sepconv1_act')(x)
x = layers.SeparableConv2D(2048, (3, 3),
padding='same',
use_bias=False,
name='block14_sepconv2')(x)
x = layers.BatchNormalization(name='block14_sepconv2_bn')(x)
x = layers.Activation('relu', name='block14_sepconv2_act')(x)
if include_top:
x = layers.GlobalAveragePooling2D(name='avg_pool')(x)
x = layers.Dense(classes, activation='softmax', name='predictions')(x)
else:
if pooling == 'avg':
x = layers.GlobalAveragePooling2D()(x)
elif pooling == 'max':
x = layers.GlobalMaxPooling2D()(x)