【图像分类】【深度学习】【轻量级网络】【Pytorch版本】MobileNets_V3模型算法详解

文章目录

【图像分类】【深度学习】【轻量级网络】【Pytorch版本】MobileNets_V3模型算法详解
前言
MobleNet_V3讲解
MobleNet_V3 Pytorch代码
完整代码
总结

前言

MobileNets_V3是由谷歌公司的Howard, Andrew等人在《Searching for MobileNetV3【ICCV-2019】》【论文地址】一文中设计出的模型，它是使用平台感知的网络架构搜索(network architecture search,NAS)通过优化每个网络块来搜索全局网络结构；使用NetAdapt算法搜索每个网络层的过滤器数量，对对平台感知的NAS的进行额外补充。
论文的目标是开发最佳的移动计算机视觉架构，以优化移动设备上的精确延迟交换，即如何进行网络结构的搜索，但是本文暂时不做深入研究，想深入了解的同学可以自行对应阅读论文或者查看其他参考资料，后续博主会补上相关内容。本篇博文主要内容是介绍论文的设计成果——MobileNets_V3。

MobleNet_V3讲解

移动端模型通常建立在更多更加高效的构建网络块上：

MobleNet_V1【参考】引入了将空间滤波与特征生成机制分离的深度可分离卷积(Depthwise separable convolutions)高效替换了传统卷积层。
MobleNet_V2【参考】新引入了线性瓶颈层(Linear Bottlenecks)和反残差结构(inverted-residual)来构造更加高效的网络块结构。

为了开发最佳的移动计算机视觉体系结构，本文开始探索自动搜索算法和网络设计如何协同工作，以利用互补的方法改善整体技术水平，以优化移动设备上的精度和延迟，最终提出了MobleNet_V2体系架构。

SE模块(Squeeze Excitation)

对所通道输出的特征图进行加权: SE模块显式地建立特征通道之间的相互依赖关系，通过学习能够计算出每个通道的重要程度，然后依照重要程度对各个通道上的特征进行加权，从而突出重要特征，抑制不重要的特征。
SE模块的示意图如下图所示：

压缩(squeeze)： 由于卷积只是在局部空间内进行操作，很难获得全局的信息发现通道之间的关系特征，因此采用全局平局池化将每个通道上的空间特征编码压缩为一个全局特征完成特征信息的进行融合。
激励(excitation)： 接收每个通道的全局特征后，采用俩个全连接层预测每个通道的重要性(激励)。为了降低计算量，第一个全连接层带有缩放超参数起到减少通道、降低维度的作用；第二个全连接层则恢复原始维度，以保证通道的重要性与通道的特征图数量完全匹配。
加权(scale)： 计算出通道的重要性后，下一步对通道的原始特征图进行加权操作，各通道权重分别和对应通道的原始特征图相乘获得新的加权特征图。

MobileNets_V3中的SE模块：

重新设计激活函数

算法内部微结构变化，把全部sigmoid使用hard-sigmoid替换，把部分relu6使用hard-swish替换。
在MobileNets_V2都是使用ReLU6激活函数，但现在比较常用的是swish激活函数，即x乘上sigmoid激活函数:
${\rm{swish}}(x) = x\sigma (x)$
其中sigmoid激活函数：
$\sigma (x) = \frac{1}{ {1 + {e^{ - x}}}}$
使用swish激活函数替换ReLU6确实能够提高网络的准确率，但是也存在2个问题：计算、求导复杂；对量化过程不友好(对移动端设备，基本上为了加速都会对它进行量化操作)。
因此，MobileNets_V3提出了h_swish激活函数，即x乘上h_sigmoid激活函数:
${\rm{h\_swish}}(x) = x\frac{ {RuLU6(x + 6)}}{6}$
其中h_sigmoid激活函数：
${\rm{h\_sigmoid}}(x) = \frac{ {RuLU6(x + 6)}}{6}$
ReLU6激活函数公式为： $min (ma x (x, 0), 6)$ 。
以下是论文中提供的示意图，从图中可以看出sigmoid和h-sigmoid曲线比较接近，swish和h-swish激活函数曲线非常相似。

在原论文中，MobileNets_V3将h-swish替换swish，将h-sigmoid替换sigmoid，替换之后网络的推理速度加快，对量化过程比较友好。

反向残差结构(Inverted Residuals)

相对于MobileNets_V2，MobileNets_V3的反向残差结构发生改变，在MobileNets_V2的反向残差结构基础上加入了SE模块。
以下是论文中提供的示意图：

输入特征图先经过1x1卷积上升通道数(升维)，然后在高维空间下使用深度卷积，再经过SE注意力机制优化特征图数据，最后经过1x1点卷积下降通道数(降维)。卷积步长为1且输入和输出特征图的形状完全一致时使用残差连接输入和输出的特征图；当步长为2(下采样阶段)则直接输出特征图。
MobileNets_V3中更新后的Inverted Residuals模块：

注意：并不是所有的Inverted Residuals模块都有SE模块；也不是所有的Inverted Residuals激活函数都是H-swish，也可能是ReLU。

重新设计耗时层结构

网络起始位置：减少第一个卷积层的卷积核个数，将在V1和V2版本中卷积核个数从32个降低到16个，准确率保持不变。
网络末尾位置：简化网络末尾的输出层，如下图所示，使用NAS搜索的网络结构的最后部分(Original last Stage)比较耗时，因此针对该部分进行精简，删除多余的卷积层(Efficient Last Stage)，准确率没有变化。

在精度基本没有影响的前提下，速度有明显的提升。

MobleNet_V3模型结构

下图是原论文给出的关于MobleNet_V3 Large模型结构的详细示意图：

下图是原论文给出的关于MobleNet_V3 Small模型结构的详细示意图：

bneck就是Inverted Residuals结构；
exp size代表bneck的升维的卷积的个数，第一个bneck结构并没有升维，直接对特征图进行深度卷积处理，没有1x1卷积层；
out代表的输出特征矩阵的通道数；
HS代表的是hard swish激活函数，RE代表的是ReLU激活函数;
NBN表示不使用BN结构，在最后的两个1×1卷积相当于全连接；

MobileNets_V3在图像分类中分为两部分：backbone部分： 主要由普通卷积层、反残差结构组成，分类器部分：由池化层(汇聚层)和1×1卷积层(全连接)层组成。

MobleNet_V3 Pytorch代码

普通卷积块： 卷积层+BN层+激活函数(ReLU或H_sigmoid或Identity)

# 普通卷积块
class ConvBNActivation(nn.Sequential):
    def __init__(self,
                 in_planes,
                 out_planes,
                 kernel_size=3,
                 stride=1,
                 groups=1,
                 norm_layer=None,
                 activation_layer=None):
        padding = (kernel_size - 1) // 2
        # 是否有归一化层
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        # 是否有激活函数
        if activation_layer is None:
            activation_layer = nn.ReLU6
        # 卷积层+BN+ReLU6
        super(ConvBNActivation, self).__init__(nn.Conv2d(in_channels=in_planes,
                                                         out_channels=out_planes,
                                                         kernel_size=kernel_size,
                                                         stride=stride,
                                                         padding=padding,
                                                         groups=groups,
                                                         bias=False),
                                               norm_layer(out_planes),
                                               activation_layer(inplace=True))

SE模块： 全局平均池化+1×1卷积+ReLU激活函数+1×1卷积+H_sigmoid激活函数

# SE模块
class SqueezeExcitation(nn.Module):
    def __init__(self, input_c, squeeze_factor = 4):
        super(SqueezeExcitation, self).__init__()
        squeeze_c = _make_divisible(input_c // squeeze_factor, 8)
        self.fc1 = nn.Conv2d(input_c, squeeze_c, 1)
        self.fc2 = nn.Conv2d(squeeze_c, input_c, 1)

    def forward(self, x):
        # 全局平均池化层
        scale = F.adaptive_avg_pool2d(x, output_size=(1, 1))
        # 全连接层1(降维)
        scale = self.fc1(scale)
        # relu激活函数
        scale = F.relu(scale, inplace=True)
        # 全连接层2(激励)
        scale = self.fc2(scale)
        # H_sigmoid激活函数
        scale = F.hardsigmoid(scale, inplace=True)
        return scale * x

反向残差结构配置参数

# 反残差结构配置参数
class InvertedResidualConfig:
    def __init__(self,
                 input_c,       # 输入channel
                 kernel,        # 卷积核大小
                 expanded_c,    # 1×1卷积升维后的channel
                 out_c,         # 输出channel
                 use_se,        # 是否使用se模块(不是所有反残差结构都有)
                 activation,    # 激活函数
                 stride,        # 步长
                 width_multi    # 控制模型通道的扩缩比例
                 ):
        # 根据扩缩比计算后的新输入channel
        self.input_c = self.adjust_channels(input_c, width_multi)
        self.kernel = kernel
        # 根据扩缩比计算后的新的升维后channel
        self.expanded_c = self.adjust_channels(expanded_c, width_multi)
        self.out_c = self.adjust_channels(out_c, width_multi)
        self.use_se = use_se
        # 是否使用H-swish
        self.use_hs = activation == "HS"
        self.stride = stride
    
    # 根据扩缩比计算新channel
    @staticmethod
    def adjust_channels(channels, width_multi):
        return _make_divisible(channels * width_multi, 8)

反向残差结构： 1×1点卷积层组+3×3深度卷积层组+SE模块(可选)+1×1点卷积层组

# 反向残差结构
class InvertedResidual(nn.Module):
    def __init__(self,
                 cnf,
                 norm_layer):
        super(InvertedResidual, self).__init__()
        # 步长必须是1或2
        if cnf.stride not in [1, 2]:
            raise ValueError("illegal stride value.")
        # 步长为1,且输入输出的特征图形状完全一致
        self.use_res_connect = (cnf.stride == 1 and cnf.input_c == cnf.out_c)

        layers = []
        # 激活函数选择Hardswish还是ReLU
        activation_layer = nn.Hardswish if cnf.use_hs else nn.ReLU

        # 第一个反残差结构是没有1×1点卷积做升维操作的,因此输出channel=输入channel
        if cnf.expanded_c != cnf.input_c:
            layers.append(ConvBNActivation(cnf.input_c,
                                           cnf.expanded_c,
                                           kernel_size=1,
                                           norm_layer=norm_layer,
                                           activation_layer=activation_layer))

        # 深度卷积
        layers.append(ConvBNActivation(cnf.expanded_c,
                                       cnf.expanded_c,
                                       kernel_size=cnf.kernel,
                                       stride=cnf.stride,
                                       groups=cnf.expanded_c,
                                       norm_layer=norm_layer,
                                       activation_layer=activation_layer))
        # SE模块
        if cnf.use_se:
            layers.append(SqueezeExcitation(cnf.expanded_c))

        # 点卷积 nn.Identity相当于恒等函数f(x)=x,可以理解为没有激活函数
        layers.append(ConvBNActivation(cnf.expanded_c,
                                       cnf.out_c,
                                       kernel_size=1,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Identity))
        self.block = nn.Sequential(*layers)
        # 输出channel
        self.out_channels = cnf.out_c

    def forward(self, x) :
        result = self.block(x)
        # 输入输出形状完全一致
        if self.use_res_connect:
            result += x
        return result

完整代码

from typing import Callable, List, Optional

import torch
from torch import nn, Tensor
from torch.nn import functional as F
from functools import partial
from torchsummary import summary

def _make_divisible(ch, divisor=8, min_ch=None):
    '''
    int(ch + divisor / 2) // divisor * divisor)
    目的是为了让new_ch是divisor的整数倍
    类似于四舍五入:ch超过divisor的一半则加1保留;不满一半则归零舍弃
    '''
    if min_ch is None:
        min_ch = divisor
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch

# 卷积组
class ConvBNActivation(nn.Sequential):
    def __init__(self,
                 in_planes,
                 out_planes,
                 kernel_size=3,
                 stride=1,
                 groups=1,
                 norm_layer=None,
                 activation_layer=None):
        padding = (kernel_size - 1) // 2
        # 是否有归一化层
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        # 是否有激活函数
        if activation_layer is None:
            activation_layer = nn.ReLU6
        # 卷积层+BN+ReLU6
        super(ConvBNActivation, self).__init__(nn.Conv2d(in_channels=in_planes,
                                                         out_channels=out_planes,
                                                         kernel_size=kernel_size,
                                                         stride=stride,
                                                         padding=padding,
                                                         groups=groups,
                                                         bias=False),
                                               norm_layer(out_planes),
                                               activation_layer(inplace=True))


class SqueezeExcitation(nn.Module):
    def __init__(self, input_c, squeeze_factor = 4):
        super(SqueezeExcitation, self).__init__()
        squeeze_c = _make_divisible(input_c // squeeze_factor, 8)
        self.fc1 = nn.Conv2d(input_c, squeeze_c, 1)
        self.fc2 = nn.Conv2d(squeeze_c, input_c, 1)

    def forward(self, x):
        # 全局平均池化层
        scale = F.adaptive_avg_pool2d(x, output_size=(1, 1))
        # 全连接层1(降维)
        scale = self.fc1(scale)
        # relu激活函数
        scale = F.relu(scale, inplace=True)
        # 全连接层2(激励)
        scale = self.fc2(scale)
        # H_sigmoid激活函数
        scale = F.hardsigmoid(scale, inplace=True)
        return scale * x

# 反残差结构配置参数
class InvertedResidualConfig:
    def __init__(self,
                 input_c,       # 输入channel
                 kernel,        # 卷积核大小
                 expanded_c,    # 1×1卷积升维后的channel
                 out_c,         # 输出channel
                 use_se,        # 是否使用se模块(不是所有反残差结构都有)
                 activation,    # 激活函数
                 stride,        # 步长
                 width_multi    # 控制模型通道的扩缩比例
                 ):
        # 根据扩缩比计算后的新输入channel
        self.input_c = self.adjust_channels(input_c, width_multi)
        self.kernel = kernel
        # 根据扩缩比计算后的新的升维后channel
        self.expanded_c = self.adjust_channels(expanded_c, width_multi)
        self.out_c = self.adjust_channels(out_c, width_multi)
        self.use_se = use_se
        # 是否使用H-swish
        self.use_hs = activation == "HS"
        self.stride = stride

    # 根据扩缩比计算新channel
    @staticmethod
    def adjust_channels(channels, width_multi):
        return _make_divisible(channels * width_multi, 8)

# 反残差结构
class InvertedResidual(nn.Module):
    def __init__(self,
                 cnf,
                 norm_layer):
        super(InvertedResidual, self).__init__()
        # 步长必须是1或2
        if cnf.stride not in [1, 2]:
            raise ValueError("illegal stride value.")
        # 步长为1,且输入输出的特征图形状完全一致
        self.use_res_connect = (cnf.stride == 1 and cnf.input_c == cnf.out_c)

        layers = []
        # 激活函数选择Hardswish还是ReLU
        activation_layer = nn.Hardswish if cnf.use_hs else nn.ReLU

        # 第一个反残差结构是没有1×1点卷积做升维操作的,因此输出channel=输入channel
        if cnf.expanded_c != cnf.input_c:
            layers.append(ConvBNActivation(cnf.input_c,
                                           cnf.expanded_c,
                                           kernel_size=1,
                                           norm_layer=norm_layer,
                                           activation_layer=activation_layer))

        # 深度卷积
        layers.append(ConvBNActivation(cnf.expanded_c,
                                       cnf.expanded_c,
                                       kernel_size=cnf.kernel,
                                       stride=cnf.stride,
                                       groups=cnf.expanded_c,
                                       norm_layer=norm_layer,
                                       activation_layer=activation_layer))
        # SE模块
        if cnf.use_se:
            layers.append(SqueezeExcitation(cnf.expanded_c))

        # 点卷积 nn.Identity相当于恒等函数f(x)=x,可以理解为没有激活函数
        layers.append(ConvBNActivation(cnf.expanded_c,
                                       cnf.out_c,
                                       kernel_size=1,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Identity))
        self.block = nn.Sequential(*layers)
        # 输出channel
        self.out_channels = cnf.out_c
        # # 是否下采样
        # self.is_strided = cnf.stride > 1

    def forward(self, x) :
        result = self.block(x)
        # 输入输出形状完全一致
        if self.use_res_connect:
            result += x
        return result

class MobileNetV3(nn.Module):
    def __init__(self,
                 inverted_residual_setting,
                 last_channel,
                 num_classes = 1000,
                 block=None,
                 norm_layer=None):
        super(MobileNetV3, self).__init__()
        # 网络配置参数不能为空
        if not inverted_residual_setting:
            raise ValueError("The inverted_residual_setting should not be empty.")
        # 网络配置参数是list保存的,且每个元素都是反残差结构配置参数
        elif not (isinstance(inverted_residual_setting, List) and
                  all([isinstance(s, InvertedResidualConfig) for s in inverted_residual_setting])):
            raise TypeError("The inverted_residual_setting should be List[InvertedResidualConfig]")

        # 反残差结构
        if block is None:
            block = InvertedResidual
        # BN层
        if norm_layer is None:
            norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.01)

        layers = []

        # 构建网络第一个卷积层
        firstconv_output_c = inverted_residual_setting[0].input_c
        layers.append(ConvBNActivation(3,
                                       firstconv_output_c,
                                       kernel_size=3,
                                       stride=2,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        # 堆叠构建反残差结构
        for cnf in inverted_residual_setting:
            layers.append(block(cnf, norm_layer))

        # 构建网络的最后几层
        lastconv_input_c = inverted_residual_setting[-1].out_c
        # 1×1卷积升维
        lastconv_output_c = 6 * lastconv_input_c
        layers.append(ConvBNActivation(lastconv_input_c,
                                       lastconv_output_c,
                                       kernel_size=1,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        self.features = nn.Sequential(*layers)
        # 全局平均池化
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        # 俩个全连接层(非1×1卷积充当)
        self.classifier = nn.Sequential(nn.Linear(lastconv_output_c, last_channel),
                                        nn.Hardswish(inplace=True),
                                        nn.Dropout(p=0.2, inplace=True),
                                        nn.Linear(last_channel, num_classes))

        # 权重初始化
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out")
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def _forward_impl(self, x) :
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

    def forward(self, x) :
        return self._forward_impl(x)


def mobilenet_v3_large(num_classes = 1000,
                       reduced_tail = False):
    """
    weights_link:
    https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth
    """
    # 通道扩缩比
    width_multi = 1.0
    # 用于创建一个函数对象，该对象是原函数的一个部分应用
    bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)
    # 用于减少主干网络C4部分的信道冗余
    reduce_divider = 2 if reduced_tail else 1
    inverted_residual_setting = [
        # input_c, kernel, expanded_c, out_c, use_se, activation, stride
        bneck_conf(16, 3, 16, 16, False, "RE", 1),
        bneck_conf(16, 3, 64, 24, False, "RE", 2),      # C1
        bneck_conf(24, 3, 72, 24, False, "RE", 1),
        bneck_conf(24, 5, 72, 40, True, "RE", 2),       # C2
        bneck_conf(40, 5, 120, 40, True, "RE", 1),
        bneck_conf(40, 5, 120, 40, True, "RE", 1),
        bneck_conf(40, 3, 240, 80, False, "HS", 2),     # C3
        bneck_conf(80, 3, 200, 80, False, "HS", 1),
        bneck_conf(80, 3, 184, 80, False, "HS", 1),
        bneck_conf(80, 3, 184, 80, False, "HS", 1),
        bneck_conf(80, 3, 480, 112, True, "HS", 1),
        bneck_conf(112, 3, 672, 112, True, "HS", 1),
        bneck_conf(112, 5, 672, 160 // reduce_divider, True, "HS", 2),  # C4
        bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
        bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
    ]
    # 分类器中第一个全连接的输出通道
    last_channel = adjust_channels(1280 // reduce_divider)  # C5

    return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
                       last_channel=last_channel,
                       num_classes=num_classes)


def mobilenet_v3_small(num_classes = 1000,
                       reduced_tail = False):
    """
    weights_link:
    https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth
    """
    # 通道扩缩比
    width_multi = 1.0
    # 用于创建一个函数对象，该对象是原函数的一个部分应用
    bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)
    # 用于减少主干网络C4部分的信道冗余
    reduce_divider = 2 if reduced_tail else 1

    inverted_residual_setting = [
        # input_c, kernel, expanded_c, out_c, use_se, activation, stride
        bneck_conf(16, 3, 16, 16, True, "RE", 2),       # C1
        bneck_conf(16, 3, 72, 24, False, "RE", 2),      # C2
        bneck_conf(24, 3, 88, 24, False, "RE", 1),
        bneck_conf(24, 5, 96, 40, True, "HS", 2),       # C3
        bneck_conf(40, 5, 240, 40, True, "HS", 1),
        bneck_conf(40, 5, 240, 40, True, "HS", 1),
        bneck_conf(40, 5, 120, 48, True, "HS", 1),
        bneck_conf(48, 5, 144, 48, True, "HS", 1),
        bneck_conf(48, 5, 288, 96 // reduce_divider, True, "HS", 2),  # C4
        bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1),
        bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1)
    ]
    # 分类器中第一个全连接的输出通道
    last_channel = adjust_channels(1024 // reduce_divider)  # C5

    return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
                       last_channel=last_channel,
                       num_classes=num_classes)
if __name__ == '__main__':
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model = mobilenet_v3_small().to(device)
    summary(model, input_size=(3, 224, 224))

summary可以打印网络结构和参数，方便查看搭建好的网络结构。

总结

尽可能简单、详细的介绍了SE模块的原理和结构，讲解了MobileNets_V3模型的结构和pytorch代码。