Resnet 中：
原始BottleNeck ：

实现的功能：　通道维度下降　--> 　通道维度保持不变 --> 　　通道维度上升
实现的时候，　是 1x1 conv 　--> 　 3x3 conv --> 1x1 conv

MobileNet_v2 中：
提出了逆残差模块, Inverted_residual ：

实现的功能：　通道维度上升　--> 　通道维度保持不变 --> 　　通道维度下降；
实现的时候，　是 1x1 conv 　--> 　 3x3 conv --> 1x1 conv

从而可以，看出所谓的逆残差块，　是特征的通道数　先上升然后下降，　这与原始的残差块实现的功能相反，　故而称为逆残差

MobilNet_v3
则是在　Inverted_residual 逆残差块中加入了通道注意力模块 SE_layer,
并且使用　Hard swish, HardSigmoid 作为不同的层的激活函数；

其实，可以将　MobileNet_v3 中：
inverted_residual 后面继续接
ResNeXt ＋　CBAM　形成新的主干网络；

等同于将特征通道数　升维－－》保持维度不变　－－》　升维度　－－－》　多分支并行＋　通道空间注意力，
。。。　新的网络诞生了，

估计以后这些事情，　可以让 chat_gpt 直接搞定了吧。

1. 线性BottleNeck

linear BottleNeck 是在 MobileNetV2: Inverted Residuals 中引入的。

线性瓶颈块是不包含最后一个激活的瓶颈块。

在论文的第 3.2 节中，他们详细介绍了为什么在输出之前存在非线性会损害性能。

简而言之：非线性函数 Line ReLU 将所有 < 0 设置为 0会破坏信息。根据经验表明，当输入的通道小于输出的通道时删除最后的激活函数是正确的。所以只要删除 BottleNeck 中的 nn.ReLU 即可。

首先说明一下ReLU6，卷积之后通常会接一个ReLU非线性激活，在Mobile v1里面使用ReLU6，ReLU6就是普通的ReLU但是限制最大输出值为6（对输出值做clip），这是为了在移动端设备float16的低精度的时候，也能有很好的数值分辨率，如果对ReLU的激活范围不加限制，输出范围为0到正无穷，如果激活值非常大，分布在一个很大的范围内，则低精度的float16无法很好地精确描述如此大范围的数值，带来精度损失。

本文提出，最后输出的ReLU6去掉，直接线性输出，理由是：ReLU变换后保留非0区域对应于一个线性变换，仅当输入低维时ReLU能保留所有完整信息。

在看MobileNet v1的时候，我就疑问为什么没有把后面的ReLU去掉，因为Xception已经实验证明了Depthwise卷积后再加ReLU效果会变差，作者猜想可能是Depthwise输出太浅了应用ReLU会带来信息丢失，而MobileNet还引用了Xception的论文，但是在Depthwise卷积后面还是加了ReLU。在MobileNet v2这个ReLU终于去掉了（非紧邻，最后的ReLU），并用了大量的篇幅来说明为什么要去掉（各种很复杂的证明，你不会想自己推一遍的= =，从理论上说明了去掉ReLU的合理性）。

总之，结论就是最后那个ReLU要去掉，效果更好。

2. mobileNet_v2

2.1 代码实现与注释

# time: 2023/3/20
#       下午4:53

import torch
from torch import nn

#from .utils import  load_state_dict_from_url


__all__ = ['MobileNetV2', 'mobilenet_v2']


def _make_divisiable(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    :param v:
    :param divisor:
    :param min_value:
    :return:
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v


# 该类用于创建3*3（或者传入1*1）conv, bn, relu6 的模块；　实例化该类用于创建 InvertedResidual 模块；
class ConvBNReLU(nn.Sequential):
    # nn.Sequential 会根据下面传入构造函数中的模块，　按照顺序构建出来
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=None):

        padding = (kernel_size - 1) // 2
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        # 这里再一次继承，　是为了将刚才添加的属性加入该类当中；
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
            norm_layer(out_planes),
            nn.ReLU6(inplace=True)
        )


# 倒残差模块，将输入的特征图的通道数，
# 1. pw, 经过　1*1 pointwise 卷积，增加特征图通道数，
# 2. dw,  经过 3*3 depthwise 　此时卷积核的个数　= 输出特征的通道数， 但是每个卷积核中的通道数　= 输入通道数/ 组数；
#  这步不改变输出特征的通道数，　改变是的每个卷积核中的通道数。　根据padding，来决定是否改变特征图大小；
# 3.  经过　1*1 pointwise 卷积，减少特征图通道数，
class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None ):
        super(InvertedResidual, self).__init__()

        self.stride = stride
        assert stride in [1, 2]

        if norm_layer is None:
            norm_layer = nn.BatchNorm2d

        # hidden_dim　用于控制，第一次经过pw输出增加到的通道数；
        hidden_dim = int(round(inp * expand_ratio))
        # 用于判断是否使用残差连接，　满足两个条件，　步长是１，　且此时，输入通道数= 输出通道数;
        self.use_res_connnect =  self.stride == 1  and  inp == oup

        layers = []
        if expand_ratio != 1:  # 如果此时，对输入的通道数进行扩充,
           # pw: pointwise, 此时带有 ReLU6
            layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1,  norm_layer=norm_layer ))

        layers.extend([
            # dw , groups　这里是对输入特征的通道数进行　分组，　进而改变每个卷积核中的通道数；
            ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer),
            # pw-linear: 注意，此时不带ReLU6激活函数了；
            nn.Conv2d(hidden_dim,oup, 1, 1, 0, bias= False),
            norm_layer(oup),
        ])

        # note 将上述　layers　按照列表中内容的顺序，　构成一个模块,
        #  并且将该模块作为一个属性，从而方便了　在该类的方法中调用；
        self.conv = nn.Sequential(*layers)

    def  forward(self, x):
            if self.use_res_connnect:  # 满足残差条件时，　该实例化的对象进行残差连接；
                return   x + self.conv(x)

            else:
                return  self.conv(x)







class MobileNetV2(nn.Module):
    def __init__(self,  num_classes=4, width_mult=1.0, inverted_residual_setting=None,
                 round_nearest=8, block=None, norm_layer=None):
        """

        Args:
            num_classes: (int) 最后分类的数量
            width_mult: (float) 用来调整每层中通道数
            inverted_residual_setting:　通过这个参数列表，来实例化InvertedResidual类，　从而形成不同的实例化对象；
            round_nearest: 将每层中通道的数量，是这个数字的整数倍，　或者离该整数倍数值最近的一个数；
            block: 用于指定使用哪个基础，来构建整个网络，　这里是使用　invertedResidual 模块来实现。
            norm_layer: 　定义使用什么 normalization 层；
        """

        super(MobileNetV2, self).__init__()

        if block is None:
            block = InvertedResidual  # 指定用于构建网络的基础模块

        if norm_layer is None:
            norm_layer = nn.BatchNorm2d

        input_channel = 32  # 用来 指定输入到 第一个InvertedResidual 模块中的　输入通道数；
        last_channel = 1280

        if inverted_residual_setting is  None:
            # 使用列表的形式给出参数，　来实例化不同的逆残差块；
            # note: t:  使用1*1,pw　将输入通道数　扩充 t倍；
            #  c: 该层最后输出的通道数目
            #  n:　该层中 bottleneck 堆叠的次数
            #  s 表示步距，但是只表示第一个 bottleneck 中 DW 卷积的步距，后面重复 bottleneck 的 stride 都是等于 1 的。
            inverted_residual_setting = [
                    # t, c, n, s
                [1, 16, 1, 1],
                [6, 24, 2, 2],
                [6, 32, 3, 2],
                [6, 64, 4, 2],
                [6, 96, 3, 1],
                [6, 160, 3, 2],
                [6, 320, 1, 1],
            ]
        # 检查配置是否为空，并且每项中的元素是4项；
        if len(inverted_residual_setting) == 0 or  len(inverted_residual_setting[0]) != 4:
            raise  ValueError("inverted_residual_setting should be non-empty "
                             "or a 4-element list, got {}" .format(inverted_residual_setting))


        # 确保输入第一个输入的通道数，是靠近round_nearest =8的整数倍的　最近的数值; 同样检查网络中最后一个层的输出通道数;
        input_channel = _make_divisiable(input_channel * width_mult, round_nearest )
        self.last_channel = _make_divisiable(last_channel * max(1.0, width_mult), round_nearest)
        # note: 构建网络中第一个层, 这里指定了输入是3通道;
        features = [ ConvBNReLU(3, input_channel, stride=2, norm_layer=norm_layer)]

        # 下面根据inverted_residual_settings　来实例化不同的 逆残差模块;
        for t, c, n, s  in   inverted_residual_setting:
            # 确定输出的通道数;
            output_channel = _make_divisiable(c * width_mult, round_nearest)
            for i in range(n):  # 本层中，堆叠多少个 invertedResidual 模块;
                stride = s if i == 0  else 1  # 只规定了本层中第一个 InverteRes　模块中步长是s, 后面模块的步长都是1;
                # block: def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None):
                features.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer))
                # 将本层最后输出通道数，　作为下一层开始的输入通道数
                input_channel = output_channel

        # 构建网络中的最后一层, 用来保证最后层输出通道数目；
        features.append(ConvBNReLU(input_channel, self.last_channel, kernel_size=1, norm_layer=norm_layer))

        # 将上述构建的模块，按照顺序构建出整个网络：
        # note : 并且整个网络的结构作为一个属性，　用于后续的方法中调用该属性；
        self.features = nn.Sequential(*features)  # *用于表明该参数传入的是个列表;

        #　构建网络的最后的分类器
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(self.last_channel, num_classes),
        )

        # 权重初始化，
        for  m in self.modules(): # self.module 该方法是继承nn.Modules 得来的，父类中方法，　用于返回网络中所有的模块;
            # 初始化卷积中的权重
            if isinstance(m, nn.Conv2d):  # 如果m 是nn.Conv2d该类的对象，
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            # 初始化 batchNorm中权重
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            # 初始化linear　线性层中权重；
            elif  isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def _forward_impl(self, x):
        # This exists since TorchScript doesn't support inheritance, so the superclass method
        # (this one) needs to have a name other than `forward` that can be accessed in a subclass
        x = self.features(x)

        # 将网络最后输出的特征图大小，　通过自适应池化，　统一成(1,1)　大小的特征图
        # Cannot use "squeeze" as batch-size can be 1 => must use reshape with x.shape[0]
        x = nn.functional.adaptive_avg_pool2d(x, 1).reshape(x.shape[0], -1)
        x = self.classifier(x)

        return x

    def forward(self, x):
        return self._forward_impl(x)



def mobilenet_v2(pretrained=False, progress=True,  **kwargs):
    """
    Constructs a MobileNetV2 architecture from
    `"MobileNetV2: Inverted Residuals and Linear Bottlenecks" <https://arxiv.org/abs/1801.04381>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    model = MobileNetV2(**kwargs)
    # if pretrained:
    #     #state_dict = load_state_dict_from_url(model_urls['mobilenet_v2'],
    #                                           progress=progress)
    #     # 调用 model 该类中的属性;
    #     model.load_state_dict(state_dict)

    return  model


if __name__ == "__main__":
    # cnn_net = model_res18(num_classes=4,)
    net_mobileV2 = mobilenet_v2(pretrained=False)


    image = torch.rand(8, 3, 96, 24)

    out2 = net_mobileV2(image)


    print(out2.shape)

2.2 网络结构

MobileNetV2(
  (features): Sequential(
    (0): ConvNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
    )
    (1): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
          (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (2): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(16, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96, bias=False)
          (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (3): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(24, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(144, 144, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=144, bias=False)
          (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(144, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (4): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(24, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(144, 144, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=144, bias=False)
          (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(144, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (5): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (6): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (7): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(192, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=192, bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (8): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (9): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (10): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (11): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(384, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (12): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576, bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (13): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576, bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (14): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(576, 576, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=576, bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(576, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (15): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (16): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (17): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(960, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (18): ConvNormActivation(
      (0): Conv2d(320, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(1280, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
    )
  )
  (classifier): Sequential(
    (0): Dropout(p=0.2, inplace=False)
    (1): Linear(in_features=1280, out_features=1000, bias=True)
  )
)

3. mobileNet_v3_small网络结构

3.1 代码与注释

# author: Chu Yun,  contact {chuyunxinlan at gmail dot com}
# time: 2023/3/21
#       下午6:18

# reference the  official pytorch code

from typing import Any, Callable, List, Optional, Sequence
from types import  FunctionType

import  torch
from torch import Tensor



__all__ = ["MobileNetV3", "mobilenet_v3_large", "mobilenet_v3_small"]


model_urls = {
    
    
    "mobilenet_v3_large": "https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth",
    "mobilenet_v3_small": "https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth",
}



def _log_api_usage_once(obj: Any) -> None:

    """
    Logs API usage(module and name) within an organization.
    In a large ecosystem, it's often useful to track the PyTorch and
    TorchVision APIs usage. This API provides the similar functionality to the
    logging module in the Python stdlib. It can be used for debugging purpose
    to log which methods are used and by default it is inactive, unless the user
    manually subscribes a logger via the `SetAPIUsageLogger method <https://github.com/pytorch/pytorch/blob/eb3b9fe719b21fae13c7a7cf3253f970290a573e/c10/util/Logging.cpp#L114>`_.
    Please note it is triggered only once for the same API call within a process.
    It does not collect any data from open-source users since it is no-op by default.
    For more information, please refer to
    * PyTorch note: https://pytorch.org/docs/stable/notes/large_scale_deployments.html#api-usage-logging;
    * Logging policy: https://github.com/pytorch/vision/issues/5052;

    Args:
        obj (class instance or method): an object to extract info from.
    """
    if not obj.__module__.startswith("torchvision"):
        return
    name = obj.__class__.__name__
    if isinstance(obj, FunctionType):
        name = obj.__name__
    torch._C._log_api_usage_once(f"{
      
      obj.__module__}.{
      
      name}")



# 用于确保输入的通道数　是最靠近数的整数倍;
def _make_divisiable(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo.
    It ensures that all layers have a channel number that is divisible by 8
    It can be seen here:
    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    :param v:
    :param divisor:
    :param min_value:
    :return:
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v



class ConvNormActivation(torch.nn.Sequential):
    """
    Configurable block used for Convolution-Normalzation-Activation blocks.

    Args:
        in_channels (int): Number of channels in the input image
        out_channels (int): Number of channels produced by the Convolution-Normalzation-Activation block
        kernel_size: (int, optional): Size of the convolving kernel. Default: 3
        stride (int, optional): Stride of the convolution. Default: 1
        padding (int, tuple or str, optional): Padding added to all four sides of the input. Default: None, in wich case it will calculated as ``padding = (kernel_size - 1) // 2 * dilation``
        groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1
        norm_layer (Callable[..., torch.nn.Module], optional): Norm layer that will be stacked on top of the convolutiuon layer. If ``None`` this layer wont be used. Default: ``torch.nn.BatchNorm2d``
        activation_layer (Callable[..., torch.nn.Module], optinal): Activation function which will be stacked on top of the normalization layer (if not None), otherwise on top of the conv layer. If ``None`` this layer wont be used. Default: ``torch.nn.ReLU``
        dilation (int): Spacing between kernel elements. Default: 1
        inplace (bool): Parameter for the activation layer, which can optionally do the operation in-place. Default ``True``
        bias (bool, optional): Whether to use bias in the convolution layer. By default, biases are included if ``norm_layer is None``.

    """

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        kernel_size: int = 3,
        stride: int = 1,
        padding: Optional[int] = None,
        groups: int = 1,
        norm_layer: Optional[Callable[..., torch.nn.Module]] = torch.nn.BatchNorm2d,
        activation_layer: Optional[Callable[..., torch.nn.Module]] = torch.nn.ReLU,
        dilation: int = 1,
        inplace: Optional[bool] = True,
        bias: Optional[bool] = None,
    ) -> None:
        if padding is None:
            padding = (kernel_size - 1) // 2 * dilation
        if bias is None:
            bias = norm_layer is None
        layers = [
            torch.nn.Conv2d(
                in_channels,
                out_channels,
                kernel_size,
                stride,
                padding,
                dilation=dilation,
                groups=groups,
                bias=bias,
            )
        ]
        if norm_layer is not None:
            layers.append(norm_layer(out_channels))
        if activation_layer is not None:
            params = {
    
    } if inplace is None else {
    
    "inplace": inplace}
            layers.append(activation_layer(**params))
        super().__init__(*layers)
        _log_api_usage_once(self)
        self.out_channels = out_channels


# note 通道注意力模块,  和CBAM 模块的区别，他缺少了空间注意力模块；
class SqueezeExcitation(torch.nn.Module):
    """
    This block implements the Squeeze-and-Excitation block from https://arxiv.org/abs/1709.01507 (see Fig. 1).
    Parameters ``activation``, and ``scale_activation`` correspond to ``delta`` and ``sigma`` in in eq. 3.

    Args:
        input_channels (int): Number of channels in the input image
        squeeze_channels (int): Number of squeeze channels
        activation (Callable[..., torch.nn.Module], optional): ``delta`` activation. Default: ``torch.nn.ReLU``
        scale_activation (Callable[..., torch.nn.Module]): ``sigma`` activation. Default: ``torch.nn.Sigmoid``
    """


    def __init__(self, input_channels:int,  squeeze_channels: int,
                 activation: Callable[...,torch.nn.Module] = torch.nn.ReLU,
                 scale_activation: Callable[..., torch.nn.Module] = torch.nn.Sigmoid,
                 ) -> None:
        super().__init__()
        _log_api_usage_once(self)   #记录调用该类的使用情况
        self.avgpool = torch.nn.AdaptiveAvgPool2d(1)

        # 使用1*1 构成全连接层；
        self.fc1 = torch.nn.Conv2d(input_channels, squeeze_channels, 1)
        self.fc2 = torch.nn.Conv2d(squeeze_channels, input_channels, 1)

        self.activation  = activation()
        self.scale_activation = scale_activation()

    def _scale(self, input:Tensor) ->Tensor:
        scale = self.avgpool(input)
        scale = self.fc1(scale)
        scale = self.activation(scale)
        scale = self.fc2(scale)

        return self.scale_activation(scale)

    def forward(self, input:Tensor) -> Tensor:
        scale = self._scale(input)

        return  scale * input




class  InvertedResidualConfig:
     # Stores information listed at Tables 1 and 2 of the MobileNetV3 paper
     # 用于存储　配置参数信息，　使用不同的参数配置形成不同的对象　inverted Residual 模块
     def __init__(self,
                  input_channels: int,  kernel: int,
                  expanded_channels: int,  out_channels: int,
                  use_se: bool,  activation:str,
                  stride: int, dilation: int,
                  width_mult: float,
                  ):
        self.input_channels = self.adjust_channels(input_channels, width_mult)
        self.kernel  = kernel
        self.expanded_channels = self.adjust_channels(expanded_channels, width_mult)
        self.out_channels      =  self.adjust_channels(out_channels, width_mult)

        self.use_se = use_se  # 是否使用　se 通道注意力;
        self.use_hs = activation  == "HS"  # 确定激活函数的类型，是否使用　hard sigmoid
        self.stride = stride
        self.dilation = dilation   # 是否开启空洞卷积；

     @staticmethod
     def adjust_channels(channels:int,  width_mult: float):
        # 用于调整通道的个数，使得通道数目是最接近规定数值的整数倍附近;
        return  _make_divisiable(channels * width_mult, 8)



from  torch import  nn
from  functools import partial

class  InvertedResidual(nn.Module):
        # Implemented as described at section 5 of MobileNetV3 paper
        def __init__(self,
                     cnf: InvertedResidualConfig,
                     norm_layer: Callable[..., nn.Module],
                     se_layer: Callable[..., nn.Module] = partial(SqueezeExcitation, scale_activation = nn.Hardsigmoid)
                     ):
            super().__init__()
            if not ( 1 <= cnf.stride <= 2): # 限制滑动步长只在1, 2之间;
                raise ValueError(" illegal stride value")

            # note : 规定了使用残差连接的条件，需要 同时满足 步长为1,　并且 输入通道数和　输出通道数相同
            self.use_res_connet = cnf.stride == 1 and cnf.input_channels == cnf.out_channels

            layers: List[nn.Module] = []
            activation_layer = nn.Hardswish  if cnf.use_hs  else nn.ReLU


            #　1*1 pw:  当扩展通道数和　输入通道数不相同时, 扩充通道数
            if cnf.expanded_channels  !=  cnf.input_channels:
                layers.append(
                    ConvNormActivation(
                        cnf.input_channels,
                        cnf.expanded_channels,
                        kernel_size= 1,
                        norm_layer=norm_layer,
                        activation_layer=activation_layer,
                    )
                )

            # depthwise:   使用分组卷积，　将输入通道数进行分组;
            # 当使用空洞卷积时，　stride = 1; 否则　stride 取给定的值
            stride = 1  if cnf.dilation >1  else cnf.stride
            layers.append(
                ConvNormActivation(
                    cnf.expanded_channels,
                    cnf.expanded_channels,
                    kernel_size= cnf.kernel,
                    stride=stride,
                    dilation= cnf.dilation,
                    groups=cnf.expanded_channels,
                    norm_layer=norm_layer,
                    activation_layer= activation_layer,
                )
            )

            if cnf.use_se:  # 本层中, 是否引入se 通道注意力模块；
                squeeze_channels = _make_divisiable(cnf.expanded_channels // 4, 8)
                # 　规定通道注意力层se_layer 中的　扩展通道和挤压通道；　
                layers.append(se_layer(cnf.expanded_channels, squeeze_channels))


            # 降低通道维数 project
            layers.append(
                ConvNormActivation(
                    cnf.expanded_channels, cnf.out_channels, kernel_size=1, norm_layer=norm_layer, activation_layer=None
                )

            )

            # note self.block 属性
            # 将上述　layers　构成的列表，　通过nn.Sequential()　形成网络中基本模块，　在使用该基本模块构建网络的各个层；
            self.block = nn.Sequential(*layers)

            # 这两个属性貌似没有用到;
            self.out_channels = cnf.out_channels
            self._is_cn = cnf.stride > 1


        def forward(self, input: Tensor) -> Tensor:
            result = self.block(input)

            if self.use_res_connet:  # 确认是否使用残差连接;
                result  += input
            return  result


class  MobileNetV3(nn.Module):
    def __init__(self,
                 # 同一个类在实例化，不同的对象时，将不同的参数以列表的形式给出;
                 inverted_residual_setting:List[InvertedResidualConfig],
                 last_channel: int,
                 num_classes: int = 1000,

                 # block 以可迭代的对象传入进来，　并且该可迭代对象是nn.Module 的实例化对象
                 block: Optional[Callable[..., nn.Module]]  = None,
                 norm_layer: Optional[Callable[..., nn.Module]]  = None,

                 dropout: float = 0.2,
                 **kwargs: Any,  # 使用关键字匹配的形式，来传递多个参数；
                 ) -> None:
        """

        Args:
            inverted_residual_setting: 以列表的形式传递传递参数，实例化模块，　构建网络的主体结构
            last_channel:　倒数第二层的通道数
            num_classes:
            block:  指定　基本模块，　使用该基本模块来构建网络
            norm_layer: 　normalization layer　归一化层;
            dropout: dropout  概率;
            **kwargs:
        """
        super().__init__()

        _log_api_usage_once(self)

        if not inverted_residual_setting: # 确定传入配置不为空
            raise ValueError( " The inverted residual setting should not be  empty")


        elif not(  # 并且再一次确定，传入的参数配置是以顺序的形式传入进来;
            isinstance(inverted_residual_setting, Sequence)
              # 其中每一个配置对象,　都是InvertedResidualConfig　类的实例化对象.
            and  all(  [isinstance(s, InvertedResidualConfig)  for s in inverted_residual_setting])
        ):     #　否则，报类型错误
            raise TypeError ("The inverted residual setting should be List[InvertedResidualConfig]")

        if block is None:
            block = InvertedResidual

        if norm_layer is None:
            norm_layer = partial(nn.BatchNorm2d, eps= 0.001, momentum=0.01)

        layers: List[nn.Module] = []


        # note 构建整个网络结构的第一层;
        firstconv_output_channels = inverted_residual_setting[0].input_channels
        layers.append(
            ConvNormActivation(
                3, firstconv_output_channels,
                kernel_size=3, stride=2,
                norm_layer=norm_layer, activation_layer=nn.Hardswish,
            )
        )

        # note :使用配置信息，构建网络的主体模块
        for  cnf in inverted_residual_setting:
            layers.append(block (cnf, norm_layer))


        # note: 构建网络的最后一层;
        lastconv_input_channels  = inverted_residual_setting[-1].out_channels
        lastconv_output_channels = 6 * lastconv_input_channels

        layers.append(
            ConvNormActivation(
                lastconv_input_channels, lastconv_output_channels,
                kernel_size=1, norm_layer=norm_layer,
                activation_layer=nn.Hardswish,
            )
        )

        # note: 将构成的主干网络结构作为一个属性;
        self.features = nn.Sequential(*layers)

        # 使用平均池化，将最后的特征图大小变成 1*1;
        self.avgpool  = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Sequential(
            nn.Linear(lastconv_output_channels, last_channel),
            nn.Hardswish(inplace=True),
            nn.Dropout(p=dropout, inplace=True),
            nn.Linear(last_channel, num_classes),
        )


        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode= 'fan_out')# 通过随机矩阵显式创建权重
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)

            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)


    def _forward_impl(self, x:Tensor ) -> Tensor:
        #　完成网络中　主体结构的特征抽取;
        x = self.features(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)

        x = self.classifier(x)

        return  x

    def forward(self, x:Tensor) -> Tensor:
        return  self._forward_impl(x)



# note : 在该函数给出　mobileNet v3 大模型和小模型的结构，
def _mobilenet_v3_conf(
    arch: str,  width_mult: float = 1.0,  reduced_tail: bool = False, dilated: bool = False,
        **kwargs: Any
):
    # 是否在最后三层的通道数进行，减半;
    reduce_divider = 2  if reduced_tail else 1
    dilation = 2 if dilated else 1


    bneck_conf = partial(InvertedResidualConfig, width_mult=width_mult)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_mult=width_mult)

    #     def __init__(self,
    #               input_channels: int,  kernel: int,
    #               expanded_channels: int,  out_channels: int,
    #               use_se: bool,  activation:str,
    #               stride: int, dilation: int,
    #               width_mult: float,
    #               ):　　# expanede_channels 是用于se_layer, 中间的扩充通道数

    #各个参数意义: 　输入通道数；　kernel大小；　扩充通道数；　输出通道数；　开启通道注意力；激活函数类型；　步长；　空洞卷积大小；　width_mult通过partial 函数作为可变参数传入

    if arch == "mobilenet_v3_large":
        inverted_residual_setting = [   # RE:relu  HS: HardSigmoid
            bneck_conf(16, 3, 16, 16,  False, "RE", 1,1),
            bneck_conf(16, 3, 64, 24,  False, "RE", 2,1),# C1
            bneck_conf(24, 3, 72, 24, False, "RE", 1, 1),
            bneck_conf(24, 5, 72, 40, True, "RE", 2, 1),  # C2
            bneck_conf(40, 5, 120, 40, True, "RE", 1, 1),
            bneck_conf(40, 5, 120, 40, True, "RE", 1, 1),
            bneck_conf(40, 3, 240, 80, False, "HS", 2, 1),  # C3
            bneck_conf(80, 3, 200, 80, False, "HS", 1, 1),
            bneck_conf(80, 3, 184, 80, False, "HS", 1, 1),
            bneck_conf(80, 3, 184, 80, False, "HS", 1, 1),
            bneck_conf(80, 3, 480, 112, True, "HS", 1, 1),
            bneck_conf(112, 3, 672, 112, True, "HS", 1, 1),
            bneck_conf(112,5, 672, 160//reduce_divider, True, "HS", 2, dilation), # c4
            bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1, dilation),
            bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1, dilation),
        ]
        # 设置网络的最后一层
        last_channel = adjust_channels(1200 // reduce_divider) #c5

    elif arch == "mobilenet_v3_small":
        inverted_residual_setting = [
            bneck_conf(16, 3, 16, 16, True, "RE", 2, 1),  # C1
            bneck_conf(16, 3, 72, 24, False, "RE", 2, 1),  # C2
            bneck_conf(24, 3, 88, 24, False, "RE", 1, 1),
            bneck_conf(24, 5, 96, 40, True, "HS", 2, 1),  # C3
            bneck_conf(40, 5, 240, 40, True, "HS", 1, 1),
            bneck_conf(40, 5, 240, 40, True, "HS", 1, 1),
            bneck_conf(40, 5, 120, 48, True, "HS", 1, 1),
            bneck_conf(48, 5, 144, 48, True, "HS", 1, 1),
            bneck_conf(48, 5, 288, 96 // reduce_divider, True, "HS", 2, dilation),  # C4
            bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1, dilation),
            bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1, dilation),
        ]
        last_channel = adjust_channels(1024//reduce_divider) #c5

    else: # 用于判读输入的　模型名称是否出错；
        raise ValueError(f"Unsupported model type {
      
      arch}")

    return  inverted_residual_setting, last_channel


def _mobilenet_v3(
        arch: str,
        inverted_residual_setting: List[InvertedResidualConfig],
        last_channel: int,
        pretrained: bool,
        progress:bool,
        **kwargs:Any,
):
    # note: 　该函数用于传递参数，　用该函数传递的参数来实例化　网络模型的类；
    model = MobileNetV3(inverted_residual_setting, last_channel, **kwargs)
    if pretrained: # 获取该网络模型的　所对应的权重；
        if model_urls.get(arch, None)  is None:
            raise ValueError(f"NO checkpoint is available for model type {
      
      arch}")
        state_dict = load_state_dict_from_url(model_urls[arch], progress=progress)
        model.load_state_dict(state_dict)

    return model




# 返回了　是一个类的对象；　　即网络模型类　经过实例化后的对象；
def mobilenet_v3_smail(pretrained:bool=False,  progress: bool = True, **kwargs:Any) -> MobileNetV3:

    """
    Constructs a small MobileNetV3 architecture from
    `"Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>`_.

    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    arch = "mobilenet_v3_small"
    # 通过网络名称，调用配置函数，　用于返回实例化类的　参数列表配置，以及最后一个通道；
    inverted_residual_setting, last_channel = _mobilenet_v3_conf(arch, **kwargs)

    return _mobilenet_v3(arch, inverted_residual_setting, last_channel, pretrained, progress, **kwargs)



if __name__ == "__main__":
    model = mobilenet_v3_smail()
    print(model)

3.2 网络结构

MobileNetV3(
  (features): Sequential(
    (0): ConvNormActivation(
      (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): Hardswish()
    )
    (1): InvertedResidual(
      (block): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=16, bias=False)
          (1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(16, 8, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(8, 16, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (2): ConvNormActivation(
          (0): Conv2d(16, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (2): InvertedResidual(
      (block): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(16, 72, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(72, 72, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=72, bias=False)
          (1): BatchNorm2d(72, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): ConvNormActivation(
          (0): Conv2d(72, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(24, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (3): InvertedResidual(
      (block): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(24, 88, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(88, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(88, 88, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=88, bias=False)
          (1): BatchNorm2d(88, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): ReLU(inplace=True)
        )
        (2): ConvNormActivation(
          (0): Conv2d(88, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(24, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (4): InvertedResidual(
      (block): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(96, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): ConvNormActivation(
          (0): Conv2d(96, 96, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=96, bias=False)
          (1): BatchNorm2d(96, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(24, 96, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): ConvNormActivation(
          (0): Conv2d(96, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(40, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (5): InvertedResidual(
      (block): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(240, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): ConvNormActivation(
          (0): Conv2d(240, 240, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=240, bias=False)
          (1): BatchNorm2d(240, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(240, 64, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(64, 240, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): ConvNormActivation(
          (0): Conv2d(240, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(40, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (6): InvertedResidual(
      (block): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(40, 240, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(240, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): ConvNormActivation(
          (0): Conv2d(240, 240, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=240, bias=False)
          (1): BatchNorm2d(240, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(240, 64, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(64, 240, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): ConvNormActivation(
          (0): Conv2d(240, 40, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(40, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (7): InvertedResidual(
      (block): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(40, 120, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(120, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): ConvNormActivation(
          (0): Conv2d(120, 120, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=120, bias=False)
          (1): BatchNorm2d(120, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(120, 32, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(32, 120, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): ConvNormActivation(
          (0): Conv2d(120, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(48, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (8): InvertedResidual(
      (block): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(48, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(144, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): ConvNormActivation(
          (0): Conv2d(144, 144, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=144, bias=False)
          (1): BatchNorm2d(144, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(144, 40, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(40, 144, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): ConvNormActivation(
          (0): Conv2d(144, 48, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(48, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (9): InvertedResidual(
      (block): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(48, 288, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(288, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): ConvNormActivation(
          (0): Conv2d(288, 288, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2), groups=288, bias=False)
          (1): BatchNorm2d(288, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(288, 72, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(72, 288, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): ConvNormActivation(
          (0): Conv2d(288, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(96, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (10): InvertedResidual(
      (block): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(576, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): ConvNormActivation(
          (0): Conv2d(576, 576, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=576, bias=False)
          (1): BatchNorm2d(576, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(576, 144, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(144, 576, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): ConvNormActivation(
          (0): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(96, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (11): InvertedResidual(
      (block): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(576, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (1): ConvNormActivation(
          (0): Conv2d(576, 576, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), groups=576, bias=False)
          (1): BatchNorm2d(576, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
          (2): Hardswish()
        )
        (2): SqueezeExcitation(
          (avgpool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Conv2d(576, 144, kernel_size=(1, 1), stride=(1, 1))
          (fc2): Conv2d(144, 576, kernel_size=(1, 1), stride=(1, 1))
          (activation): ReLU()
          (scale_activation): Hardsigmoid()
        )
        (3): ConvNormActivation(
          (0): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(96, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        )
      )
    )
    (12): ConvNormActivation(
      (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(576, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
      (2): Hardswish()
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=1)
  (classifier): Sequential(
    (0): Linear(in_features=576, out_features=1024, bias=True)
    (1): Hardswish()
    (2): Dropout(p=0.2, inplace=True)
    (3): Linear(in_features=1024, out_features=1000, bias=True)
  )
)

mobileNet_v2介绍

mobileNet_v３介绍

v3实践

参考：
https://blog.csdn.net/deephub/article/details/124684557#t3

mobileNet_v2_v3 网络代码实现与网络结构