[Image Classification] [Deep Learning] [Lightweight Network] [Pytorch Version] Detailed Explanation of ShuffleNet_V2 Model Algorithm

[Image Classification] [Deep Learning] [Lightweight Network] [Pytorch Version] Detailed Explanation of ShuffleNet_V2 Model Algorithm


Preface

ShuffleNet_V2 is an improved model proposed by Ma, Ningning and others from Megvii Technology in the article "ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design [ECCV-2018]" [ Paper Address ]. The paper proposes an efficient network architecture design Two major principles: first, use direct indicators (such as speed) rather than indirect indicators (such as FLOPs); second, propose four cross-platform design guidelines, and design ShuffleNet_V2 under the guidance of these guidelines.


ShuffleNet_V2 explanation

Some network models in the past, such as MobileNet_v1, v2, ShuffleNet_v1, and Xception, used grouped convolution or depth-separable convolution to reduce the amount of floating-point operations (FLOPs) to a certain extent, but FLOPs are not a It is an indicator that directly measures the speed of the model. It only indirectly measures the speed of the model through the theoretical calculation amount.
However, on actual devices, due to various optimization calculation operations, it will also be limited by memory access cost (MAC) and platform characteristics. As a result, the amount of calculation cannot accurately measure the speed of the model. In other words, Different inference speeds appear for the same FLOPs. The following figure is the specific situation of the inference speed under different situations in the original paper:
the speed is not entirely determined by FLOPs: As shown in the figure above, the red box represents different devices, the left side is the result on the GPU, and the right side is the result on the ARM result. As can be seen from the two figures below, when the MFLOPs of different models are the same, the speeds are different.

Batches/sec: The number of batches received by the data warehouse write operation module per second.

So what other factors will affect the speed of running on the device? The paper gives an explanation that the contradiction between direct indicators and indirect indicators can be attributed to two reasons.

  • First, FLOPs do not take into account some important factors that affect speed. For example, memory access cost
    takes up a lot of computing time in group convolution and is also a potential performance bottleneck in GPU operations; there is also parallelism. With the same FLOPs, a highly parallel network will execute faster.
  • Second, the same operation of FLOPs runs differently on different platforms. For example, early work extensively used tensor decomposition to accelerate matrix multiplication. Although it can reduce FLOPs by 75%, the operation is slower on the GPU. This is because CuDnn has been specially optimized for 3×3 convolutions. 3×3 convolutions The product is no longer 9 times the theoretical time-consuming of 1×1 convolution, and this decomposition has no obvious meaning to speed up.

Therefore, the paper proposes two major principles for efficient network architecture design. First, use direct indicators (such as speed) rather than indirect indicators (such as FLOPs); second, the indicator needs to be verified on the target platform. At the same time, four cross-platform design guidelines are proposed, and the design is designed under the guidance of these guidelines. A new network architecture ShuffleNet V2.

Four Practical Guiding Ideas

The following figure is the original paper’s statistics on the running speed of ShuffleNet_V1 and MobileNet_V2 network components:

From the figure, you can see the time spent on various operations of the model on GPU/ARM. FLOPs only represent the calculation amount of the convolution part, which occupies the total Most of the running time, but in fact Elemwise and Data-related parts also consume a considerable amount of time, including data input and output, data scrambling, element-level processing related operations (tensor addition, activation function processing, etc.).
By studying the running time of ShuffleNetv1 and MobileNetv2 on a specific platform, and combining theory and experiment, the paper proposes four practical guiding principles.

G1: Equal channel width reduces storage access costs

Equal channel width minimizes memory access cost (MAC)
Current networks such as Xception [Reference], MobileNet_V1 [ Reference ], MobileNet_V2 [ Reference ], ShuffleNet_V1 [ Reference ] all use depth-separable convolution, and 1×1 point convolution occupies the Most of the computational complexity: Assume that the input feature map is h × w × c 1 {\rm{h}} \times {\rm{w}} \times { { \rm{c}}_1}h×w×c1, 1×1 point convolution is c 1 × c 2 × 1 × 1 { {\rm{c}}_1} \times { {\rm{c}}_2} \times 1 \times 1c1×c2×1×1 , the size of the output feature map remains unchanged, then the FLOPs of 1×1 point convolution areB = h × w × c 1 × c 2 {\rm{B = h}} \times {\rm{w}} \times { {\rm{c}}_1} \times { {\rm{c}}_2}B=h×w×c1×c2

FLOPs calculation treats multiplication and addition as a floating point operation

Assuming that the buffer of the computing device is large enough to store the entire feature map and all parameters, the memory access cost (number of memory accesses) of 1×1 point convolution is MAC = hwc 1 + hwc 2 + c 1 c 2 = hw ( c 1 + c 2 ) + c 1 c 2 {\rm{MAC = hw}}{ {\rm{c}}_1}{\rm{ + hw}}{ {\rm{c}}_2} + { {\ rm{c}}_1}{ {\rm{c}}_2} = {\rm{hw(}}{ { \rm{c}}_1}{\rm{ + }}{ { \rm{c} }_2}) + { {\rm{c}}_1}{ {\rm{c}}_2}MAC=c _1+ hw c2+c1c2=hw(c1+c2)+c1c2, which represents the cost of the input feature map, output feature map and weight parameters.
When fixing BBB时, c 2 = B h w c 1 { {\rm{c}}_2} = \frac{B}{ {hw{ {\rm{c}}_1}}} c2=h w c1B,根据均值不等式
M A C = h w ( c 1 + c 2 ) + c 1 c 2 = ( h w ) 2 ( c 1 + c 2 ) 2 + B h w ≥ ( h w ) 2 ( 4 c 1 c 2 ) + B h w ≥ 2 h w B + B h w {\rm{MAC = hw(}}{ {\rm{c}}_1}{\rm{ + }}{ {\rm{c}}_2}) + { {\rm{c}}_1}{ {\rm{c}}_2} = \sqrt { { { {\rm{(hw}})}^2}{ { {\rm{(}}{ {\rm{c}}_1}{\rm{ + }}{ {\rm{c}}_2})}^2}} + \frac{B}{ {hw}} \ge \sqrt { { { {\rm{(hw}})}^2}{\rm{(4}}{ {\rm{c}}_1}{ {\rm{c}}_2})} + \frac{B}{ {hw}} \ge 2\sqrt { {\rm{hwB}}} + \frac{B}{ {hw}} MAC=hw(c1+c2)+c1c2=(hw)2(c1+c2)2 +hwB(hw)2(4c1c2) +hwB2hwB +hwB
From the mean inequality, we can see that when c 1 = c 2 { {\rm{c}}_1} = { {\rm{c}}_2}c1=c2时取( c 1 + c 2 ) 2 { {\rm{(}}{ {\rm{c}}_1}{\rm{ + }}{ {\rm{c}}_2})^2}(c1+c2)2的下限,即( c 1 + c 2 ) 2 = 4 c 1 c 2 { {\rm{(}}{ {\rm{c}}_1}{\rm{ + }}{ { \rm{c }}_2})^2} = {\rm{4}}{ {\rm{c}}_1}{ {\rm{c}}_2}(c1+c2)2=4c1c2 M A C MAC MAC obtains the minimum value .
Under the given computational limit, MAC MACM A C has a lower bound.
In order to verify this conclusion, the paper conducted experimental analysis. The test network in the table below is stacked by 10 repeated blocks, each block contains two convolutional layers, and the input channel isc 1 { {\rm{c}}_1 }c1, the output channel is c 2 { {\rm{c}}_2}c2.

From the data in the table, we can get: when c 1 : c 2 { {\rm{c}}_1}{\rm{:}}{ { \rm{c}}_2}c1:c2When the value is close to 1:1, the MAC value becomes smaller and smaller, and the evaluation speed of network operation becomes faster and faster.

G2: A large number of grouped convolutions will increase storage access

Excessive group convolution increases MAC
group convolution is the core of current network structure design. It reduces computational complexity FLOPs through sparse connections between channels, that is, only connecting to features in the same group. On the one hand, it allows the use of more channels to increase network capacity and improve accuracy, but on the other hand, as the number of channels increases, it also brings more MACs.
For 1×1 grouped convolution, the calculation formula of grouped convolution FLOPs is:
B = h × w × 1 × 1 × c 1 g × c 2 g × g = hwc 1 c 2 g {\rm{B = h} } \times {\rm{w}} \times 1 \times 1 \times \frac{ { { {\rm{c}}_1}}}{g} \times \frac{ { { { \rm{c} }_2}}}{g} \times g = \frac{ { {\rm{hw}}{ {\rm{c}}_1}{ {\rm{c}}_2}}}{g}B=h×w×1×1×gc1×gc2×g=gc _1c2
Calculation formula of grouped convolution MAC:
MAC = hw ( c 1 + c 2 ) + c 1 c 2 g = hwc 1 + B gc 1 + B hw MAC = hw({c_1} + {c_2}) + \frac{ { {c_1}{c_2}}}{g} = hw{c_1} + \frac{ {Bg}}{ { {c_1}}} + \frac{B}{ {hw}}MAC=hw(c1+c2)+gc1c2=h w c1+c1Bg+hwB
Input feature map h × w × c 1 {\rm{h}} \times {\rm{w}} \times { {\rm{c}}_1}h×w×c1is fixed when BB is fixedWhen B , it is necessary to fixc 2 g \frac{ { {c_2}}}{g}gc2The ratio of , so MAC MACM A C andggg is directly proportional to the relationship.
The paper designed the experiment by superimposing 10 grouping point convolution layers, and used different numbers of grouping groups to test the running time of the model while ensuring the same computational cost FLOPs. The results are shown in the table below.

When the total calculation amount is fixed and the number of groups is changed, it can be seen that the more groups are used, the slower the actual running speed is. Therefore, the paper recommends that the number of groups of grouped convolutions be carefully selected based on the hardware platform and target tasks. You cannot simply choose a large number of groups just because it can improve the accuracy, while ignoring the increase in the memory access cost MAC.

G3: Network fragmentation reduces parallelism

Network fragmentation reduces degree of parallelism.
In the GoogLeNet series: Inception V1 [ Reference ], V2 [ Reference ], V3 [ Reference ] V4 [ Reference ], etc., each unit block in the network uses a multi-branch structure (multi-path). This In this structure, small operators (fragmented operators/fragment operators) are mostly used instead of large operators. Each convolution or pooling operation in the network structure block is called a fragmented operator. Past papers have shown that fragmented structure can improve model accuracy, but it will reduce efficiency because this structure of GPU is not friendly to devices with strong parallelism.
To quantify network fragmentation, or how network branching affects efficiency, the paper evaluates a series of network building blocks with varying degrees of fragmentation.

Specifically, each building block of the comparative experiments consists of 1 to 4 1x1 convolutional layers in a sequential or parallel structure.
In order to verify the impact of network branches on performance, the paper conducted comparative experiments on networks with different branch levels. Repeat stacking of each block 10 times. The results in the table below show that fragmentation slows down the GPU.

Excessive network branches will significantly reduce the running speed on GPU devices, but the speed reduction will be relatively gentle on the ARM platform.

G4: Element-level operations cannot be ignored

Element-wise operations are non-negligible
Some element-wise operations (element-wise operators) also occupy a considerable part of the time, especially on GPU devices. Although the FLOPs are relatively small, the MAC values ​​are large. In particular, the paper infers that deep convolutions are also element-level operations and usually have higher MAC/FLOPs values.

The element-wise operators (element-level operations) in the paper include ReLU, AddTensor, AddBias, etc.

The paper uses ResNet's "bottleneck" unit to conduct experiments. ReLU and shortcut operations are deleted in the experiment. On GPU and ARM devices, the running speed is increased by about 20%. The results are shown in the table below.

Model structure of ShuffleNet_V2

ShuffleNet_V1 uses two technologies: point-wise group convolution and bottleneck-like structure. From the introduction in the previous chapter of this article, we can know that both point-wise group convolution and bottleneck structure increase MAC (G2 principle and G1 principle), which cannot be ignored especially for lightweight networks. In addition, it is not advisable to use too many groupings and element-wise addition in residual connections (G3 principle and G4 principle).
Therefore, to build an efficient network model, the key is to maintain a large number of equally wide channels without having too many dense convolutions and groupings. Therefore, based on the ShuffleNet_V1 basic unit, the ShuffleNet_V2 basic unit introduces channel split.
The following figure is a comparison diagram of ShuffleNet_V1 and ShuffleNet_V2 in the original paper: When stride=1, the basic unit of ShuffleNet_V2 will input cc

through channel segmentation.The c channel feature map is divided into two branch parts: one part isc, { {\rm{c}}^,}c, the shortcut branch of the channel, the other part isc − c , {\rm{c - }}{ {\rm{c}}^,}cc, the trunk branch of each channel, for simplicity, setc , = c / 2 { {\rm{c}}^,} = c/2c,=c /2 (satisfying the G3 principle), the channel segmentation operation also completes the grouping effect in disguise. Half of the feature maps bypass the current basic unit and directly enter the next basic unit, similar to DenseNet [Reference]. The backbone branch contains three convolutions with the same number of channels(satisfying the G1 principle); and the two 1×1 convolutions no longer use the grouped convolutions in ShuffleNet_V1, but become the original ordinary convolutions(satisfying the G2 principle), so the channel shuffling of the trunk branch is moved after splicing. Finally, the outputs of the two branches are spliced, instead of the addition in ShuffleNet_V1, keeping the number of input and output channels of the basic unit consistent(satisfying the G1 principle).
At stride=2, channel segmentation is removed and the 3x3 average pooling on the ShuffleNet_V1 shortcut branch is replaced by a combination of 3x3 depth convolution + 1x1 ordinary convolution.
The element-wise operation operator ReLU only exists in the right branch and combines three consecutive element-wise operation operators: splicing, channel shuffling and channel splitting into one element-wise operator(satisfying the G4 principle).

The following figure is a detailed schematic diagram of the ShuffleNet_V2 model structure given in the original paper:

ShuffleNet_V2 is divided into two parts in image classification: backbone part: mainly composed of ShuffleNet_V2 basic unit, convolution layer and pooling layer (aggregation layer), classifier Part: Composed of global pooling layer and fully connected layer.

The number of basic unit channels of ShuffleNet_V2 is scaled according to a proportion of 0.5x to generate ShuffleNet_V2 networks of different complexities.


ShuffleNet_V2 Pytorch code

Channel shuffling: greater feature interactivity and expressiveness.

def channel_shuffle(x, groups):
    # 获得特征图的所以维度的数据
    batch_size, num_channels, height, width = x.shape
    # 对特征通道进行分组
    channels_per_group = num_channels // groups
    # reshape新增特征图的维度
    x = x.view(batch_size, groups, channels_per_group, height, width)
    # 通道混洗(将输入张量的指定维度进行交换)
    x = torch.transpose(x, 1, 2).contiguous()
    # reshape降低特征图的维度
    x = x.view(batch_size, -1, height, width)
    return x

The code diagram of channel shuffling is shown below:

ShuffleNet Uint basic unit): 1×1 convolution and 3×3 depth convolution + BN layer + activation function

class ShuffleUnit(nn.Module):
    def __init__(self, input_c: int, output_c: int, stride: int):
        super(ShuffleUnit, self).__init__()
        # 步长必须在1和2之间
        if stride not in [1, 2]:
            raise ValueError("illegal stride value.")
        self.stride = stride

        # 输出通道必须能二被等分
        assert output_c % 2 == 0
        branch_features = output_c // 2

        # 当stride为1时,input_channel是branch_features的两倍
        # '<<' 是位运算,可理解为计算×2的快速方法
        assert (self.stride != 1) or (input_c == branch_features << 1)

        # 捷径分支
        if self.stride == 2:
            # 进行下采样:3×3深度卷积+1×1卷积
            self.branch1 = nn.Sequential(
                self.depthwise_conv(input_c, input_c, kernel_s=3, stride=self.stride, padding=1),
                nn.BatchNorm2d(input_c),
                nn.Conv2d(input_c, branch_features, kernel_size=1, stride=1, padding=0, bias=False),
                nn.BatchNorm2d(branch_features),
                nn.ReLU(inplace=True)
            )
        else:
            # 不进行下采样:保持原状
            self.branch1 = nn.Sequential()

        # 主干分支
        self.branch2 = nn.Sequential(
            # 1×1卷积+3×3深度卷积+1×1卷积
            nn.Conv2d(input_c if self.stride > 1 else branch_features, branch_features, kernel_size=1,
                      stride=1, padding=0, bias=False),
            nn.BatchNorm2d(branch_features),
            nn.ReLU(inplace=True),
            self.depthwise_conv(branch_features, branch_features, kernel_s=3, stride=self.stride, padding=1),
            nn.BatchNorm2d(branch_features),
            nn.Conv2d(branch_features, branch_features, kernel_size=1, stride=1, padding=0, bias=False),
            nn.BatchNorm2d(branch_features),
            nn.ReLU(inplace=True)
        )

    # 深度卷积
    @staticmethod
    def depthwise_conv(input_c, output_c, kernel_s, stride, padding, bias= False):
        return nn.Conv2d(in_channels=input_c, out_channels=output_c, kernel_size=kernel_s,
                         stride=stride, padding=padding, bias=bias, groups=input_c)

    def forward(self, x):
        if self.stride == 1:
            # 通道切分
            x1, x2 = x.chunk(2, dim=1)
            # 主干分支和捷径分支拼接
            out = torch.cat((x1, self.branch2(x2)), dim=1)
        else:
            # 通道切分被移除
            # 主干分支和捷径分支拼接
            out = torch.cat((self.branch1(x), self.branch2(x)), dim=1)
        # 通道混洗
        out = channel_shuffle(out, 2)
        return out

Complete code

from typing import List, Callable

import torch
from torch import Tensor
import torch.nn as nn
from torchsummary import summary

def channel_shuffle(x, groups):
    # 获得特征图的所以维度的数据
    batch_size, num_channels, height, width = x.shape
    # 对特征通道进行分组
    channels_per_group = num_channels // groups
    # reshape新增特征图的维度
    x = x.view(batch_size, groups, channels_per_group, height, width)
    # 通道混洗(将输入张量的指定维度进行交换)
    x = torch.transpose(x, 1, 2).contiguous()
    # reshape降低特征图的维度
    x = x.view(batch_size, -1, height, width)
    return x

class ShuffleUnit(nn.Module):
    def __init__(self, input_c: int, output_c: int, stride: int):
        super(ShuffleUnit, self).__init__()
        # 步长必须在1和2之间
        if stride not in [1, 2]:
            raise ValueError("illegal stride value.")
        self.stride = stride

        # 输出通道必须能二被等分
        assert output_c % 2 == 0
        branch_features = output_c // 2

        # 当stride为1时,input_channel是branch_features的两倍
        # '<<' 是位运算,可理解为计算×2的快速方法
        assert (self.stride != 1) or (input_c == branch_features << 1)

        # 捷径分支
        if self.stride == 2:
            # 进行下采样:3×3深度卷积+1×1卷积
            self.branch1 = nn.Sequential(
                self.depthwise_conv(input_c, input_c, kernel_s=3, stride=self.stride, padding=1),
                nn.BatchNorm2d(input_c),
                nn.Conv2d(input_c, branch_features, kernel_size=1, stride=1, padding=0, bias=False),
                nn.BatchNorm2d(branch_features),
                nn.ReLU(inplace=True)
            )
        else:
            # 不进行下采样:保持原状
            self.branch1 = nn.Sequential()

        # 主干分支
        self.branch2 = nn.Sequential(
            # 1×1卷积+3×3深度卷积+1×1卷积
            nn.Conv2d(input_c if self.stride > 1 else branch_features, branch_features, kernel_size=1,
                      stride=1, padding=0, bias=False),
            nn.BatchNorm2d(branch_features),
            nn.ReLU(inplace=True),
            self.depthwise_conv(branch_features, branch_features, kernel_s=3, stride=self.stride, padding=1),
            nn.BatchNorm2d(branch_features),
            nn.Conv2d(branch_features, branch_features, kernel_size=1, stride=1, padding=0, bias=False),
            nn.BatchNorm2d(branch_features),
            nn.ReLU(inplace=True)
        )

    # 深度卷积
    @staticmethod
    def depthwise_conv(input_c, output_c, kernel_s, stride, padding, bias= False):
        return nn.Conv2d(in_channels=input_c, out_channels=output_c, kernel_size=kernel_s,
                         stride=stride, padding=padding, bias=bias, groups=input_c)

    def forward(self, x):
        if self.stride == 1:
            # 通道切分
            x1, x2 = x.chunk(2, dim=1)
            # 主干分支和捷径分支拼接
            out = torch.cat((x1, self.branch2(x2)), dim=1)
        else:
            # 通道切分被移除
            # 主干分支和捷径分支拼接
            out = torch.cat((self.branch1(x), self.branch2(x)), dim=1)
        # 通道混洗
        out = channel_shuffle(out, 2)
        return out


class ShuffleNetV2(nn.Module):
    def __init__(self, stages_repeats, stages_out_channels, num_classes=1000, ShuffleUnit=ShuffleUnit):
        super(ShuffleNetV2, self).__init__()

        if len(stages_repeats) != 3:
            raise ValueError("expected stages_repeats as list of 3 positive ints")
        if len(stages_out_channels) != 5:
            raise ValueError("expected stages_out_channels as list of 5 positive ints")
        self._stage_out_channels = stages_out_channels

        # 输入通道
        input_channels = 3
        output_channels = self._stage_out_channels[0]

        self.conv1 = nn.Sequential(
            nn.Conv2d(input_channels, output_channels, kernel_size=3, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(output_channels),
            nn.ReLU(inplace=True)
        )
        input_channels = output_channels
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        # 三个基本单元组层
        self.stage2: nn.Sequential
        self.stage3: nn.Sequential
        self.stage4: nn.Sequential

        stage_names = ["stage{}".format(i) for i in [2, 3, 4]]
        for name, repeats, output_channels in zip(stage_names, stages_repeats,
                                                  self._stage_out_channels[1:]):
            # 每个Stage的首个基础单元都需要进行下采样,其他单元不需要
            seq = [ShuffleUnit(input_channels, output_channels, 2)]
            for i in range(repeats - 1):
                seq.append(ShuffleUnit(output_channels, output_channels, 1))
            setattr(self, name, nn.Sequential(*seq))
            input_channels = output_channels
        output_channels = self._stage_out_channels[-1]
        self.conv5 = nn.Sequential(
            nn.Conv2d(input_channels, output_channels, kernel_size=1, stride=1, padding=0, bias=False),
            nn.BatchNorm2d(output_channels),
            nn.ReLU(inplace=True)
        )
        # 全局平局池化
        self.global_pool = nn.AdaptiveAvgPool2d((1, 1))
        # 全连接层
        self.fc = nn.Linear(output_channels, num_classes)
        # 权重初始化
        self.init_params()
    def init_params(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        x = self.conv1(x)
        x = self.maxpool(x)
        x = self.stage2(x)
        x = self.stage3(x)
        x = self.stage4(x)
        x = self.conv5(x)
        x = self.global_pool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

def shufflenet_v2_x0_5(num_classes=1000):
    """
    weight: https://download.pytorch.org/models/shufflenetv2_x0.5-f707e7126e.pth
    """
    model = ShuffleNetV2(stages_repeats=[4, 8, 4],
                         stages_out_channels=[24, 48, 96, 192, 1024],
                         num_classes=num_classes)
    return model
def shufflenet_v2_x1_0(num_classes=1000):
    """
    weight: https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth
    """
    model = ShuffleNetV2(stages_repeats=[4, 8, 4],
                         stages_out_channels=[24, 116, 232, 464, 1024],
                         num_classes=num_classes)
    return model

def shufflenet_v2_x1_5(num_classes=1000):
    """
    weight: https://download.pytorch.org/models/shufflenetv2_x1_5-3c479a10.pth
    """
    model = ShuffleNetV2(stages_repeats=[4, 8, 4],
                         stages_out_channels=[24, 176, 352, 704, 1024],
                         num_classes=num_classes)
    return model

def shufflenet_v2_x2_0(num_classes=1000):
    """
    weight: https://download.pytorch.org/models/shufflenetv2_x2_0-8be3c8ee.pth
    """
    model = ShuffleNetV2(stages_repeats=[4, 8, 4],
                         stages_out_channels=[24, 244, 488, 976, 2048],
                         num_classes=num_classes)
    return model

if __name__ == '__main__':
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model = shufflenet_v2_x2_0().to(device)
    summary(model, input_size=(3, 224, 224))

summary can print the network structure and parameters, making it easy to view the built network structure.


Summarize

The principles of four practical guiding ideas are introduced as simply and in detail as possible, and the structure and pytorch code of the ShuffleNet_V2 model are explained.

Guess you like

Origin blog.csdn.net/yangyu0515/article/details/135168267