[Image Classification] [Deep Learning] [Lightweight Network] [Pytorch Version] Detailed Explanation of ShuffleNet_V1 Model Algorithm

[Image Classification] [Deep Learning] [Lightweight Network] [Pytorch Version] Detailed Explanation of ShuffleNet_V1 Model Algorithm


Preface

ShuffleNet_V1 is a model proposed by Zhang, Xiangyu and others from Megvii Technology in the article "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices [CVPR-2018]" [ Paper Address ]. It is a model that uses point-by-point volume formation. The lightweight CNN network with product and channel shuffling greatly reduces the computational cost while maintaining accuracy.


ShuffleNet_V1 explanation

The general convolution is full-channel convolution, that is, convolution is performed on all input feature maps. This is a channel dense connection, while group convolution is a channel Sparse connection (channel sparse connection). Group convolution groups different feature maps of the input layer, and then uses different convolution kernels to perform intra-group convolution on each group, which will reduce the calculation amount of convolution. The core design concept of ShuffleNet_V1 is to randomly combine and rearrange different channels (shuffle) to solve the disadvantages caused by group convolution.

group convolution (group convolution)

Group convolution has been effectively demonstrated in ResNeXt [ Reference ]. Group convolution is a technique that divides the input feature map into multiple groups and performs independent convolution on each group. This method can increase the nonlinear capability and representation ability of the model while reducing the amount of calculation.

Recently, MobileNet [ Reference ] utilized depthwise separable convolution to obtain a lightweight model, and achieved significant results in the lightweight model. ShuffleNet_V1 promotes group convolution and depth-separable convolution in a new form.

Channel Shuffle

Grouped convolution has disadvantages: a certain output channel of the current network layer is only related to certain input channels, that is, the output of a certain channel only comes from a small part of the input channels, which hinders the exchange of information between channel groups.
The following figure is a detailed schematic diagram between ordinary group convolution and channel shuffled group convolution in the ShuffleNet_V1 paper:

Different colors in Figure (a) represent different groups, and the input of each group is not mixed with the characteristics of other groups. This is equivalent to each taking care of its own business, resulting in the blockage of information between groups. If each grouped convolution is allowed to obtain features of different groups, as shown in Figure (b), the output features of all groups of GConv1 are evenly distributed according to the number of groups and used as the input of each group of GConv2, then the output (Output) and Input channels are completely relevant. This shuffling operation can be efficiently and elegantly implemented through channel shuffling in Figure (c).

ShuffleNet Uint (ShuffleNet basic unit)

The residual module based on ResNet [ Reference ] adds channel shuffling operations and depth-separable convolution operations.
The following figure is a detailed schematic diagram of the ShuffleNet unit in the ShuffleNet_V1 paper:

Figure (a) is a typical residual structure with depth-separable convolution [ reference ], and ShuffleNet_V1 designed the ShuffleNet unit on this basis. Figure (b) shows the ShuffleNet unit when stride=1, using 1x1 grouped convolution instead of dense 1x1 convolution to reduce the cost of the original 1x1 convolution, and adding Channel Shuffle to achieve cross-channel information exchange. Figure (c) is the ShuffleNet unit when stride=2. Because the feature map needs to be downsampled, based on the structure of figure (b), the 3x3 global tie pooling with stride=2 is used for the residual connection branch, and then The trunk output features and branch features are concated instead of add, which greatly reduces the amount of calculation and parameter size.

ShuffleNet_V1 model structure

The following figure is a detailed schematic diagram of the ShuffleNet_V1 model structure given in the original paper:

ShuffleNet_V1 is divided into two parts in image classification: backbone part: mainly composed of ShuffleNet basic unit, convolution layer and pooling layer (aggregation layer), classifier Part: Composed of global pooling layer and fully connected layer.

In the ShuffleNet_V1 basic unit, the group number g controls the connection sparsity of the 1×1 convolution. Under the same parameter limit, if the number of groups g is larger, the number of channels of the network can be larger, that is, the number of groups is larger. If it is large, the number of output channels is allowed to be larger, while the network parameters can remain roughly unchanged.


ShuffleNet_V1 Pytorch code

The components of ShuffleNet Uint: first use 1×1 group convolution for dimensionality reduction, then use 3×3 depth convolution for feature extraction after channel shuffling, and finally use 1×1 group convolution for dimensionality increase.

# 1×1卷积(降维/升维)
def conv1x1(in_chans, out_chans, n_groups=1):
    return nn.Conv2d(in_chans, out_chans, kernel_size=1, stride=1, groups=n_groups)

# 3×3深度卷积
def conv3x3(in_chans, out_chans, stride, n_groups=1):
    # Attention: no matter what the stride is, the padding will always be 1.
    return nn.Conv2d(in_chans, out_chans, kernel_size=3, padding=1, stride=stride, groups=n_groups)

Channel shuffling: greater feature interactivity and expressiveness.

def channel_shuffle(x, n_groups):
    # 获得特征图的所以维度的数据
    batch_size, chans, height, width = x.shape
    # 对特征通道进行分组
    chans_group = chans // n_groups
    # reshape新增特征图的维度
    x = x.view(batch_size, n_groups, chans_group, height, width)
    # 通道混洗(将输入张量的指定维度进行交换)
    x = torch.transpose(x, 1, 2).contiguous()
    # reshape降低特征图的维度
    x = x.view(batch_size, -1, height, width)
    return x

The code diagram of channel shuffling is shown below:

ShuffleNet Uint basic unit): grouped convolution layer and depth-separable convolution layer + BN layer + activation function

class ShuffleUnit(nn.Module):
    def __init__(self, in_chans, out_chans, stride, n_groups=1):
        super(ShuffleUnit, self).__init__()
        # 1×1分组卷积降维后的维度
        self.bottle_chans = out_chans // 4
        # 分组卷积的分组数
        self.n_groups = n_groups
        # 是否进行下采样()
        if stride == 1:
            # 不进行下采样,分支和主干特征形状完全一致,直接执行add相加
            self.end_op = 'Add'
            self.out_chans = out_chans
        elif stride == 2:
            # 进行下采样,分支和主干特征形状不一致,分支也需进行下采样,而后再进行concat拼接
            self.end_op = 'Concat'
            self.out_chans = out_chans - in_chans
        # 1×1卷积进行降维
        self.unit_1 = nn.Sequential(conv1x1(in_chans, self.bottle_chans, n_groups=n_groups),
                                  nn.BatchNorm2d(self.bottle_chans),
                                  nn.ReLU())
        # 3×3深度卷积进行特征提取
        self.unit_2 = nn.Sequential(conv3x3(self.bottle_chans, self.bottle_chans, stride, n_groups=n_groups),
                                    nn.BatchNorm2d(self.bottle_chans))
        # 1×1卷积进行升维
        self.unit_3 = nn.Sequential(conv1x1(self.bottle_chans, self.out_chans, n_groups=n_groups),
                                    nn.BatchNorm2d(self.out_chans))
        self.relu = nn.ReLU(inplace=True)

    def forward(self, inp):
        # 分支的处理方式(是否需要下采样)
        if self.end_op == 'Add':
            residual = inp
        else:
            residual = F.avg_pool2d(inp, kernel_size=3, stride=2, padding=1)
        x = self.unit_1(inp)
        x = channel_shuffle(x, self.n_groups)
        x = self.unit_2(x)
        x = self.unit_3(x)
        # 分支与主干的融合方式
        if self.end_op == 'Add':
            return self.relu(residual + x)
        else:
            return self.relu(torch.cat((residual, x), 1))

Complete code

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import init
from collections import OrderedDict
from torchsummary import summary

# 1×1卷积(降维/升维)
def conv1x1(in_chans, out_chans, n_groups=1):
    return nn.Conv2d(in_chans, out_chans, kernel_size=1, stride=1, groups=n_groups)

# 3×3深度卷积
def conv3x3(in_chans, out_chans, stride, n_groups=1):
    # Attention: no matter what the stride is, the padding will always be 1.
    return nn.Conv2d(in_chans, out_chans, kernel_size=3, padding=1, stride=stride, groups=n_groups)

def channel_shuffle(x, n_groups):
    # 获得特征图的所以维度的数据
    batch_size, chans, height, width = x.shape
    # 对特征通道进行分组
    chans_group = chans // n_groups
    # reshape新增特征图的维度
    x = x.view(batch_size, n_groups, chans_group, height, width)
    # 通道混洗(将输入张量的指定维度进行交换)
    x = torch.transpose(x, 1, 2).contiguous()
    # reshape降低特征图的维度
    x = x.view(batch_size, -1, height, width)
    return x

class ShuffleUnit(nn.Module):
    def __init__(self, in_chans, out_chans, stride, n_groups=1):
        super(ShuffleUnit, self).__init__()
        # 1×1分组卷积降维后的维度
        self.bottle_chans = out_chans // 4
        # 分组卷积的分组数
        self.n_groups = n_groups
        # 是否进行下采样()
        if stride == 1:
            # 不进行下采样,分支和主干特征形状完全一致,直接执行add相加
            self.end_op = 'Add'
            self.out_chans = out_chans
        elif stride == 2:
            # 进行下采样,分支和主干特征形状不一致,分支也需进行下采样,而后再进行concat拼接
            self.end_op = 'Concat'
            self.out_chans = out_chans - in_chans
        # 1×1卷积进行降维
        self.unit_1 = nn.Sequential(conv1x1(in_chans, self.bottle_chans, n_groups=n_groups),
                                  nn.BatchNorm2d(self.bottle_chans),
                                  nn.ReLU())
        # 3×3深度卷积进行特征提取
        self.unit_2 = nn.Sequential(conv3x3(self.bottle_chans, self.bottle_chans, stride, n_groups=n_groups),
                                    nn.BatchNorm2d(self.bottle_chans))
        # 1×1卷积进行升维
        self.unit_3 = nn.Sequential(conv1x1(self.bottle_chans, self.out_chans, n_groups=n_groups),
                                    nn.BatchNorm2d(self.out_chans))
        self.relu = nn.ReLU(inplace=True)

    def forward(self, inp):
        # 分支的处理方式(是否需要下采样)
        if self.end_op == 'Add':
            residual = inp
        else:
            residual = F.avg_pool2d(inp, kernel_size=3, stride=2, padding=1)
        x = self.unit_1(inp)
        x = channel_shuffle(x, self.n_groups)
        x = self.unit_2(x)
        x = self.unit_3(x)
        # 分支与主干的融合方式
        if self.end_op == 'Add':
            return self.relu(residual + x)
        else:
            return self.relu(torch.cat((residual, x), 1))

class ShuffleNetV1(nn.Module):
    def __init__(self, n_groups, n_classes, stage_out_chans):
        super(ShuffleNetV1, self).__init__()
        # 输入通道
        self.in_chans = 3
        # 分组组数
        self.n_groups = n_groups
        # 分类个数
        self.n_classes = n_classes

        self.conv1 = conv3x3(self.in_chans, 24, 2)
        self.maxpool = nn.MaxPool2d(3, 2, 1)

        # Stage 2
        op = OrderedDict()
        unit_prefix = 'stage_2_unit_'
        # 每个Stage的首个基础单元都需要进行下采样,其他单元不需要
        op[unit_prefix+'0'] = ShuffleUnit(24, stage_out_chans[0], 2, self.n_groups)
        for i in range(3):
            op[unit_prefix+str(i+1)] = ShuffleUnit(stage_out_chans[0], stage_out_chans[0], 1, self.n_groups)
        self.stage2 = nn.Sequential(op)

        # Stage 3
        op = OrderedDict()
        unit_prefix = 'stage_3_unit_'
        op[unit_prefix+'0'] = ShuffleUnit(stage_out_chans[0], stage_out_chans[1], 2, self.n_groups)
        for i in range(7):
            op[unit_prefix+str(i+1)] = ShuffleUnit(stage_out_chans[1], stage_out_chans[1], 1, self.n_groups)
        self.stage3 = nn.Sequential(op)

        # Stage 4
        op = OrderedDict()
        unit_prefix = 'stage_4_unit_'
        op[unit_prefix+'0'] = ShuffleUnit(stage_out_chans[1], stage_out_chans[2], 2, self.n_groups)
        for i in range(3):
            op[unit_prefix+str(i+1)] = ShuffleUnit(stage_out_chans[2], stage_out_chans[2], 1, self.n_groups)
        self.stage4 = nn.Sequential(op)

        # 全局平局池化
        self.global_pool =nn.AdaptiveAvgPool2d((1, 1))
        # 全连接层
        self.fc = nn.Linear(stage_out_chans[-1], self.n_classes)
        # 权重初始化
        self.init_params()

    # 权重初始化
    def init_params(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        x = self.conv1(x)
        x = self.maxpool(x)
        x = self.stage2(x)
        x = self.stage3(x)
        x = self.stage4(x)
        x = self.global_pool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

# 不同分组数对应的通道数也不同
stage_out_chans_list = [[144, 288, 576], [200, 400, 800], [240, 480, 960],
                                [272, 544, 1088], [384, 768, 1536]]
def shufflenet_v1_groups1(n_groups=1, n_classes=1000):
    model = ShuffleNetV1(n_groups=n_groups, n_classes=n_classes, stage_out_chans=stage_out_chans_list[n_groups-1])
    return model

def shufflenet_v1_groups2(n_groups=2, n_classes=1000):
    model = ShuffleNetV1(n_groups=n_groups, n_classes=n_classes, stage_out_chans=stage_out_chans_list[n_groups-1])
    return model

def shufflenet_v1_groups3(n_groups=3, n_classes=1000):
    model = ShuffleNetV1(n_groups=n_groups, n_classes=n_classes, stage_out_chans=stage_out_chans_list[n_groups-1])
    return model

def shufflenet_v1_groups4(n_groups=4, n_classes=1000):
    model = ShuffleNetV1(n_groups=n_groups, n_classes=n_classes, stage_out_chans=stage_out_chans_list[n_groups-1])
    return model

def shufflenet_v1_groupsother(n_groups=5, n_classes=1000):
    # groups>4
    model = ShuffleNetV1(n_groups=n_groups, n_classes=n_classes, stage_out_chans=stage_out_chans_list[-1])
    return model

if __name__ == '__main__':
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model = shufflenet_v1_groups1().to(device)
    summary(model, input_size=(3, 224, 224))

summary can print the network structure and parameters, making it easy to view the built network structure.


Summarize

The principle and process of grouped convolution channel shuffling are introduced as simply and in detail as possible, and the structure and pytorch code of the ShuffleNet_V1 model are explained.

Guess you like

Origin blog.csdn.net/yangyu0515/article/details/134929409