shuffleNet-V1 paper reading and code implementation


foreword

ShufflenetV1 is another direction for the lightweight development of convolutional neural networks, and is a lightweight network following Mobilenet.


1. Paper reading summary

Paper address
Tricks: Application of Group Convolution on 1*1 convolution; channel shuffle improves information transfer between channels;

1.Channel Shuffle

In ResNeXt, only the group convolution mode of 3 3 convolution is considered , so that most of the calculations are concentrated on 1 1 convolution (pointwise conv). Therefore, this paper also uses group convolution for 1*1 convolution to ensure that each convolution operation only acts on a small number of channels, ensuring the sparsity of channel connections and reducing the amount of calculation.
What is group convolution?
Assume that the output feature map of the previous layer has N, that is, the number of channels channel=N, that is to say, there are N convolution kernels in the previous layer. Assume again that the group number M of the group convolution. Then the operation of this group of convolutional layers is to first divide the channel into M parts. Each group corresponds to N/M channels and is independently connected to them. Then after the convolution of each group is completed, the output is concatenated as the output channel of this layer.

2.Group Conv

If the group convolution method is used in the entire network, the feature map corresponding to the output of each channel is only related to the feature maps of a few input channels, resulting in information blocking. Therefore, it is necessary to divide N (number of channels) input feature maps into multiple subgroups, and select different feature maps in the subgroups to form a new subgroup and send it to the next group convolution. The schematic diagram is shown in Figure 1.insert image description here

3. Ablation Experiment

1) The number of channels of group convolution (g=1, 2, 3, 4, 8)
using group convolution on 1*1 convolution is stronger than not using it. In some models (such as ShuffleNet 0.5×), when the group When the number becomes larger (such as g=8), the classification score reaches saturation or even declines. As the number of groups increases (and thus wider feature maps), each convolutional filter has fewer input channels, which can hurt representational power. For smaller models, such as ShuffleNet 0.25×, group numbers tend to give better results, suggesting that larger feature maps bring more benefits to smaller models.
insert image description here
2) Comparing shuffle and no shuffle:
Channel shuffle can improve classification scores under different settings. Especially when the number of groups is large (such as g = 8), the model with channel randomization significantly outperforms its counterparts, which illustrates the importance of cross-group information exchange.
insert image description here

2. Code implementation

Code Citation Reference

1. Shuffle implementation

code show as below:

class shuffle(nn.Module):
    def __init__(self,group=2):
        super(shuffle, self).__init__()
        self.group=group
    def forward(self,x):
        """shuffle操作:[N,C,H,W] -> [N,g,C/g,H,W] -> [N,C/g,g,H,w] -> [N,C,H,W]"""
        num,channel,height,width=x.size()
        x=x.view(num,self.group,channel//self.group,height,width)
        x=x.permute(0,2,1,3,4)
        x=x.reshape(num,channel,height,width)
        return x

2. Bottleneck module implementation

code show as below:

class bottleblock(nn.Module):
    def __init__(self,in_channel,out_channel,stride,group):
        super(bottleblock, self).__init__()
        self.stride=stride
        if in_channel==24:
            group=1
        else:
            group=group
        self.conv1_with_group=nn.Sequential(nn.Conv2d(in_channels=in_channel,out_channels=out_channel//4,kernel_size=1,stride=1,groups=group,bias=False),
                                            nn.BatchNorm2d(out_channel//4),
                                            nn.ReLU(inplace=True))
        self.shuffle=shuffle(group)
        self.conv2_with_depth=nn.Sequential(nn.Conv2d(in_channels=out_channel//4,out_channels=out_channel//4,stride=stride,kernel_size=3,groups=out_channel//4,padding=1,bias=False),
                                            nn.BatchNorm2d(out_channel//4))
        self.conv3_with_group=nn.Sequential(nn.Conv2d(in_channels=out_channel//4,out_channels=out_channel,kernel_size=1,stride=1,groups=group),
                                            nn.BatchNorm2d(out_channel))
        if stride==2:
            self.shortcut=nn.AvgPool2d(stride=stride,kernel_size=3,padding=1)
        else:
            self.shortcut=nn.Sequential()
    def forward(self,a):
        x=self.conv1_with_group(a)
        x=self.shuffle(x)
        x=self.conv2_with_depth(x)
        x=self.conv3_with_group(x)
        residual=self.shortcut(a)
        if self.stride==2:
            return F.relu(torch.cat([x,residual],1))
        else:
            return F.relu(residual+x)


3. Shufflenet network implementation

code show as below:

class shufflenet(nn.Module):
    def __init__(self,num_class,group):
        super(shufflenet, self).__init__()
        self.num_class=num_class
        self.inchannel=24
        if group==8:
            stage_dict={
    
    'bolck_num':[4,8,4],
                         'outchannel':[384,768,1536],
                         'group':group}
        elif group==4:
            stage_dict = {
    
    'bolck_num': [4, 8, 4],
                               'outchannel': [272, 544, 1088],
                               'group': group}
        elif group==3:
            stage_dict = {
    
    'bolck_num': [4, 8, 4],
                               'outchannel': [240, 480, 960],
                               'group': group}
        elif group==2:
            stage_dict = {
    
    'bolck_num': [4, 8, 4],
                               'outchannel': [200, 400, 800],
                               'group': group}
        elif group==1:
            stage_dict = {
    
    'bolck_num': [4, 8, 4],
                               'outchannel': [144, 288, 576],
                               'group': group}
        block_num=stage_dict['bolck_num']
        outchannel=stage_dict['outchannel']
        group=stage_dict['group']
        self.initial=nn.Sequential(nn.Conv2d(kernel_size=3,padding=1,in_channels=3,out_channels=24,stride=2),
                                   nn.BatchNorm2d(24),
                                   nn.ReLU(inplace=True),
                                   nn.MaxPool2d(kernel_size=3,stride=2,padding=1))

        self.layer1 = self.make_layer(block_num[0],outchannel[0],group)
        self.layer2 = self.make_layer(block_num[1], outchannel[1], group)
        self.layer3 = self.make_layer(block_num[2], outchannel[2], group)

        self.pool=nn.AdaptiveAvgPool2d(1)
        self.fc=nn.Linear(outchannel[2],num_class)
    def make_layer(self,block_num,outchannel,group):
        layer_list=[]
        for i in range(block_num):
            if i==0:
                stride=2
                catchannel=self.inchannel
            else:
                stride=1
                catchannel=0
            layer_list.append(bottleblock(self.inchannel,outchannel-catchannel,stride,group))
            self.inchannel=outchannel
        return nn.Sequential(*layer_list)
    def forward(self,x):
        x=self.initial(x)
        x=self.layer1(x)
        x=self.layer2(x)
        x=self.layer3(x)
        x=self.pool(x)
        x=x.view(x.size(0),-1)
        x=self.fc(x)
        return F.softmax(x,dim=1)

Summarize

This article introduces the core idea and code implementation of shuffleNetV1 for everyone to exchange and discuss!
Past review:
(1) Interpretation of CBAM papers + Pytorch implementation of CBAM-ResNeXt
(2) Interpretation of SENet papers and code examples
Next issue preview:
shuffleNet-V2 paper reading and code implementation

Guess you like

Origin blog.csdn.net/qq_44840741/article/details/121425959