12. CVPR 2023 | SCConv: CNN model that can solve the redundancy of convolutional features + a brief discussion of large models and lightweight models

SCI journals:

"CVPR" , one of the three top journals in the fields of AI and CV

paper:

《SCConv: Spatial and Channel Reconstruction Convolution for Feature
Redundancy》

unit:

1.School of Communication and Electronic Engineering, East China Normal University, Shanghai, China.
2.Department of Computer Science and Technology, Tongji University, Shanghai, China.

0. Foreword:

The reason why I pay attention to this article is not only because it has won CVPR, but also because of work needs. Since starting work, I have mainly focused on the following aspects of progress in the field of artificial intelligence:

1, New integral neural network

2, Engineering experience in self-built CNN model

3,CNN’s effective information extraction in channel and spatial dimensions

4, Develop a new CNN model that can compete with or even surpass the various benchmarks of the most advanced Baseline model that specializes in processing EEG signals

5, The latest progress of various lightweight models in the world

Among the above 5 points, I will not discuss the first 4 points here due to other reasons, but I have said enough about the 5th point in my previous blog. This is because the current AI innovation model has the following two developments:

1. Either develop a new large model

2. Either explore a good lightweight model

First of all, I have not conducted detailed research on large models, but I know them more or less. But as far as I know, as of 2023, there are currently a total of 188 large models in domestic universities and enterprises, such as Huawei's Pangu large model, Tencent's Hunyuan, and iFlytek's Spark; there are currently 18 large models abroad, and the most famous is OpenAI's ChatGPT created by a disciple of the father of deep learning. I have to say that China is far ahead in this regard!

However, a Nature article published on November 8, 2023 rigorously pointed out:Large models can only engage in role-playing and do not really have self-awareness. The article points out that the polite appearance of large models such as GPT-4, PaLM, and LIama in front of humans is just a pretense. In fact, they do not have human emotions, nor do they have any emotions. What does it look like? This article comes from Google DeepMind and Eleuther AI. The paper points out: The large model is just a role-playing engine. For this sentence, the article is too harsh, directly blocking the future significance of research and development of large models, an AI direction, without future practical significance, generally over, Domestic countries have spent a lot of money and people to develop large models, but in November they were slapped in the face by Nature, and the country is far ahead!

Ha, if you are interested, you can read this latest Nature article:

Shanahan M, McDonell K, Reynolds L. Role play with large language models. Nature. 2023 Nov;623(7987):493-498. doi: 10.1038/s41586-023-06647-8.IF: 64.8 Q1 . Epub 2023 Nov 8. PMID: 37938776.

So I say that it is wise and always correct to develop new lightweight CNN models and structural designs and engineering implementations.

1 Introduction:

     Let’s get back to business, let’s talk about this SCConv lightweight model. Convolutional neural networks have achieved remarkable performance in various computer vision tasks, but this comes at the cost of huge computational resources, in part due to the redundant features extracted by the convolutional layers.

    In this paper, the author attempts to utilize the spatial and channel redundancy between features for CNN compression, and proposes an efficient convolution module called SCConv (spatial and channel reconstruction convolution) to reduce Compute redundantly and facilitate the learning of representative features. The proposed SCConv consists of Spatial Reconstruction Unit (SRU) and Channel Reconstruction Unit (CRU) consists of two units. SRU uses the separation and reconstruction method to suppress spatial redundancy, and CRU uses the separation transformation and fusion strategy to reduce channel redundancy. In addition, SCConv is a plug-and-play architectural unit that can be directly used to replace standard convolutions in various convolutional neural networks. Experimental results show that the scconvo embedding model can achieve better performance by reducing redundant features and significantly reduce complexity and computational cost.

     Let me first talk about redundancy in computers, redundancy: duplication of data and waste of space. When placed in a neural network, the convolutional layer extracts repeated features and takes up memory space, causing other effective features that are beneficial to classification to be squeezed out. The result is that the model performance is limited. This is also focus 3 I mentioned above, so I can learn a lot of other good things through this model. Below I will briefly explain this model.

2. Model design:

2.1 SCConv

       As shown in the figure, SCConv consists of two units, namely the spatial reconstruction unit (SRU) and the channel reconstruction unit (CRU). The two units are arranged in order. The input feature X first passes through the spatial reconstruction unit  to obtain the spatially refined feature Xw. Then through the channel reconstruction unit , the channel-refined feature Y is obtained as the output.

      The SCConv module takes advantage of the spatial redundancy and channel redundancy between features. The module can be seamlessly integrated into any CNN framework to reduce the redundancy between features and improve the representativeness of CNN features.

author 对 SRU 和 CRU , inclusive:

  1. Do not use SRU and CRU
  2. Using SRU alone
  3. Using CRU alone
  4. Using SRU and CRU in parallel
  5. Use CRU first then SRU
  6. Use SRU first and then use CRU

Finally, it was found that using SRU first and then CRU works best:

2.2 SRU spatial reconstruction unit

 Spatial Reconstruction Unit.SRU, this unit adopts the separation-reconstruction method

The purpose of Separation is to separate feature maps with large information content from feature maps with small information content, corresponding to the spatial content. The authors use scaling factors in Group Normalization to evaluate the information content in different feature maps.

Separation formula:

Reconstruction The operation is to add features with more information and features with less information to generate features with more information and save space. The specific operation is cross reconstruction, which combines two weighted different information features to obtain Xw1 and Xw2, and then connects them to obtain the spatial refinement feature map Xw.

After SRU processing, features with large information content are separated from features with small information content, reducing redundant features in the spatial dimension.

2.3 CRU channel reconstruction unit

 Channel Reconstruction Unit. The unit adopts the method of segmentation-conversion-fusion.

The segmentation operation divides the input spatial refinement feature Xw into two parts: the number of channels of one part is aC, and the number of channels of the other part is (1−a)C, where a is a hyperparameter and 0≤a≤1. Then the channel numbers of the two sets of features are compressed using a 1×1 convolution kernel to obtain Xup and Xlow respectively.

The Conversion operation uses the input Xup as the input of "rich feature extraction", performs GWC and PWC respectively, and then adds them to obtain the output Y1. Use the input

The fusion operation uses a simplified SKNet method to adaptively merge Y1 and Y2. Specifically, the global average pooling technology is first used to combine the global spatial information and channel statistical information to obtain the pooled S1 and S2. Then do Softmax on S1 and S2 to obtain the feature weight vectors B1 and B2. Finally, the feature weight vector is used to obtain the output �=B1Y1+B2Y2, and Y is the feature extracted from the channel.

In fact, it is very simple. The channel module separates the channels from the characteristic data obtained from the previous spatial unit. Then the upper channel is the backbone, and the lower channel can be said to be the supplement. The upper channel performs group intelligent convolution GWC and point intelligent convolution PWC, and the lower channel only performs PWC then adds features, pools them, clicks and splices again.

The key is, how to allocate the initial number of channels to the top and bottom lanes? The author gave the answer:

2.4 Classification experiment

After determining the optimal channel split ratio and determining the order of SRU and CRU, the author compared the model with quite a few current CVs.

The SOTA model was compared and the results are as follows:

2.5 Pytorch Replication Code

'''
Description: 
LastEditTime: 2023-07-27 18:41:47
FilePath: /chengdongzhou/ScConv.py
'''
import torch
import torch.nn.functional as F
import torch.nn as nn 


class GroupBatchnorm2d(nn.Module):
    def __init__(self, c_num:int, 
                 group_num:int = 16, 
                 eps:float = 1e-10
                 ):
        super(GroupBatchnorm2d,self).__init__()
        assert c_num    >= group_num
        self.group_num  = group_num
        self.weight     = nn.Parameter( torch.randn(c_num, 1, 1)    )
        self.bias       = nn.Parameter( torch.zeros(c_num, 1, 1)    )
        self.eps        = eps
    def forward(self, x):
        N, C, H, W  = x.size()
        x           = x.view(   N, self.group_num, -1   )
        mean        = x.mean(   dim = 2, keepdim = True )
        std         = x.std (   dim = 2, keepdim = True )
        x           = (x - mean) / (std+self.eps)
        x           = x.view(N, C, H, W)
        return x * self.weight + self.bias


class SRU(nn.Module):
    def __init__(self,
                 oup_channels:int, 
                 group_num:int = 16,
                 gate_treshold:float = 0.5,
                 torch_gn:bool = True
                 ):
        super().__init__()
        
        self.gn             = nn.GroupNorm( num_channels = oup_channels, num_groups = group_num ) if torch_gn else GroupBatchnorm2d(c_num = oup_channels, group_num = group_num)
        self.gate_treshold  = gate_treshold
        self.sigomid        = nn.Sigmoid()

    def forward(self,x):
        gn_x        = self.gn(x)
        w_gamma     = self.gn.weight/sum(self.gn.weight)
        w_gamma     = w_gamma.view(1,-1,1,1)
        reweigts    = self.sigomid( gn_x * w_gamma )
        # Gate
        w1          = torch.where(reweigts > self.gate_treshold, torch.ones_like(reweigts), reweigts) # 大于门限值的设为1,否则保留原值
        w2          = torch.where(reweigts > self.gate_treshold, torch.zeros_like(reweigts), reweigts) # 大于门限值的设为0,否则保留原值
        x_1         = w1 * x
        x_2         = w2 * x
        y           = self.reconstruct(x_1,x_2)
        return y
    
    def reconstruct(self,x_1,x_2):
        x_11,x_12 = torch.split(x_1, x_1.size(1)//2, dim=1)
        x_21,x_22 = torch.split(x_2, x_2.size(1)//2, dim=1)
        return torch.cat([ x_11+x_22, x_12+x_21 ],dim=1)


class CRU(nn.Module):
    '''
    alpha: 0<alpha<1
    '''
    def __init__(self, 
                 op_channel:int,
                 alpha:float = 1/2,
                 squeeze_radio:int = 2 ,
                 group_size:int = 2,
                 group_kernel_size:int = 3,
                 ):
        super().__init__()
        self.up_channel     = up_channel   =   int(alpha*op_channel)
        self.low_channel    = low_channel  =   op_channel-up_channel
        self.squeeze1       = nn.Conv2d(up_channel,up_channel//squeeze_radio,kernel_size=1,bias=False)
        self.squeeze2       = nn.Conv2d(low_channel,low_channel//squeeze_radio,kernel_size=1,bias=False)
        #up
        self.GWC            = nn.Conv2d(up_channel//squeeze_radio, op_channel,kernel_size=group_kernel_size, stride=1,padding=group_kernel_size//2, groups = group_size)
        self.PWC1           = nn.Conv2d(up_channel//squeeze_radio, op_channel,kernel_size=1, bias=False)
        #low
        self.PWC2           = nn.Conv2d(low_channel//squeeze_radio, op_channel-low_channel//squeeze_radio,kernel_size=1, bias=False)
        self.advavg         = nn.AdaptiveAvgPool2d(1)

    def forward(self,x):
        # Split
        up,low  = torch.split(x,[self.up_channel,self.low_channel],dim=1)
        up,low  = self.squeeze1(up),self.squeeze2(low)
        # Transform
        Y1      = self.GWC(up) + self.PWC1(up)
        Y2      = torch.cat( [self.PWC2(low), low], dim= 1 )
        # Fuse
        out     = torch.cat( [Y1,Y2], dim= 1 )
        out     = F.softmax( self.advavg(out), dim=1 ) * out
        out1,out2 = torch.split(out,out.size(1)//2,dim=1)
        return out1+out2


class ScConv(nn.Module):
    def __init__(self,
                op_channel:int,
                group_num:int = 4,
                gate_treshold:float = 0.5,
                alpha:float = 1/2,
                squeeze_radio:int = 2 ,
                group_size:int = 2,
                group_kernel_size:int = 3,
                 ):
        super().__init__()
        self.SRU = SRU( op_channel, 
                       group_num            = group_num,  
                       gate_treshold        = gate_treshold )
        self.CRU = CRU( op_channel, 
                       alpha                = alpha, 
                       squeeze_radio        = squeeze_radio ,
                       group_size           = group_size ,
                       group_kernel_size    = group_kernel_size )
    
    def forward(self,x):
        x = self.SRU(x)
        x = self.CRU(x)
        return x


if __name__ == '__main__':
    x       = torch.randn(1,32,16,16)
    model   = ScConv(32)
    print(model(x).shape)

All are encapsulated, you can call them directly if you want.

3. Blake’s baldness incident

Goudan Scholar accidentally burned his hair while exploring AI model technology last week and is now bald. I hope Black Scholar will recover soon. Here is a handsome photo of Goudan Scholar the night before he was burned.

Guess you like

Origin blog.csdn.net/mantoudamahou/article/details/134506203