[Neural Network] (18) EfficientNetV2 code reproduction, network analysis, with complete Tensorflow code

Hello everyone, today I will share with you how to use Tensorflow to build an EfficientNetV2 convolutional neural network model.

EfficientNetV2 has been improved on the basis of EfficientNetV1, introduced the Fused-MBConv module, and the progressive learning strategy, and the training is faster . This article only introduces how to build a network model, and does not introduce the training process.

The article of EfficientNetV1 is here, you can take a look if you are interested: https://blog.csdn.net/dgvv4/article/details/123553351

In EfficientNet, the author pays more attention to the accuracy, the number of parameters, and FLOPs, and in EfficientNetV2, the author pays more attention to the training speed of the model.


1. Disadvantages in EfficientNet:

(1) When the size of the training images is large, the training speed is very slow.

A better way to think about this problem is to reduce the size of the training images. The size of the training images around the volume can not only speed up the training speed, but also use a larger batch_size

(2) The speed of using depthwise convolution (Depthwise) in the shallow layers of the network will be slow.

Because the current Depthwise Convolution cannot use some existing accelerators. Although the theoretical calculation amount is small, the actual use is not as fast as imagined. Therefore, the author introduced the Fused-MBConv module and replaced the shallow MBConv module of the network with the Fused-MBConv module.

(3) Equal amplification of each stage is suboptimal

In EfficientNetV1, the depth and width of each stage are equally enlarged. The contribution of each stage to the training speed of the network is not the same, so it is not reasonable to directly use the same scaling strategy. Therefore , the author adopts a non-uniform scaling strategy to scale the model.


2. Innovations of EfficientNetV2

(1) Introduce a new network EfficientNetV2, which is superior to some previous networks in terms of training speed and number of parameters.

(2) An improved progressive learning method is proposed, which dynamically adjusts the regularization method according to the size of the image to improve training speed and accuracy.

(3) Compared with some previous networks through experiments, the training speed is increased by 11 times, and the number of parameters is reduced by 1/6.8

Difference from EfficientNetV1

(1) Use the Fused-MBConv module in the shallow layer of the network and the MBConv module in the deep layer

(2) Use a smaller channel rise multiple

(3) Bias to use a smaller convolution kernel size (3*3)

(4) Removed the last stage with stride 1 in EfficientNetV1


3. Network core module

Depth separable convolution , reversal residual structure, SE attention mechanism will not be introduced anymore. It has been introduced in detail several times in previous articles. If you have any doubts, you can read: https://blog.csdn.net/dgvv4 /article/details/123553351

3.1 Stochastic Depth

This Dopout method is different from the previous Dropout method that kills neurons with random probability . As shown in the figure below, in the forward propagation process, there are many residual structures, the main branch performs convolution operations, and the shortcut residual connects the input and output.

Stochastic Depth has a certain probability of discarding the output of the main branch, and directly uses the output of the previous layer as the output of this layer, which is equivalent to no such layer, and the depth of the network becomes random , depending on how many layers are discarded. The dropout probability in EfficientNetV2 is 0-0.2

The Dropout layer of the Stochastic Depth type here is only used for the Dropout layer in the Fused-MBConv module and the MBConv module, excluding the Dropout layer of the last fully connected layer of the network .

This method improves the training speed and slightly improves the accuracy. The code only needs to add one more parameter than the normal dropout function.

x = layers.Dropout(rate = dropout_rate,  # 随机丢弃输出层的概率
                   noise_shape = (None,1,1,1))  # 代表不是杀死神经元,是丢弃输出层


3.2 MBConv module

Basic module (stride=1): image input, first increase the number of channels through 1x1 convolution ; then use depth convolution in high-latitude space ; then optimize the feature map data through the SE attention mechanism ; and then decrease the number of channels through 1x1 convolution (using a linear activation function) ; if the shape of the input feature map is the same as the shape of the output feature map at this time, add a Dropout layer of Stochastic Depth type to the feature map after 1x1 convolution dimension reduction to prevent over-fitting; differential connection input and output

Downsampling module (stride=2): The general process is the same as the basic module, without the use of Dropout layer and residual connection, and the feature map is directly output after 1x1 convolutional dimension reduction.

Code:

#(3)逆转残差模块
def MBConv(x, expansion, kernel_size, stride, out_channel, dropout_rate):
    '''
    expansion: 第一个卷积层特征图通道数上升的倍数
    kernel_size: 深度卷积层的卷积核size
    stride: 深度卷积层的步长
    out_channel: 第二个卷积层下降的通道数
    dropout_rate: Dropout层随机丢弃输出层的概率,直接将输入接到输出    
    '''
    # 残差边
    residual = x
    
    # 输入特征图的通道数
    in_channel = x.shape[-1]
    
    # ① 1*1标准卷积升维
    x = conv_block(inputs = x, 
                   filters = in_channel * expansion,  # 上升通道数为expansion倍 
                   kernel_size = (1,1), 
                   stride = 1,
                   activation = True)
    
    # ② 3*3深度卷积
    x = layers.DepthwiseConv2D(kernel_size = kernel_size,
                               strides = stride,
                               padding = 'same',
                               use_bias = False)(x)
    
    x = layers.BatchNormalization()(x)
    
    x = swish(x)
    
    # ④ SE注意力机制,输入特征图x,和MBConv模块输入图像的通道数
    x = se_block(inputs = x, in_channel = in_channel)
    
    # ⑤ 1*1标准卷积降维,使用线性激活
    x = conv_block(inputs = x,
                   filters = out_channel,  # 上升通道数
                   kernel_size = (1,1),
                   stride = 1,
                   activation = False)  # 不使用swish激活
    
    # ⑥ 只有步长=1且输入等于输出shape,才使用残差连接输入和输出
    if stride == 1 and residual.shape == x.shape:
        
        # 判断是否进行dropout操作
        if dropout_rate > 0:
            
            # 参数noise_shape一定的概率将某一层的输出丢弃
            x = layers.Dropout(rate = dropout_rate,  # 丢弃概率
                               noise_shape = (None,1,1,1))
        
        # 残差连接输入和输出
        x = layers.Add([residual, x])
        
        return x
    
    # 如果步长=2,直接输出1*1卷积降维后的结果
    return x

3.3 Fused-MBconv module

No need to increase the number of channels (expansion == 1): Image input, after 3*3 standard convolution, use the Stochastic Depth type Dropout layer for the output feature map  . When stride=1 and the shape of the input image of the module and the output image of the convolution are the same, the input and output are connected by residuals; when the stride=2 downsampling stage, the feature map of the convolution output is directly output. 

Number of ascending channels required (expansion != 1): For image input, first use 3*3 standard convolution ascending channels , then use 1*1 convolution descending channels , and the input feature map goes through the  Stochastic Depth type Dropout layer  . When the stride=1 and the input image of the module has the same shape as the 1*1 convolution output image, the residual is used to connect the input and output; when the stride=2 downsampling stage, the feature map output by the convolution is directly output.

Code:

#(4)Fused-MBConv模块
def Fused_MBConv(x, expansion, kernel_size, stride, out_channel, dropout_rate):    

    # 残差边
    residual = x
    
    # 输入特征图的通道数
    in_channel = x.shape[-1]
    
    # ① 如果通道扩展倍数expansion==1,就不需要升维
    if expansion != 1:
        # 3*3标准卷积升维
        x = conv_block(inputs = x, 
                       filters = in_channel * expansion,  # 通道数上升为原来的expansion倍 
                       kernel_size = kernel_size, 
                       stride = stride)
    
    # ② 判断卷积的类型
    # 如果expansion==1,变成3*3卷积+BN+激活;
    # 如果expansion!=1,变成1*1卷积+BN,步长为1
    x = conv_block(inputs = x, 
                   filters = out_channel, # FusedMBConv模块输出特征图通道数
                   kernel_size = (1,1) if expansion != 1 else kernel_size, 
                   stride = 1 if expansion != 1 else stride,
                   activation = False if expansion != 1 else True)
    
    # ④ 当步长=1且输入输出shape相同时残差连接
    if stride == 1 and residual.shape == x.shape:
        
        # 判断是否使用Dropout层
        if dropout_rate > 0:
            x = layers.Dropout(rate = dropout_rate,  # 随机丢弃输出层的概率
                               noise_shape = (None,1,1,1))  # 代表不是杀死神经元,是丢弃输出层

        # 残差连接输入和输出
        outputs = layers.Add([residual, x])
        
        return outputs
    
    # 若步长等于2,直接输出卷积层输出结果
    return x

4. Code display

4.1 Network Structure Diagram

The EfficientNetV2 network structure diagram is as follows. In the opterator column, MBConv4 represents that the number of ascending channels is 4 times the number of the original input channels, channels represents the number of output channels of each module, and layers represents how many times each module is repeated.


4.2 Complete code

Function method to build the network, the code is as follows

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Model, layers

#(1)swish激活函数
def swish(x):
    x = x * tf.nn.sigmoid(x)
    return x


#(2)标准卷积块
def conv_block(inputs, filters, kernel_size, stride, activation=True):
    
    # 卷积+BN+激活
    x = layers.Conv2D(filters = filters, 
                      kernel_size = kernel_size, 
                      strides = stride,
                      padding = 'same',
                      use_bias = False)(inputs)
    
    x = layers.BatchNormalization()(x)
    
    if activation:  # 如果activation==True就使用激活函数
        x = swish(x)
    
    return x


#(3)SE注意力机制
def se_block(inputs, in_channel, ratio=0.25):
    '''
    inputs: 深度卷积层的输出特征图
    input_channel: MBConv模块的输入特征图的通道数
    ratio: 第一个全连接层的通道数下降为MBConv输入特征图的几倍
    ''' 
    squeeze = int(in_channel * ratio)  # 第一个FC降低通道数个数
    excitation = inputs.shape[-1]  # 第二个FC上升通道数个数
    
    # 全局平均池化 [h,w,c]==>[None,c]
    x = layers.GlobalAveragePooling2D()(inputs)
    
    # [None,c]==>[1,1,c]
    x = layers.Reshape(target_shape=(1, 1, x.shape[-1]))(x)
    
    # [1,1,c]==>[1,1,c/4]
    x = layers.Conv2D(filters = squeeze, # 通道数下降1/4
                      kernel_size = (1,1),
                      strides = 1,
                      padding = 'same')(x)
    
    x = swish(x)  # swish激活
    
    # [1,1,c/4]==>[1,1,c]
    x = layers.Conv2D(filters = excitation,  # 通道数上升至原来
                      kernel_size = (1,1),
                      strides = 1,
                      padding = 'same')(x)
    
    x = tf.nn.sigmoid(x)  # sigmoid激活,权重归一化
    
    # [h,w,c] * [1,1,c] ==> [h,w,c]
    outputs = layers.multiply([inputs, x])
    
    return outputs


#(3)逆转残差模块
def MBConv(x, expansion, kernel_size, stride, out_channel, dropout_rate):
    '''
    expansion: 第一个卷积层特征图通道数上升的倍数
    kernel_size: 深度卷积层的卷积核size
    stride: 深度卷积层的步长
    out_channel: 第二个卷积层下降的通道数
    dropout_rate: Dropout层随机丢弃输出层的概率,直接将输入接到输出    
    '''
    # 残差边
    residual = x
    
    # 输入特征图的通道数
    in_channel = x.shape[-1]
    
    # ① 1*1标准卷积升维
    x = conv_block(inputs = x, 
                   filters = in_channel * expansion,  # 上升通道数为expansion倍 
                   kernel_size = (1,1), 
                   stride = 1,
                   activation = True)
    
    # ② 3*3深度卷积
    x = layers.DepthwiseConv2D(kernel_size = kernel_size,
                               strides = stride,
                               padding = 'same',
                               use_bias = False)(x)
    
    x = layers.BatchNormalization()(x)
    
    x = swish(x)
    
    # ④ SE注意力机制,输入特征图x,和MBConv模块输入图像的通道数
    x = se_block(inputs = x, in_channel = in_channel)
    
    # ⑤ 1*1标准卷积降维,使用线性激活
    x = conv_block(inputs = x,
                   filters = out_channel,  # 上升通道数
                   kernel_size = (1,1),
                   stride = 1,
                   activation = False)  # 不使用swish激活
    
    # ⑥ 只有步长=1且输入等于输出shape,才使用残差连接输入和输出
    if stride == 1 and residual.shape == x.shape:
        
        # 判断是否进行dropout操作
        if dropout_rate > 0:
            
            # 参数noise_shape一定的概率将某一层的输出丢弃
            x = layers.Dropout(rate = dropout_rate,  # 丢弃概率
                               noise_shape = (None,1,1,1))
        
        # 残差连接输入和输出
        x = layers.Add([residual, x])
        
        return x
    
    # 如果步长=2,直接输出1*1卷积降维后的结果
    return x
    

#(4)Fused-MBConv模块
def Fused_MBConv(x, expansion, kernel_size, stride, out_channel, dropout_rate):    

    # 残差边
    residual = x
    
    # 输入特征图的通道数
    in_channel = x.shape[-1]
    
    # ① 如果通道扩展倍数expansion==1,就不需要升维
    if expansion != 1:
        # 3*3标准卷积升维
        x = conv_block(inputs = x, 
                       filters = in_channel * expansion,  # 通道数上升为原来的expansion倍 
                       kernel_size = kernel_size, 
                       stride = stride)
    
    # ② 判断卷积的类型
    # 如果expansion==1,变成3*3卷积+BN+激活;
    # 如果expansion!=1,变成1*1卷积+BN,步长为1
    x = conv_block(inputs = x, 
                   filters = out_channel, # FusedMBConv模块输出特征图通道数
                   kernel_size = (1,1) if expansion != 1 else kernel_size, 
                   stride = 1 if expansion != 1 else stride,
                   activation = False if expansion != 1 else True)
    
    # ④ 当步长=1且输入输出shape相同时残差连接
    if stride == 1 and residual.shape == x.shape:
        
        # 判断是否使用Dropout层
        if dropout_rate > 0:
            x = layers.Dropout(rate = dropout_rate,  # 随机丢弃输出层的概率
                               noise_shape = (None,1,1,1))  # 代表不是杀死神经元,是丢弃输出层

        # 残差连接输入和输出
        outputs = layers.Add([residual, x])
        
        return outputs
    
    # 若步长等于2,直接输出卷积层输出结果
    return x


#(5)每个模块重复执行num次
# Fused_MBConv模块
def Fused_stage(x, num, expansion, kernel_size, stride, out_channel, dropout_rate):
    
    for _ in range(num):
        # 传入参数,反复调用Fused_MBConv模块
        x = Fused_MBConv(x, expansion, kernel_size, stride, out_channel, dropout_rate)
        
    return x

# MBConv模块
def stage(x, num, expansion, kernel_size, stride, out_channel, dropout_rate):
    
    for _ in range(num):
        # 反复执行MBConv模块
        x = MBConv(x, expansion, kernel_size, stride, out_channel, dropout_rate)
    
    return x


#(6)主干网络
def efficientnetv2(input_shape, classes, dropout_rate):
    
    # 构造输入层
    inputs = keras.Input(shape=input_shape)
    
    # 标准卷积层[224,224,3]==>[112,112,24]
    x = conv_block(inputs, filters=24, kernel_size=(3,3), stride=2)
    
    # [112,112,24]==>[112,112,24]
    x = Fused_stage(x, num=2, expansion=1, kernel_size=(3,3), 
                    stride=1, out_channel=24, dropout_rate=dropout_rate)
    
    # [112,112,24]==>[56,56,48]
    x = Fused_stage(x, num=4, expansion=4, kernel_size=(3,3), 
                    stride=2, out_channel=48, dropout_rate=dropout_rate)

    # [56,56,48]==>[32,32,64]
    x = Fused_stage(x, num=4, expansion=4, kernel_size=(3,3), 
                    stride=2, out_channel=64, dropout_rate=dropout_rate)
    
    # [32,32,64]==>[16,16,128]
    x = stage(x, num=6, expansion=4, kernel_size=(3,3), 
              stride=2, out_channel=128, dropout_rate=dropout_rate)

    # [16,16,128]==>[16,16,160]
    x = stage(x, num=9, expansion=6, kernel_size=(3,3), 
              stride=1, out_channel=160, dropout_rate=dropout_rate)

    # [16,16,160]==>[8,8,256]
    x = stage(x, num=15, expansion=6, kernel_size=(3,3), 
              stride=2, out_channel=256, dropout_rate=dropout_rate)

    # [8,8,256]==>[8,8,1280]
    x = conv_block(x, filters=1280, kernel_size=(1,1), stride=1)
    
    # [8,8,1280]==>[None,1280]
    x = layers.GlobalAveragePooling2D()(x)
    
    # dropout层随机杀死神经元
    if dropout_rate > 0:
        x = layers.Dropout(rate=dropout_rate)    
    
    # [None,1280]==>[None,classes]
    logits = layers.Dense(classes)(x)
    
    # 构建网络
    model = Model(inputs, logits)
    
    return model


#(7)接收网络模型
if __name__ == '__main__':

    model = efficientnetv2(input_shape = [224,224,3], # 输入图像shape 
                           classes = 1000, # 分类数 
                           dropout_rate = 0)
    
    model.summary()  # 查看网络架构

4.3 View the network structure

View the network architecture through model.summary(), there are about 20 million parameters

--------------------------------
dense (Dense)                   (None, 1000)         1281000     global_average_pooling2d_30[0][0]
==================================================================================================
Total params: 21,612,360
Trainable params: 21,458,488
Non-trainable params: 153,872
__________________________________________________________________________________________________

Guess you like

Origin blog.csdn.net/dgvv4/article/details/123598847