Detailed explanation of RepVGG paper (super detailed)

        RepVGG was published in CVPR in 2021. Like resnet, it is an image classification network and is used as a backbone in target detection. The paper proposes a new technology called structural reparameterization. Simply put, it is a trained model. Perform equivalent replacement with a simple model, and then use this simple model for reasoning (that is, testing), the purpose is to speed up reasoning and improve model practicability.   

Paper address: https://arxiv.org/abs/2101.03697

Paper source code: https://github.com/megvii-model/RepVGG


Table of contents

1. Abstract and Introduction (What is RepVGG)

 For complex networks, it is obvious that a very high accuracy can be achieved. Why not use it?

1.1 RepVGG model structure

2. Related research done by the author

3. Advantages of RepVGG

4. How to achieve structural reparameterization

5. Model deployment

6. Experimental comparison


1. Abstract and Introduction (What is RepVGG)

        These two parts put forward the background and general functional advantages of the RepVGG network. Since the current network architecture is very large and the inference speed is slow such as ResNet-101, the paper proposes a simple and efficient network. The inference structure is only 3x3 volumes like VGG. Product and ReLU, with multiple residual edges during training. And it has a very good score on ImageNet

   

 For complex networks, it is obvious that a very high accuracy can be achieved. Why not use it?

        Although the complex multi-branch model can achieve relatively high accuracy, it also has some obvious shortcomings. 1) It will reduce the reasoning speed of the model and reduce memory utilization (mentioned later), 2) Some nodes will increase memory consumption and are not friendly to other devices. But in industry, VGG and ResNet are still very popular. As mentioned in the paper, most scholars mentioned that FLOPs will affect the speed of reasoning, but the author did experiments and found that FLOPs do not seem to have much relationship with the speed of the model. For example, the FLOPs of RepVGG-B3 are larger than EfficientNet but the Speed ​​is faster.

        In view of the above shortcomings, the author proposes to decouple the training-time multi  branch and inference-time plain architecture via structural re-parameterization, and decouple the training multi-branch model into an inference smoothing model through structural reparameterization, and because the inference model structure is very different from VGG Like it is called RepVGG, this technology is simple, because the model is built from a set of parameters, then whether it is possible to find another set of parameters with the same effect, and the model corresponding to the new set of parameters is used for inference, this is It is structural reparameterization.

1.1 RepVGG model structure

        RepVGG continues to use the residual edge of ResNet, but RepVGG uses a residual connection in each layer, 3x3 convolution is connected with an identity and 1x1 convolution, B in the figure represents the model structure used during training, in the figure C represents the model structure used in reasoning. In the training model, a BN layer is added before the output of each branch. How to make B->C is an important embodiment of reparameterization. The implementation details will be discussed below, so first understand the general process

2. Related research done by the author

        1. The development of the network from single to multi-branch) VGG is a single-path network structure, that is, there is no residual edge, and the accuracy was not bad back then, but the accuracy was compared after ResNet and DenseNet were proposed, but DenseNet Very large networks may not be able to be trained on ordinary GPUs, thus limiting its practicability. In addition, there is also the problem of slow reasoning speed and low parallelism.

        2. The effectiveness of single model training) The author has done some comparative experiments. On the single model and the multi-branch model, it is obvious that the multi-branch model has a better training effect and more accurate prediction results, so the author did not design the network. Abandon the multi-branch structure and instead fuse it into a single path at inference time.

       3. Model reparameterization) The author mentioned that the reparameterization method used in DiracNet is different from RepVGG. DiracNet adjusts the network during training. In other words, RepVGG is still a multi-branch structure during training, but DiracNet is not.

        4. Why use 3x3 convolution) 3x3 convolution is packaged and optimized on the GPU, and the calculation density is higher than other convolutions. The author has also done related experiments

3. Advantages of RepVGG

        1) Fast, after using reparameterization, the speed of reasoning will be greatly improved, and it will help model deployment to improve practicability. 2) Save memory, adopt the multi-branch model, and each calculation requires multiple copies of memory to store the results of each branch separately, which leads to large memory consumption

3) It will be more convenient to modify the model. There is a problem with using multi-branches, that is, the input channel and output channel of each branch must be consistent, so it will be inconvenient to modify the model. If it is a single path, this problem does not exist. 

4. How to achieve structural reparameterization

        As mentioned above, each layer of RepVGG has three branches during training, which are identify, 1x1, 3x3. During model training, the output y = x +g(x)+f(x)requires 3 parameter blocks for each layer. For an n-layer network, 3^{n}a parameter is required. piece. So we need to reparameterize, which will make the amount of model parameters small during inference.

        

         The first question is to connect a BN after each convolution, how to fuse the convolution and BN. The second question is that there are convolutions of different sizes, how to fuse several convolutions of different sizes together.

        For the first question, if the input is M(1), the convolution kernel is W, and the output after BN is:

         How is BN calculated:

       Propose the M feature map and offset:

 So convolution plus BN is equal to the fusion of three convolution kernels to convolve M plus bias

The second question, for 1x1 convolution, expand 1x1 to 3x3, add 0 to the excess, and identify can also be converted into 1x1, so three 3x3s are added to form a new 3x3 convolution, and the entire reparameterization is completed. operate

5. Model deployment

        The author designs RepVGG for the small model and the large model. The layers are [ 1, 2, 4, 14, 1] and [ 1, 4, 6, 16, 1] respectively. The small model is called RepVGG-A, and the large model is called RepVGG-A. RepVGG-B, for the number of channels per layer [64 a, 128 a, 256 a, 512 b ],

 Why this design, the author mentioned that because the image resolution of the first layer input is high, the number of channels should not be too large, which may lead to excessive calculation. In the end, stage 512b is designed because there is only one layer at the end, so a larger size helps to store parameters. Among them, there are 14 layers of blocks in 4stage, which is based on the architectural layout of ResNet

6. Experimental comparison

        The author proposes 7 models to compare with mainstream models on Imagenet:    

 

 It can be seen from the experiment that the reasoning speed of RepVGG is relatively fast. Although the parameters are larger than ResNet, the use of re-parameterization to replace the reasoning model shows that this work is very effective. The speed of RepVGG-B0 in the figure below does not do heavy parameterization. Compared with the speed in the figure above, there is a big decrease, and the speed is increased by nearly 50%.

 In short, the method of structural reparameterization in RepVGG is worth learning. It is very friendly to the deployment of the model. In future development, reparameterization will become more and more popular (YOLOv6 uses this method).

Guess you like

Origin blog.csdn.net/weixin_44711102/article/details/127624993