Model pruning-yolov5

introduction

In order to understand learning model pruning, here we take the target detection algorithm yolov5 as an example to perform model pruning operation.

1. Concept introduction

1. Model quantification is to reduce the calculation amount and storage space of the model by compressing or streamlining the model, thereby improving the efficiency and speed of the model. There are two main aspects of quantification:

(1) Weight quantification: Convert the floating point parameters of the model into low-precision parameters, such as converting 32-bit floating point numbers into 8-bit integers or 4-bit floating point numbers.

(2) Model pruning: Delete unnecessary parameters and connections of the model to reduce the amount of calculation and memory usage. For example, delete unnecessary convolution kernels and prune sparse connections.

2. Yolov5 performs model pruning and compression

1. The target detection model yolov5 has a very good detection effect, but if the model is large in size, and if you simply reduce the network input to reduce operations (such as 640->320), the detection effect will be greatly affected. By adding L1 regularization to constrain the BN layer coefficients, the coefficients are sparse. After sparse training, the sparse layer is cut out with very small layers, and the corresponding activations are also very small. This has a low impact on the performance of the model. By iterating this process repeatedly, a very effective model can be obtained. This is the basic step.

2. Detailed explanation: The scaling factor γ in the BN layer is associated with each channel in the convolutional layer. These scaling factors are sparsely regularized during training to automatically identify unimportant channels. Channels with smaller scaling factor values ​​will be trimmed. After pruning, a compact model is obtained, which is then fine-tuned to achieve comparable (or even higher) accuracy to the normally trained full network.

Further explanation: Add a sparse factor to the BN layer network, train to make the BN layer sparse, statistically sort the weights of all BN layers in the model after sparse training, and obtain the specified number of retained BN layers, that is, obtain the sorted weight threshold thres . Traverse the BN layer weights in the model and create masks for each layer (weight > thres value is 1, weight < thres value is 0). Pruning operation, construct a new model structure according to the mask of each layer (the number of channels retained in each layer), obtain the index of the non-zero value of the BN layer weight * mask, and the channels of the original conv layer, BN layer, and linear layer corresponding to the non-zero index The weights, biases and other values ​​are assigned to each layer of the new model. Load the pruned model and perform fine-tuning training.

PS: Regularization:

Introduce additional information to the original loss function to prevent overfitting or improve the generalization ability of the model. That is, the objective function becomes the original loss function + additional terms. Common additional terms are L1 regularization and L2 regularization.

3. Here we use pedestrians holding dangerous objects as the training set and yolov5prune as the training code ( GitHub - midasklr/yolov5prune )

(1) First use train.py for normal training;

Complete model trained by yolov5 Size: 13.6M Speed: 0.9ms (413 pictures)

(2) Then perform sparse training, train_sparsity.py;

Model obtained by yolov5 after sparse training Size: 13.6M Speed: 0.8ms (413 pictures)

(3) After training is completed, pruning operation is required: prune.py;

Model: model obtained after pruning by yolov5 Size: 9.67M Speed ​​0.8ms (413 pictures)

(4) After pruning is completed, fine-tune: finetune_pruned.py.

Model re-fine-tuned after pruning by yolov5 Size: 4.95M Speed ​​0.8ms

 

Guess you like

Origin blog.csdn.net/qq_39149619/article/details/132035033