Start yolov5 model pruning from scratch

Start yolov5 model pruning from scratch ·

1 Introduction

[During the entire process, more than 10 problems were encountered in normal train, spareTrain, prune, and finetune, including solutions to AttributeError, ModuleNotFoundError, RuntimeError, SyntaxError, TypeError, etc., see the content for details]

In order to transplant the existing model to the ARM platform, while ensuring the accuracy of the model, reduce the model's computing power consumption and inference time.

We have done experiments before to compare YOLOv5, YOLOv7, and YOLOv8 , combined the inference time and accuracy of different versions of the models, and checked a lot of information, including most people's blog descriptions, and combined with most people's experience, we feel that yolov5 has a better generalization ability. good. Therefore, when considering training our own model and deploying it on X86 and ARM platforms, we trained and pruned the model for yolov5 to facilitate lightweight deployment of small models.

Of course, we also need to perform INT8 quantization on the final model to reduce the inference time of target detection.

2 Get the source code from GitHub

Download the source code from the following path:

https://github.com/midasklr/yolov5prune/tree/v6.0

This article is pruning the 6.0 version from GitHub above.

3 principles

[Based on the introduction of yolov5 pruning in some blogs/articles, let’s briefly summarize the principles of yolov5 model pruning]

3.1 Principle

Principle paper: Learning Efficient Convolutional Networks through Network Slimming

ref: Pruning Filters for Efficient ConvNets( https://arxiv.org/abs/1608.08710 )

ref: https://blog.csdn.net/qq_42835363/article/details/129125376?spm=1001.2014.3001.5501

ref: https://blog.csdn.net/IEEE_FELLOW/article/details/117236025

ref: Model pruning on Yolov5_5.0

The input passes through the BN (Batch Normalization) layer to obtain the normalized distribution. There are two trainable parameters γ (gamma) and β (beta) in the BN layer.

When gamma and betaγ tend to 0, the input is equivalent to being multiplied by 0. At this time, the convolution on the channel will output 0, which is meaningless. Therefore, it can be considered that eliminating such redundant channels has no impact on model performance.

During ordinary network training, gamma is generally distributed near 1 due to initialization. In order to make gamma tend to 0, you can constrain it by adding L1 regularization to make the coefficients sparse. In the paper, the training that adds gammaL1 regularization is called sparse training.

After sparse training, the sparsely small layers are cut out, and the corresponding activations are also very small, so the impact on the following is very small. By repeatedly iterating this process, a small model can be obtained. The steps are shown in Figure 1.

Insert image description here

figure 1

3.2 network slimming process

① First initialize the network, add L1 regularization to the parameters of the BN layer and train the network.

② Statistics of γ (gamma) in the network, and set the pruning rate to prune the network.

③ Finetune the pruned network to complete the pruning work.

4 specific implementation steps

4.1 Install virtual environment

Unzip the downloaded source code, enter the yolov5prune_6.0 directory, and perform the following operations in sequence

# 1 创建虚拟环境
conda create -n yolov5prune
# 2 激活虚拟环境
conda activate yolov5prune
# 3 安装虚拟环境(根据yolov5prune_6.0根目录下的requirements.txt安装)
pip install -r requirements.txt

4.2 Configuration parameters

4.2.1 Dataset parameters

The structure of your own data set is as follows

--datasTrain
------images
----------train     	# 存放训练数据集的图片(.jpg)
----------val
----------test
------labels
----------train			# 存放训练图片对应的标签文件(.txt)
----------val
----------tes

In the /yolov5prune_6.0/data/ directory, create the my_yolov5.yaml file following the structure in coco128.yaml. The contents are as follows

# Train/val/test sets as 
# 1) dir: path/to/imgs, 
# 2) file: path/to/imgs.txt, or 
# 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: /home/user/hlj/MyTrain/datasTrain3_More/  # dataset root dir
train: images/train/  # train images (relative to 'path') 128 images
val: images/val/      # val images (relative to 'path') 128 images
test:  images/test/   # test images (optional)


nc: 11  # number of classes
names: ['pedes', 'car', 'bus', 'truck', 'bike', 'elec', 'tricycle', 'coni', 'warm', 'tralight', 'specVehi']

4.2.2 Model structure parameters

Modify the target detection type in yolov5prune_6.0/models/yolov5s.yaml to adapt it to the number of target detection types in your own data set. as follows

nc: 11

4.2.3 Parameters in train.py

Set the parameters in train.py, mainly including the following:

'--weights', default='./yolov5s.pt'  # 由于我要从头训练,所以注释了此参数
'--cfg', default='./models/yolov5s.yaml'
'--data', default='./data/my_yolov5.yaml'
'--epochs', default=300 		# 由于从头训练,所以epochs值设的比较大
'--batch-size', default=-1
'--imgsz', default=640			# 考虑部署

4.3 Normal training

4.3.1 Preparation

Since I'm SSHing, create/open a tmux session first

tmux new -s prunesession

If [press ctrl+b first, then press d alone] to exit the session, you need to use the command to enter the session next time

tmux a -t prunesession

To enter the session, first enter the project directory and activate the virtual environment (if it has been activated, you can ignore it)

cd ../yolov5prune_6.0/
source activate yolov5prune

Delete the session after training

tmux kill-session -t prunesession

4.3.2 Training and problem solving

Execute the following command to perform training

python3 train.py

【Question 1】

After running the train.py file, the following error was reported

ModuleNotFoundError: No module named 'utils.loggers.wandb'

It prompts that the package is missing. According to other people's strategies, download the code corresponding to the yolov5_6.0 version of U God, and then copy the entire wandb folder in the yolov5_6.0\utils\loggers\ directory to the yolov5prune_6.0\utils\loggers directory.

【Question 2】

After re-entering python3 train.py, the following problem is reported. It can be seen that when setting parameters in train.py, the '--weights' parameter cannot be commented out.

AttributeError: 'Namespace' object has no attribute 'weights'

Therefore, the 'weights' parameter is set as follows, which means that the pre-trained weights will not be used and the model will be trained from scratch.

'--weights', default=''

【Question 3】

I don’t know why, but another numpy problem has been reported on ubuntu as follows. There is no such problem when running locally.

raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'int'.

It turns out that the new version of numpy does not have np.int, which can be solved by modifying the source code.

Modify all...astype(np.int) in datasets.py in the yolov5prune_6.0/utils/ directory to...astype(int), as shown below:

441  bi = np.floor(np.arange(n) / batch_size).astype(int)  # batch index
483  self.batch_shapes = np.ceil(np.array(shapes) * img_size / stride + pad).astype(int) * stride
854  b = xywh2xyxy(b.reshape(-1, 4)).ravel().astype(int)

Modify all...astype(np.int) in general.py in the yolov5prune_6.0/utils/ directory to...astype(int), as shown below:

510  classes = labels[:, 0].astype(int)  # labels = [class xywh]
525  class_counts = np.array([np.bincount(x[:, 0].astype(int), minlength=nc) for x in labels])

【Question 4】

File "/home/user/hlj/MyTrain/yolov5prune_6.0/utils/loss.py", line 217, in build_targets
indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1))) 
RuntimeError: result type Float can't be cast to the desired output type long int

Reference ref: https://blog.csdn.net/Thebest_jack/article/details/125649451 Perform the following operations:

Modify the source code of loss.py in the yolov5prune_6.0/utils/ directory,

#(1) 183行左右
for i in range(self.nl):
    anchors, shape = self.anchors[i], p[i].shape   # anchors = self.anchors[i]
    gain[2:6] = torch.tensor(p[i].shape)[[3, 2, 3, 2]]  # xyxy gain
#(2)218行后
# indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1)))  
上一行代码改为如下
indices.append((b, a, gj.clamp_(0, shape[2] - 1), gi.clamp_(0, shape[3] - 1)))  # image, anchor, grid indices            

【Question 5】

During the epoch, the following questions are reported:

File "..../yolov5prune_6.0/utils/plots.py", line 116, in text
w, h = self.font.getsize(text)  # text width, height
AttributeError: 'FreeTypeFont' object has no attribute 'getsize'

This is because a new version of Pillow was installed, pip install tf-models-official removed the getsize feature
, and downgrading to Pillow 9.5 solved the problem. You can try the following methods to solve the problem, see [Question 8]:

pip install Pillow==9.5

【Question 6】

After epoch 0 ends and val ends, the following problem is reported

File ".....\yolov5prune_6.0\utils\callbacks.py", line 77, in run
    logger['callback'](*args, **kwargs)
TypeError: on_fit_epoch_end() missing 1 required positional argument: 'fi'

Find the official source code and copy the entire loggers file under yolov5_6.0/utils/. It should be fine. It may be caused by inconsistent versions.

【Question 7】

yolov5prune_6.0/utils/general.py line471
return re.sub(pattern="[|@#!?·$€%&()=??^*;:,¨′><+]", repl="_", string=s)
SyntaxError:(unicoda error)'utf-8' code can't decode byte 0xal in position 6: invalid start byte。 

It should be a problem that the 'utf-8' code is not supported. I added the following encoding format, but it was not solved in the end. I looked at the function of the corresponding function and found that it was just to clean up the string (replacing special characters with underscores), so I directly changed that line of code, which had no impact on the entire program.

# -*- coding: utf-8 -*-

【Summarize】

Although various problems continued, and I didn't care about [Question 5], in the end, python3 train.py finally ran normally.

4.4 Sparse training

4.4.1 Parameter settings

Set the parameters of train_sparity.py

'--st', action='store_true',default=True,
'--sr', type=float, default=0.0001,
'--weights', type=str, default=ROOT / '',
'--cfg', type=str, default='./models/yolov5s.yaml',
'--data', type=str, default='./data/my_yolov5.yaml',
'--epochs', type=int, default=300
'--batch-size', type=int, default=-1,   # 注意【问题8】的发生
'--imgsz', '--img', '--img-size', type=int, default=640,
'--adam', action='store_true', default=True, 

4.4.2 Sparse training and problems

Execute the following command to perform sparse training

python train_sparity.py

Insert image description here

【Question 8】

loggers.on_params_update({"batch_size": batch_size})
AttributeError: 'Loggers' object has no attribute 'on_params_update'

It seems to be caused by autobatch, so I changed the parameters '–batch-size', type=int, default=-1 to the fixed value default=2 first. After that, epoch0 can be normal. But there is still the problem of [Problem 5]. Although it does not affect the training, I feel that it should be solved. After all, it is an AttributeError problem. The solution is as follows:

# pillow版本太新的原因,新版的getsize属性被删除掉了。
pip3 uninstall pillow
pip3 nstall pillow==9.5

【Question 9】

After the val of Epoch0 ended, the following problem was reported

File "/home/user/hlj/MyTrain/yolov5prune_6.0/utils/callbacks.py", line 77, in run
logger['callback'](*args, **kwargs)
TypeError: Loggers.on_fit_epoch_end() takes 5 positional arguments but 6 were given

This problem is due to the fact that in order to solve [Problem 6], I replaced the utils/loggers/init.py file in the project with the official file. I found def on_fit_epoch_end(self, vals, bn_weights, epoch, best_fitness in the init.py file , fi) To reduce bn_weights, re-copy the file under the prune project in this project.

4.5 Pruning

4.5.1 Parameter settings

​Set the cropping ratio parameters, you can try from small to large. Note that the model file of cfg needs to correspond to the weights, otherwise there will be a problem of mismatch of key values ​​during the process of running prune. The corresponding model pruned_model.pt will be saved after the trimming is completed.

In the prune.py file, modify the following parameters

'--data', type=str, default=ROOT / 'data/my_yolov5.yaml',
'--weights', nargs='+', type=str, default=ROOT / 'runs/train/spaweight/last.pt'
'--cfg', type=str, default='./models/yolov5s.yaml',
'--percent', type=float, default=0.1,
'--batch-size', type=int, default=16, 
'--imgsz', '--img', '--img-size', type=int, default=640,

run

python prune.py

【Question 10】

SyntaxError: Non-UTF-8 code starting with '\xe5' in file /home/user/hlj/MyTrain/yolov5prune_6.0/prune.py on line 400, but no encoding declared; see https://peps.python.org/pep-0263/ for details

Solution: Find the corresponding line and find that it is a problem with the format of the comment content code. Delete it or change the Chinese to English.

【Question 11】

return func(*args, **kwargs)
TypeError: run() got an unexpected keyword argument 'cfg'

The solution is to add parameters in the run() function of prune.py source code as follows:

cfg = './model/yolov5s.yaml'

4.5.2 Pruning

Prune the model best.pt after sparse training.

If the parameters have been set, directly execute python prune.py

python prune.py

Otherwise, the weight passed in is the weight obtained by sparse training.

python prune.py --weights runs/train/exp_sparity/weights/best.pt --percent 0.5 --cfg models/yolov5s.yaml

After cutting is completed, the corresponding model pruned_model.pt will be saved in the root directory.

4.6 finetune pruned network

4.6.1 Parameter settings

Change the relevant parameters of finetune_pruned.py as follows

'--weights', type=str, default=ROOT / 'pruned_model.pt',
'--cfg', type=str, default='./models/yolov5s.yaml',
'--data', type=str, default=ROOT / 'data/my_yolov5.yaml', 
'--epochs', type=int, default=100
'--batch-size', type=int, default=16, 
'--imgsz', '--img', '--img-size', type=int, default=640,
'--adam', action='store_true', default=True, 
'--workers', type=int, default=8, 
'--project', default=ROOT / 'runs/finetune',

4.6.2 fine-tune

If the parameters in finetune_pruned.py have not been modified, execute the following.

python finetune_pruned.py --weights pruned_model.pt --adam --epochs 100

Since the parameters in finetune_pruned.py are directly modified, execute directly

python finetune_pruned.py

When executing, [Problem 9] is reported. Fine_tune can start normally according to the relevant solutions.

4.7 Cyclic sparse training->Pruning->finetune network

This time I only did one round of sparsification->pruning->fine-tuning. Neither normal training nor sparse training uses pre-trained models. The comparison includes finetune results after cropping at different scales of the model (10%, 20%, 30%).

Table 2 Comparison of target detection model pruning

Train Sparity Before Prune fintune fintune fintune
Prune0.1 Prune0.2 Prune0.3
modelSize(M) 13.8M 13.8M 13.8 12.3M 10.5M 8.68M
P 0.915 0.923 0.919 0.916 0.899 0.892
R 0.832 0.808 0.816 0.829 0.84 0.842
[email protected] 0.885 0.879 0.881 0.887 0.887 0.888
[email protected]:.95 0.642 0.628 0.633 0.639 0.64 0.642

(1) It can be seen that it is feasible to prune 30% of the yolov5s model while ensuring the basic mAP.

(2) Considering the pertinence of the model, it is usually difficult for us to train a model with high accuracy and strong generalization ability, so we often need to prune and fine-tune the project data and model.

(3) Test the effect of the fine-tuned model on detecting real-time video streams. Since the model training scale is 640, the recall rate of distant targets is low and there is a phenomenon of missed detection when intersection traffic is dense.

Guess you like

Origin blog.csdn.net/qq_42835363/article/details/132500571