"Analysis" cache training model to enhance performance

The model used in the experiment is YOLOv5-7.0 version, and the m model configuration is convenient for experimental testing.coco2012 valdataset as an example,

It can be seen that training through ram has not improved significantly. The analysis reason may be that the hard disk itself has a faster reading speed, which can meet the throughput of the graphics card, but training through ram takes up a lot of memory. If the memory is small enough Partners can try to use ram for training. If the memory is small, you can consider replacing it with a faster solid-state drive as a data disk.

In addition, if the system has many processes, it will reduce the performance of the CPU and the reading speed of the hard disk, thereby reducing the data reading of the GPU and affecting the computing performance. Therefore, it is recommended to use the ram form for training.

""" 博主的主机配置 """

CPU		: 	Intel 13700k
GPU		:	Nvidia 4090 
硬盘	:	致钛TiPro7000
内存	:	金士顿 FURY D5 6000 EXPO 16G x 4
主板	:  华硕 ROG STRIX Z690-G
experiment batch size memory usage training time
ram 16 28.7GB 28:29
disk 16 7.2GB 28:17
ram auto 29.5GB 20:24
disk auto 8.3GB 20:29

Move Data to ram

Experimental results

batch size 为16,YOLOv5-7.0 m 模型
# From ram
Transferred 481/481 items from yolov5m.pt
AMP: checks passed 
optimizer: SGD(lr=0.01) with parameter groups 79 weight(decay=0.0), 82 weight(decay=0.0005), 82 bias
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), 
				CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
train: Scanning G:\coco2017\labels\train2017.cache... 117266 images, 1021 backgrounds, 0 corrupt: 100%|██████████| 118287/118287 00:00
train: 95.1GB RAM required, 42.9/63.7GB available, not caching images 
val: Scanning G:\coco2017\labels\val2017.cache... 4952 images, 48 backgrounds, 0 corrupt: 100%|██████████| 5000/5000 00:00
val: Caching images (4.1GB ram): 100%|██████████| 5000/5000 00:01

AutoAnchor: 4.45 anchors/target, 0.995 Best Possible Recall (BPR). Current anchors are a good fit to dataset 
Plotting labels to runs\train\exp5\labels.jpg... 
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs\train\exp5
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
       0/99      5.72G    0.03863    0.05964    0.01552        206        640: 100%|██████████| 7393/7393 28:29
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 157/157 00:28
                   all       5000      36335       0.69      0.562      0.606      0.415


#---------------------------------------------------------------------------------------------------------------------------------

# From Disk
Transferred 481/481 items from yolov5m.pt
AMP: checks passed 
optimizer: SGD(lr=0.01) with parameter groups 79 weight(decay=0.0), 82 weight(decay=0.0005), 82 bias
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), 
				CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
train: Scanning G:\coco2017\labels\train2017.cache... 117266 images, 1021 backgrounds, 0 corrupt: 100%|██████████| 118287/118287 00:00
val: Scanning G:\coco2017\labels\val2017.cache... 4952 images, 48 backgrounds, 0 corrupt: 100%|██████████| 5000/5000 00:00

AutoAnchor: 4.45 anchors/target, 0.995 Best Possible Recall (BPR). Current anchors are a good fit to dataset 
Plotting labels to runs\train\exp4\labels.jpg... 
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs\train\exp4
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
       0/99      5.72G    0.03863    0.05964    0.01552        206        640: 100%|██████████| 7393/7393 28:17
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 157/157 00:27
                   all       5000      36335       0.69      0.562      0.606      0.415

1、AutoBatch From ram

insert image description here

insert image description here

insert image description here

3、From ram

insert image description here

insert image description here
insert image description here

From Disk Autobatch
insert image description here


2、From Disk

insert image description here

insert image description here

insert image description here

Guess you like

Origin blog.csdn.net/ViatorSun/article/details/130198476