The model used in the experiment is YOLOv5-7.0 version, and the m model configuration is convenient for experimental testing.coco2012 valdataset as an example,
It can be seen that training through ram has not improved significantly. The analysis reason may be that the hard disk itself has a faster reading speed, which can meet the throughput of the graphics card, but training through ram takes up a lot of memory. If the memory is small enough Partners can try to use ram for training. If the memory is small, you can consider replacing it with a faster solid-state drive as a data disk.
In addition, if the system has many processes, it will reduce the performance of the CPU and the reading speed of the hard disk, thereby reducing the data reading of the GPU and affecting the computing performance. Therefore, it is recommended to use the ram form for training.
""" 博主的主机配置 """
CPU : Intel 13700k
GPU : Nvidia 4090
硬盘 : 致钛TiPro7000
内存 : 金士顿 FURY D5 6000 EXPO 16G x 4
主板 : 华硕 ROG STRIX Z690-G
experiment | batch size | memory usage | training time |
---|---|---|---|
ram | 16 | 28.7GB | 28:29 |
disk | 16 | 7.2GB | 28:17 |
ram | auto | 29.5GB | 20:24 |
disk | auto | 8.3GB | 20:29 |
Move Data to ram
Experimental results
batch size 为16,YOLOv5-7.0 m 模型
# From ram
Transferred 481/481 items from yolov5m.pt
AMP: checks passed
optimizer: SGD(lr=0.01) with parameter groups 79 weight(decay=0.0), 82 weight(decay=0.0005), 82 bias
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01),
CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
train: Scanning G:\coco2017\labels\train2017.cache... 117266 images, 1021 backgrounds, 0 corrupt: 100%|██████████| 118287/118287 00:00
train: 95.1GB RAM required, 42.9/63.7GB available, not caching images
val: Scanning G:\coco2017\labels\val2017.cache... 4952 images, 48 backgrounds, 0 corrupt: 100%|██████████| 5000/5000 00:00
val: Caching images (4.1GB ram): 100%|██████████| 5000/5000 00:01
AutoAnchor: 4.45 anchors/target, 0.995 Best Possible Recall (BPR). Current anchors are a good fit to dataset
Plotting labels to runs\train\exp5\labels.jpg...
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs\train\exp5
Starting training for 100 epochs...
Epoch GPU_mem box_loss obj_loss cls_loss Instances Size
0/99 5.72G 0.03863 0.05964 0.01552 206 640: 100%|██████████| 7393/7393 28:29
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 157/157 00:28
all 5000 36335 0.69 0.562 0.606 0.415
#---------------------------------------------------------------------------------------------------------------------------------
# From Disk
Transferred 481/481 items from yolov5m.pt
AMP: checks passed
optimizer: SGD(lr=0.01) with parameter groups 79 weight(decay=0.0), 82 weight(decay=0.0005), 82 bias
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01),
CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
train: Scanning G:\coco2017\labels\train2017.cache... 117266 images, 1021 backgrounds, 0 corrupt: 100%|██████████| 118287/118287 00:00
val: Scanning G:\coco2017\labels\val2017.cache... 4952 images, 48 backgrounds, 0 corrupt: 100%|██████████| 5000/5000 00:00
AutoAnchor: 4.45 anchors/target, 0.995 Best Possible Recall (BPR). Current anchors are a good fit to dataset
Plotting labels to runs\train\exp4\labels.jpg...
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs\train\exp4
Starting training for 100 epochs...
Epoch GPU_mem box_loss obj_loss cls_loss Instances Size
0/99 5.72G 0.03863 0.05964 0.01552 206 640: 100%|██████████| 7393/7393 28:17
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 157/157 00:27
all 5000 36335 0.69 0.562 0.606 0.415
1、AutoBatch From ram
3、From ram
From Disk Autobatch