1. First, let me show you the configuration of my computer. Open cmd through win+R and input dxdiag, and open the diagnostic tool of directx to see it.
This is just to prove that I have a graphics card installed on my computer. As for anconda environment setup, interpreter generation, and pytorch installation, there are operation steps on the Internet, so I won’t list them here.
2. Run train.py to see
C:\Users\admin\.conda\envs\yolov7_ch\python.exe C:\dev\yolov7\train.py
YOLOR v0.1-116-g8c0bf3f torch 1.13.1+cu116 CUDA:0 (NVIDIA GeForce RTX 3080, 10239.5MB)
Namespace(adam=False, artifact_alias='latest', batch_size=16, bbox_interval=-1, bucket='', cache_images=False, cfg='', data='data/coco.yaml', device='', entity=None, epochs=300, evolve=False, exist_ok=False, freeze=[0], global_rank=-1, hyp='data/hyp.scratch.p5.yaml', image_weights=False, img_size=[640, 640], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, project='runs/train', quad=False, rect=False, resume=False, save_dir='runs\\train\\exp27', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=16, upload_dataset=False, v5_metric=False, weights='yolov7.pt', workers=1, world_size=1)
tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.3, cls_pw=1.0, obj=0.7, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.2, scale=0.9, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.15, copy_paste=0.0, paste_in=0.15, loss_ota=1
wandb: Install Weights & Biases for YOLOR logging with 'pip install wandb' (recommended)
Overriding model.yaml nc=80 with nc=1
from n params module arguments
0 -1 1 928 models.common.Conv [3, 32, 3, 1]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 36992 models.common.Conv [64, 64, 3, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 8320 models.common.Conv [128, 64, 1, 1]
5 -2 1 8320 models.common.Conv [128, 64, 1, 1]
6 -1 1 36992 models.common.Conv [64, 64, 3, 1]
7 -1 1 36992 models.common.Conv [64, 64, 3, 1]
8 -1 1 36992 models.common.Conv [64, 64, 3, 1]
9 -1 1 36992 models.common.Conv [64, 64, 3, 1]
10 [-1, -3, -5, -6] 1 0 models.common.Concat [1]
11 -1 1 66048 models.common.Conv [256, 256, 1, 1]
12 -1 1 0 models.common.MP []
13 -1 1 33024 models.common.Conv [256, 128, 1, 1]
14 -3 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 147712 models.common.Conv [128, 128, 3, 2]
16 [-1, -3] 1 0 models.common.Concat [1]
17 -1 1 33024 models.common.Conv [256, 128, 1, 1]
18 -2 1 33024 models.common.Conv [256, 128, 1, 1]
19 -1 1 147712 models.common.Conv [128, 128, 3, 1]
20 -1 1 147712 models.common.Conv [128, 128, 3, 1]
21 -1 1 147712 models.common.Conv [128, 128, 3, 1]
22 -1 1 147712 models.common.Conv [128, 128, 3, 1]
23 [-1, -3, -5, -6] 1 0 models.common.Concat [1]
24 -1 1 263168 models.common.Conv [512, 512, 1, 1]
25 -1 1 0 models.common.MP []
26 -1 1 131584 models.common.Conv [512, 256, 1, 1]
27 -3 1 131584 models.common.Conv [512, 256, 1, 1]
28 -1 1 590336 models.common.Conv [256, 256, 3, 2]
29 [-1, -3] 1 0 models.common.Concat [1]
30 -1 1 131584 models.common.Conv [512, 256, 1, 1]
31 -2 1 131584 models.common.Conv [512, 256, 1, 1]
32 -1 1 590336 models.common.Conv [256, 256, 3, 1]
33 -1 1 590336 models.common.Conv [256, 256, 3, 1]
34 -1 1 590336 models.common.Conv [256, 256, 3, 1]
35 -1 1 590336 models.common.Conv [256, 256, 3, 1]
36 [-1, -3, -5, -6] 1 0 models.common.Concat [1]
37 -1 1 1050624 models.common.Conv [1024, 1024, 1, 1]
38 -1 1 0 models.common.MP []
39 -1 1 525312 models.common.Conv [1024, 512, 1, 1]
40 -3 1 525312 models.common.Conv [1024, 512, 1, 1]
41 -1 1 2360320 models.common.Conv [512, 512, 3, 2]
42 [-1, -3] 1 0 models.common.Concat [1]
43 -1 1 262656 models.common.Conv [1024, 256, 1, 1]
44 -2 1 262656 models.common.Conv [1024, 256, 1, 1]
45 -1 1 590336 models.common.Conv [256, 256, 3, 1]
46 -1 1 590336 models.common.Conv [256, 256, 3, 1]
47 -1 1 590336 models.common.Conv [256, 256, 3, 1]
48 -1 1 590336 models.common.Conv [256, 256, 3, 1]
49 [-1, -3, -5, -6] 1 0 models.common.Concat [1]
50 -1 1 1050624 models.common.Conv [1024, 1024, 1, 1]
51 -1 1 7609344 models.common.SPPCSPC [1024, 512, 1]
52 -1 1 131584 models.common.Conv [512, 256, 1, 1]
53 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
54 37 1 262656 models.common.Conv [1024, 256, 1, 1]
55 [-1, -2] 1 0 models.common.Concat [1]
56 -1 1 131584 models.common.Conv [512, 256, 1, 1]
57 -2 1 131584 models.common.Conv [512, 256, 1, 1]
58 -1 1 295168 models.common.Conv [256, 128, 3, 1]
59 -1 1 147712 models.common.Conv [128, 128, 3, 1]
60 -1 1 147712 models.common.Conv [128, 128, 3, 1]
61 -1 1 147712 models.common.Conv [128, 128, 3, 1]
62[-1, -2, -3, -4, -5, -6] 1 0 models.common.Concat [1]
63 -1 1 262656 models.common.Conv [1024, 256, 1, 1]
64 -1 1 33024 models.common.Conv [256, 128, 1, 1]
65 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
66 24 1 65792 models.common.Conv [512, 128, 1, 1]
67 [-1, -2] 1 0 models.common.Concat [1]
68 -1 1 33024 models.common.Conv [256, 128, 1, 1]
69 -2 1 33024 models.common.Conv [256, 128, 1, 1]
70 -1 1 73856 models.common.Conv [128, 64, 3, 1]
71 -1 1 36992 models.common.Conv [64, 64, 3, 1]
72 -1 1 36992 models.common.Conv [64, 64, 3, 1]
73 -1 1 36992 models.common.Conv [64, 64, 3, 1]
74[-1, -2, -3, -4, -5, -6] 1 0 models.common.Concat [1]
75 -1 1 65792 models.common.Conv [512, 128, 1, 1]
76 -1 1 0 models.common.MP []
77 -1 1 16640 models.common.Conv [128, 128, 1, 1]
78 -3 1 16640 models.common.Conv [128, 128, 1, 1]
79 -1 1 147712 models.common.Conv [128, 128, 3, 2]
80 [-1, -3, 63] 1 0 models.common.Concat [1]
81 -1 1 131584 models.common.Conv [512, 256, 1, 1]
82 -2 1 131584 models.common.Conv [512, 256, 1, 1]
83 -1 1 295168 models.common.Conv [256, 128, 3, 1]
84 -1 1 147712 models.common.Conv [128, 128, 3, 1]
85 -1 1 147712 models.common.Conv [128, 128, 3, 1]
86 -1 1 147712 models.common.Conv [128, 128, 3, 1]
87[-1, -2, -3, -4, -5, -6] 1 0 models.common.Concat [1]
88 -1 1 262656 models.common.Conv [1024, 256, 1, 1]
89 -1 1 0 models.common.MP []
90 -1 1 66048 models.common.Conv [256, 256, 1, 1]
91 -3 1 66048 models.common.Conv [256, 256, 1, 1]
92 -1 1 590336 models.common.Conv [256, 256, 3, 2]
93 [-1, -3, 51] 1 0 models.common.Concat [1]
94 -1 1 525312 models.common.Conv [1024, 512, 1, 1]
95 -2 1 525312 models.common.Conv [1024, 512, 1, 1]
96 -1 1 1180160 models.common.Conv [512, 256, 3, 1]
97 -1 1 590336 models.common.Conv [256, 256, 3, 1]
98 -1 1 590336 models.common.Conv [256, 256, 3, 1]
99 -1 1 590336 models.common.Conv [256, 256, 3, 1]
100[-1, -2, -3, -4, -5, -6] 1 0 models.common.Concat [1]
101 -1 1 1049600 models.common.Conv [2048, 512, 1, 1]
102 75 1 328704 models.common.RepConv [128, 256, 3, 1]
103 88 1 1312768 models.common.RepConv [256, 512, 3, 1]
104 101 1 5246976 models.common.RepConv [512, 1024, 3, 1]
105 [102, 103, 104] 1 32310 models.yolo.Detect [1, [[12, 16, 19, 36, 40, 28], [36, 75, 76, 55, 72, 146], [142, 110, 192, 243, 459, 401]], [256, 512, 1024]]
Model Summary: 407 layers, 37194710 parameters, 37194710 gradients
Transferred 554/560 items from yolov7.pt
Scaled weight_decay = 0.0005
Optimizer groups: 95 .bias, 95 conv.weight, 92 other
train: Scanning 'C:\img\yolo\people\labels\train.cache' images and labels... 10000 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 10000/10000 [00:00<?, ?it/s]
val: Scanning 'C:\img\yolo\people\labels\val.cache' images and labels... 300 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 300/300 [00:00<?, ?it/s]
autoanchor: Analyzing anchors... anchors/target = 4.43, Best Possible Recall (BPR) = 0.9948
Image sizes 640 train, 640 test
Using 1 dataloader workers
Logging results to runs\train\exp27
Starting training for 300 epochs...
Epoch gpu_mem box obj cls total labels img_size
0%| | 0/625 [00:03<?, ?it/s]
Traceback (most recent call last):
File "C:\dev\yolov7\train.py", line 619, in <module>
train(hyp, opt, device, tb_writer)
File "C:\dev\yolov7\train.py", line 362, in train
pred = model(imgs) # forward
File "C:\Users\admin\.conda\envs\yolov7_ch\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\dev\yolov7\models\yolo.py", line 599, in forward
return self.forward_once(x, profile) # single-scale inference, train
File "C:\dev\yolov7\models\yolo.py", line 625, in forward_once
x = m(x) # run
File "C:\Users\admin\.conda\envs\yolov7_ch\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\dev\yolov7\models\common.py", line 507, in forward
return self.act(self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 10.00 GiB total capacity; 9.25 GiB already allocated; 0 bytes free; 9.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Process finished with exit code 1
Here is an error, but it is detected that the GPU is used,
YOLOR v0.1-116-g8c0bf3f torch 1.13.1+cu116 CUDA:0 (NVIDIA GeForce RTX 3080, 10239.5MB)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 10.00 GiB total capacity; 9.25 GiB already allocated; 0 bytes free; 9.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Is this problem related to video memory?
Error: Out of CUDA memory. Try to allocate 26.00 MiB (GPU 0; 10.00 GiB total capacity; 9.25 GiB already allocated; 0 bytes free; PyTorch has a total of 9.30 GiB reserved) if reserved memory > > allocate memory try setting max_split_size_mb to avoid debris. See documentation on memory management and PYTORCH_CUDA_ALLOC_CONF
3. Adjust the disk virtual memory to see, the effect is not great.
After setting as shown in the figure, you need to restart. The validation of the above settings is invalid, and the same error is reported.
4. Check the installation of moudle packages, and use pip to check the results
(yolov7_ch) C:\dev\yolov7>pip list
Package Version
----------------------- ------------
absl-py 1.4.0
backcall 0.2.0
brotlipy 0.7.0
cachetools 5.3.0
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 2.0.4
colorama 0.4.6
cryptography 38.0.4
cv 1.0.0
cycler 0.11.0
decorator 5.1.1
fonttools 4.38.0
google-auth 2.16.1
google-auth-oauthlib 0.4.6
grpcio 1.51.3
idna 3.4
importlib-metadata 6.0.0
ipython 7.31.1
jedi 0.18.1
kiwisolver 1.4.4
Markdown 3.4.1
MarkupSafe 2.1.2
matplotlib 3.5.3
matplotlib-inline 0.1.6
numpy 1.21.6
oauthlib 3.2.2
opencv-contrib-python 4.7.0.72
opencv-python 4.7.0.72
packaging 23.0
panda 0.3.1
pandas 1.3.5
parso 0.8.3
pickleshare 0.7.5
Pillow 9.3.0
pip 22.3.1
prompt-toolkit 3.0.36
protobuf 3.20.3
psutil 5.9.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.21
Pygments 2.11.2
pyOpenSSL 22.0.0
pyparsing 3.0.9
PySocks 1.7.1
python-dateutil 2.8.2
pytz 2022.7.1
PyYAML 6.0
requests 2.28.1
requests-oauthlib 1.3.1
rsa 4.9
scipy 1.7.3
seaborn 0.12.2
setuptools 65.6.3
six 1.16.0
tensorboard 2.11.2
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
torch 1.13.1+cu116
torchaudio 0.13.1+cu116
torchvision 0.14.1+cu116
tqdm 4.64.1
traitlets 5.7.1
typing_extensions 4.5.0
urllib3 1.26.14
wcwidth 0.2.5
Werkzeug 2.2.3
wheel 0.38.4
win-inet-pton 1.1.0
wincertstore 0.2
zipp 3.14.0
(yolov7_ch) C:\dev\yolov7>
4. Use conda to view the results
(yolov7_ch) C:\dev\yolov7>conda list
# packages in environment at C:\Users\admin\.conda\envs\yolov7_ch:
#
# Name Version Build Channel
absl-py 1.4.0 pypi_0 pypi
backcall 0.2.0 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
brotlipy 0.7.0 py37h2bbff1b_1003 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ca-certificates 2023.01.10 haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cachetools 5.3.0 pypi_0 pypi
certifi 2022.12.7 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cffi 1.15.1 py37h2bbff1b_3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
charset-normalizer 2.0.4 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
colorama 0.4.6 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cryptography 38.0.4 py37h21b164f_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
cv 1.0.0 pypi_0 pypi
cycler 0.11.0 pypi_0 pypi
decorator 5.1.1 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
fonttools 4.38.0 pypi_0 pypi
freetype 2.12.1 ha860e81_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
google-auth 2.16.1 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
grpcio 1.51.3 pypi_0 pypi
idna 3.4 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
importlib-metadata 6.0.0 pypi_0 pypi
ipython 7.31.1 py37haa95532_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jedi 0.18.1 py37haa95532_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
jpeg 9e h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
kiwisolver 1.4.4 pypi_0 pypi
lerc 3.0 hd77b12b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libdeflate 1.8 h2bbff1b_5 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libpng 1.6.37 h2a8f88b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libprotobuf 3.20.3 h23ce68f_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libtiff 4.5.0 h6c2663c_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libwebp 1.2.4 h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libwebp-base 1.2.4 h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
lz4-c 1.9.4 h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
markdown 3.4.1 pypi_0 pypi
markupsafe 2.1.2 pypi_0 pypi
matplotlib 3.5.3 pypi_0 pypi
matplotlib-inline 0.1.6 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
numpy 1.21.6 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
opencv-contrib-python 4.7.0.72 pypi_0 pypi
opencv-python 4.7.0.72 pypi_0 pypi
openssl 1.1.1t h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
packaging 23.0 pypi_0 pypi
panda 0.3.1 pypi_0 pypi
pandas 1.3.5 pypi_0 pypi
parso 0.8.3 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pickleshare 0.7.5 pyhd3eb1b0_1003 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pillow 9.3.0 py37hd77b12b_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pip 22.3.1 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
prompt-toolkit 3.0.36 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
protobuf 3.20.3 py37hd77b12b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
psutil 5.9.0 py37h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pycparser 2.21 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pygments 2.11.2 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pyopenssl 22.0.0 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pyparsing 3.0.9 pypi_0 pypi
pysocks 1.7.1 py37_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python 3.7.16 h6244533_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python-dateutil 2.8.2 pypi_0 pypi
pytz 2022.7.1 pypi_0 pypi
pyyaml 6.0 py37h2bbff1b_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
requests 2.28.1 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
requests-oauthlib 1.3.1 pypi_0 pypi
rsa 4.9 pypi_0 pypi
scipy 1.7.3 pypi_0 pypi
seaborn 0.12.2 pypi_0 pypi
setuptools 65.6.3 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
six 1.16.0 pypi_0 pypi
sqlite 3.40.1 h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tensorboard 2.11.2 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pypi_0 pypi
tk 8.6.12 h2bbff1b_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
torch 1.13.1+cu116 pypi_0 pypi
torchaudio 0.13.1+cu116 pypi_0 pypi
torchvision 0.14.1+cu116 pypi_0 pypi
tqdm 4.64.1 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
traitlets 5.7.1 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
typing-extensions 4.5.0 pypi_0 pypi
urllib3 1.26.14 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
vc 14.2 h21ff451_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
vs2015_runtime 14.27.29016 h5e58377_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
wcwidth 0.2.5 pyhd3eb1b0_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
werkzeug 2.2.3 pypi_0 pypi
wheel 0.38.4 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
win_inet_pton 1.1.0 py37haa95532_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
wincertstore 0.2 py37haa95532_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xz 5.2.10 h8cc25b3_1 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
yaml 0.2.5 he774522_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
zipp 3.14.0 pypi_0 pypi
zlib 1.2.13 h8cc25b3_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
zstd 1.5.2 h19a0ad4_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
(yolov7_ch) C:\dev\yolov7>
A new problem appeared during the operation. According to the online solution, set num_workers to 0. I don’t know where this parameter is, but it is related to this line of code.
if __name__ == '__main__':
parser = argparse.ArgumentParser()
......
parser.add_argument('--workers', type=int, default=8, help='maximum number of dataloader workers')
The value of default here is 8 by default, and it appears when running train.py
[Errno 32] Broken pipe 'This problem is because the computer cannot handle it.
I changed default=8 to default=0, but the operation still reported an error.
Here is a test program, create train_test.py, and run the following code, and check whether it is running on the GPU through nvidia-smi, and which one it is running on.
import torch
from torchvision.models import resnet50, resnet152
if __name__ == '__main__':
# 虽然这里设置cuda:0,但实际使用的是1号gpu
device = torch.device('cuda:0' if torch.cuda.is_available else 'cpu')
print(f'当前设备为:{torch.cuda.current_device()}')
model = resnet152(num_classes=10)
model.to(device)
# 使用res152做1000次前向推断,batch-size设置为16
for i in range(1000):
X = torch.randn(16,3,224,224).to(device)
y = model(X)
print(f'id:{i+1:3d}:{y}')
Through the above program, the test is indeed running on the GPU. The setting here is cuda:0
Find a solution to this problem from this blog;
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)_Ggggm_28's blog-CSDN blog YOLOv7 error: RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) https://blog.csdn.net/Ggggm_28/article/details/129005769
Modify under the yolo7/utils/loss.py file and add .to(torch.device('cuda:0')) after the corresponding code
if (anchor_matching_gt > 1).sum() > 0:
_, cost_argmin = torch.min(cost[:, anchor_matching_gt > 1], dim=0)
matching_matrix[:, anchor_matching_gt > 1] *= 0.0
matching_matrix[cost_argmin, anchor_matching_gt > 1] = 1.0
fg_mask_inboxes = matching_matrix.sum(0) > 0.0
fg_mask_inboxes = fg_mask_inboxes.to(torch.device('cuda:0'))
matched_gt_inds = matching_matrix[:,
fg_mask_inboxes.to(torch.device('cuda:0'))].argmax(0)
from_which_layer = from_which_layer[fg_mask_inboxes.to(torch.device('cuda:0'))]
all_b = all_b[fg_mask_inboxes.to(torch.device('cuda:0'))]
all_a = all_a[fg_mask_inboxes.to(torch.device('cuda:0'))]
all_gj = all_gj[fg_mask_inboxes.to(torch.device('cuda:0'))]
all_gi = all_gi[fg_mask_inboxes.to(torch.device('cuda:0'))]
all_anch = all_anch[fg_mask_inboxes.to(torch.device('cuda:0'))]
Finally, it ran successfully. I used 10,000 sample data and 300 verification samples to train the doll and see how it works.
It took a long time to debug various errors. To be honest, my level is not enough. Some problems are also checked online and asked other people. Especially in the construction environment, the version matching of the module components is very important. The running environment created by general python3.9 has reported an error and has not found a solution, but I re-created it with python3.7. After many attempts to run it is No problem, the problems that arise have been solved one by one according to the above situation.
I hope it can bring reference value to my partners on the learning journey.