first pit
After completing the training set and configuring the yaml file, the author started the training command
python train.py --data 我的yaml位置 --batch-size 我的每次进行一次反向传播之前需要前向计算的图片张数 --device 我的GPU编号
After reporting an error
OSError: [WinError 1455] The paging file is too small to complete the operation.
After surfing the Internet many times, I found a solution, Xiao Nali searched for environment variables
Enter the advanced settings area
In the performance option bar, enter the advanced virtual memory change area
Since my training project is on the C disk, I choose the C disk and customize the size. My disk is relatively large, and I allocate 100 G to it for training.
After the configuration is complete, the computer needs to be restarted to take effect. Please save all your files before restarting to avoid loss! !
So far, the hole has been filled.
second pit
So the author entered the second training command. At this time, the box_loss and obj_loss are both nan, which is really amazing.
After the guidance of the boss, we need to open train.py to modify it and find the statement
amp = check_amp(model) # check AMP
the location of
Replace this statement with
amp = False
as shown in the picture
This hole is filled in, and I don’t understand the specific reason, and I will study it in depth in the future. . .
train
The author conducts the third training
It can be seen that it is running normally
Open cmd and enter the command to view the training effect
tensorboard --logdir=./runs/train/exp
The last is "exp" because I am the last folder in the corresponding training directory, which is the normal training
If you are not training for the first time, please modify to the last folder name
After running this command, a webpage will be automatically opened to let us preview the training situation in real time
So far, today’s pit climbing essay has come to a successful conclusion, looking forward to the next giant pit attack