Error example:
Nan appeared during the training process.
train epoch 0] loss: 27.854: 6%|███████ | 7/126 [00:00<00:09, 12.64it/s]WARNING: non-finite loss, ending training tensor(nan, device='cuda:0', dtype=torch.float64, grad_fn=<MseLossBackward>)
[train epoch 0] loss: nan: 6%|███████▏
To resolve the error:
1. The learning rate used is too large
When reducing the learning rate, you need to appropriately reduce the batch and increase the epoch.
2. There is a problem with your data, check the data set
If you repeatedly try to change the learning rate to no avail, then there is usually something wrong with your training data set. If you are using supervised data, then check whether your data set has empty labels. I This error occurred