[Solution] Non appears in loss during neural network training process

 In this loop, batch_size=64, take

i=0-->[0, 63],

i=64-->[64, 127]

....

The training is normal. When the

In the loop of i=320-->[320, 383], the loss appears nan


Reason: gradient explosion

The reference article explains it more clearly, I just summarize it from the surface.

It can be found that when i=256, the gradient parameters appear larger, such as

 e+16, e+17 and other larger values


method

batch_size=64--->Changed to 32

Note: I have only tried this method. You can also adjust the learning rate, normalize and standardize the data set, etc. If the method of changing batch_size fails, I will try other methods and will add more at that time.

Guess you like

Origin blog.csdn.net/azheng02/article/details/130521767