We generally think that neural networks will have local optimal solutions
Like a pothole in 3D, it may sink our neural network into it and not get out
Not so
It can be seen that the loss of my door is actually a multi-dimensional neural network
Then for the i-th dimension, I can roughly think that in this dimension, the probability of his rising or falling is 0.5
We know that if we want to form a pothole, the gradient of all our points on this area must be like the following
With that probability, you can buy a lottery ticket.
So, if the training slows down, there is a high probability of hitting a saddle point
At this time, it's okay, just practice a few more times.
But, don't, don't increase learning_rate
I don't know why, but after increasing the learning_rate, the acc will quickly drop to the initial value (for example, if it is binary classification, it is 50%)
then restart training