Deep Learning Theory - Overfitting, Underfitting, Regularization, Optimizers

 

 

 

Data enhancement: 1. Don’t overdo it, otherwise it will only increase the training time and will not increase the generalization ability; 2. Do not increase irrelevant data 

 

 

 L2 regularization: tends to respond to the common characteristics of the training set samples; makes the model prefer samples with small parameters, reducing the risk of overfitting

 

 

 

 

 

 

 

 

 

 Several common optimizers

 

 

For sparse data, try to choose an optimization method with an adaptive learning rate, without manual adjustment, and it is best to use the default value.

Stochastic gradient descent algorithms usually take longer to train and tend to get stuck in saddle points, but with good initialization and learning rate schedules, the results are more reliable.

Overall, Adam is currently the best choice.

 

 

Guess you like

Origin blog.csdn.net/m0_54776464/article/details/125825846