Deep learning parameter tuning skills

Recently, the parameter adjustment is very weak. I found some parameter adjustment skills from the Internet to summarize, I hope that the most accurate improvement of my own will help.

Reference Zhihu: https://www.zhihu.com/question/25097993

0. First remove the redundant calculations in the middle, and then calculate the intermediate ones after the parameters are adjusted to the accuracy. save time! ! !

1. Focus more on loss than acc.

2. Use a small data set for calculation first. If the accuracy is not improved, you should reflect on whether the model is correct. Large networks and small data sets generally overfit, and the training accuracy must be good.

3. Prioritize the learning rate adjustment. lr settings:

Too big: loss explosion, or nan

Too small: loss does not respond

Manual adjustment of lr: If the loss design is unreasonable at the beginning, it is easy to explode in the initial situation, first use a small lr to ensure that it does not explode, and then slowly increase the lr after the loss decreases, and then slowly decrease the lr

4. Compare the loss of the training set and the validation set to determine whether it is overfitting. Enough training, you need to stop early to end

5. Optimizing the function sgd adam These choices depend on your personal choice. Generally, it is not decisive for the network. Anyway, I have no brains to use sgd + momentum.

Adam works well, but it is slower than sgd+momentum training; generally you can try Adam first. Both of them are pretty good, you can try both

Comparison of optimization functions: https://img-blog.csdn.net/20160824161755284

https://img-blog.csdn.net/20160824161815758

6. Adjust the parameters of the validation set! ! !


Other notes:

1. Preprocessing:  -mean/std zero-center is enough, PCA, whitening, etc. are not used. My personal point of view, anyway, CNN can learn encoder, PCA is not really relevant, the big deal is to learn it in the network One.

2. shuffle, shuffle, shuffle.

Because generally their own data sets are ordered, and several close samples are related. Training generally requires that the data is not too correlated, otherwise it is like a piece of data that is continuously plugged into the model several times.

3.  Brainless ReLU (CV field).

4. Brainless 3x3.

5. Brainless use xavier.

6. I have never used batch normalization, although I know it's good, I don't need it just because I'm lazy. So encourage the use of batch normalization.

Can greatly speed up training and model performance

7. Don't completely believe what's in the paper. If you think about the structure or something, you can try it.

8. There is a 95% probability that a model with more than 40 layers will not be used.

9. The connection of shortcuts is useful.

10. Violent parameter tuning is the most desirable, after all, your own life is the most important. After you finish tuning this model, maybe it will be thrown away in two days.

11. Google's inception paper, take a good look at the structure.

12. Brainless maxout

13. Dropout is also a big killer to prevent overfitting. If you don’t know how to set the ratio, just set it to 0.5, which is half and half, but remember to turn off dropout when testing.


To learn:

1. maxout https://blog.csdn.net/hjimce/article/details/50414467

2. See the google inception article

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325806647&siteId=291194637