Deep learning-01

Training strategy:

Overfitting

  1. early stopping
  2. Regularization
  3. dropout

Underfitting
5. New activation function
6. New optimizer, learning rate

skill:

  1. Input normalization
  2. Batch Normalization

Skills (reproduced from: https://zhuanlan.zhihu.com/p/25928551)

  • The model is obviously not the most important thing: it cannot be denied that a good model design is crucial to getting good results, and it is also a hot topic of academic concern. In actual use, the workload of the model actually takes up relatively little time. Although five models of CNN/RNN and their variants are introduced in the second part, the actual text classification task using CNN alone is enough to achieve very good results. Our experimental test RCNN improves the accuracy by about 1%, and Not very significant. The best practice is to first use the TextCNN model to debug the overall task effect to the best, and then try to improve the model.

  • Understand your data: Although the application of deep learning has a big advantage in that it no longer requires tedious and inefficient artificial feature engineering, but if you just treat it as a black box, it is inevitable that you will often doubt life. Be sure to understand your data and remember that data sense is always very important regardless of traditional methods or deep learning methods. Pay attention to badcase analysis and understand whether your data is suitable and why it is right and wrong.

  • Pay attention to iteration quality-record and analyze your each experiment: iteration speed is the key to the success or failure of an algorithm project, and students who have studied probability can easily agree. The important thing about algorithm projects is not only the speed of iteration, but also the quality of iteration. If you have not built a quick experimental analysis routine, no matter how fast the iteration speed is, it will only distress your company's precious computing resources. It is recommended to record each experiment, and the experimental analysis should answer at least these three questions: Why do we need to experiment? What is the conclusion? How to experiment next?

  • Super-parameter adjustment: Super-parameter adjustment is the daily routine of all tuning engineers. A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification is recommended, and some super-parameter comparisons are posted inside. Experiment, if you are just starting a text analysis task, you may wish to set the hyperparameters according to the results of the article. How to get the hyperparameter adjustment the fastest is actually a very important question. You can read this article Deep Learning Network Parameter Tuning Techniques by Xiao Se -Know the column.

  • Be sure to use dropout: There are two situations where you don't need to: the amount of data is very small, or you use a better regular method, such as bn. In practice, we tried dropout with different parameters, and the best is 0.5, so if your computing resources are very limited, the default 0.5 is a good choice.

  • Fine-tuning is mandatory: as mentioned above, if you just use the word vector trained by word2vec as a feature representation, I bet you will lose a lot of effect.

  • Softmax loss is not necessarily required: It depends on your data. If your task is non-mutual exclusion between multiple categories, you can try training multiple binary classifiers, that is, define the problem as multi lable instead of multi class. After we adjusted, the accuracy rate still increased by >1%.

  • Category imbalance problem: It is basically a conclusion that has been verified in many scenarios: if your loss is dominate by some categories, it is mostly negative in general. It is recommended to try a similar booststrap method to adjust the sample weight in loss.

  • Avoid training shock: By default, you must increase the random sampling factor as much as possible to make the data distribution iid, and the default shuffle mechanism can make the training results more stable. If the training model is still very volatile, consider adjusting the learning rate or mini_batch_size.

  • Don’t draw conclusions too early before converging: the best one is to play until the end, especially for some new angle tests. Don’t deny it easily, at least wait until convergence.

Guess you like

Origin blog.csdn.net/lovoslbdy/article/details/104860571