[Compilation of basic knowledge of model training and parameter adjustment in deep learning (continuous update...)]

Model training and parameter adjustment basic knowledge in deep learning (continuous update...)

1、SOTA

When reading paper, I often encounter expressions such as model SOTA and the effect of SOTA, and I think it is a new model.

  • SOTA model: State-Of-The-Art model, refers to the task in this field, compared with other models in this field, this is a model that is currently performing well. (The best effect, including but not limited to, the fastest speed, the least amount of model calculation, the highest accuracy rate, the smallest error, etc.)

  • SOTA result: State-Of-The-Art result, usually refers to the research tasks in this field, the results of this paper are compared with the existing models and implementation results, and the model of this paper has the best results.

2. Training set train, verification set val, test set test

  • Training set

    Use the data from the training set to train the model

  • test set

  • Calculate the error of the trained model on the test set to verify the final effect of the model.

  • Usually 80% of the data set is used as the training set and 20% as the test set;

  • It is usually necessary to divide the data set before starting to build the model to prevent data snooping bias, that is to say, to avoid knowing too much about the characteristics of the samples in the test set, and to prevent the selection of models that we think are helpful to the test set data. Such results will be too optimistic, but in fact they are not as good as expected;

  • Usually, when building a model, the data needs to be processed, including some data cleaning, data feature scaling (standardization or normalization), at this time we only need to perform these operations on the training set, and then apply the parameters obtained on the training set to the test set;

  • Since the test set is an approximation to the generalization error, the model is trained, and finally the generalization ability of the model is approximated on the test set;

  • Validation set val (validation)

  • Before model training, the data set is divided into training set and test set. We let the model train on the training set, and then approximate the generalization ability of the model on the test set.

  • Both training dataset and validation dataset work during training.

  • And because there is no intersection between the validation data set and training, this part of the data does not contribute to the final trained model.

  • The main function of validation is to verify whether it is overfitting, and to adjust training parameters, etc. In target detection, val is actually used to monitor the training process. After the hyperparameters are determined, the monitoring process is optional.

  • For example, when you train for iterations 0-10,000, the loss of both train and validation keeps decreasing,
    but from 10,000 to 20,000, the loss of train keeps decreasing, and the loss of validation does not decrease but increases.
    Then it proves that if you continue to train, the model only fits the training dataset very well, but the generalization ability is very poor. So instead of picking the result of 20,000 times, it is better to choose the result of 10,000 times.
    The name of this process is called Early Stop, and validation data is essential in this process.
    If you run the training demo that comes with caffe, you will use train_val.prototxt, where val is actually validation. The TEST layer of network input is actually validation, not test. You can determine the model you need by observing the loss of validation and the loss of train.

  • Now many do not use authentication?

    • 原因在于模型中防⽌过拟合的机制已经⽐较完善,Dropout或者BN等做的很好了。通常情况下都是对现有模型直接进⾏finetune,相较于重头开始train,这样的方式也很难过拟合。
      
  • We want to select the yolo series and SSD series models for target detection tasks. We can train the two models on the training set, and then test the two trained models on the test set. Finally, we can choose the model with a small error on the test set (yolo series or SSD series) as the model with strong generalization ability that we want to choose.

  • In addition, what we need to do is not only the comparison between different models, but often we need to choose the model itself. After passing the test set error, we finally choose the yolo series for target detection tasks, but there are many parameters in the darknet neural network that need to be manually selected, such as the number of layers of the neural network, the number of neurons in each layer of the neural network, and some parameters of regularization. We call these parameters hyperparameters . Different choices of these parameters are very important to the final effect of the model, and these hyperparameters always need to be adjusted.

  • information leakage

    • Adjust hyperparameters to enhance model generalization. When we directly use the test set as the generalization error estimate, and our ultimate goal is to choose a model with strong generalization ability, when we can directly adjust these parameters through the error of the model on the test set, the error of the model on the test set may be 0, but if you deploy such a model to a real scene for use, the effect may be very poor. This phenomenon is known as information leakage.
    • We use the test set as an approximation of the generalization error, so we cannot leak information from the test set. The questions we usually do are equivalent to the training set, and the test set is equivalent to the final exam. We use the final exam to test our final learning ability and leak the information of the test set, which means that students know the exam questions in advance, and then take these exam questions that they know in advance. Even if they get a high score, it does not mean that the students have strong learning ability.
    • To sum up, when we are studying, the teacher will prepare some classroom examples and exercises to help us check for gaps. These small samples and exercises can be called verification sets . We use the validation set as the basis for adjusting the model, which will also protect the data information in the test set.
  • Train the model on the training set and evaluate the model on the verification set. Once the best hyperparameters are found, test it on the test set for the last time. The error on the test set is used as an approximation of the generalization error, which is the final effect of the model.

  • How to divide the three

  • When the amount of data is not very large (below the 10,000 level), the training set, verification set, and test set are in the ratio of 7:2:1 or 6:2:2;

  • The amount of data is large enough, and the ratio of training set, validation set, and test set is adjusted to 98:1:1

Guess you like

Origin blog.csdn.net/crist_meng/article/details/123991181