Reasons for overfitting

1.1 What is overfitting

The so-called overfitting (Overfit) is such a phenomenon: a hypothesis can obtain a better fit than other hypotheses on the training data, but cannot fit the data well on the data set outside the training data. At this point we call this hypothesis the phenomenon of overfit.

As shown in the figure above: Overfitting means that the fitting function needs to care about every point, and the final fitting function fluctuates greatly. In some small intervals, the function value changes drastically. This means that the derivative value (absolute value) of the function in some small intervals is very large. Since the value of the independent variable can be large or small, only the coefficient is large enough to ensure that the derivative value is large.

 

1.2 Reasons for overfitting

  1. One of the possible causes of overfitting is that the vc dimension of the model is too high, and the ability to use too strong model complexity is used. (There are many parameters and training) 
  2. Another reason is the noise in the data, which may cause a greater deviation from the real scene if it is completely fitted. 
  3. The last reason is the limited amount of data, which makes it impossible for the model to really understand the true distribution of the entire data. 
  4. There are enough iterations of weight learning (Overtraining) to fit the noise in the training data and the unrepresentative features in the training samples.

 

1.3 Overfitting solution

  1. Weight decay 
    reduces each weight by a small factor during each iteration, which is equivalent to modifying the definition of E and adding a penalty term corresponding to the total amount of network weights. The motivation of this method is Keep the weights small and avoid weight decay, so that the learning process is biased in the opposite direction of the complex decision-making surface. (L2 regularization)

  2.Cross-validation with some patterns 
    The cross-validation method works well when additional data is available to provide the validation set, but the problem of overfitting is more serious with the small training set. The 
    k-fold cross-validation method: 
    divide the training examples into k and then perform k cross-validation process, each time using a different copy as the validation set, and the remaining k-1 copies are combined as the training set. Each sample will be used as a validation sample in one experiment, and the is used as a training example in 1 experiment; in each experiment, use the cross-validation procedure discussed above to determine the number of iterations n* that achieves the best performance on the validation set, and then compute the mean of these iterations as the final required the number of iterations.

  3. Regularization

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326693447&siteId=291194637