Over-fitting (Overfitting)

Too conceited

Prior to elaborate, let's use a real life example of a metaphor about over-fitting phenomenon. To put it plainly, is to machine learning models in confidence. Has reached the stage of ego. That ego harm,

We all know that in his inner circle extraordinary performance, but often being blocked in the real big circle actually. So in this introduction, we put conceited and over-fitting equate.

Return classification of overfitting

 

 Conceited machine learning model and in what aspects of it. Here are some of the data. If you want to draw a line to describe these data, most people would draw so. Yes, this line is our hope

The machine can learn out of a line used to summarize these data. At this time the total error of the Blue Line and data may be 10. But sometimes, the machine is too tangled this error, he wants to be reduced to a smaller error,

To complete his mission of this group of data to learn, so he learned might come to this. It almost after every data point, so that error will be smaller. The smaller the error, but it is really good

Right? It seems that our model is still too naive. When I get to use this model in reality, his ego is manifested. Small two to a dozen real data. In this case, the error before the big blue line

Error essentially unchanged. Small red error error suddenly soar, conceited pride red line could not up, because he can not succeed in the expression of other data in addition to the training data. This is called

Over-fitting. Overfitting.

 

 Then the classification problem among. Overfitting dividing line may be so, a small two, and then on a dozen data. We clearly see, there are two yellow data has not been well separated. This is also too fit

In trouble. Well, now that we occasionally encounter overfitting problem, and that the solution are those who do.

Solution

 

 Method One: Increase the amount of data, most of the causes of over-fitting because the amount of data is too small if we have thousands of data, the red line will be straightened slowly, become less distorted.

 

Method Two:

 

 The use of normalization. L1, L2 regularization, etc. These methods are applicable to most machine learning, including neural networks. Their approach is similar, we simplify machine learning is the key formula

y = Wx. W for the various parameters of the machine need to learn in over-fitting, the value of W tends to change was particularly large or very small. In order to prevent W has changed so much, we gotta do something about the calculation error. original

cost error is calculated, cost = predictive value - the true value of the square if W becomes too large, we let the cost increases also followed into an instrument for punishing so we come to consider W himself.

Where abs is the absolute value of this form of normalization, normalization is called L1. L1 and L2 of the normalized similar, but the absolute value squared replaced. Other L3, L4 are also replaced cube and the fourth power of Wait

Etc. similar forms. With these methods, we can ensure that the lines out of school not too distorted.

 

 

There is also a special use in the method of regularization neural network, called dropout. When training, we randomly ignore some of the neurons and neural connections to make this neural network becomes "incomplete."

With an incomplete neural network training time.

The second randomly ignore others, into another incomplete neural network. With these random drop out of the rules, we can imagine in fact, every training session, we all make every prediction

The results will not depend on where a specific part of neurons. Like L1, L2 regularization as excessive dependence of W, that is, the value will be a great training parameters, L1, L2 will punish these big arguments.

Dropout approach is to let the neural network from excessive dependence on no chance.

Guess you like

Origin www.cnblogs.com/Lazycat1206/p/11911491.html