Andrew Ng machine learning (X) - diagnostics, bias and variance due to fitting and over-fitting

In the first few weeks of lessons we have learned supervised learning of linear regression, logistic regression, neural networks. Recalling the course work, all of the sample data are used to train the model. Validating the model, only the results of the model with the data compare the results look right accuracy. This training method is correct? Correct rate can be used as criteria for evaluating the model? You will learn how to evaluate our model, as well as how to take correct and effective improvement strategies.

Click the courses video you will be able to continuously learn Ng of course, Python code is on course work has been put on Github, you can click on the course code to Github View (Github can not access, then you can click Coding view), and errors in the code Welcome to the improvement noted.

Improved strategies
for predicting function, we often use several means to improve:

Data collected more samples
to reduce the number of features, go to unless the main features of
the introduction of more relevant characteristic
polynomial features
reduce the regularization parameter λλ
increase regularization parameters λλ
Andrew Ng told you that he had seen a lot of developers blindly use improvement strategy, for which spent a lot of time and effort, without much effect. So we need some basis to help us choose the right strategy.

Partitioning the data set
used to evaluate the model, the data set we generally divided into three parts, 60% to 60% of the training set, 20% of the cross-validation set of 20% and 20% to 20% of the test set, and using the error as the Model evaluation on the form of these sets, an error cost function the same as the previous (linear regression error function below).

Js ([theta]) = = 1ms 12msΣi (H ?. (X (I) S) -Y (I) S) 2 (S = Train, CV, Test)
Js ([theta]) = = 1ms 12msΣi (H ?. (XS (i)) - ys (i )) 2 (s = train, cv, test)
in the divided set, we use a training set to train the parameters θθ, cross-validation sets to select the model (such as the use of how many times characteristic polynomial), using the test set to evaluate the predictive power of the model.

Variance and deviation
when the poor performance of our model, usually there are two problems, one is high-bias problem, the other is a high variance problem. They help identify ways to choose the right optimization, so let's look at the significance of the deviation and variance.

  • Deviation: gap between expectations and actual results of the sample described model outputs.
  • Variance: Model Description output stability for a given value.
    Here Insert Picture Description

Like shooting, shooting deviation describe our overall has deviated from our goal, and whether the Right variance describes the shooting. Let's set through training and cross-validation error curve in each case set to intuitively understand the significance of the deviation and high variance is high.

For polynomial regression, when the number of selected low, our training set and cross-validation set error error would greatly; when the number just selected, the training set and cross-validation set error error is very small; when the number is too large will produce over-fitting, although the error is small training set, but the cross-validation set error would be great (diagram below).
Here Insert Picture Description
So we can calculate Jtrain (θ) Jtrain (θ) and Jcv (θ) Jcv (θ), if they also very large, is experiencing a high bias problem, and Jcv (θ) Jcv (θ) than Jtrain (θ) Jtrain (θ) is much larger, it is experiencing a high variance problem.

For high variance problem regularization parameter, using the same analytical methods, when the parameter is small prone to over-fitting phenomenon, that is. And prone to the phenomenon of poor fitting parameter is relatively large, that is, high bias problem.
Here Insert Picture Description

The learning curve
whether you want to check whether your work or learning algorithm to improve the performance of the algorithm, the learning curve is a very intuitive and effective tool. The horizontal axis is the number of samples of the learning curve, the vertical axis represents the training set and cross-validation set error. So in the beginning, due to the small number of samples, Jtrain (θ) Jtrain (θ ) almost nothing, but Jcv (θ) Jcv (θ) is very large. As the number of samples, Jtrain (θ) Jtrain (θ ) is increasing, while Jcv (θ) Jcv (θ) increases as the better fit the training data and therefore decrease. Therefore, the learning curve look as shown below:
Here Insert Picture Description
In the case of high deviation, Jtrain (θ) Jtrain (θ ) with Jcv (θ) Jcv (θ) has been very close, but a large error. This time blindly increase the number of samples does not give the performance of the algorithm brings improved.
Here Insert Picture Description
In the case of high variance, Jtrain (θ) Jtrain (θ ) of error is small, Jcv (θ) Jcv (θ ) is relatively large, then collect more samples is likely to bring help.
Here Insert Picture Description

Summary
With these analytical tools, will be able to come to our improvement strategies under what scenario:

[High variance] to collect more data samples
[high variance] reducing the number of features, to remove non-essential characteristics
[high deviation] introducing more relevant features
[high deviation] characteristic polynomial
[high deviation] reduced regularization parameter λλ
[high variance] increase the regularization parameter λ

References Andrew Ng Machine Learning: variance and deviation of
09 machine learning (Andrew Ng): Machine Learning diagnostics

Published 80 original articles · won praise 140 · views 640 000 +

Guess you like

Origin blog.csdn.net/linjpg/article/details/104126767