Machine Learning - Evaluating and Improving Learning Algorithms

0 Preface

When we use the trained model to predict unknown data and find that there is a large error, what can we do next?

  1. Obtain more training samples - usually effective, but costly. The following methods may also be effective. You can consider using the following methods first.
  2. Try to reduce the number of features
  3. Try to get more features
  4. Try adding polynomial features
  5. Try to reduce the regularization degree λ \lambdal
  6. Try increasing the regularization degree λ \lambdal

We should not randomly select one of the above methods to improve our algorithm, but use some machine learning diagnostics to help us know which of the above methods are effective for our algorithm.

1. Evaluate a hypothesis

When we determine the parameters of a learning algorithm, we consider choosing parameters to minimize the training error, but just because this hypothesis has a small training error does not mean that it is necessarily a good hypothesis function. We learned the example of overfitting hypothesis function. This overfitting hypothesis function is not applicable to the new training set.
Insert image description here
So, how to judge whether a hypothesis function is overfitting? For a simple example, we can assume that the function h ( x ) h(x)h ( _ _ _ Difficult or even impossible to achieve.

Therefore, we need another way to evaluate our hypothesized function overfitting test.

In order to test whether the algorithm is overfitting, we divide the data into a training set and a test set, usually using 70% of the data as the training set and the remaining 30% of the data as the test set. A very important point is that both the training set and the test set must contain various types of data. Usually we have to "shuffle" the data and then divide it into a training set and a test set.
Insert image description here
Test set evaluation: After letting our model learn its parameters through the training set, and then applying the model to the test set, we have two ways to calculate the error:

  1. Calculate the cost function
    For the linear regression model, we use the test set data to calculate the cost function JJJFor
    the logistic regression model, in addition to using the test data set to calculate the cost function:
    Insert image description here

  2. Use the 0/1 misclassification metric to define the test error.
    Calculate the misclassification error:
    Insert image description here
    The calculated results are then averaged to get the test error.

2. Model selection and cross-validation machine

Suppose we want to choose between 10 binomial models of different degrees:
Insert image description here
Obviously, the polynomial model with higher degree is more adaptable to our training data set, but adapting to the training data set does not mean that it can be generalized to the general situation. We A model that better adapts to the general situation should be chosen. We need to use a cross-validation set to help select the model.

That is: use 60% of the data as the training set, use 20% of the data as the cross-validation set, and use 20% of the data as the test set.

The method of model selection is:

  1. Use the training set to train 10 models
  2. Use 10 models to calculate the cross-validation error (the value of the cost function) on the cross-validation set.
  3. Select the model with the smallest cost function value
  4. Use the model selected in step 3 to calculate the generalization error (value of the cost function) on the test set

Train/validation/test error
Training error:
Insert image description here
Cross Validation error:
Insert image description here
Test error:
Insert image description here

3. Diagnostic bias and variance

When you run a learning algorithm, if the performance of the algorithm is not ideal, there are probably two situations: either the deviation is relatively large, or the variance is relatively large. In other words, what occurs is either an underfitting or an overfitting problem. The problems of high bias and high variance are basically under-fitting and over-fitting problems.
Insert image description here
We compare the cost function errors of the training set and cross-validation set error errore r r o r and the degreeddd is plotted on the same graph to aid analysis:
Insert image description here
for the training set, whenddWhen d is small, the model fitting degree is lower and the error is larger; asddAs d increases, the degree of fitting improves and the error decreases.
For the cross-validation set, whenddWhen d is small, the model fitting degree is low and the error is large; but asddAs d increases, the error first decreases and then increases. The turning point is when our model begins to overfit the training data set.

According to the above chart, we know:
• When the training set error and the cross-validation set error are similar: bias/underfitting
• When the cross-validation set error is much larger than the training set error: variance/overfitting:

4. Regularization and bias/variance

In the process of training the model, we generally use some regularization methods to prevent overfitting. But we may have the degree of regularization too high or too small, that is, when we choose the value of λ, we also need to think about similar issues as when we just selected the degree of the polynomial model. We select a series of λ \lambdaInsert image description here
that we want to testThe λ value is usually a value between 0 and 10 showing a 2-fold relationship (such as:0, 0.01, 0.02, 0.04, 0.08, 0.15, 0.32, 0.64, 1.28, 2.56, 5.12, 10 0,0.01,0.02,0.04 ,0.08,0.15,0.32,0.64,1.28,2.56,5.12,100,0.01,0.02,0.04,0.08,0.15,0.32,0.64,1.28,2.56,5.12,1 0 12 in total). We also divide the data into training set, cross-validation set and test set.
Insert image description here
Selectλ \lambdaThe method of λ is:

  1. Use the training set to train 12 models with different degrees of regularization.
  2. Cross-validation errors calculated using 12 models on the cross-validation set
  3. Select the model that yields the smallest cross-validation error
  4. Use the model selected in step 3 to calculate the generalization error on the test set. We can also plot the cost function error of the training set and cross-validation set model and the value of λ on a chart: • When λ
    Insert image description here
    \ lambdaWhen λ is small, the training set error is small (overfitting) and the cross-validation set error is large
    • Asλ \lambdaAs λ increases, the training set error continues to increase (underfitting), while the cross-validation set error first decreases and then increases.

5. Variance and bias of neural networks

Insert image description here
Using a smaller neural network, similar to the case with fewer parameters, can easily lead to high deviation and underfitting, but the computational cost is smaller. Using a larger neural network, similar to the case with more parameters, can easily lead to high variance and overfitting. Fitting, although computationally expensive, can be adjusted through regularization to better suit the data.

Generally, choosing a larger neural network and applying regularization will perform better than using a smaller neural network.

When choosing the number of hidden layers in a neural network, we usually start with one layer and gradually increase the number of layers. In order to make a better choice, we can divide the data into a training set, a cross-validation set and a test set, and target different hidden layers. Train the neural network with several neural networks, and then select the neural network with the smallest cross-validation set cost.

Guess you like

Origin blog.csdn.net/Luo_LA/article/details/128139526