Machine Learning Cornerstone Lecture 15 Notes

Lecture 15: Validation

15-1 Model selection problems

What is a good model? A: Eout can be minimized.

But this faces a problem: it is impossible for us to know the value of Eout.

So how to choose? You can't choose visually. (if it is high-dimensional)

Choose the one with the smallest Ein? Answer: No, overfitting; or there may be bad generalization.

One answer: pick the model with the best test results. Leave a small part of the existing data as a test set for the finished model.


15-2 Test Set

Illustration of the answer to the model selection question above:

Use different models H to get different Eout, and then compare to find the best one.

Comparing the gm of all data and all data minus the gm of the validation data, there are:

When the validation set is small, gm and gm- are about the same;

When the validation set is large, gm performs better than gm-.



15-3 leave-one-out cross-validation


A schematic of this approach (linear and constant, respectively):


At this time, when the data size is large, gm and gm- are almost the same.


15-4 V-Fold Cross Validation

Disadvantage 1 of leave-one-out: If there are 1000 points, it will be done 1000 times.

A simple method of leave-one-out: linear regression, at this time there is a formula solution for leave-one-out.

Disadvantage 2 of leave-one-out: The stability is too poor when doing binary problems (1/0).

So it is not often used in practice.

Improvements of V-Fold on leave-one-out:


For example, during ten-fold cross-validation, take turns to take nine training and one validation.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325432485&siteId=291194637