In cross-validation, what is the difference between leave-one-out and normal cross-validation?

Motivation for using the assessment method :

Evaluate the generalization error of the learner through experimental tests and then make a choice.

There are three main assessment methods :

  • Set aside method (divide into a mutually exclusive set)
  • Cross-validation method (multiple pairs of k-folds to form multiple mutually exclusive sets)
  • Bootstrap (with replacement sampling)

Reading the watermelon book, in fact, the understanding of it is still very vague, and I do not know the specific application scenarios.
Main reference: https://dataminingguide.books.yourtion.com/chapter-5/chapter-5-2.html

Set aside method only once, too random, not very convincing

In cross-validation , each sample data is used as both training data and test data. Avoiding the occurrence of over-learning and under-learning states yields more convincing results.

So what is the difference between normal and leave-one-out in cross-validation?

The advantages and disadvantages of the leave-one-out method are described below :

Advantages :

  1. We train with almost all data and test with one data
  2. certainty

Deterministic :
The experiment has no random factors and the entire process is repeatable.

For example, ten-fold verification, you test twice, the results are different

And it's the same how many times you use the leave-one-out method

Disadvantages :

  1. Calculation time is very long
  2. layered problem

Hierarchical problem :

Let's go back to the example of classification of athletes - judging whether a female athlete is involved in basketball, gymnastics, or track and field.

When training a classifier, we will try to make the training set contain all three categories. If we were to completely randomize, it's possible that the training set would not include basketball players, which would affect the results at test time.

For example, let's build a dataset of 100 athletes: 33 basketball players from the Women's NBA website, 33 athletes who participated in 2012 Olympic gymnastics from Wikipedia, and 34 track and field athletes .

Now let's do ten-fold cross-validation. We put the athletes in 10 buckets in order, so the first three buckets are basketball players, the fourth bucket has basketball players and gymnasts, and so on.

This way, no single bucket really represents the full picture of this dataset. The best way to do this is to proportionally distribute the different categories of athletes into buckets so that each bucket will contain one-third basketball players, one-third gymnasts, and one-third track athletes.

This practice is called layering. In the leave-one-out method, all test sets contain only one data. So, leave-one-out method is suitable for small datasets , but in most cases we will choose ten-fold cross-validation.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324604926&siteId=291194637