From zero learning pytorch 1.5 class training set, validation set and test set of role

Experience with over-fitting error

Keywords: error rate (error rate), accuracy (Accuracy) .

  • It appreciated that the error rate is good, that is m samples, a is misclassified samples, the error rate E = a / m.
  • Accuracy is 1-E. In fact, very simple, but why do I have to mention it, because there are still a lot of machine learning-related terms, such as: precision, recall rate, accuracy and so on, so I hope we can clearly recognize.

Keywords: empirical error (Empirical error) .

  • On the training set, differences (!) Between the predicted output and the real output samples called training or experience errors error.

Keywords: generalization error (Generalization) .

  • Errors in the new sample referred to the generalization error.

Keywords: over-fitting (overfitting) .

  • New to machine learning time, came across this fits very abstract word was said curve fitting. It is actually a curve fitting process. This process is achieved by the actual number of sample points on the curve (Be training samples), after a model training, to obtain a prediction curve. (Knock on the blackboard: Fit is a process). So too is fitting, fitting too far, predicted curve is very good training for the sample to other samples, however the actual curve is not suitable. Promotion look to the classifier (model) that is, the trained classifier good classification of training samples, but very bad for the classification results of the test samples.
    You Qian had a fit of course fit you, the less fit can be considered the classifier learning ability is poor, even on the training samples are not well classified, not to mention on the test sample.

assessment method

Key words: stay out of France (the HOLD-OUT) .

  • How a given data set into training and test sets it? There is a method commonly used in the presentation. First introduced the law is set aside, in fact, this approach is most common in domestic textbooks and papers, is to put the data set D is divided into two mutually exclusive collections, one of which is the training set, a test set. Reference book division ratio is given training set 66.6% to 80%.

Keywords: cross-validation (Cross Validation) .

  • Cross-validation is competition or by more than more formal experiment. What is cross-validation it? In fact, the data set D into k mutually exclusive subsets of the same size, and then use k-1 subsets as training, leaving it as a subset of the test. This requires training models k, k get results, it can then take an average. Such methods generally become "k-fold cross-validation." The book also gives the reference value of k: 5,10,20.

Keywords: Bootstrap (bootstrapping) .

  • The first time I heard bootstrap method, has never been seen in the literature, self-help method is mainly used for small sample! The disadvantage is easy to introduce bias estimate. To do this, the data set D m samples, each a randomly selected sample of D into D ', the pick m times, the sample has been calculated D about 36.8% (≈1 / e) of It does not appear in D ', so that with the D' as the training set, D \ D '( "\" represents a set of subtraction) as the test set. Self-law, also known as re-sampling with replacement sampling.

Keywords: training set (train set), validation set (valid set), the test set (the Test the SET) .

  • Came into contact with a machine learning only know that training and test sets, then I heard the word validation set, finds validation and test sets previously recognized usage is the same, has been put validation and test sets to confuse.

First of all you need to know is that, in engineering applications, ultimately submitted to the customer's model is a model m samples of the training data set D is exhausted. In other words, our test set will eventually be used to train the model. There comes before the data set D is divided into a training set and a test set, the training set is used to train the model, the test set is used to estimate the model generalization in practical application, the validation set is a model selection and parameter adjustment of.
So, as I understand it in the course of the study, the role of validation and test sets are the same, but the model is an observation, generalization observation trained model. And when in engineering applications, verification should be set from a training set out in subdivision as part of the validation set, the model is used to select and adjust the parameters. After a good tune, and then test set to evaluate the model generalization performance, and if performance is OK, then the test set input into the model train, the resulting model is presented to the user model.
For example
For high school seniors entrance of an example,
the training set is usually a job, exercise books and other
validation set is a mold, two-mode, tri-mode the questions
test set is the College Entrance Examination
training set is learning to students, improve student capacity; validation set is used to test students' learning, learning direction, learning style, is correct; the test set is the ultimate examine how student achievement.
Fortunately, we have a number of "entrance" of opportunity, but we can not go to college entrance examination questions analysis, because only as a validation test suite.
In general, when the validation and test sets have the same distribution (ie simulation questions entrance theme and almost the same time), the mock exams could get 650 points, then the real test will be in high school about 650 points.

Published 47 original articles · won praise 4 · Views 2256

Guess you like

Origin blog.csdn.net/qq_34107425/article/details/104097370