Zhou Zhihua machine learning - model evaluation and selection

(1) experiences an error and overfitting

Error rate error rate: the number of samples misclassified proportion of the total number of samples a.

Accuracy accuracy: the number of samples correctly classified proportion of the total samples 1-a.

Error error: | predict actual output - real output sample |

Training or experience in training error error error empirical error: error learning in the training set

Generalization error generalization error: error in the new sample

Ideally learning: learning from the training sample to apply to all potential sample of "common law" as much as possible

Over-fitting (overfitting): learner training samples to learn the "good" is likely to lead to some characteristics of the sample itself has been training as a common potential of the sample, such as those leading to the generalization performance. Because learning is too strong lead.

Underfitting underfitting: the general nature of the training samples did not learn. Because usually lead to learning disabilities.

Underfitting relatively easy to solve, over-fitting is not difficult to solve, over-fitting machine learning is the key obstacle faced, but over-fitting is not completely avoidable.

Real learning tasks, select the learning algorithm to determine the parameters and configuration that is "model selection" (model selection) problem.

(2) model assessment

Since the generalization error can not be obtained on all samples, it is generally only the "measurement errors" on the test set (testing error) as a generalization error approximation.

Test set and the training set should be as mutually exclusive, that is, the test sample is not used in training.

M for the sample data set = {D ( X . 1 , Y . 1 ), ( X 2 , Y 2 ), ..., ( X m , Y m )}, how to divide the training set S and a test set T ?

Method ① leaving hold-out, the data sets directly into a training set S and a test set T , in order to maintain the consistency of the divided data distribution of the data set, the classification task to maintain a similar ratio of sample type, such reservations proportions of sampling methods generally known as "stratified sampling" (stratified sampling), for example D a total of 500 positive examples containing example 500 trans, 30% stratified sample testing samples, the S should contain positive cases 350, 350 counter-example, T should contain 150 positive cases, 150 negative examples.

There are various kinds of division methods aside S and T method, aside method using a single, randomly divided into several times to be used generally, after repeated experimental evaluations were averaged as the evaluation result. S is too large, the trained model is closer to D , the evaluation results with better fidelity (Fidelity), but the test set T is too small, the evaluation result is not stable and accurate, otherwise fidelity assessment results and can not be guaranteed, and therefore no the perfect solution, it is common practice sample of about 2 / 3-4 / 5 for training, for the remainder of the test.

② cross validation, D = D . 1D 2 ∪ ... ∪ D K , D ID J = [Phi] ( IJ ), D I are layered samples, each mutually exclusive subsets, each with k- 1 subsets and sets as the training set, the rest of that subset as a test set available k group training / test set, can be k times training and testing, and ultimately return k test results mean. Cross-validation and evaluation results of the stability depends largely on the fidelity k value through called k -fold cross validation (k-fold cross validation), generally take 10 or 5,20. k sets of training / test set exists many division manner, generally k-fold cross validation using a different random division is repeated p times, the final assessment was that the p-th mean k-fold cross validation results.

Leave-one (Leave-One-Out, referred LOO), assumed to contain m samples, so that k = m, a special case of cross-validation is obtained leave-one. Leave-one evaluation results is often considered more accurate, but the data set is large, unbearable the computational overhead.

③ self-bootstrapping method, because the method has left cross validation was used to test for a left portion of the sample, the assessment model in the training set ratio D is small, inevitably introduce some bias estimates vary due to the size of the training sample (leaving a law this minimal impact, but computational complexity is too high). Bootstrap is a better solution, it self-sampling (bootstrap sampling) is based, i.e. m for the sample set capacity of D, using sampling with replacement of the embodiment, taken from m samples form a data set D ', obviously there may be repeated wherein the sample, estimating the initial data set by about 36.8% of the sample D does not appear in D ', a then the D' as the training set, and D \ D 'as a test set.

Self-help method is suitable for the data set is small, difficult to effectively divided useful training / test set. Bootstrap due to the generation of data set changes the distribution of initial set of data, estimation error will be introduced, and therefore, when the initial amount of data is sufficient, leaving the cross validation method and more common.

④ participate in the final tune model parameters to adjust or tune parameters (parameter tuning), due to the many parameters of the algorithm is a value in the range of real numbers, for each parameter are trained model is not feasible, realistic common practice for each selected parameters and a range change step. Even so, many learning algorithms have a large number of parameters to be set, will cause a great amount adjusting process parameters work, how well the parameter adjustment tend to have a critical impact on the final model performance.

Test set, learn data model encountered in actual use test data referred to;

The validation set (validation set), the model evaluation and selection for assessing the test data set.

Determined by performance on the test set is estimated generalization ability in actual use, the training data into training and validation sets, and selecting the model parameter adjustment based authentication performance set.

Guess you like

Origin www.cnblogs.com/Sweepingmonk/p/11037261.html