1. Why do we need to do assessment method

Usually, we can be evaluated and further experiments to test the learner's generalization error and make a choice. Thus, the need to use "test set" to test the learner determination of new samples, and then "measurement errors" test set as the generalization error of approximation.

2. What assessment methods

The sample set to become the training and test sets, and how to train and test sets distribution, the optimal effect. There are three specific practices: stay out of France, and poor self-validation method.

2.1 distillation method

Directly to the data set D is divided into two mutually exclusive set S (training set) and T (test set). D = S∪T, S∩T = ∅, after training the model on the S, T evaluated by the test error, and as its generalization error estimate.

For example:

Method: two classification tasks

Sample collection: D contains 1000 samples, where n Example 500, Example 500 trans

S and T division method: S contains 700 samples positive Example 350, Example 350 trans; T contains 300 samples, 150 positive examples, Example 150 trans

Calculated error rate and accuracy: Suppose there are 90 samples misclassified on T, its error rate (90/300) * 100% = 30%; accuracy [(300-90) / 300] * 100% = 1- 30% = 70%

NOTE: single-use estimated error obtained aside method is often not stable enough to be used generally randomly divided several times, after repeated experiments taking the average evaluation result as the evaluation method set aside. 1000 samples as above, can be repeatedly and randomly divided into T S, the method using the model, accuracy, were averaged.

2.2 cross validation

D dataset into k disjoint sets of similar size, i.e., D = D . 1 ∪ D 2 ∪ ... ∪ D k , D I ∩ D J = ∅ (I ≠ J) , each subset D I have the best may keep the data distribution agreement.

For example:

Method: two classification tasks

Sample collection: D contains 1000 samples, where n Example 500, Example 500 trans

S and T division method:

1000 samples are divided into D . 1 and 10 samples (5 positive cases, 5 counterexample), D 2 and 10 samples (5 positive cases, 5 counterexample) ... D 10 has (5 n Example 10 sample , counter-example 5);
Training set, and final test set of precision calculations

Training set									Test Set	Test Results
500 1	D2	D 3	D 4	D5	D6	D 7	D- 8	D 9	D10	Test set accuracy 1
500 1	D2	D 3	D 4	D5	D6	D 7	D- 8	D10	D 9	Test Set Accuracy 2
……									……	……
D2	D 3	D 4	D5	D6	D 7	D- 8	D 9	D10	500 1	Test Set Accuracy 10

Averaged test set accuracy, as the ultimate generalization error

There is a special case of cross-validation:: Leave-one supplement. If the sample set is 1000, the training set to 999, as a test set. For large data sets, the large overhead of training.

Bootstrap 2.3

M samples comprising a given set of data D, which is sampled to produce a data set D ': a randomly selected sample from each of D, copy it to D', and then returned to the sample D, so that the in the next sample could be taken to human samples; after repeated m times, they will have a set of data samples D comprises m '.

For example:

Method: two classification tasks

Sample collection: D contains 1000 samples, where n Example 500, Example 500 trans

S and T division method:

A randomly drawn back into the 700 samples as a training set, there will be repeated in this sample 700 samples. Wherein the probability sample 700 in the sample are not always taken into is (1 1/ 700 ) ^ 700 = 0.3676. (When the sampling number m, is not always a probability sample is taken to the (l- . 1 / m ) ^ m , resulting in the limit Lim m → ∞ (l- . 1 / m ) ^ m → . 1 / e≈0.368 ).

0.36 the probability of the test set of samples does not appear in the training set, select D \ D 'of 300 samples as a test set. And calculating the concentration precision of the model.

Bootstrap method is useful in smaller data sets, it is difficult to effectively divide the training and test sets when: supplement. But produced by Bootstrap datasets changed the distribution of the initial data set, there will be an estimated bias.

2.4 transfer involved in determining the final model

Most algorithms require configuration parameters, different configuration parameters, performance learned models tend to have significant differences. Configuration parameters to be selected and a range change step.

Given gnawing m samples of the data set D, the model evaluation and selection process are required to remain part of the test data to evaluate, in fact, we just train with part of the data model. After the completion of model selection, learning algorithms and parameters are selected, then the data set D should re-training model that uses all m samples in the training process, this is the ultimate model presented to the user.

Data used in the model is formed as a "test set", data evaluation model called "validation set."

3, summary

The above process describes three methods are mainly divided into the test set and validation set, leaving mainly the method, and the self-cross validation method, very similar to the first two methods, the latter method is adequate for the case where the initial amount of data will be there is a problem of biased estimates. Specifically, the user can select a suitable method of dividing the model validation.

zfan520

Published 22 original articles · won praise 28 · views 50000 +

Private letter concerns

Common assessment methods in machine learning

1. Why do we need to do assessment method

2. What assessment methods

2.1 distillation method

2.2 cross validation

Bootstrap 2.3

2.4 transfer involved in determining the final model

3, summary

Guess you like