Model evaluation and selection (1)

Model evaluation and selection

Experience with over-fitting error

(1) error rate: number of samples misclassified accounts for the proportion of the total number of samples

Accuracy: 1 \ (- \) error rate

(2) error: the difference between the actual output and the true value of the sample learner

There are training error and generalization error error two kinds. Training error refers to the error learning in the training set, also known as empirical error; generalization error refers to the error in the new sample.

(However, for training samples, the classification accuracy even 100%, it does not necessarily mean that learning is very good. We hope to get a small error of generalization learner)

(3) over-fitting: to undertake content in the first 2:00 parentheses, we hope to get the learner, the learner is performing well on a new sample, which is the generalization error as small as possible. For this reason, we should try to learn as much as possible from the training set of samples for all potentially universal law. However, learning is the training sample learn too well, it often might own characteristics as a training sample all potential sample would have for nature, this is the over-fitting phenomenon.

With the corresponding over-fitting phenomenon is underfitting phenomenon. Underfitting phenomenon refers to the learner on the general nature of the training samples yet to learn.

For example, to determine whether a learner leaves, leaf edge if the training set of mostly serrated, then learning is likely to edge not serrated leaves, the leaves are not mistaken, this is the over-fitting performance, because learner mistaken all the leaves have jagged edges. On the other hand, if the learner is not good enough for the end of the training set to learn, then it is likely the trees as leaves, because the leaves are green, the trees are green, and no further learning is to learn the characteristics of the leaves, which It is underfitting phenomenon.

(However, it is necessary to point out that the less fit can be overcome through improved learning methods; and too fit, you can not avoid, can only be alleviated)

Select several methods of training and test sets

Introduction: For the evaluation of the learner generalization error, may be tested experimentally. So, in addition to the training set, a test set also requires, after the completion of the training, the test set need to test the ability to learn new sample is determined, and then the error on the test set as a generalization error approximation. But we will face a problem that we have is a set of data = D \ ({(x_1, Y_1), (x_2, y_2) \ cdots (x_m, Y_M)} \) . It is necessary to divide a data set with some skills, get a training set and a test set T. S

1. Method leave: the data set directly to D is divided into two mutually exclusive sets, where a training set as S, the other as a test set T, D, S, T should satisfy three relationships are:

\(\begin{cases} D=S\cup T\\S\cap T=\varnothing\end{cases}\)

After the model is trained on the training set S, a T-test to assess its error as the estimated generalization error

2. The cross-validation: first data set D into k disjoint sets of similar size, i.e. \ (D = D_1 \ cup D_1 \ cup \ cdots \ cup D_k, \) of I \ (\ NE \) there j \ (D_i \ cap D_J = \ varnothing \) each with k-1 subsets as the training set, and sets, as a subset of the remainder of the test set, so that you can get the k sets of training / test set, thus k-secondary training and testing, and ultimately returns the mean of the k test results.

3. Self method, also known Bootstrap method: direct sampling method is based on a self-service, comprising a given set of m data samples D, we generate a data set it is sampled D ', specific practices are: random from each D in the selection of a sample, copy it into D ', and then the sample back into the initial data set D, such that the next sample is taken into a sampling still; When this process is repeated m times, we get comprising m samples of the data set D '.

Obviously, there will be part of the sample in D 'appears more than once, but also part of the never appear. While a sample is not a sample is taken to the probability \ (1- \ frac {1} {m} \) probability, m samples are not always to be taken is the \ ((1- \ frac {1 } } m {) m ^ \) .

\[\lim_{m\to\infty}(1-\frac{1}{m})^m=\frac{1}{e}=0.368\]

That is, when the data set D is large enough, of which about 36.8% of the samples does not appear in the D 'in. So we can be D 'is used as a training set, D-D' as a test set; Thus, the desired model evaluation model of the actual use evaluation m training samples, we still have about 1/3 of the total amount of data It did not appear in the training set of data as a test set.

About parameter adjustment

Machine learning typically involves two types of parameters, a class of algorithm parameters, the number of parameters of this type often less than 10; the other model parameters, generally larger number. Whether algorithm parameters or model parameters, some are based on the evaluation method is selected after generating a plurality of models; the place is not the same as two kinds of algorithm parameters are usually generated by the plurality of parameter candidate values ​​manually set model, the model parameters is to generate a plurality of candidate models by learning.

Different configuration parameters, the model performance is often learned by the difference, therefore making model evaluation and selection time, in addition to the selection of suitable learning algorithms, but also on the learning algorithm parameters are set, this is the parameter adjustment.

Personal Summary

The finishing is mainly based on a learning algorithm to determine criteria for the quality of the resulting model, follow-up will organize more details. In addition, is to learn some methods use existing data sets to divide the training set and test set, which sampling method and mathematical statistics somewhat similar, such as leave and cross-validation method, the selected training set must take into account a representative sample, and therefore in order to reduce the stratified sampling method since the error introduced distributed difference data categories; the other is the error to be further reduced by the process of averaging several experiments. All in all, this part of mathematical statistics in close proximity.

Guess you like

Origin www.cnblogs.com/my-python-learning/p/11874726.html