2.1 Experience with the over-fitting error
Our real hope is in the new sample can show a good learner. To this end, a study should be applicable to all potential sample of "universal law" from the sample as possible, so as to make the right judgments in the face of new samples.
Factors leading to over-fitting: learning ability is too strong, the general characteristics of the training is not included in the sample learned.
The factors leading to poor fitting: learning disabilities
2.2 Evaluation
Test set and the training set should be as mutually exclusive, that is, the test sample try not to focus on training occurs.
2.2.1 distillation method
Relatively simple and commonly used, the following is divided sklearn training and testing sets of packets
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.2)
Cross validation 2.2.2
Bootstrap 2.2.3
There are sets of data D, where there are m samples. M times using random sampling with replacement. Get new dataset D`. Probability sample of m samples are not always to be taken taking the limit = Approximately equal to 0.368
D` then can be used as a training set, no data is sampled as a test set. This method is suitable for small data sets, the situation is difficult to effectively divide the training set and test set.
2.2.4 tune participate in the final model
After training model, should integrate the training data and test data from the new training model, this model uses all the samples, this is the ultimate model presented to the user.
2.3.1 error rate and accuracy
2.3.2 precision, recall and F1
Precision is also called "accuracy", recall, also known as "recall"
When evaluating the model "balance" (Break-Event Point, referred BEP), which is precision = value at the time of the recall, but a bit oversimplified BEP or, more commonlyF1measure:
Not to mention the book contents: a threshold corresponding to any point will have an F value, the maximum F value of the selected classifier F_score
2.3.3 ROC and AUC
True and false positive rate:
in the race classification algorithm is also commonly used as a measure of AUC (roc fraction of
the area under the curve)
2.4.1 Hypothesis Testing
Generalization obtained by deriving the error rate e measured-in the error rate of the learner e ^ probability formula obtained:
hypothesis testing hypothesis is learning generalization error rate profile is determined or some conjecture
If the test error rate is less than the threshold, i.e. can be a confidence level 1-α that, the learner generalization error rate is not more than [epsilon] O, or reject the hypothesis.
2.4.2 cross-validation test t
To repeatedly set aside by t-test method
Here too is more statistical knowledge, if not too much foundation, then recommend a look at this book vernacular statistics