Chapter 2 Model Evaluation and Selection
- 2.4 Comparison test
-
- 2.4.1 Hypothesis testing
-
- Hypothesis testing purpose (0223)
- Generalization Problem and Probability Theory Course Recommendation (0224)
- Binomial distribution (0225)
- Code to achieve binomial distribution(0226)
- Hypothesis test example e=0.3 (0227)
- Assume that e_0 is greater than or equal to 0.3 (0228)
- Hypothesis testing of an algorithm in multiple test sets (0229)
2.4 Comparison test
2.4.1 Hypothesis testing
Hypothesis testing purpose (0223)
Ideally
Training set | Validation set | Test set | |
---|---|---|---|
True value y | |||
After the model f(x), the predicted value y' | |||
The error rate can be obtained from the above 2 items |
But in the real world, there will be more data. To what extent can the error rate of the test set guarantee real performance, and the purpose of hypothesis testing is to ensure this level
Generalization Problem and Probability Theory Course Recommendation (0224)
2 generalization properties
- Get a model on the training set, and then how does it perform on the test set
- How does this model perform on all real data
problem
- The performance on the test set is not necessarily the same as the real generalization performance
- Different test sets reflect different performance
- The machine learning algorithm itself has a certain degree of randomness. If it is run multiple times on the same test set, there may be different results
She recommended that the probability theory and mathematical statistics [collection] [Xiao Yuan teacher]
Binomial distribution (0225)
A known
- M=10 on the test set, error rate=m'/m
- The error rate on the real data is 0.3
Find the probability of various error conditions on the test set
The following is the hypothesis test idea explained in the video
Code to achieve binomial distribution(0226)
round(m_T_error / m, 4) # 对m'/m曲小数点后4位
The above figure on the yellow background formula is equivalent to the above formula
comb(m_T, m_T_error) # 排列组合C~10~^6^
e_all**m_T_error # 0.3^6
m_T_errors=[0,1,2,3,4,5,6,7,8,9,10] is a set of lists of the number of errors (0, 1,...10 errors)
operation result:
The conclusion can be drawn:
when it is assumed that the error rate on the real data = 0.3, when the experiment is performed on the test set, the probability of
three errors is greater . The error rate on the test set = 3/10=0.333 and the real data Error rate = 0.3 is not much different
Hypothesis test example e=0.3 (0227)
Assuming a 90% confidence interval
If you actually test 5 errors on the test set, look at the picture above, it is in 90%, so accept the null hypothesis.
If you actually test 10 errors on the test set, look at the picture above, it is not in 90%. So reject the null hypothesis
Assume that e_0 is greater than or equal to 0.3 (0228)
When there are 5 errors, the area on the left accounts for 90%, so if it is greater than or equal to 5, the hypothesis is rejected
Another more intuitive understanding:
from the above program, when x=5, the probability is 0.9552...>90%, so if it is greater than or equal to 5, the hypothesis is rejected
Hypothesis testing of an algorithm in multiple test sets (0229)