4.3 Model Validation
Li Mu
Station B: https://space.bilibili.com/1567748478/channel/collectiondetail?sid=28144
Course homepage: https://c.d2l.ai/stanford-cs329p/
1. Approximate generalization error
For machine learning models, what we are most concerned about is the prediction error of the model on all unknown data , that is, the generalization error of the model. But this requires many, many samples , so it can be represented by approximate generalization error . Methods as below:
- Use the generalization error on the test set instead of the true generalization error. (
test dataset
Test setValidation dataset
validation set) Note that the test set can only be used once , but because the test set data is more expensive, the validation set is generally used instead of the test set. - Just like you can only take the midterm exam once, you can't do it again to replace the original grade after you get the grade.
- Using a validation dataset (commonly used), the validation dataset can be used multiple times.
- Take a part of the training data set as the verification data set.
- In the daily test , the test accuracy refers to the test on the verification data set, not the test in the true strict sense.
2. Generate a validation set
2.1 Random split
put the data