4.3 Model Validation

4.3 Model Validation

Li Mu

Station B: https://space.bilibili.com/1567748478/channel/collectiondetail?sid=28144
Course homepage: https://c.d2l.ai/stanford-cs329p/

1. Approximate generalization error

For machine learning models, what we are most concerned about is the prediction error of the model on all unknown data , that is, the generalization error of the model. But this requires many, many samples , so it can be represented by approximate generalization error . Methods as below:

  • Use the generalization error on the test set instead of the true generalization error. ( test datasetTest set Validation datasetvalidation set) Note that the test set can only be used once , but because the test set data is more expensive, the validation set is generally used instead of the test set.
  • Just like you can only take the midterm exam once, you can't do it again to replace the original grade after you get the grade.
  • Using a validation dataset (commonly used), the validation dataset can be used multiple times.
  • Take a part of the training data set as the verification data set.
  • In the daily test , the test accuracy refers to the test on the verification data set, not the test in the true strict sense.

2. Generate a validation set

2.1 Random split

put the data

Guess you like

Origin blog.csdn.net/ch_ccc/article/details/130000392