Machine Learning Cornerstone Lecture 13 Notes

Lecture 13: Hazard of Overfitting

13-1 What is overfitting

Generalization of failure: low Ein, high Eout.

When the VC dimension is large, it will cause overfitting.

When the VC dimension is too small, it will cause underfitting.

Causes of overfitting: too large VC dimension, noise, and less data size N.

 

13-2 The role of noise and data size

Overfit: Ein is small but Eout is particularly large.


When there is no noise, g2 will also do well: (the picture on the right is the target function)


This is because: the complexity of the objective function acts like noise.

 

13-3 Deterministic Noise

When the Datasize is too small, the stochasticnoise is too large, the deterministic noise is too large and the excesspower is too large, it will cause overfit. As shown below:

 


13-4 How to solve overfitting

Start with a simpler model; data cleaning; data hinting; regularization; validation;

Datacleaning:correct the label

Datapruning:removethe label

DataHinting: addexamples by shift/rotate the label (in the digit recognition problem)

 

 

 


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325433609&siteId=291194637