Why is deep learning not recommended for small samples?

In machine learning, the more complex and expressive the model, the easier it is to sacrifice the ability to explain future data and focus on explaining training data. This phenomenon will lead to very good training data, but the test data will be greatly reduced. This phenomenon is called overfitting .

Because of its structure, the deep neural network has strong expressive ability compared with the traditional model, which requires more data to avoid over-fitting, so as to ensure that the trained model can also be used on new data. acceptable performance.

Author: Zhihu User
Link : https://www.zhihu.com/question/29633459/answer/45049798
Source: Zhihu The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.


For the classification model, there is such a conclusion:
write picture description here
in the above formula, N is the number of training samples, η is greater than or equal to 0 and less than or equal to 1, and h is the VC dimension of the classification model. For details, see wiki:VC dimension.

This one of them:

write picture description here

Also called model complexity penalty. It can be seen that the probability that the test error is less than the training error plus the model complexity penalty is 1-η. If the current algorithm for training the model can make the training error small, and the model complexity penalty is small, the probability that the test error is also small is 1-η. Therefore, to make the generalization of the model better, it is necessary to ensure that both the training error and the model complexity penalty are relatively small. Looking at the model complexity penalty term, you can see that the larger h is, the larger the model complexity penalty will be. The larger N, the smaller the model complexity penalty. Roughly speaking, the more complex the model has the larger h (VC dimension), so in order to make the model have good generalization, it needs to have a larger N to reduce the model complexity penalty. This is why deep learning models require a lot of data to train, otherwise the generalization of the model will be poor, that is, overfitting.

Author: Zhihu User
Link : https://www.zhihu.com/question/29633459/answer/45138977
Source: Zhihu The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325846748&siteId=291194637