[Deep Learning Notes] Bias and Variance

This column is the study notes of the artificial intelligence course "Neural Network and Deep Learning" of NetEase Cloud Classroom. The video is jointly produced by NetEase Cloud Classroom and deeplearning.ai. The speaker is Professor Andrew Ng. Interested netizens can watch the video of NetEase Cloud Classroom for in-depth study. The link of the video is as follows:

Neural Networks and Deep Learning - NetEase Cloud Classroom

Netizens who are interested in neural networks and deep learning are also welcome to communicate together~

Table of contents

1 The concept of bias and variance

2 Basic methods of machine learning


1 The concept of bias and variance

        For a dataset containing two input features, a straight line can be obtained using logistic regression fit. For the data given in the figure above, this straight line does not fit the data set well. This is a case of high bias and the model is underfitting.

         Conversely, if we fit a very complex classifier, such as a deep neural network with multiple hidden layers, we can get a complex classification curve, but this does not look like a good fit result, which is high variance (HIgh variance), the model is overfitting (Overfitting).

        In the middle, training a classifier with moderate complexity and suitable data fit can get an arc curve, which seems more reasonable.

        Two key pieces of data to understand bias and variance are training set error and validation set error.

        For a cat picture classifier, assuming the training set error is 1% and the validation set error is 11%, the model fits the training set data very well, but the validation set data does not fit well. This case falls under "high variance".

        For the case where the error of the training set is 15% and the error of the verification set is 16%, if the error rate of human discrimination is almost 0 for the data set, the model has not been well trained in the training set, and the error rate of the training set and the verification set Close, this situation is "high bias".

        If the results of the model on the training set and the validation set are not satisfactory, for example, the error of the training set is 15%, and the evaluation result of the validation set is even worse, with an error rate of 30%, this situation is "high bias and high variance".

        Finally, for the case of 0.5% error on the training set and 1% error on the validation set, one would be happy to see this result as "low bias with low variance".

        The above analysis is based on assumptions and predictions, that is, the error rate of human eyes to distinguish pictures is close to 0%.

2 Basic methods of machine learning

        During the neural network training process, the next step can be determined according to the level of deviation and variance.

         After the initial model training is complete, first check whether the bias of the model is high. If the bias is indeed high, it cannot even fit the training set. The next thing to do is to choose a new neural network, such as a network with more hidden layers. , or spend more time training the network, or try new optimization algorithms.

        These attempts may or may not work. This requires trial and error until the data can be fitted.

        If the bias is within an acceptable range, the next step is to check whether the variance of the model is high.

 

        If the error of the model on the validation set is high, the best solution is to use more data for training. If you can't get more data, you can also try regularization (Regularization) to reduce overfitting. At the same time, you can also try neural networks with different architectures, which may have the effect of killing two birds with one stone, while reducing bias and variance.

        In short, the training of the neural network needs to be tried continuously until a neural network structure with low deviation and low variance is found. This is why neural networks are usually large in size and require a lot of training data.

Guess you like

Origin blog.csdn.net/sxyang2018/article/details/131751261