偏差 方差

原始博客链接

When we discuss prediction models, prediction errors can be decomposed into two main subcomponents we care about: error due to "bias" and error due to "variance". There is a tradeoff between a model's ability to minimize bias and variance. Understanding these two types of error can help us diagnose model results and avoid the mistake of over- or under-fitting.

1. Bias and Variance

Understanding how different sources of error lead to bias and variance helps us improve the data fitting process resulting in more accurate models. We define bias and variance in three ways: conceptually, graphically and mathematically.

1.1 Conceptual Definition

Error due to Bias: The error due to bias is taken as the difference between the expected (or average) prediction of our model and the correct value which we are trying to predict. Of course you only have one model so talking about expected or average prediction values might seem a little strange. However, imagine you could repeat the whole model building process more than once: each time you gather new data and run a new analysis creating a new model. Due to randomness in the underlying data sets, the resulting models will have a range of predictions. Bias measures how far off in general these models' predictions are from the correct value.

偏差表述的是根据样本拟合出的模型的输出预测结果的期望与样本真实结果的差距,简单讲,就是模型在样本上拟合的好不好。

如果想获得low bias,需要复杂化模型,增加模型参数,但同时会产生overfitting的问题。

Error due to Variance: The error due to variance is taken as the variability of a model prediction for a given data point. Again, imagine you can repeat the entire model building process multiple times. The variance is how much the predictions for a given point vary between different realizations of the model.

    variance 描述的是样本上训练出来的模型在测试集上的表现,要想获得low variance ,需要简化模型,减少模型参数,可能会产生underfitting问题。

猜你喜欢

转载自blog.csdn.net/u012706792/article/details/80991150