Relationship appreciated variance generalization error and deviation

https://blog.csdn.net/ChenVast/article/details/81385018

 

symbol Meaning
x Test samples
D data set
y_{D} xTag in the data set
Y xThe true mark
f Training set  D to learn the model
f\left ( x;D\right ) Training set by the  Dmodel of learned fto  xpredict output
\bar{f}\left ( x \right ) Model f of xthe  desired predicted  output

 

variance

On a training set D model to predict the output of the test samples f x is f (x; D), then the learning algorithm for testing samples x f  desired predicted  as follows:

The above prediction is desirable for a  different  set of data D, f predictive value of whichever desired x (average prediction).

Use the same number of samples of different training sets generated variance:

 

deviation

Desired square error prediction and the real mark is referred to as a deviation (BIAS), for convenience, we take the direct deviation:

 

Generalization error

To return to the task, for example, squared prediction error learning algorithm is expected to:

Generalization error of the desired decomposition algorithm:

bias-variance-proof

Make noise to zero \ Epsilon = y_ {D} -y = 0, so zero red zone.

The last remaining  E (f, P) = \ varepsilon ^ {2} + bias + ^ {2} have, the result is a generalization error deviation = variance + noise +

 

Deviation, variance, noise

  1. Deviation: a measure of the expectations of the prediction model and the degree of deviation of actual results, characterizes the model itself fitting ability .
  2. Variance: measure the changes in learning performance variations of the same size as a result of the training set, that characterizes the disturbance caused by the impact of data .
  3. Noise: The expression of the desired lower bound on the generalization error of the current model can achieve any task, portray the difficulty of learning the problem itself .


Illustrates deviation and variance

 

  Low variance High variance
Low deviation + Data points are concentrated on the data points fall prediction point + Data not concentrated portion of the data points fall on point prediction (prediction accuracy is not high)
High deviation + Data points are concentrated there is a distance data point and the predicted point (inaccuracies) Data points are not concentrated + data points do not fall substantially (inaccuracies) the predicted point

 

Variance and deviation and fitting

Goodness of fit variance deviation the reason Solution
Underfitting   Exorbitant Inadequate training, bias leading generalization error Integrated learning; deepen plus iterations; additional features; reduce regularization;
Overfitting Exorbitant   Excessive training, leading generalization error variance Reduce the complexity of the model; plus regular penalty term; plus training set; Save feature; improved regularization

 

bias-variance-tradeoff

 

reference:

http://www.cnblogs.com/makefile/p/bias-var.html#fn2

Disclaimer: This article is a blogger original article, reproduced, please indicate the source at the beginning of the article (author + description link). https://blog.csdn.net/ChenVast/article/details/81385018

Guess you like

Origin www.cnblogs.com/leebxo/p/11584969.html