Variance, standard deviation, mean square error, mean square error

A measure of the degree of dispersion when the variance is a measure of a random variable or a set of data in probability theory and statistics. Variance is a measure of probability theory and its mathematical expectation of a random variable (ie mean) between the degree of deviation. 统计中的方差(样本方差)是每个样本值与全体样本值的平均数之差的平方值的平均数. Variance can be used to describe the degree of fluctuation variables.

Variance in statistics and probability distributions have different definitions, and there are different formulas. In statistics, the variance used to calculate the difference between each variable (observed value) and the overall mean. To avoid deviation from the mean zero sum, sum of squares of deviations from the mean affected sample size, using the average statistical degree of variation from the mean square difference and the variables described. The overall variance calculation formula:

Alt text

σ represents the square of the population variance, X is the variable, μ represents the average of the population, N denotes the total number of samples. In a real project, when the population mean is difficult to obtain, use an alternative sample statistics overall parameters corrected, the sample variance is calculated:

Alt text

σ represents the square of the sample variance, X is the variable, {X_i ... X_n} represents the sample mean, N denotes the number of samples. The reason is divided by N-1 instead of N, because this will enable us to better small sample set of approximate overall standard deviation, ie statistics on so-called "unbiased estimates." 由于方差是数据的平方,与检测值本身相差太大,难以直观的衡量,所以常用方差开根号换算回来,就成了标准差(Standard Deviation)用 σ 表示Using the following formula:

Alt text

For example there is python code:

      
      
1
2
      
      
import numpy as np
data1 = [10, 30, 40, 50, 10]
data2 = [5, 20, 25, 80, 10]
print(np.mean(data1), np.var(data1), np.std(data1))
print(np.mean(data2), np.var(data2), np.std(data2))

Output:

      
      
1
2
      
      
28.0 256.0 16.0
28.0 726.0 26.94438717061496

Two sets of data can be seen that the mean is 28.0 and the standard deviation but the variance is not the same, the variance or standard deviation value of the reaction the greater the greater the fluctuations in the data, otherwise more stable.

标准差在中文坏境中也被称为均方差,但不同于均方误差(mean squared error),均方误差是样本数据值偏离真实样本数据值的平方和的平均数,也即误差平方和的平均数The calculation formula in form close to the variance, its square root is called the root mean square error, standard deviation and root mean square error before formally approaching. For example, denotes a sample value X, x represents a true value, then the mean square error using the following formula:

MSE

Then the average root error can be represented by the following formula:

RMSE

In machine learning can be used as a mean square error model loss function, regression and used to predict the mean square error is smaller, the more accurate the model prediction, the less accurate and vice versa. In general, the relationship between the mean square error is the mean data sample, and the mean square error is a relationship between data samples and the real value, in practice to choose to use the variance or the mean square error are necessary.

Guess you like

Origin www.cnblogs.com/lijianming180/p/12032678.html