table of Contents
First look at the mean, sample variance, the sample covariance equation difference
X ˉ \ bar {X} Xˉ = 1 N ∑ i = 1 N x i \frac{ 1}{N}\sum_{i=1}^N x_i N1∑i=1Nxi
S = 1 N − 1 ∑ i = 1 N ( x i − x ˉ ) \frac{ 1}{N-1}\sum_{i=1}^N (x_i-\bar{x}) N−11∑i=1N(xi−xˉ)
cov(x,y) = 1 N − 1 ∑ i = 1 N ( x i − x ˉ ) ( y i − y ˉ ) \frac{ 1}{N-1}\sum_{i=1}^N (x_i-\bar{x})(y_i-\bar{y}) N−11∑i=1N(xi−xˉ )(andi−Yˉ)
Among them, why is the sample variance formula divided by n-1 instead of n, and the sample covariance is also divided by n-1 instead of n. Please see here: http://blog.csdn.net/maoersong/article/details /21819957, if the division is n, then the variance to be sought is not the variance of the sample composed of randomly selected variables, but the variance of the entire space.
Covariance cov(x)
-x is a sample vector
cov(x) calculates an unbiased estimate of the sample variance, but not the true variance s 2 s^2s2. The true variance is the maximum likelihood estimate of the sample, which can be calculated with cov(x,1).
cov(x) =∑ i = 1 n (x − x ˉ) n − 1 \frac{\sum_{i=1}^{n} (x-\bar{x})}{n-1}n−1∑i=1n(x−xˉ)
the (x, 1) = s 2 s ^ 2s2 = ∑ i = 1 n ( x − x ˉ ) n \frac{\sum_{i=1}^{n}( x-\bar{x})}{n} n∑i=1n(x−xˉ)
-x is a sample matrix
若x= ( x 1 , x 2 , . . . , x n ) T (x_1,x_2,...,x_n)^T (x1,x2,...,xn)T is an n-dimensional matrix, that is, n sample variables, cov(x) gets an n×n matrix.
The diagonal elements are the variance of each dimension, and the off-diagonal elements are the covariance between different dimensions.c 12 = c 21 c_{12}=c_{21}c12=c21。
Covariance cov(x,y)
x=[ a 1 , a 2 , . . . , a m a_1,a_2,...,a_m a1,a2,...,am]
y=[ b 1 , b 2 , . . . , b m b_1,b_2,...,b_m b1,b2,...,bm]
z= ( a 1 a 2 . . . a m b 1 b 2 . . . b m ) \begin{pmatrix} a_1 & a_2 & ... & a_m \\ b_1 & b_2 & ... & b_m \end{pmatrix} (a1b1a2b2......ambm)
cov(x,y) = cov(z)
cov(z) is actually the vertical splicing of the two variables in cov(x,y) together as z to participate in the operation.
Therefore, when calculating the covariance matrix, you must first clarify whether a row of the matrix is a set of samples or a column.
Coefficient of variation cv
Compare the degree of dispersion of the two sets of data. If the measurement scales of the two sets of data are too different, or the data dimensions are different, it is not appropriate to directly use the standard deviation to carry out the measurement. At this time, the influence of the measurement scale and dimension should be eliminated.
cv = (standard deviation s / mean x ˉ \bar{x}xˉ ) × 100% When
performing data analysis, if the coefficient of variation is greater than 15%, it is considered that the data may be abnormal and should be eliminated.
————————————————
Copyright Notice: For part of the content, refer to the article "The Moon is Blue", the
original link: https://blog.csdn.net/lyl771857509/article/details/ 79439184