Covariance cov(X),cov(X,Y); coefficient of variation cv

First look at the mean, sample variance, the sample covariance equation difference

X ˉ \ bar {X} Xˉ = 1 N ∑ i = 1 N x i \frac{ 1}{N}\sum_{i=1}^N x_i N1i=1Nxi

S = 1 N − 1 ∑ i = 1 N ( x i − x ˉ ) \frac{ 1}{N-1}\sum_{i=1}^N (x_i-\bar{x}) N11i=1N(xixˉ)

cov(x,y) = 1 N − 1 ∑ i = 1 N ( x i − x ˉ ) ( y i − y ˉ ) \frac{ 1}{N-1}\sum_{i=1}^N (x_i-\bar{x})(y_i-\bar{y}) N11i=1N(xixˉ )(andiYˉ)

Among them, why is the sample variance formula divided by n-1 instead of n, and the sample covariance is also divided by n-1 instead of n. Please see here: http://blog.csdn.net/maoersong/article/details /21819957, if the division is n, then the variance to be sought is not the variance of the sample composed of randomly selected variables, but the variance of the entire space.

Covariance cov(x)

-x is a sample vector

cov(x) calculates an unbiased estimate of the sample variance, but not the true variance s 2 s^2s2. The true variance is the maximum likelihood estimate of the sample, which can be calculated with cov(x,1).
cov(x) =∑ i = 1 n (x − x ˉ) n − 1 \frac{\sum_{i=1}^{n} (x-\bar{x})}{n-1}n1i=1n(xxˉ)

the (x, 1) = s 2 s ^ 2s2 = ∑ i = 1 n ( x − x ˉ ) n \frac{\sum_{i=1}^{n}( x-\bar{x})}{n} ni=1n(xxˉ)

-x is a sample matrix

若x= ( x 1 , x 2 , . . . , x n ) T (x_1,x_2,...,x_n)^T (x1,x2,...,xn)T is an n-dimensional matrix, that is, n sample variables, cov(x) gets an n×n matrix.
Insert picture description here
The diagonal elements are the variance of each dimension, and the off-diagonal elements are the covariance between different dimensions.c 12 = c 21 c_{12}=c_{21}c12=c21

Covariance cov(x,y)

x=[ a 1 , a 2 , . . . , a m a_1,a_2,...,a_m a1,a2,...,am]
y=[ b 1 , b 2 , . . . , b m b_1,b_2,...,b_m b1,b2,...,bm]
z= ( a 1 a 2 . . . a m b 1 b 2 . . . b m ) \begin{pmatrix} a_1 & a_2 & ... & a_m \\ b_1 & b_2 & ... & b_m \end{pmatrix} (a1b1a2b2......ambm)

cov(x,y) = cov(z)
cov(z) is actually the vertical splicing of the two variables in cov(x,y) together as z to participate in the operation.

Therefore, when calculating the covariance matrix, you must first clarify whether a row of the matrix is ​​a set of samples or a column.

Coefficient of variation cv

Compare the degree of dispersion of the two sets of data. If the measurement scales of the two sets of data are too different, or the data dimensions are different, it is not appropriate to directly use the standard deviation to carry out the measurement. At this time, the influence of the measurement scale and dimension should be eliminated.
cv = (standard deviation s / mean x ˉ \bar{x}xˉ ) × 100% When
performing data analysis, if the coefficient of variation is greater than 15%, it is considered that the data may be abnormal and should be eliminated.

————————————————
Copyright Notice: For part of the content, refer to the article "The Moon is Blue", the
original link: https://blog.csdn.net/lyl771857509/article/details/ 79439184

Guess you like

Origin blog.csdn.net/weixin_43217427/article/details/101055389