Statistics of small bits and pieces (please add at any time)

Knowledge point 1: Covariance and correlation coefficient

[Definition 1]
The covariance of random variables X and Y is recorded as cov(X,Y), as follows:
cov (X, Y) = E [(X − EX) (Y − EY)] cov(X,Y) = E[(X-EX)(Y-EY)]c o v ( X ,And )=And [ ( XE X ) ( YEY)]
X ∗ = X − E X D X X^* = \frac{X-EX}{\sqrt{DX}} X=DX XEX, Y ∗ = Y - EYDYY ^ * = \ frac {Y-EY} {\ sqrt {DY}}Y=D Y Y - E Y , 称
r X Y = E X ∗ Y ∗ = c o v ( X , Y ) D X D Y r_{XY} = EX^*Y^* = \frac{cov(X,Y)}{\sqrt{DX}\sqrt{DY}} rXY=EXY=DX D Y c o v ( X ,Y )
Is the correlation coefficient of random variables X and Y. The correlation coefficient values [-1,1], the measured X, Y of the linear correlation . If r XY = 0 r_{XY}=0rXY=0 , it is said that X and Y are not related.

Correlation is a standardized format for covariance . Covariance itself is difficult to compare. For example: if we calculate the covariance of salary ($) and age (years), because these two variables have different measures, we will get different covariances that cannot be compared. To solve this problem, we calculate the correlation to get a value between -1 and 1, we can ignore their different metrics.

Guess you like

Origin blog.csdn.net/weixin_41332009/article/details/113834080