Meaning of covariance

Covariance represents whether two variables deviate from the mean at the same time.

3a6f9c262fc67167d50742c3.png 

If there is a positive correlation, this calculation formula, each sample pair (Xi, Yi), and each summation term are mostly positive numbers, that is, the two deviate from their respective mean values ​​in the same direction, but there are also some deviations that do not deviate at the same time, but less, so that when When there are many samples, the sum result is positive. The picture below is very intuitive. The following is reproduced from: http://blog.csdn.net/wuhzossibility/article/details/8087863

 

In probability theory, the relationship between two random variables X and Y generally has the following three situations:

 

When the joint distribution of X and Y is as shown in the figure above, we can see that there are roughly: the larger X is, the larger Y is, and the smaller X is, the smaller Y is. In this case, we call it "positive correlation".

 

When the joint distribution of X and Y is as shown in the figure above, we can see that there are roughly: the larger X is, the smaller Y is, and the smaller X is, the larger Y is. This situation is called "negative correlation".

When the joint distribution of X, Y is like the above figure, we can see: neither the bigger X is, the bigger Y is, nor is the bigger X is, the smaller Y is, we call this situation "irrelevant".

How to express these three related situations with a simple number?

In the area (1) in the figure, there are X>EX, Y-EY>0, so (X-EX)(Y-EY)>0;

In the area (2) in the figure, there are X<EX, Y-EY>0, so (X-EX)(Y-EY)<0;

In the area (3) in the figure, there are X<EX , Y-EY<0 , so (X-EX)(Y-EY)>0;

In the area (4) in the figure, there are X>EX, Y-EY<0, so (X-EX)(Y-EY)<0.

When X and Y are positively correlated, their distribution is mostly in regions (1) and (3), and a small part in regions (2) and (4), so on average, there is E(X-EX)( Y-EY)>0.

When X and Y are negatively correlated, their distribution is mostly in regions (2) and (4), and a small part in regions (1) and (3), so on average, there is (X-EX)(Y -EY)<0 .

When X and Y are uncorrelated, they are distributed almost as much in regions (1) and (3) as in regions (2) and (4), so on average, there are (X-EX) (Y-EY)=0 .

So, we can define a numerical feature that represents the relationship between X, Y, that is , covariance
the (X, Y) = E (X-EX) (Y-EY)。

When cov(X, Y)>0, it means that X and Y are positively correlated;

When cov(X, Y)<0, it indicates that X and Y are negatively correlated;

When cov(X, Y)=0, it means that X and Y are not related.

That's what covariance means.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324847169&siteId=291194637