Pearson correlation coefficient

The formula is defined as:  the pearson correlation coefficient (Px,y) of two continuous variables (X,Y) is equal to the covariance cov(X,Y) between them divided by the product of their respective standard deviations (σX,σY). The value of the coefficient is always between -1.0 and 1.0, variables close to 0 are considered uncorrelated, and variables close to 1 or -1 are said to have strong correlation.

    The Pearson correlation coefficient is a measure of the degree of linear correlation. A geometric interpretation of p is that it represents the cosine of the angle between the vectors formed by the values ​​of the two variables after the mean concentration.

According to the above formula, python3 implements the code:

def pearson(vector1, vector2):  
    n = only (vector1)  
    #simple sums  
    sum1 = sum(float(vector1[i]) for i in range(n))  
    sum2 = sum(float(vector2[i]) for i in range(n))  
    #sum up the squares  
    sum1_pow = sum([pow(v, 2.0) for v in vector1])  
    sum2_pow = sum([pow(v, 2.0) for v in vector2])  
    #sum up the products  
    p_sum = sum([vector1[i]*vector2[i] for i in range(n)])  
    #Numerator num, denominator den  
    num = p_sum - (sum1*sum2/n)  
    den = math.sqrt((sum1_pow-pow(sum1, 2)/n)*(sum2_pow-pow(sum2, 2)/n))  
    if den == 0:  
        return 0.0  
    return num / den  
Now, test it with two vectors:

vector1 = [2,7,18,88,157,90,177,570]

vector2 = [3,5,15,90,180, 88,160,580]

The running result is 0.998, which shows that these two groups of numbers are highly positively correlated.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325995970&siteId=291194637