The formula is defined as: the pearson correlation coefficient (Px,y) of two continuous variables (X,Y) is equal to the covariance cov(X,Y) between them divided by the product of their respective standard deviations (σX,σY). The value of the coefficient is always between -1.0 and 1.0, variables close to 0 are considered uncorrelated, and variables close to 1 or -1 are said to have strong correlation.
The Pearson correlation coefficient is a measure of the degree of linear correlation. A geometric interpretation of p is that it represents the cosine of the angle between the vectors formed by the values of the two variables after the mean concentration.
According to the above formula, python3 implements the code:
def pearson(vector1, vector2): n = only (vector1) #simple sums sum1 = sum(float(vector1[i]) for i in range(n)) sum2 = sum(float(vector2[i]) for i in range(n)) #sum up the squares sum1_pow = sum([pow(v, 2.0) for v in vector1]) sum2_pow = sum([pow(v, 2.0) for v in vector2]) #sum up the products p_sum = sum([vector1[i]*vector2[i] for i in range(n)]) #Numerator num, denominator den num = p_sum - (sum1*sum2/n) den = math.sqrt((sum1_pow-pow(sum1, 2)/n)*(sum2_pow-pow(sum2, 2)/n)) if den == 0: return 0.0 return num / denNow, test it with two vectors:
vector1 = [2,7,18,88,157,90,177,570]
vector2 = [3,5,15,90,180, 88,160,580]
The running result is 0.998, which shows that these two groups of numbers are highly positively correlated.