The concept of correlation coefficient and coefficient of determination in day-14 regression and Python implementation

  Two parameters commonly used to measure a regression model: Pearson correlation coefficient and R-squared

The Pearson correlation coefficient

  In statistics, the Pearson correlation coefficient, also known as the Pearson product-moment correlation coefficient (PPMCC or PCCs), is used to measure the relationship between two variables X and Y. Correlation (linear correlation) with a value between -1 and 1.

   

  It can actually be calculated using the following formula:

  

  If it is greater than 0, it means a positive correlation, if it is less than 0, it means a negative correlation, and if it is equal to 0, it means no correlation

Second, the coefficient of determination: R square value

         Definition: Response to the proportion of the total variation in the dependent variable that can be explained by the independent variable through a regression relationship.

        

         Specifically for simple linear regression models :

  Where (Sum square regression) represents the variance of all predicted values ​​and squared values, (Sum square total) represents the variance of all true values ​​and squared values, (Sum square regression) represents the variance of true values ​​and predicted values

            

         For example, when it is 0.8, it means that 80% can be explained by the model

         In practical applications, in order to offset the influence of the sample size on the evaluation parameters, we need to modify the R-squared expression as:

        

         Represents the R-squared value of the sample, represents the sample size, represents the number of predicted values

Three, python code implementation

  For simple linear regression, calculate the correlation coefficient and coefficient of determination separately, and verify the formula:

import numpy as np
import math

x = np.array([1,3,8,7,9])
y = np.array([10,12,24,21,34 ])

#Calculate the correlation 
def computeCorrelation(x,y):
    xBar = np.mean(x)
    yBar = np.mean(y)
    SSR = 0.0
    varX = 0.0
    varY = 0.0
    for i in range(0,len(x)):
        diffXXbar = x[i] - xBar
        difYYbar = y[i] - yBar
        SSR += (diffXXbar * difYYbar)
        varX += diffXXbar**2
        varY += difYYbar**2
    SST = math.sqrt(varX * varY)
    return SSR/SST

#Calculate R squared 
def polyfit(x,y,degree):
    results = {}
    coeffs = np.polyfit(x,y,degree)
    results['polynomial'] = coeffs.tolist()
    p = np.poly1d(coeffs)
    yhat = p(x)
    ybar = np.sum(y)/ len(y)
    ssreg = np.sum((yhat - ybar)**2)
    sstot = e.g. sum ((y - ybar) ** 2 )
    results['determination'] = ssreg/sstot
    return results

result = computeCorrelation(x,y)
r = result
r_2 = result**2
print("r:",r)
print("r^2:",r*r)
print(polyfit(x,y,1)['determination'])

  Through the verification of the results, in the simple linear regression model, it is established

  

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325221652&siteId=291194637