Two parameters commonly used to measure a regression model: Pearson correlation coefficient and R-squared
The Pearson correlation coefficient
In statistics, the Pearson correlation coefficient, also known as the Pearson product-moment correlation coefficient (PPMCC or PCCs), is used to measure the relationship between two variables X and Y. Correlation (linear correlation) with a value between -1 and 1.
It can actually be calculated using the following formula:
If it is greater than 0, it means a positive correlation, if it is less than 0, it means a negative correlation, and if it is equal to 0, it means no correlation
Second, the coefficient of determination: R square value
Definition: Response to the proportion of the total variation in the dependent variable that can be explained by the independent variable through a regression relationship.
Specifically for simple linear regression models :
Where (Sum square regression) represents the variance of all predicted values and squared values, (Sum square total) represents the variance of all true values and squared values, (Sum square regression) represents the variance of true values and predicted values
For example, when it is 0.8, it means that 80% can be explained by the model
In practical applications, in order to offset the influence of the sample size on the evaluation parameters, we need to modify the R-squared expression as:
Represents the R-squared value of the sample, represents the sample size, represents the number of predicted values
Three, python code implementation
For simple linear regression, calculate the correlation coefficient and coefficient of determination separately, and verify the formula:
import numpy as np import math x = np.array([1,3,8,7,9]) y = np.array([10,12,24,21,34 ]) #Calculate the correlation def computeCorrelation(x,y): xBar = np.mean(x) yBar = np.mean(y) SSR = 0.0 varX = 0.0 varY = 0.0 for i in range(0,len(x)): diffXXbar = x[i] - xBar difYYbar = y[i] - yBar SSR += (diffXXbar * difYYbar) varX += diffXXbar**2 varY += difYYbar**2 SST = math.sqrt(varX * varY) return SSR/SST #Calculate R squared def polyfit(x,y,degree): results = {} coeffs = np.polyfit(x,y,degree) results['polynomial'] = coeffs.tolist() p = np.poly1d(coeffs) yhat = p(x) ybar = np.sum(y)/ len(y) ssreg = np.sum((yhat - ybar)**2) sstot = e.g. sum ((y - ybar) ** 2 ) results['determination'] = ssreg/sstot return results result = computeCorrelation(x,y) r = result r_2 = result**2 print("r:",r) print("r^2:",r*r) print(polyfit(x,y,1)['determination'])
Through the verification of the results, in the simple linear regression model, it is established