A-- multiple linear regression python achieve

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

In [2]:

# Create a matrix 
A = np.array ([[. 1, 2], [. 3,. 4 ]])
m = np.mat(a) 
m

[2]:

matrix([[1, 2],
        [3, 4]])
In [4]:
# Matrix operation Review 
# matrix transposing 
mT
 # matrix multiplication of 
m * m
A * A
 # matrix ⾏ inline 
np.linalg.det (m)
 # inverse matrix 
of mI
 # converted into Array 
M.A.
 # dimensionality reduction in one dimension 
m.fattlen
Out[4]:
matrix([[-2. ,  1. ],
        [ 1.5, -0.5]])
 

DataFrame format input data is assumed, as a final label value, write custom function regression (least squares) on the basis of

In [ ]:
# Matrix formula 
w = (xT * x) .I * XT * y
In [53]:
# The derivation of the least square method to obtain w = (xT * x) .I * XT * y NOTE: If (xT * X) is not satisfied reversible, then the least squares solution without the other convex function is not satisfied, and no solution 
# and because the feature matrix in the presence of multicollinearity, reversibility feature matrix is not satisfied, so before doing the regression, the need to eliminate multicollinearity 
DEF standRegres (dataSet):
     # convert DataFrame to convert array in matrix, because DateFrame each column of data may not be the same, can not be directly calculated, converted into matirx Meanwhile, the unified data format will 
    xMat = np.mat (dataSet.iloc [:,: -1 ] .values)
    yMat = np.mat(dataSet.iloc[:, -1].values).T
    xTx = xMat.T * xMat
     IF np.linalg.det (xTx) == 0: # determine whether xTx is full rank matrix, if dissatisfied with the operation of rank, are impossible ⾏ law into its inverse matrix 
        Print ( " This the Matrix Singular iS, CAN not do inverse " )
         return 
    WS = xTx.I * (xMat.T * yMat)
     return WS
 # individual cases from Note that when Using the matrix factorization to solve multiple linear regression, must Add an columns are all column 1 Use ⽅ process for characterizing the linear intercept b.

In [54]:

ex0 = pd.read_table('ex0.txt', header=None)
ex0.head()
Out[54]:
  0 1 2
0 1.0 0.067732 3.176513
1 1.0 0.427810 3.816464
2 1.0 0.995731 4.550095
3 1.0 0.738336 4.256571
4 1.0 0.981083 4.560815
In [55]:
= ws standRegres (ex0)
ws
# Returns the result is a feature weight for each column, wherein the first frame data set values are column 1, so the returned results first frame components an intercept

Out[55]:

matrix([[3.00774324],
        [1.69532264]])
In [56]:
# Visual display 
yhat = ex0.iloc [:,: -1] * .values WS
plt.plot (ex0.iloc [:, 1] ex0.iloc [:, 2] ' or ' )
plt.plot (ex0.iloc [:, 1] yhat)

Out[56]:

[<matplotlib.lines.Line2D at 0x215fd3146a0>]
 
 

Model evaluation and SSE residual level ⽅

In [23]:
y = ex0.iloc [:, -1 ] .values
yhat = yhat.flatten()
rss = np.power(yhat - y, 2).sum()
rss
Out[23]:
1.3552490816814904
In [26]:
# The SSE make a package 
DEF sseCal (dataSet A, regres): # set parameters for data sets and the regression methods 
    n-= dataSet.shape [0]
    y = dataSet.iloc[:, -1].values
    ws = regres(dataSet)
    yhat = dataSet.iloc[:, :-1].values * ws
    yhat = yhat.reshape([n,])
    rss = np.power(yhat - y, 2).sum()
    return rss
 

In [29]:

sseCal (ex0, standRegres)
Out[29]:
1.3552490816814904
 

Evaluation model coefficients determined R_square, the distribution coefficient determined in [0, 1], and the more close to 1, the better the fit.

In [21]:
sse = sseCal (ex0, standRegres)
and = ex0.iloc [:, -1 ] .values
sst = np.power(y - y.mean(), 2).sum()
1 - sse / sst

 

Out[21]:
0.9731300889856916
In [31]:
 
# Encapsulation R ** 2
2
DEF rSquare (dataSet A, regres): # Set parameter dataset and regression 
    SSE = sseCal (dataSet A, regres)
    y = dataSet.iloc[:, -1].values
    sst = np.power(y - y.mean(), 2).sum()
    return 1 - sse / sst

 

In [32]:
rSquare (ex0, standRegres)
Out[32]:
0.9731300889856916
 

Linear regression Scikit-Learn realization

In [60]:
from sklearn import linear_model
reg = linear_model.LinearRegression(fit_intercept=True)
reg.fit (ex0.iloc [:,: -1] .values, ex0.iloc [: - 1] .values)

Out[60]:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [61]:
reg.coef_ # return coefficient 
Out [61 is ]:
array([0.        , 1.69532264])
In [62]:
reg.intercept_ # Returns intercept 
Out [62 is ]:
 3.0077432426975905

Model MSE is then calculated and the coefficient of determination

In [63]:
from sklearn.metrics import mean_squared_error, r2_score
yhat = reg.predict (ex0.iloc [:, -1 ])
mean_squared_error(y, yhat)
Out[63]:
0.006776245408407454
In [64]:
mean_squared_error(y, yhat)*ex0.shape[0]
Out[64]:
1.3552490816814908
In [65]:
r2_score(y, yhat)
Out[65]:
0.9731300889856916

Guess you like

Origin www.cnblogs.com/Koi504330/p/11909381.html