import numpy as np import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt
In [2]:
# Create a matrix A = np.array ([[. 1, 2], [. 3,. 4 ]]) m = np.mat(a) m
[2]:
In [4]:
# Matrix operation Review # matrix transposing mT # matrix multiplication of m * m A * A # matrix ⾏ inline np.linalg.det (m) # inverse matrix of mI # converted into Array M.A. # dimensionality reduction in one dimension m.fattlen
Out[4]:
DataFrame format input data is assumed, as a final label value, write custom function regression (least squares) on the basis of
In [ ]:
# Matrix formula w = (xT * x) .I * XT * y
In [53]:
# The derivation of the least square method to obtain w = (xT * x) .I * XT * y NOTE: If (xT * X) is not satisfied reversible, then the least squares solution without the other convex function is not satisfied, and no solution # and because the feature matrix in the presence of multicollinearity, reversibility feature matrix is not satisfied, so before doing the regression, the need to eliminate multicollinearity DEF standRegres (dataSet): # convert DataFrame to convert array in matrix, because DateFrame each column of data may not be the same, can not be directly calculated, converted into matirx Meanwhile, the unified data format will xMat = np.mat (dataSet.iloc [:,: -1 ] .values) yMat = np.mat(dataSet.iloc[:, -1].values).T xTx = xMat.T * xMat IF np.linalg.det (xTx) == 0: # determine whether xTx is full rank matrix, if dissatisfied with the operation of rank, are impossible ⾏ law into its inverse matrix Print ( " This the Matrix Singular iS, CAN not do inverse " ) return WS = xTx.I * (xMat.T * yMat) return WS # individual cases from Note that when Using the matrix factorization to solve multiple linear regression, must Add an columns are all column 1 Use ⽅ process for characterizing the linear intercept b.
In [54]:
ex0 = pd.read_table('ex0.txt', header=None)
ex0.head()
Out[54]:
In [55]:
= ws standRegres (ex0) ws # Returns the result is a feature weight for each column, wherein the first frame data set values are column 1, so the returned results first frame components an intercept
Out[55]:
In [56]:
# Visual display yhat = ex0.iloc [:,: -1] * .values WS plt.plot (ex0.iloc [:, 1] ex0.iloc [:, 2] ' or ' ) plt.plot (ex0.iloc [:, 1] yhat)
Out[56]:
Model evaluation and SSE residual level ⽅
In [23]:
y = ex0.iloc [:, -1 ] .values yhat = yhat.flatten() rss = np.power(yhat - y, 2).sum() rss
Out[23]:
In [26]:
# The SSE make a package DEF sseCal (dataSet A, regres): # set parameters for data sets and the regression methods n-= dataSet.shape [0] y = dataSet.iloc[:, -1].values ws = regres(dataSet) yhat = dataSet.iloc[:, :-1].values * ws yhat = yhat.reshape([n,]) rss = np.power(yhat - y, 2).sum() return rss
In [29]:
sseCal (ex0, standRegres)
Out[29]:
Evaluation model coefficients determined R_square, the distribution coefficient determined in [0, 1], and the more close to 1, the better the fit.
In [21]:
sse = sseCal (ex0, standRegres) and = ex0.iloc [:, -1 ] .values sst = np.power(y - y.mean(), 2).sum() 1 - sse / sst
Out[21]:
In [31]:
# Encapsulation R ** 2 2 DEF rSquare (dataSet A, regres): # Set parameter dataset and regression SSE = sseCal (dataSet A, regres) y = dataSet.iloc[:, -1].values sst = np.power(y - y.mean(), 2).sum() return 1 - sse / sst
In [32]:
rSquare (ex0, standRegres)
Out[32]:
Linear regression Scikit-Learn realization
In [60]:
from sklearn import linear_model reg = linear_model.LinearRegression(fit_intercept=True) reg.fit (ex0.iloc [:,: -1] .values, ex0.iloc [: - 1] .values)
Out[60]:
In [61]:
reg.coef_ # return coefficient Out [61 is ]: array([0. , 1.69532264]) In [62]: reg.intercept_ # Returns intercept Out [62 is ]: 3.0077432426975905
Model MSE is then calculated and the coefficient of determination
In [63]:
from sklearn.metrics import mean_squared_error, r2_score yhat = reg.predict (ex0.iloc [:, -1 ]) mean_squared_error(y, yhat)
Out[63]:
In [64]:
mean_squared_error(y, yhat)*ex0.shape[0]
Out[64]:
In [65]:
r2_score(y, yhat)
Out[65]: