Handwriting algorithm-python code to achieve Ridge regression
Introduction to Ridge
In the previous two articles, we introduced over-fitting and regularization, and talked about the principles and characteristics of L1 and L2 regularization more comprehensively;
Link: Principle Analysis-Over-fitting and Regularization
And python code to achieve Lasso regression;
link: handwritten algorithm-python code to achieve Lasso regression
Today, on this basis, we will talk about Ridge regression, which is relatively simple.
This article mainly implements Ridge regression of python code (with L2 regular term), and uses examples to prove the principle.
Ridge regression analysis and python code implementation
Quoting the generated data set from the previous article:
import numpy as np
from matplotlib import pyplot as plt
import sklearn.datasets
#生成100个一元回归数据集
x,y = sklearn.datasets.make_regression(n_features=1,noise=5,random_state=2020)
plt.scatter(x,y)
plt.show()
#加5个异常数据,为什么这么加,大家自己看一下生成的x,y的样子
a = np.linspace(1,2,5).reshape(-1,1)
b = np.array([350,380,410,430,480])
#生成加入异常数据后新的数据集
x_1 = np.r_[x,a]
y_1 = np.r_[y,b]
plt.scatter(x_1,y_1)
plt.show()
The above are the normal data set and the image with 5 abnormal data added. If you use linear regression directly to fit:
class normal():
def __init__(self):
pass
def fit(self,x,y):
m=x.shape[0]
X = np.concatenate((np.ones((m,1)),x),axis=1)
xMat=np.mat(X)
yMat =np.mat(y.reshape(-1,1))
xTx=xMat.T*xMat
#xTx.I为xTx的逆矩阵
ws=xTx.I*xMat.T*yMat
#返回参数
return ws
plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
plt.rcParams['axes.unicode_minus']=False #用来正常显示负号
clf1 =normal()
#拟合原始数据
w1 = clf1.fit(x,y)
#预测数据
y_pred = x * w1[1] + w1[0]
#拟合新数据
w2 = clf1.fit(x_1,y_1)
#预测数据
y_1_pred = x_1 * w2[1] + w2[0]
print('原始样本拟合参数:\n',w1)
print('\n')
print('新样本拟合参数:\n',w2)
ax1= plt.subplot()
ax1.scatter(x_1,y_1,label='样本分布')
ax1.plot(x,y_pred,c='y',label='原始样本拟合')
ax1.plot(x_1,y_1_pred,c='r',label='新样本拟合')
ax1.legend(prop = {'size':15}) #此参数改变标签字号的大小
plt.show()
Because of several abnormal point data, the new fitted regression line parameters have become much larger, from more than 19 points to more than 47 points; deviating from the distribution of actual data, the performance of the model decreases.
We add L2 regular term to tune the model. The following is the loss function of L2 regularization;
Method 1: Gradient descent method to solve Ridge regression parameters
In the previous article, we have deduced the gradient of linear regression and the gradient of L2 regular term. The gradient of this is the addition of the two, forget it, or write it:
write the python code as follows (that is, add L2 to the original linear regression gradient Gradient):
class ridge():
def __init__(self):
pass
#梯度下降法迭代训练模型参数,x为特征数据,y为标签数据,a为学习率,epochs为迭代次数,Lambda为正则项参数
def fit(self,x,y,a,epochs,Lambda):
#计算总数据量
m=x.shape[0]
#给x添加偏置项
X = np.concatenate((np.ones((m,1)),x),axis=1)
#计算总特征数
n = X.shape[1]
#初始化W的值,要变成矩阵形式
W=np.mat(np.ones((n,1)))
#X转为矩阵形式
xMat = np.mat(X)
#y转为矩阵形式,这步非常重要,且要是m x 1的维度格式
yMat =np.mat(y.reshape(-1,1))
#循环epochs次
for i in range(epochs):
gradient = xMat.T*(xMat*W-yMat)/m + Lambda * W
W=W-a * gradient
return W
def predict(self,x,w): #这里的x也要加偏置,训练时x是什么维度的数据,预测也应该保持一样
return np.dot(x,w)
ridge() function to achieve our Ridge regression, example (the following parameters are all after debugging, I confirmed that the model can be converged, continue to increase the number of iterations or change the learning rate, the final model coefficients will not change):
When the Lambda parameter is 0, that is, when the L2 regular term is not added, it is an ordinary linear regression, and the parameter output is the same, which is more than 47 points.
#Lambda=0时;
clf = ridge()
w = clf.fit(x_1,y_1,a = 0.001,epochs = 10000,Lambda=0)
print(w)
#计算新的拟合值
y_1_pred = x_1 * w[1] + w[0]
ax1= plt.subplot()
ax1.scatter(x_1,y_1,label='样本分布')
ax1.plot(x,y_pred,c='y',label='原始样本拟合')
ax1.plot(x_1,y_1_pred,c='r',label='新样本拟合')
ax1.legend(prop = {'size':15}) #此参数改变标签字号的大小
plt.show()
When Lambda = 0.5, the parameter becomes more than 31 points;
#Lambda=0.5时;
clf = ridge()
w = clf.fit(x_1,y_1,a = 0.001,epochs = 10000,Lambda=0.5)
print(w)
#计算新的拟合值
y_1_pred = x_1 * w[1] + w[0]
ax1= plt.subplot()
ax1.scatter(x_1,y_1,label='样本分布')
ax1.plot(x,y_pred,c='y',label='原始样本拟合')
ax1.plot(x_1,y_1_pred,c='r',label='新样本拟合')
ax1.legend(prop = {'size':15}) #此参数改变标签字号的大小
plt.show()
When Lambda = 1.5, the parameter becomes more than 18 points, which is basically the same as the parameter without adding outliers;
#Lambda=1.5时;
clf = ridge()
w = clf.fit(x_1,y_1,a = 0.001,epochs = 10000,Lambda=1.5)
print(w)
#计算新的拟合值
y_1_pred = x_1 * w[1] + w[0]
ax1= plt.subplot()
ax1.scatter(x_1,y_1,label='样本分布')
ax1.plot(x,y_pred,c='y',label='原始样本拟合')
ax1.plot(x_1,y_1_pred,c='r',label='新样本拟合')
ax1.legend(prop = {'size':15}) #此参数改变标签字号的大小
plt.show()
When Lambda = 20, the parameter is more than 2 points, and the fitting line is almost a horizontal line. At this time, it is seriously under-fitting, the loss function value is very large, and the model does not converge at all;
#Lambda=20时;
clf = ridge()
w = clf.fit(x_1,y_1,a = 0.001,epochs = 10000,Lambda=20)
print(w)
#计算新的拟合值
y_1_pred = x_1 * w[1] + w[0]
ax1= plt.subplot()
ax1.scatter(x_1,y_1,label='样本分布')
ax1.plot(x,y_pred,c='y',label='原始样本拟合')
ax1.plot(x_1,y_1_pred,c='r',label='新样本拟合')
ax1.legend(prop = {'size':15}) #此参数改变标签字号的大小
plt.show()
It can be found that the appropriate L2 regular term parameters can prevent over-fitting;
when the Lambda parameter becomes larger and larger, the model parameters become smaller and smaller, slowly approaching zero.
Method 2: standard equation method to achieve Ridge regression
Next, we use the standard equation method to achieve Ridge regression, the derivation formula is as follows: The
python code is implemented as follows:
class standard_ridge():
def __init__(self):
pass
def fit(self,x,y,Lambda):
m = x.shape[0]
X = np.concatenate((np.ones((m,1)),x),axis=1)
xMat= np.mat(X)
yMat = np.mat(y.reshape(-1,1))
xTx = xMat.T * xMat
#生成单位矩阵,2个矩阵行列相等才可以相加
#前面的梯度下降法代码中,我们没有省掉m,因此,我们化简时,也不省掉m,最后形式就是在正则项梯度这里乘以m,其实不会造成本质影响
rxTx = xTx + np.eye(xMat.shape[1]) * Lambda * m
#rxTx.I为rxTx的逆矩阵
w = rxTx.I * xMat.T * yMat
return w
The following is the result of the operation:
basically the result is the same, but this form is more concise and convenient.
Call sklearn to compare
from sklearn.linear_model import Ridge
lr=Ridge(alpha=0)
lr.fit(x_1,y_1)
print('alpha=0时',lr.coef_,'\n')
lr=Ridge(alpha=40)
lr.fit(x_1,y_1)
print('alpha=40时',lr.coef_,'\n')
lr=Ridge(alpha=150)
lr.fit(x_1,y_1)
print('alpha=150时',lr.coef_,'\n')
lr=Ridge(alpha=2000)
lr.fit(x_1,y_1)
print('alpha=2000时',lr.coef_)
sklearn展示Ridge:
1. As the alpha value increases, that is, the coefficient of the regular term increases, the coefficient becomes closer and closer to 0, but there is no equal to 0.
#用波士顿房价回归数据集展示
data = sklearn.datasets.load_boston()
x =data['data']
y= data['target']
lr=Ridge(alpha=0)
lr.fit(x,y)
print('alpha=0时',lr.coef_,'\n')
lr=Ridge(alpha=10)
lr.fit(x,y)
print('alpha=10时',lr.coef_,'\n')
lr=Ridge(alpha=100)
lr.fit(x,y)
print('alpha=100时',lr.coef_,'\n')
lr=Ridge(alpha=1000)
lr.fit(x,y)
print('alpha=1000时',lr.coef_)
Summary: We have introduced the linear regression series here. Because many concepts are introduced for the first time, they are written very meticulously, supplemented by data examples to ensure that readers can understand them, and at the same time manually reproduce these basic concepts. It is clear, and it is convenient to explain complex algorithms later.
Next, we will introduce logistic regression.