Handwriting algorithm-python code to achieve Ridge (L2 regular term) regression

Introduction to Ridge

In the previous two articles, we introduced over-fitting and regularization, and talked about the principles and characteristics of L1 and L2 regularization more comprehensively;
Link: Principle Analysis-Over-fitting and Regularization

And python code to achieve Lasso regression;
link: handwritten algorithm-python code to achieve Lasso regression

Today, on this basis, we will talk about Ridge regression, which is relatively simple.
This article mainly implements Ridge regression of python code (with L2 regular term), and uses examples to prove the principle.

Ridge regression analysis and python code implementation

Quoting the generated data set from the previous article:

import numpy as np
from matplotlib import pyplot as plt
import sklearn.datasets

#生成100个一元回归数据集
x,y = sklearn.datasets.make_regression(n_features=1,noise=5,random_state=2020)
plt.scatter(x,y)
plt.show()

#加5个异常数据,为什么这么加,大家自己看一下生成的x,y的样子
a = np.linspace(1,2,5).reshape(-1,1)
b = np.array([350,380,410,430,480])

#生成加入异常数据后新的数据集
x_1 = np.r_[x,a]
y_1 = np.r_[y,b]

plt.scatter(x_1,y_1)
plt.show()

Insert picture description here
Insert picture description here
The above are the normal data set and the image with 5 abnormal data added. If you use linear regression directly to fit:

class normal():
    def __init__(self):
        pass

    def fit(self,x,y):
        m=x.shape[0]
        X = np.concatenate((np.ones((m,1)),x),axis=1)
        xMat=np.mat(X)
        yMat =np.mat(y.reshape(-1,1))

        xTx=xMat.T*xMat
        #xTx.I为xTx的逆矩阵
        ws=xTx.I*xMat.T*yMat
        
        #返回参数
        return ws
         


plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
plt.rcParams['axes.unicode_minus']=False #用来正常显示负号
clf1 =normal()
#拟合原始数据
w1 = clf1.fit(x,y)
#预测数据
y_pred = x * w1[1] + w1[0]

#拟合新数据
w2 = clf1.fit(x_1,y_1)
#预测数据
y_1_pred = x_1 * w2[1] + w2[0]

print('原始样本拟合参数:\n',w1)
print('\n')
print('新样本拟合参数:\n',w2)

ax1= plt.subplot()
ax1.scatter(x_1,y_1,label='样本分布')
ax1.plot(x,y_pred,c='y',label='原始样本拟合')
ax1.plot(x_1,y_1_pred,c='r',label='新样本拟合')
ax1.legend(prop = {'size':15}) #此参数改变标签字号的大小
plt.show()

Insert picture description here
Because of several abnormal point data, the new fitted regression line parameters have become much larger, from more than 19 points to more than 47 points; deviating from the distribution of actual data, the performance of the model decreases.

We add L2 regular term to tune the model. The following is the loss function of L2 regularization;
Insert picture description here

Method 1: Gradient descent method to solve Ridge regression parameters

In the previous article, we have deduced the gradient of linear regression and the gradient of L2 regular term. The gradient of this is the addition of the two, forget it, or write it:
Insert picture description here
write the python code as follows (that is, add L2 to the original linear regression gradient Gradient):

class ridge():
    def __init__(self):
        pass
    
    #梯度下降法迭代训练模型参数,x为特征数据,y为标签数据,a为学习率,epochs为迭代次数,Lambda为正则项参数
    def fit(self,x,y,a,epochs,Lambda):  
        #计算总数据量
        m=x.shape[0]
        #给x添加偏置项
        X = np.concatenate((np.ones((m,1)),x),axis=1)
        #计算总特征数
        n = X.shape[1]
        #初始化W的值,要变成矩阵形式
        W=np.mat(np.ones((n,1)))
        #X转为矩阵形式
        xMat = np.mat(X)
        #y转为矩阵形式,这步非常重要,且要是m x 1的维度格式
        yMat =np.mat(y.reshape(-1,1))
        #循环epochs次
        for i in range(epochs):
            gradient = xMat.T*(xMat*W-yMat)/m + Lambda * W
            W=W-a * gradient
        return W
    def predict(self,x,w):  #这里的x也要加偏置,训练时x是什么维度的数据,预测也应该保持一样
        return np.dot(x,w)

ridge() function to achieve our Ridge regression, example (the following parameters are all after debugging, I confirmed that the model can be converged, continue to increase the number of iterations or change the learning rate, the final model coefficients will not change):

When the Lambda parameter is 0, that is, when the L2 regular term is not added, it is an ordinary linear regression, and the parameter output is the same, which is more than 47 points.

#Lambda=0时;
clf = ridge()
w = clf.fit(x_1,y_1,a = 0.001,epochs = 10000,Lambda=0)
print(w)

#计算新的拟合值
y_1_pred = x_1 * w[1] + w[0]

ax1= plt.subplot()
ax1.scatter(x_1,y_1,label='样本分布')
ax1.plot(x,y_pred,c='y',label='原始样本拟合')
ax1.plot(x_1,y_1_pred,c='r',label='新样本拟合')
ax1.legend(prop = {'size':15}) #此参数改变标签字号的大小
plt.show()

Insert picture description here

When Lambda = 0.5, the parameter becomes more than 31 points;

#Lambda=0.5时;
clf = ridge()
w = clf.fit(x_1,y_1,a = 0.001,epochs = 10000,Lambda=0.5)
print(w)

#计算新的拟合值
y_1_pred = x_1 * w[1] + w[0]

ax1= plt.subplot()
ax1.scatter(x_1,y_1,label='样本分布')
ax1.plot(x,y_pred,c='y',label='原始样本拟合')
ax1.plot(x_1,y_1_pred,c='r',label='新样本拟合')
ax1.legend(prop = {'size':15}) #此参数改变标签字号的大小
plt.show()

Insert picture description here

When Lambda = 1.5, the parameter becomes more than 18 points, which is basically the same as the parameter without adding outliers;

#Lambda=1.5时;
clf = ridge()
w = clf.fit(x_1,y_1,a = 0.001,epochs = 10000,Lambda=1.5)
print(w)

#计算新的拟合值
y_1_pred = x_1 * w[1] + w[0]

ax1= plt.subplot()
ax1.scatter(x_1,y_1,label='样本分布')
ax1.plot(x,y_pred,c='y',label='原始样本拟合')
ax1.plot(x_1,y_1_pred,c='r',label='新样本拟合')
ax1.legend(prop = {'size':15}) #此参数改变标签字号的大小
plt.show()

Insert picture description here

When Lambda = 20, the parameter is more than 2 points, and the fitting line is almost a horizontal line. At this time, it is seriously under-fitting, the loss function value is very large, and the model does not converge at all;

#Lambda=20时;
clf = ridge()
w = clf.fit(x_1,y_1,a = 0.001,epochs = 10000,Lambda=20)
print(w)

#计算新的拟合值
y_1_pred = x_1 * w[1] + w[0]

ax1= plt.subplot()
ax1.scatter(x_1,y_1,label='样本分布')
ax1.plot(x,y_pred,c='y',label='原始样本拟合')
ax1.plot(x_1,y_1_pred,c='r',label='新样本拟合')
ax1.legend(prop = {'size':15}) #此参数改变标签字号的大小
plt.show()

Insert picture description here
It can be found that the appropriate L2 regular term parameters can prevent over-fitting;
when the Lambda parameter becomes larger and larger, the model parameters become smaller and smaller, slowly approaching zero.

Method 2: standard equation method to achieve Ridge regression

Next, we use the standard equation method to achieve Ridge regression, the derivation formula is as follows: The
Insert picture description herepython code is implemented as follows:

class standard_ridge():
    def __init__(self):
        pass

    def fit(self,x,y,Lambda):
        m = x.shape[0]
        X = np.concatenate((np.ones((m,1)),x),axis=1)
        xMat= np.mat(X)
        yMat = np.mat(y.reshape(-1,1))

        xTx = xMat.T * xMat
        #生成单位矩阵,2个矩阵行列相等才可以相加
        #前面的梯度下降法代码中,我们没有省掉m,因此,我们化简时,也不省掉m,最后形式就是在正则项梯度这里乘以m,其实不会造成本质影响
        rxTx = xTx + np.eye(xMat.shape[1]) * Lambda * m
        
        #rxTx.I为rxTx的逆矩阵
        w = rxTx.I * xMat.T * yMat
        
        return w

The following is the result of the operation:
Insert picture description here
basically the result is the same, but this form is more concise and convenient.

Call sklearn to compare

from sklearn.linear_model import Ridge
lr=Ridge(alpha=0)
lr.fit(x_1,y_1)
print('alpha=0时',lr.coef_,'\n')

lr=Ridge(alpha=40)
lr.fit(x_1,y_1)
print('alpha=40时',lr.coef_,'\n')

lr=Ridge(alpha=150)
lr.fit(x_1,y_1)
print('alpha=150时',lr.coef_,'\n')

lr=Ridge(alpha=2000)
lr.fit(x_1,y_1)
print('alpha=2000时',lr.coef_)

Insert picture description here

sklearn展示Ridge:

1. As the alpha value increases, that is, the coefficient of the regular term increases, the coefficient becomes closer and closer to 0, but there is no equal to 0.

#用波士顿房价回归数据集展示
data =  sklearn.datasets.load_boston()
x =data['data']
y= data['target']

lr=Ridge(alpha=0)
lr.fit(x,y)
print('alpha=0时',lr.coef_,'\n')

lr=Ridge(alpha=10)
lr.fit(x,y)
print('alpha=10时',lr.coef_,'\n')

lr=Ridge(alpha=100)
lr.fit(x,y)
print('alpha=100时',lr.coef_,'\n')

lr=Ridge(alpha=1000)
lr.fit(x,y)
print('alpha=1000时',lr.coef_)

Insert picture description here
Summary: We have introduced the linear regression series here. Because many concepts are introduced for the first time, they are written very meticulously, supplemented by data examples to ensure that readers can understand them, and at the same time manually reproduce these basic concepts. It is clear, and it is convenient to explain complex algorithms later.

Next, we will introduce logistic regression.

Guess you like

Origin blog.csdn.net/weixin_44700798/article/details/110738525