gradient descent python code implementations (multiple linear regression)

gradient descent python code implementations (multiple linear regression minimizing loss function)

1, the gradient descent method is mainly used to minimize the loss function, is a relatively common method of optimization, which specifically includes the following two different ways: batch gradient descent method (the fastest search along the direction of least gradient value) and the stochastic gradient descent method (mainly stochastic gradient descent, by iteration convergence to a minimum)

(1) batch gradient descent

(2) stochastic gradient descent method ( learning rate eta increases with reduced training times continuously, using the principle of simulated annealing, is no longer a fixed value)

 

 

2, the multiple linear regression mathematical quantized gradient descent calculation principle :

 

 

3, python principle function codes to achieve the following two methods:
(1) batch gradient descent method:
# multiple linear regression using a gradient descent method to obtain the minimum loss function
Import numpy AS NP
Import matplotlib.pyplot PLT AS
NP. random.seed (666)
X = np.random.random (size = 100)
Y = X + 3.0 * + np.random.normal. 4 (size = 100)
X-x.reshape = (-1,1)
Print (X- )
Print (x.shape)
Print (y.shape)
plt.scatter (X, Y)
plt.show ()
Print (X-)
Print (len (X-))

#1使用梯度下降法训练
def J1(theta,x_b,y):
return np.sum((y-x_b.dot(theta))**2)/len(x_b)
def DJ2(theta,x_b,y):
res=np.empty(len(theta))
res[0]=np.sum(x_b.dot(theta)-y)
for i in range(1,len(theta)):
res[i]=np.sum((x_b.dot(theta)-y).dot(x_b[:,i]))
return res*2/len(x_b)
def DJ1(theta, x_b, y):
return x_b.T.dot(x_b.dot(theta)-y)*2/len(y)
def gradient_descent1(x_b,y,eta,theta_initial,erro=1e-8, n=1e4):
theta=theta_initial
i=0
while i<n:
gradient = DJ1(theta,x_b,y)
last_theta = theta
theta = theta - gradient * eta
if (abs(J1(theta,x_b,y) - J1(last_theta,x_b,y))) < erro:
break
i+=1
return theta
x_b=np.hstack([np.ones((len(X),1)),X])
print(x_b)
theta0=np.zeros(x_b.shape[1])
eta=0.1
theta1=gradient_descent1(x_b,y,eta,theta0)
print(theta1)

from sklearn.linear_model import LinearRegression
l=LinearRegression()
l.fit(X,y)
print(l.coef_)
print(l.intercept_)

#2随机梯度下降法的函数原理代码(多元线性回归为例):
#1-1写出损失函数的表达式子
def J_SGD(theta, x_b, y):
return np.sum((y - x_b.dot(theta)) ** 2) / len(x_b)
#1-2写出梯度胡表达式
def DJ_SGD(theta, x_b_i, y_i):
return x_b_i.T.dot(x_b_i.dot(theta)-y_i)*2
#1-3写出SGD随机梯度的函数形式
def SGD(x_b, y, theta_initial, n):
t0=5
t1=50
def learning_rate(t):
return t0/(t+t1) #计算学习率eta的表达式,需要随着次数的增大而不断的减小
theta = theta_initial #定义初始化的点(列阵)
for i1 in range(n): #采用不断增加次数迭代计算的方式来进行相关的计算
rand_i=np.random.randint(len(x_b)) #生成随机的索引值,计算随机梯度
gradient = DJ_SGD(theta, x_b[rand_i], y[rand_i])
theta = theta - gradient *learning_rate(i1)
return theta
np.random.seed(666)
x=np.random.random(size=100)
y=x*3.0+4+np.random.normal(size=100)
X=x.reshape(-1,1)
print(X)
print(x.shape)
print(y.shape)
plt.scatter(x,y)
plt.show()
print(X)
print(len(X))
#1-4初始化数据x,y以及定义超参数theta0,迭代次数n
x_b=np.hstack([np.ones((len(X),1)),X])
print(x_b)
theta0=np.zeros(x_b.shape[1])
theta1=SGD(x_b,y,theta0,100000)
print(theta1)

 

Guess you like

Origin www.cnblogs.com/Yanjy-OnlyOne/p/11311747.html