Machine Learning---Gradient Descent Code

1. Normalization

# Read data from csv
pga = pd.read_csv("pga.csv")
print(type(pga))

print(pga.head())

# Normalize the data 归一化值 (x - mean) / (std)
pga.distance = (pga.distance - pga.distance.mean()) / pga.distance.std()
pga.accuracy = (pga.accuracy - pga.accuracy.mean()) / pga.accuracy.std()
print(pga.head())

plt.scatter(pga.distance, pga.accuracy)
plt.xlabel('normalized distance')
plt.ylabel('normalized accuracy')
plt.show()

2. Linear regression 

from sklearn.linear_model import LinearRegression
import numpy as np

# We can add a dimension to an array by using np.newaxis
print("Shape of the series:", pga.distance.shape)
print("Shape with newaxis:", pga.distance[:, np.newaxis].shape)

# The X variable in LinearRegression.fit() must have 2 dimensions
lm = LinearRegression()
lm.fit(pga.distance[:, np.newaxis], pga.accuracy)
theta1 = lm.coef_[0]
print (theta1)

       This code is an example showing how to use np.newaxisand LinearRegressionto perform linear regression.

       First, convert np.newaxisa one-dimensional array into a two-dimensional number by adding a new dimensionpga.distance

Group. By printing the shape of the array, you can see that np.newaxisbefore adding, pga.distanceit is a one-dimensional array with the shape

is (n,), and after adding np.newaxis, the shape becomes (n, 1).

LinearRegressionThen, an instance        is created lm. Use lm.fit()the method to convert the converted feature data

pga.distance[:, np.newaxis]and target data pga.accuracyas parameters to train the linear regression model.

combine.

       Finally, by lm.coef_obtaining the trained model coefficients (weights) and assigning the coefficient of the first feature to the variable

theta1. pga.distanceand pga.accuracyare sample data, you need to replace them with your own data according to the actual situation.

3. Cost function

# The cost function of a single variable linear model# The c 
# 单变量 代价函数
def cost(theta0, theta1, x, y):
    # Initialize cost
    J = 0
    # The number of observations
    m = len(x)
    # Loop through each observation
    # 通过每次观察进行循环
    for i in range(m):
        # Compute the hypothesis 
        # 计算假设
        h = theta1 * x[i] + theta0
        # Add to cost
        J += (h - y[i])**2
    # Average and normalize cost
    J /= (2*m)
    return J

# The cost for theta0=0 and theta1=1
print(cost(0, 1, pga.distance, pga.accuracy))

theta0 = 100
theta1s = np.linspace(-3,2,100)
costs = []
for theta1 in theta1s:
    costs.append(cost(theta0, theta1, pga.distance, pga.accuracy))

plt.plot(theta1s, costs)
plt.show()

       The cost function of a simple univariate linear regression model is implemented and calculated given a set of parameters theta0andtheta1

the price of the situation. In this code, cost()the function accepts four parameters: theta0and theta1are the parameters of the linear model,

xis the input feature and yis the target variable. The goal of the function is to calculate the cost of the model.

       First, the initialization cost Jis 0. Then, by looping through each observation, the model's predicted value is calculated h. cost Jthrough tiring

Calculated by adding the squared error of each observation. Finally, divide the cost Jby twice the number of observations to average and normalize

price. In the second half of this code, using a given theta0value and a set of theta1values, each theta1corresponding

price and store the result in costsa list. Then, use plt.plot()the sum theta1sand costsplot to show the generation

The tendency of the valence function theta1to change with the change of .

4. Draw three-dimensional diagrams

import numpy as np
from mpl_toolkits.mplot3d import Axes3D

# Example of a Surface Plot using Matplotlib
# Create x an y variables
x = np.linspace(-10,10,100)
y = np.linspace(-10,10,100)

# We must create variables to represent each possible pair of points in x and y
# ie. (-10, 10), (-10, -9.8), ... (0, 0), ... ,(10, 9.8), (10,9.8)
# x and y need to be transformed to 100x100 matrices to represent these coordinates
# np.meshgrid will build a coordinate matrices of x and y
X, Y = np.meshgrid(x,y)
#print(X[:5,:5],"\n",Y[:5,:5])

# Compute a 3D parabola 
Z = X**2 + Y**2 

# Open a figure to place the plot on
fig = plt.figure()
# Initialize 3D plot
ax = fig.gca(projection='3d')
# Plot the surface
ax.plot_surface(X=X,Y=Y,Z=Z)

plt.show()

# Use these for your excerise 
theta0s = np.linspace(-2,2,100)
theta1s = np.linspace(-2,2, 100)
COST = np.empty(shape=(100,100))
# Meshgrid for paramaters 
T0S, T1S = np.meshgrid(theta0s, theta1s)
# for each parameter combination compute the cost
for i in range(100):
    for j in range(100):
        COST[i,j] = cost(T0S[0,i], T1S[j,0], pga.distance, pga.accuracy)

# make 3d plot
fig2 = plt.figure()
ax = fig2.gca(projection='3d')
ax.plot_surface(X=T0S,Y=T1S,Z=COST)
plt.show()

 

       Use Matplotlib to draw three-dimensional graphics, including a quadratic surface plot and a cost function plot.

       First, by using np.linspace()the function, 100 points are created at equal intervals from -10 to 10 and assigned to variables respectively.

xand y.

       Next, use np.meshgrid()the function to xconvert ysum into a 100x100 grid matrix and assign values ​​to Xsum respectively Y. this

Thus, Xeach Yelement in the sum matrix represents an (x, y) coordinate pair.

Z = X**2 + Y**2Then, a matrix is ​​calculated        based on the quadratic surface equation Z, where Zeach element in the matrix represents the corresponding coordinate

The height of the punctuation point.

       By plt.figure()creating a new figure, and by fig.gca(projection='3d')initializing a three-dimensional figure

coordinate system. Use ax.plot_surface()functions to draw surface plots, where X, Yand Zrepresent the X, Y, and Z matrices respectively.

       Finally, use plt.show()display graphics.

       In the second half of the code, two arrays containing 100 uniformly distributed values ​​​​are first created theta0sand theta1sdivided into

Do not indicate the value range of theta0 and theta1.

       Next, use np.empty()create an empty 100x100 array COSTto store the calculation results of the cost function.

Convert the sum to a grid matrix sum by using np.meshgrid()the function .theta0stheta1sT0ST1S

       Then, iterate through all possible parameter combinations through two nested loops and use cost()a function to calculate each parameter combination pair

corresponding cost and store the result in COSTan array.

       Finally, create a plt.figure()new graph usingfig.gca(projection='3d')

The coordinate system of dimensional graphics. Use ax.plot_surface()the function to draw the surface plot of the cost function, where X, Yand Zrepresent respectively

T0S, T1Sand COSTmatrix. Use plt.show()display graphics.

5. Derivative function

The partial derivative formula of the linear regression model can be derived by minimizing the cost function. The following is the derivation process:

The linear regression model assumes that the function is: h(x) = theta0 + theta1 * x

The cost function is the mean squared error function: J(theta0, theta1) = (1/2m) * Σ(h(x) - y)^2

where m is the sample size, h(x) is the predicted value of the model, and y is the observed value.

In order to solve for the optimal model parameters theta0 and theta1, we need to calculate the partial derivatives of the cost function for these two parameters.

First, calculate the partial derivative of the cost function with respect to theta0:

∂J/∂theta0 = (1/m) * Σ(h(x) - y)

Then, calculate the partial derivative of the cost function with respect to theta1:

∂J/∂theta1 = (1/m) * Σ(h(x) - y) * x


# 对 theta1 进行求导# 对 thet 
def partial_cost_theta1(theta0, theta1, x, y):
    # Hypothesis
    h = theta0 + theta1*x
    # Hypothesis minus observed times x
    diff = (h - y) * x
    # Average to compute partial derivative
    partial = diff.sum() / (x.shape[0])
    return partial

partial1 = partial_cost_theta1(0, 5, pga.distance, pga.accuracy)
print("partial1 =", partial1)

# 对theta0 进行求导
# Partial derivative of cost in terms of theta0
def partial_cost_theta0(theta0, theta1, x, y):
    # Hypothesis
    h = theta0 + theta1*x
    # Difference between hypothesis and observation
    diff = (h - y)
    # Compute partial derivative
    partial = diff.sum() / (x.shape[0])
    return partial

partial0 = partial_cost_theta0(1, 1, pga.distance, pga.accuracy)
print("partial0 =", partial0)

       Computes the partial derivative of the cost function with respect to the theta1parameter theta0sum.

partial_cost_theta1()First, a function called        is defined , which accepts four parameters: theta0and theta1is the line

The parameters of the sexual model xare input features yand target variables. This function is used to calculate theta1the partial derivative of the cost function pair. In the letter

Internally, the hypothesis value is first calculated h, and then (h-y)*xthe difference between the hypothesis value and the observed value is calculated multiplied by the input feature x.

Finally, the sum of these differences is divided by the theta1number of input features to get the pair's partial derivatives. Then, by calling

partial_cost_theta1()function and pass in the parameter sum 0and 5calculate the corresponding partial derivative partial1.

partial_cost_theta0()Next, a function called         is defined , accepting four parameters: theta0and

theta1are the parameters of the linear model, xare the input features, yand are the target variables. This function is used to calculate theta0the bias of the cost function pair

Derivative. Inside the function, first the hypothesized value is calculated h, and then the difference between the hypothesized value and the observed value is calculated. Finally, these differences

The sum of is divided by the number of input features to get theta0the partial derivative of the pair. Then, by calling partial_cost_theta0()the function

And pass in the parameters 1 and 1 to calculate the corresponding partial derivatives partial0.

6. Gradient Descent

# x is our feature vector -- distance
# y is our target variable -- accuracy
# alpha is the learning rate
# theta0 is the intial theta0 
# theta1 is the intial theta1
def gradient_descent(x, y, alpha=0.1, theta0=0, theta1=0):
    max_epochs = 1000 # Maximum number of iterations 最大迭代次数
    counter = 0       # Intialize a counter 当前第几次
    c = cost(theta1, theta0, pga.distance, pga.accuracy)  ## Initial cost 当前代价函数
    costs = [c]     # Lets store each update 每次损失值都记录下来
    # Set a convergence threshold to find where the cost function in minimized
    # When the difference between the previous cost and current cost 
    #        is less than this value we will say the parameters converged
    # 设置一个收敛的阈值 (两次迭代目标函数值相差没有相差多少,就可以停止了)
    convergence_thres = 0.000001  
    cprev = c + 10   
    theta0s = [theta0]
    theta1s = [theta1]

    # When the costs converge or we hit a large number of iterations will we stop updating
    # 两次间隔迭代目标函数值相差没有相差多少(说明可以停止了)
    while (np.abs(cprev - c) > convergence_thres) and (counter < max_epochs):
        cprev = c
        # Alpha times the partial deriviative is our updated
        # 先求导, 导数相当于步长
        update0 = alpha * partial_cost_theta0(theta0, theta1, x, y)
        update1 = alpha * partial_cost_theta1(theta0, theta1, x, y)

        # Update theta0 and theta1 at the same time
        # We want to compute the slopes at the same set of hypothesised parameters
        #             so we update after finding the partial derivatives
        # -= 梯度下降,+=梯度上升
        theta0 -= update0
        theta1 -= update1
        
        # Store thetas
        theta0s.append(theta0)
        theta1s.append(theta1)
        
        # Compute the new cost
        # 当前迭代之后,参数发生更新  
        c = cost(theta0, theta1, pga.distance, pga.accuracy)

        # Store updates,可以进行保存当前代价值
        costs.append(c)
        counter += 1   # Count
        
    # 将当前的theta0, theta1, costs值都返回去
    return {'theta0': theta0, 'theta1': theta1, "costs": costs}

print("Theta0 =", gradient_descent(pga.distance, pga.accuracy)['theta0'])
print("Theta1 =", gradient_descent(pga.distance, pga.accuracy)['theta1'])
print("costs =", gradient_descent(pga.distance, pga.accuracy)['costs'])

descend = gradient_descent(pga.distance, pga.accuracy, alpha=.01)
plt.scatter(range(len(descend["costs"])), descend["costs"])
plt.show()

 

       The process of using gradient descent method to solve partial derivatives and update parameters in a linear regression model. in,gradient_descent

The function accepts input features  x and observations  y, as well as the learning rate  alpha, initial parameters  theta0 and  theta1. In the function, let

The maximum number of iterations  max_epochs and convergence threshold  convergence_thresare set to control the stopping conditions of the algorithm. initial

When , the initial cost function value is calculated  cand stored in  costs the list.

       During the iteration process, the parameters are updated using the formula of partial derivatives, that is,  theta0 -= update0 and theta1 -=

update1. At the same time, the new cost function value is calculated  cand stored in  costs the list. Finally, return the updated parameters

value  theta0 sum  theta1, and the changing process of the cost function value  costs.

       Finally, the function is called  gradient_descent and the final parameter values ​​and cost function values ​​are printed. Then, drawn

The change process diagram of the cost function value.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/weixin_43961909/article/details/132224771