1. Normalization
# Read data from csv
pga = pd.read_csv("pga.csv")
print(type(pga))
print(pga.head())
# Normalize the data 归一化值 (x - mean) / (std)
pga.distance = (pga.distance - pga.distance.mean()) / pga.distance.std()
pga.accuracy = (pga.accuracy - pga.accuracy.mean()) / pga.accuracy.std()
print(pga.head())
plt.scatter(pga.distance, pga.accuracy)
plt.xlabel('normalized distance')
plt.ylabel('normalized accuracy')
plt.show()
2. Linear regression
from sklearn.linear_model import LinearRegression
import numpy as np
# We can add a dimension to an array by using np.newaxis
print("Shape of the series:", pga.distance.shape)
print("Shape with newaxis:", pga.distance[:, np.newaxis].shape)
# The X variable in LinearRegression.fit() must have 2 dimensions
lm = LinearRegression()
lm.fit(pga.distance[:, np.newaxis], pga.accuracy)
theta1 = lm.coef_[0]
print (theta1)
This code is an example showing how to use np.newaxis
and LinearRegression
to perform linear regression.
First, convert np.newaxis
a one-dimensional array into a two-dimensional number by adding a new dimensionpga.distance
Group. By printing the shape of the array, you can see that np.newaxis
before adding, pga.distance
it is a one-dimensional array with the shape
is (n,)
, and after adding np.newaxis
, the shape becomes (n, 1)
.
LinearRegression
Then, an instance is created lm
. Use lm.fit()
the method to convert the converted feature data
pga.distance[:, np.newaxis]
and target data pga.accuracy
as parameters to train the linear regression model.
combine.
Finally, by lm.coef_
obtaining the trained model coefficients (weights) and assigning the coefficient of the first feature to the variable
theta1
. pga.distance
and pga.accuracy
are sample data, you need to replace them with your own data according to the actual situation.
3. Cost function
# The cost function of a single variable linear model# The c
# 单变量 代价函数
def cost(theta0, theta1, x, y):
# Initialize cost
J = 0
# The number of observations
m = len(x)
# Loop through each observation
# 通过每次观察进行循环
for i in range(m):
# Compute the hypothesis
# 计算假设
h = theta1 * x[i] + theta0
# Add to cost
J += (h - y[i])**2
# Average and normalize cost
J /= (2*m)
return J
# The cost for theta0=0 and theta1=1
print(cost(0, 1, pga.distance, pga.accuracy))
theta0 = 100
theta1s = np.linspace(-3,2,100)
costs = []
for theta1 in theta1s:
costs.append(cost(theta0, theta1, pga.distance, pga.accuracy))
plt.plot(theta1s, costs)
plt.show()
The cost function of a simple univariate linear regression model is implemented and calculated given a set of parameters theta0
andtheta1
the price of the situation. In this code, cost()
the function accepts four parameters: theta0
and theta1
are the parameters of the linear model,
x
is the input feature and y
is the target variable. The goal of the function is to calculate the cost of the model.
First, the initialization cost J
is 0. Then, by looping through each observation, the model's predicted value is calculated h
. cost J
through tiring
Calculated by adding the squared error of each observation. Finally, divide the cost J
by twice the number of observations to average and normalize
price. In the second half of this code, using a given theta0
value and a set of theta1
values, each theta1
corresponding
price and store the result in costs
a list. Then, use plt.plot()
the sum theta1s
and costs
plot to show the generation
The tendency of the valence function theta1
to change with the change of .
4. Draw three-dimensional diagrams
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
# Example of a Surface Plot using Matplotlib
# Create x an y variables
x = np.linspace(-10,10,100)
y = np.linspace(-10,10,100)
# We must create variables to represent each possible pair of points in x and y
# ie. (-10, 10), (-10, -9.8), ... (0, 0), ... ,(10, 9.8), (10,9.8)
# x and y need to be transformed to 100x100 matrices to represent these coordinates
# np.meshgrid will build a coordinate matrices of x and y
X, Y = np.meshgrid(x,y)
#print(X[:5,:5],"\n",Y[:5,:5])
# Compute a 3D parabola
Z = X**2 + Y**2
# Open a figure to place the plot on
fig = plt.figure()
# Initialize 3D plot
ax = fig.gca(projection='3d')
# Plot the surface
ax.plot_surface(X=X,Y=Y,Z=Z)
plt.show()
# Use these for your excerise
theta0s = np.linspace(-2,2,100)
theta1s = np.linspace(-2,2, 100)
COST = np.empty(shape=(100,100))
# Meshgrid for paramaters
T0S, T1S = np.meshgrid(theta0s, theta1s)
# for each parameter combination compute the cost
for i in range(100):
for j in range(100):
COST[i,j] = cost(T0S[0,i], T1S[j,0], pga.distance, pga.accuracy)
# make 3d plot
fig2 = plt.figure()
ax = fig2.gca(projection='3d')
ax.plot_surface(X=T0S,Y=T1S,Z=COST)
plt.show()
Use Matplotlib to draw three-dimensional graphics, including a quadratic surface plot and a cost function plot.
First, by using np.linspace()
the function, 100 points are created at equal intervals from -10 to 10 and assigned to variables respectively.
x
and y
.
Next, use np.meshgrid()
the function to x
convert y
sum into a 100x100 grid matrix and assign values to X
sum respectively Y
. this
Thus, X
each Y
element in the sum matrix represents an (x, y) coordinate pair.
Z = X**2 + Y**2
Then, a matrix is calculated based on the quadratic surface equation Z
, where Z
each element in the matrix represents the corresponding coordinate
The height of the punctuation point.
By plt.figure()
creating a new figure, and by fig.gca(projection='3d')
initializing a three-dimensional figure
coordinate system. Use ax.plot_surface()
functions to draw surface plots, where X
, Y
and Z
represent the X, Y, and Z matrices respectively.
Finally, use plt.show()
display graphics.
In the second half of the code, two arrays containing 100 uniformly distributed values are first created theta0s
and theta1s
divided into
Do not indicate the value range of theta0 and theta1.
Next, use np.empty()
create an empty 100x100 array COST
to store the calculation results of the cost function.
Convert the sum to a grid matrix sum by using np.meshgrid()
the function .theta0s
theta1s
T0S
T1S
Then, iterate through all possible parameter combinations through two nested loops and use cost()
a function to calculate each parameter combination pair
corresponding cost and store the result in COST
an array.
Finally, create a plt.figure()
new graph usingfig.gca(projection='3d')
The coordinate system of dimensional graphics. Use ax.plot_surface()
the function to draw the surface plot of the cost function, where X
, Y
and Z
represent respectively
T0S
, T1S
and COST
matrix. Use plt.show()
display graphics.
5. Derivative function
The partial derivative formula of the linear regression model can be derived by minimizing the cost function. The following is the derivation process:
The linear regression model assumes that the function is: h(x) = theta0 + theta1 * x
The cost function is the mean squared error function: J(theta0, theta1) = (1/2m) * Σ(h(x) - y)^2
where m is the sample size, h(x) is the predicted value of the model, and y is the observed value.
In order to solve for the optimal model parameters theta0 and theta1, we need to calculate the partial derivatives of the cost function for these two parameters.
First, calculate the partial derivative of the cost function with respect to theta0:
∂J/∂theta0 = (1/m) * Σ(h(x) - y)
Then, calculate the partial derivative of the cost function with respect to theta1:
∂J/∂theta1 = (1/m) * Σ(h(x) - y) * x
# 对 theta1 进行求导# 对 thet
def partial_cost_theta1(theta0, theta1, x, y):
# Hypothesis
h = theta0 + theta1*x
# Hypothesis minus observed times x
diff = (h - y) * x
# Average to compute partial derivative
partial = diff.sum() / (x.shape[0])
return partial
partial1 = partial_cost_theta1(0, 5, pga.distance, pga.accuracy)
print("partial1 =", partial1)
# 对theta0 进行求导
# Partial derivative of cost in terms of theta0
def partial_cost_theta0(theta0, theta1, x, y):
# Hypothesis
h = theta0 + theta1*x
# Difference between hypothesis and observation
diff = (h - y)
# Compute partial derivative
partial = diff.sum() / (x.shape[0])
return partial
partial0 = partial_cost_theta0(1, 1, pga.distance, pga.accuracy)
print("partial0 =", partial0)
Computes the partial derivative of the cost function with respect to the theta1
parameter theta0
sum.
partial_cost_theta1()
First, a function called is defined , which accepts four parameters: theta0
and theta1
is the line
The parameters of the sexual model x
are input features y
and target variables. This function is used to calculate theta1
the partial derivative of the cost function pair. In the letter
Internally, the hypothesis value is first calculated h
, and then (h-y)*x
the difference between the hypothesis value and the observed value is calculated multiplied by the input feature x
.
Finally, the sum of these differences is divided by the theta1
number of input features to get the pair's partial derivatives. Then, by calling
partial_cost_theta1()
function and pass in the parameter sum 0
and 5
calculate the corresponding partial derivative partial1
.
partial_cost_theta0()
Next, a function called is defined , accepting four parameters: theta0
and
theta1
are the parameters of the linear model, x
are the input features, y
and are the target variables. This function is used to calculate theta0
the bias of the cost function pair
Derivative. Inside the function, first the hypothesized value is calculated h
, and then the difference between the hypothesized value and the observed value is calculated. Finally, these differences
The sum of is divided by the number of input features to get theta0
the partial derivative of the pair. Then, by calling partial_cost_theta0()
the function
And pass in the parameters 1 and 1 to calculate the corresponding partial derivatives partial0
.
6. Gradient Descent
# x is our feature vector -- distance
# y is our target variable -- accuracy
# alpha is the learning rate
# theta0 is the intial theta0
# theta1 is the intial theta1
def gradient_descent(x, y, alpha=0.1, theta0=0, theta1=0):
max_epochs = 1000 # Maximum number of iterations 最大迭代次数
counter = 0 # Intialize a counter 当前第几次
c = cost(theta1, theta0, pga.distance, pga.accuracy) ## Initial cost 当前代价函数
costs = [c] # Lets store each update 每次损失值都记录下来
# Set a convergence threshold to find where the cost function in minimized
# When the difference between the previous cost and current cost
# is less than this value we will say the parameters converged
# 设置一个收敛的阈值 (两次迭代目标函数值相差没有相差多少,就可以停止了)
convergence_thres = 0.000001
cprev = c + 10
theta0s = [theta0]
theta1s = [theta1]
# When the costs converge or we hit a large number of iterations will we stop updating
# 两次间隔迭代目标函数值相差没有相差多少(说明可以停止了)
while (np.abs(cprev - c) > convergence_thres) and (counter < max_epochs):
cprev = c
# Alpha times the partial deriviative is our updated
# 先求导, 导数相当于步长
update0 = alpha * partial_cost_theta0(theta0, theta1, x, y)
update1 = alpha * partial_cost_theta1(theta0, theta1, x, y)
# Update theta0 and theta1 at the same time
# We want to compute the slopes at the same set of hypothesised parameters
# so we update after finding the partial derivatives
# -= 梯度下降,+=梯度上升
theta0 -= update0
theta1 -= update1
# Store thetas
theta0s.append(theta0)
theta1s.append(theta1)
# Compute the new cost
# 当前迭代之后,参数发生更新
c = cost(theta0, theta1, pga.distance, pga.accuracy)
# Store updates,可以进行保存当前代价值
costs.append(c)
counter += 1 # Count
# 将当前的theta0, theta1, costs值都返回去
return {'theta0': theta0, 'theta1': theta1, "costs": costs}
print("Theta0 =", gradient_descent(pga.distance, pga.accuracy)['theta0'])
print("Theta1 =", gradient_descent(pga.distance, pga.accuracy)['theta1'])
print("costs =", gradient_descent(pga.distance, pga.accuracy)['costs'])
descend = gradient_descent(pga.distance, pga.accuracy, alpha=.01)
plt.scatter(range(len(descend["costs"])), descend["costs"])
plt.show()
The process of using gradient descent method to solve partial derivatives and update parameters in a linear regression model. in,gradient_descent
The function accepts input features x
and observations y
, as well as the learning rate alpha
, initial parameters theta0
and theta1
. In the function, let
The maximum number of iterations max_epochs
and convergence threshold convergence_thres
are set to control the stopping conditions of the algorithm. initial
When , the initial cost function value is calculated c
and stored in costs
the list.
During the iteration process, the parameters are updated using the formula of partial derivatives, that is, theta0 -= update0
and theta1 -=
update1
. At the same time, the new cost function value is calculated c
and stored in costs
the list. Finally, return the updated parameters
value theta0
sum theta1
, and the changing process of the cost function value costs
.
Finally, the function is called gradient_descent
and the final parameter values and cost function values are printed. Then, drawn
The change process diagram of the cost function value.