Application of Gradient Descent Algorithm in Linear Regression
Article Directory
The data processing process is similar to the single variable, and the principle part will not be repeated.
multivariate
Topic: Predicting House Prices
(The link to Wu Enda's machine learning after-school questions is placed at the end)
Input: House size, number of bedrooms
Output: Price
Training set
The first column is the size of the house, the second column is the number of bedrooms, the first two columns are the input and
the third column is the house price, which is the ideal output
data standardization
In the input X, the gap between the house size and the number of bedrooms is too large; in the output y, the gap between the house price and the number of bedrooms is also large, which will affect the convergence speed in gradient descent, so the data needs to be standardized. That is, turn each column into data with a mean of 0 and a standard deviation of 1. code show as below:
file = pd.read_csv('E:/吴恩达机器学习/machine-learning-ex1/ex1/ex1data2.txt', header=None, names=['size', 'numofbedroom', 'price'])
file = (file-file.mean())/np.std(file)
Record some data, which will be used later.
file_size_mean = file.mean()['size']
file_size_std = file.std()['size']
file_bedroom_mean = file.mean()['numofbedroom']
file_bedroom_std = file.std()['numofbedroom']
file_price_mean = file.mean()['price']
file_price_std = file.std()['price']
file = (file-file.mean())/np.std(file)
Note: What is subtracted from each column in the file here is the respective mean of each column, not the overall mean.
Processing Training set
file.insert(0, 'bias', 1)
Since it is no longer a singular entry, the training set is no longer visualized here.
Input and output data extraction and conversion into matrix form
""" 获取所需数据并转化为矩阵 """
X = file.iloc[:, 0:3] # 参数权重theta
y = file.iloc[:, 3:] # 理想输出y
X = np.matrix(X, dtype='float64')
y = np.matrix(y, dtype='float64')
m = len(y)
The dimension of X is (97, 3), the dimension of y is (97, 1), and there is an inserted column of all 1 columns in X. Part of the X data and y data are as follows:
Enter X (including one column and all 1 columns)
output y
Loss function solution
First define the weight theta, learning rate alpha, iterations
alpha = 0.31
num_iters = 50
theta = np.array([[0.01],[0.02],[0.03]])
Theta is defined as an array of (3, 1), where theta is set to different values to prevent the weight value from being always the same in gradient descent.
The following is the loss function CostFunctionx vector solution code:
def computeCostMulti(X, y, theta):
cost_temp = (X*theta-y).transpose()*(X*theta-y)
return cost_temp/(2*m)
It is more convenient to solve the cost function in the vector method, so I won’t explain too much here. For similar principles, see the univariate blog.
At the time of the initial weight theta, we call the loss function to see the running results:
J_theta = computeCostMulti(X, y, theta) # 计算代价函数
operation result:
gradient descent algorithm
Note that the gradient descent is to continuously update the weight theta , not to update X or y, and then the weight will affect the prediction effect of the model, regardless of the input and output.
For each weight theta, the update time is after accumulating all m algebraic values , and the accumulated value is calculated with the learning rate alpha and sample data m before theta is updated.
# 梯度下降
def gradientDescentMulti(X, y, theta, alpha, num_iters):
J_history = np.zeros((num_iters, 1))
for iter in range(num_iters):
diff = X*theta - y
gradient_cost = (1/m) * (X.transpose()*diff)
theta = theta - alpha*gradient_cost
J_history[iter] = computeCostMulti(X, y, theta)
return theta, J_history
theta, J_history = gradientDescentMulti(X, y, theta, alpha, num_iters)
A variable J_history is defined to record the value of the loss function after each update of the weight. The effect of gradient descent can be visualized after the iteration is completed. After each update of theta value, the cost function is calculated with the new theta value and stored in J_history.
diff is ( h_theta(x) - y ), which is (m, 1) dimensional. Each row is the difference between the predicted output and the actual output for a sample of the training set.
In the algorithm formula, each weight theta_j needs to be multiplied by X_j in the end, and X is (m, 3) dimensional, and each column is a column vector of X_j under different training set sample inputs ; each row is a training set sample . Therefore, it can be considered to transpose X, and then perform matrix multiplication with diff , and the result is a (3, 1) matrix, where the value is equal to the summation part of the corresponding theta algorithm. The calculated result is then calculated with alpha and m to obtain the updated part of theta.
visualization
This section plots the curve of the cost function as a function of the number of iterations. code show as below:
plt.plot(np.arange(num_iters), J_history)
plt.show()
The results are as follows:
the abscissa is the number of iterations, and the ordinate is the cost function
predict
def pridicit(size, numofbedroom):
X = np.matrix([1, (size-file_size_mean)/file_size_std, (numofbedroom-file_bedroom_mean)/file_bedroom_std], dtype='float64')
return ((X * theta)*file_price_std + file_price_mean)
Incoming parameters: house size and number of bedrooms.
Output: house price.
Note : The incoming input parameters need to go through the normalization process, and the output parameters need to go through the denormalization process to get the actual house price value.
price = pridicit(2000, 3)
print(price)
Output result:
The blog link for the univariate part is here.
NG Machine Learning Courses
link: https://pan.baidu.com/s/1FoAQNRdevsqYzW4a5QDsBw
Extraction code: 0wdr