Gradient Descen-multivariate (Wu Enda Machine Learning: Application of Gradient Descent to Linear Models)

Application of Gradient Descent Algorithm in Linear Regression


The data processing process is similar to the single variable, and the principle part will not be repeated.


multivariate

Topic: Predicting House Prices

(The link to Wu Enda's machine learning after-school questions is placed at the end)
topic

Input: House size, number of bedrooms
Output: Price

Training set
Training set

The first column is the size of the house, the second column is the number of bedrooms, the first two columns are the input and
the third column is the house price, which is the ideal output

data standardization

standardization
In the input X, the gap between the house size and the number of bedrooms is too large; in the output y, the gap between the house price and the number of bedrooms is also large, which will affect the convergence speed in gradient descent, so the data needs to be standardized. That is, turn each column into data with a mean of 0 and a standard deviation of 1. code show as below:

file = pd.read_csv('E:/吴恩达机器学习/machine-learning-ex1/ex1/ex1data2.txt', header=None, names=['size', 'numofbedroom', 'price'])
file = (file-file.mean())/np.std(file)

Record some data, which will be used later.

file_size_mean = file.mean()['size']
file_size_std = file.std()['size']
file_bedroom_mean = file.mean()['numofbedroom']
file_bedroom_std = file.std()['numofbedroom']
file_price_mean = file.mean()['price']
file_price_std = file.std()['price']
file = (file-file.mean())/np.std(file)

Note: What is subtracted from each column in the file here is the respective mean of each column, not the overall mean.
mean
std

Processing Training set

file.insert(0, 'bias', 1)

Since it is no longer a singular entry, the training set is no longer visualized here.

Input and output data extraction and conversion into matrix form

""" 获取所需数据并转化为矩阵 """
X = file.iloc[:, 0:3]  # 参数权重theta
y = file.iloc[:, 3:]  # 理想输出y
X = np.matrix(X, dtype='float64')
y = np.matrix(y, dtype='float64')
m = len(y)

The dimension of X is (97, 3), the dimension of y is (97, 1), and there is an inserted column of all 1 columns in X. Part of the X data and y data are as follows:
Enter X (including one column and all 1 columns)
X

output y
y

Loss function solution

First define the weight theta, learning rate alpha, iterations

alpha = 0.31
num_iters = 50
theta = np.array([[0.01],[0.02],[0.03]])

Theta is defined as an array of (3, 1), where theta is set to different values ​​to prevent the weight value from being always the same in gradient descent.
The following is the loss function CostFunctionx vector solution code:
cost

def computeCostMulti(X, y, theta):
    cost_temp = (X*theta-y).transpose()*(X*theta-y)
    return cost_temp/(2*m)

It is more convenient to solve the cost function in the vector method, so I won’t explain too much here. For similar principles, see the univariate blog.

At the time of the initial weight theta, we call the loss function to see the running results:

J_theta = computeCostMulti(X, y, theta)  # 计算代价函数

operation result:

insert image description here

gradient descent algorithm

gradient descent
Note that the gradient descent is to continuously update the weight theta , not to update X or y, and then the weight will affect the prediction effect of the model, regardless of the input and output.
For each weight theta, the update time is after accumulating all m algebraic values , and the accumulated value is calculated with the learning rate alpha and sample data m before theta is updated.

# 梯度下降
def gradientDescentMulti(X, y, theta, alpha, num_iters):
    J_history = np.zeros((num_iters, 1))
    for iter in range(num_iters):
        diff = X*theta - y
        gradient_cost = (1/m) * (X.transpose()*diff)
        theta = theta - alpha*gradient_cost
        J_history[iter] = computeCostMulti(X, y, theta)
    return theta, J_history
    
 theta, J_history = gradientDescentMulti(X, y, theta, alpha, num_iters)

A variable J_history is defined to record the value of the loss function after each update of the weight. The effect of gradient descent can be visualized after the iteration is completed. After each update of theta value, the cost function is calculated with the new theta value and stored in J_history.

diff is ( h_theta(x) - y ), which is (m, 1) dimensional. Each row is the difference between the predicted output and the actual output for a sample of the training set.
In the algorithm formula, each weight theta_j needs to be multiplied by X_j in the end, and X is (m, 3) dimensional, and each column is a column vector of X_j under different training set sample inputs ; each row is a training set sample . Therefore, it can be considered to transpose X, and then perform matrix multiplication with diff , and the result is a (3, 1) matrix, where the value is equal to the summation part of the corresponding theta algorithm. The calculated result is then calculated with alpha and m to obtain the updated part of theta.

visualization

This section plots the curve of the cost function as a function of the number of iterations. code show as below:

plt.plot(np.arange(num_iters), J_history)
plt.show()

The results are as follows:
the abscissa is the number of iterations, and the ordinate is the cost function
insert image description here

predict

def pridicit(size, numofbedroom):
    X = np.matrix([1, (size-file_size_mean)/file_size_std, (numofbedroom-file_bedroom_mean)/file_bedroom_std], dtype='float64')
    return ((X * theta)*file_price_std + file_price_mean)

Incoming parameters: house size and number of bedrooms.
Output: house price.

Note : The incoming input parameters need to go through the normalization process, and the output parameters need to go through the denormalization process to get the actual house price value.

price = pridicit(2000, 3)
print(price)

Output result:
pridict


The blog link for the univariate part is here.


NG Machine Learning Courses
link: https://pan.baidu.com/s/1FoAQNRdevsqYzW4a5QDsBw
Extraction code: 0wdr


Guess you like

Origin blog.csdn.net/qq_48691686/article/details/121573545