24- Vectorization and data standardization of gradient descent

Vectorization and data standardization of gradient descent

  This blog mainly introduces the application of gradient descent method to train this linear regression model applied to real data.

  So before this, we still deal with such a problem, how to vectorize our gradient descent process , then the vectorization process is mainly focused on our process of finding the gradient:
Insert picture description here
  we are in the previous blog , The solution method is to find out the corresponding elements in the gradient one by one, using a for loop. For this formula, can we further vectorize and convert it into a matrix operation? Take a closer look at the formula. The forms of the items in it are very consistent. Generally speaking, it is ok. Let's try it out below.

  Here we first see the 0th item. It is different from other items. We make it unify with other items. The way to unify is also very simple. Just multiply a X0(i) after the 0th item, we let X0 (i) Always equal to 1.
Insert picture description here

  Then our task is to vectorize equation (2). Recall that when the previous blog was implemented, in fact, the for loop had already vectorized to a certain extent the formula corresponding to each element in formula (2):
Insert picture description here

  We have regarded the formula corresponding to each element as a form of dot product corresponding to two vectors (using dot). The two vectors are respectively: In
Insert picture description here
this way, the formula in our gradient can be understood as a matrix Operation.

For this formula:
Insert picture description here
we expand it and get:

Insert picture description here
So can we think of this formula as multiplying two vectors:

Insert picture description here
So extending it to the whole J(θ), it can be regarded as:
Insert picture description here
Insert picture description here

  In this way, the previous process of finding the gradient is transformed into the multiplication of two matrices, A matrix is ​​a 1 xm matrix, and B is a mx (n+1) matrix. After the two matrices are multiplied, it becomes a 1 x (n+1) matrix, and the original gradient we require also has n+1 elements, and the corresponding n+1 elements in these two vectors are equal , One-to-one correspondence.

  Let's observe matrix B again:
Insert picture description here
  in fact, it is the X_b we used before. The first column is X_0, which is always equal to 1. The other part is the X corresponding to our original sample, which is an mxn matrix, so we can change the gradient to this The formula is written in such a simple form:

Insert picture description here
  Why is a transposition operation (T) performed here? Because we specified that all vectors in the learning process are represented by column vectors, we need to perform a transposition operation to convert them into a row vector.

  Then you may be aware of a problem, that is, according to the above formula, a row vector of 1 x (n+1) is obtained, and our original gradient expression is a column vector of (n+1) x 1, before We have said that in numpy, the representation of vectors does not distinguish between row vectors and column vectors, so it does not matter if we calculate according to this formula. But here, for the sake of rigor, we still convert this formula into a column vector accordingly. Then the conversion method is also very simple, we can transpose the result of the conversion as a whole, so we form such a formula:
Insert picture description here
  then we only need to modify the dJ function:

import numpy as np
from metrics import r2_score

class LinearRegression:
    def __init__(self):
        """初始化 Linear Regression"""
        self.coef_ = None           # 系数
        self.interception_ = None   # 截距
        self._theta = None          # θ
    def fit_normal(self, X_train, y_train):
        """根据训练数据集X_train,y_train训练Linear Regression模型"""
        assert X_train.shape[0] == y_train.shape[0], \
            "the size of X_train must be equal to the size of y_train"
        X_b = np.hstack([np.ones((len(X_train), 1)), X_train]) # 在 X_train 前加一列 1
        self._theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train)

        self.interception_ = self._theta[0]     #截距
        self.coef_ = self._theta[1:]            #系数

        return self
    def fit_gd(self, X_train, y_train, eta=0.01, n_iters=1e4):
        """根据训练数据集X_train,y_train,使用梯度下降法训练Linear Regression模型"""
        assert X_train.shape[0] == y_train.shape[0], \
            "the size of X_train, y_train must be equal to the size of y_train"
        def J(theta, X_b, y):
            try:
                return np.sum((y - X_b.dot(theta)) ** 2) / len(X_b)
            except:
                return float('inf')
        def dJ(theta, X_b, y):
            # res = np.empty(len(theta))
            # res[0] = np.sum(X_b.dot(theta) - y)
            # for i in range(1, len(theta)):
            #     res[i] = np.sum((X_b.dot(theta) - y).dot(X_b[:, i]))
            # return res * 2 / len(X_b)
            return X_b.T.dot(X_b.dot(theta) - y) * 2 / len(X_b)
        def gradient_descent(X_b, y, initial_theta, eta, n_iters=1e4, epsilon=1e-8):
            theta = initial_theta
            i_iter = 0
            while i_iter < n_iters:
                gradient = dJ(theta, X_b, y)
                last_theta = theta
                theta = theta - eta * gradient

                if(abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):
                    break
                i_iter += 1
            return theta
        X_b = np.hstack([np.ones((len(X_train), 1)), X_train])
        initial_theta = np.zeros(X_b.shape[1])
        self._theta = gradient_descent(X_b, y_train, initial_theta, eta)

        self.interception_ = self._theta[0]
        self.coef_ = self._theta[1:]
        return self
    def predict(self, X_predict):
        """给定待预测数据集X_predict,返回表示X_predict的结果向量"""
        assert self.interception_ is not None and self.coef_ is not None, \
            "must fit before predict!"
        assert X_predict.shape[1] == len(self.coef_), \
            "the feature number of X_predict must be equal to X_train"
        X_b = np.hstack([np.ones((len(X_predict), 1)), X_predict])
        return X_b.dot(self._theta)
    def score(self, X_test, y_test):
        """根据测试数据集X_test和y_test确定当前模型的准确度"""
        y_predict = self.predict(X_test)
        return r2_score(y_test, y_predict)
    def __repr__(self):
        return "LinearRegression()"

test

Let's start testing:
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here


In the next blog, I will introduce the stochastic gradient descent method~~

Guess you like

Origin blog.csdn.net/qq_41033011/article/details/109076627