1. The principle and implementation of polynomial regression

The notes are from "Mathematics of Vernacular Machine Learning"

1.1 The principle of polynomial regression

predict a variable $x$ and a variableThe relationship between $y$
For example: advertising fee $x$ and hits $y$
Fitting data with a curve
The derivation process is analogous to my previous blog for derivation, related notes: The principle and implementation of the least squares method

nvariance
$f_{\theta}(x)=\theta_0+\theta_1x+\theta_2x^2+\cdots+\theta_nx^n$
the higher the number, the more accurate the training data fit (overfitting), but our purpose is to use this fitting curve to predict data other than the training data. This curve or model needs to have generalization ability, not just representative Training data, overfitting makes the model no longer representative, unable to predict the general situation

1.2 Implementation of polynomial regression

Advertising fee $x$ and hits $y$

import numpy as np
import matplotlib.pyplot as plt

# 读入训练数据
train = np.loadtxt('~/Downloads/sourcecode-cn/click.csv', delimiter=',', dtype='int', skiprows=1)
train_x = train[:,0] # 第一列
train_y = train[:,1] # 第二列

One of the data preprocessing steps: standardize/normalize the training data, so that the parameter convergence will be faster
Calculate the mean value $\mu of all x in the data$ and standard deviation $\sigma$ , each value x is standardized according to the following formula

# 标准化
mu = train_x.mean()
sigma = train_x.std()
def standardize(x):
    return (x - mu) / sigma

train_z = standardize(train_x)
# 展示标准化后的数据
plt.plot(train_z, train_y, 'o')
plt.show()

# 参数初始化
theta = np.random.rand(3)

# 创建训练数据的矩阵
def to_matrix(x):
    return np.vstack([np.ones(x.size), x, x ** 2]).T

X = to_matrix(train_z)

Since there are a lot of training data, we treat 1 row of data as 1 training data, and it would be better to process it in the form of matrix.

# 预测函数
def f(x):
    return np.dot(x, theta)

# 目标函数
def E(x, y):
    return 0.5 * np.sum((y - f(x)) ** 2)

# 学习率
ETA = 1e-3
# 初始化误差的差值，随后作为循环结束判断依据
diff = 1
# 初始化更新次数
count = 0

The update expression of the parameters (note that all parameters must be updated synchronously when updating the parameters to ensure that the gradient direction remains stable)

because this example has three parameters $\theta_0, \theta_1, \theta_2$ ,
Method 1: Update three parameters directly with three formulas in the loop body

Method 2: Write the second half of the parameter update formula as a matrix, and update three parameters with one formula

using the gradient descent method

# 直到误差的差值小于 0.01 为止，重复参数更新
error = E(X, train_y)
while diff > 1e-2:
    # 更新结果保存到临时变量
    # 这里使用矩阵直接计算出所有参数，而不是每个参数进行更新迭代
    theta = theta - ETA * np.dot(f(X) - train_y, X)

    # 计算与上一次误差的差值
    current_error = E(X, train_y)
    diff = error - current_error
    error = current_error

    # 输出日志
    count += 1
    log = '第 {} 次 : theta = {}, 差值 = {:.4f}'
    print(log.format(count, theta, diff))

# 绘图确认
x = np.linspace(-3, 3, 100)
plt.plot(train_z, train_y, 'o')
plt.plot(x, f(to_matrix(x)))
plt.show()

The number of repetitions is taken as the horizontal axis, and the mean square error is drawn as the vertical axis. As the number of iterations increases, the mean square error gradually decreases

# 均方误差
def MSE(x, y):
    return (1 / x.shape[0]) * np.sum((y-f(x))**2)

# 用随机值初始化参数
theta = np.random.rand(3)

# MSE的历史记录
errors = []

# 误差的差值
diff = 1

# 重复学习
errors.append(MSE(X, train_y))
while diff > 1e-2:
    theta = theta - ETA * np.dot(f(X) - train_y, X)
    errors.append(MSE(X, train_y))
    diff = errors[-2] - errors[-1]

# 绘制误差变化图
x = np.arange(len(errors))
plt.plot(x, errors)
plt.show()

The above process uses the gradient descent method (using all training data) to optimize the objective function. Next, we use the stochastic gradient descent method (using only one training data) to optimize the objective function n times of stochastic gradient descent (which takes a relatively short time
) Equivalent to 1 gradient descent (relatively long time-consuming)

in the picture above $k$ is random

import numpy as np
import matplotlib.pyplot as plt

# 读入训练数据
train = np.loadtxt('~/Downloads/sourcecode-cn/click.csv', delimiter=',', dtype='int', skiprows=1)
train_x = train[:,0]
train_y = train[:,1]

# 标准化
mu = train_x.mean()
sigma = train_x.std()
def standardize(x):
    return (x - mu) / sigma

train_z = standardize(train_x)

# 参数初始化
theta = np.random.rand(3)

# 创建训练数据的矩阵
def to_matrix(x):
    return np.vstack([np.ones(x.size), x, x ** 2]).T

X = to_matrix(train_z)

# 预测函数
def f(x):
    return np.dot(x, theta)

# 均方误差
def MSE(x, y):
    return (1 / x.shape[0]) * np.sum((y - f(x)) ** 2)

# 学习率
ETA = 1e-3

# 误差的差值
diff = 1

# 更新次数
count = 0

# 重复学习
error = MSE(X, train_y)
while diff > 1e-2:
    # 使用随机梯度下降法更新参数
    p = np.random.permutation(X.shape[0]) # 随机p
    for x, y in zip(X[p,:], train_y[p]): # 选择第p行的训练数据（此例中一个x，一个y）对参数进行更新
        theta = theta - ETA * (f(x) - y) * x

    # 计算与上一次误差的差值
    current_error = MSE(X, train_y)
    diff = error - current_error
    error = current_error

    # 输出日志
    count += 1
    log = '第 {} 次 : theta = {}, 差值 = {:.4f}'
    print(log.format(count, theta, diff))

# 绘图确认
x = np.linspace(-3, 3, 100)
plt.plot(train_z, train_y, 'o')
plt.plot(x, f(to_matrix(x)))
plt.show()

2. The principle of multiple regression

2.1 The principle of multiple regression

Predict multiple variables $x$ and a variableThe relationship between $y$
For example: advertising fee $x_1$ , ad placement $x_2$ , advertising layout size $x_3$ and hits $y$
$f_{\theta}(x_1，\cdots，x_n)=\theta_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n$
$\boldsymbol{\theta}= \left [ \begin{matrix} \theta_0\\ \theta_1 \\ \vdots\\ \theta_n \end{matrix} \right ] 、 \boldsymbol{x}= \left [ \begin{matrix} x_0\\ x_1 \\ \vdots\\ x_n \end{matrix} \right ]（x_0=1）$
$f_{\boldsymbol{\theta}}(\boldsymbol{x})=\boldsymbol{\theta}^T\boldsymbol{x}=\theta_0x_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n$

$\theta_j:=\theta_j-\eta\sum_{i=1}^n\big(f_{\boldsymbol{\theta}}(\boldsymbol{x}^{(i)})-y^{(i)}\big)x_j^{(i)}$
The above expression uses all training data

It should be noted that all variables x need to be standardized during data preprocessing
Calculate variable $x_1$ The mean and standard deviation of all values, using the following formula for the variable $x_1$ All values are standardized, and others are similar.

The training process is similar to polynomial regression, the only difference is that the prediction function is different.

The principle and implementation of polynomial regression, the principle of multiple regression

1. The principle and implementation of polynomial regression

1.1 The principle of polynomial regression

1.2 Implementation of polynomial regression

2. The principle of multiple regression

2.1 The principle of multiple regression

Guess you like