# GD和线性拟合举例
对离散数据进行线性拟合,假设$y=\omega x$为待拟合的曲线[$f_\omega(x)=\sum\omega_i x_i$],实际点为$(x_0,y_0),(x_1, y_1), \cdots, (x_i, y_i)$, 则离散点$(\widehat{x_i}, \widehat{y_i})$距离曲线的差距为$e_i$
$$e_i = (y_i-\widehat{y_i})^2$$
均方误差为
$$
e = \frac{1}{n}\sum_{i=1}^n{(y_i-\widehat{y_i})^2}
=\frac{1}{n}\sum_{i=1}^n{(y_i-\omega\widehat{x_i})^2}
= \frac{1}{n}[{(y_1^2+\cdots+y_n^2)
-2\omega(x_1y_1+\cdots+x_ny_n)
+\omega^2(x_1^2+\cdots+x_n^2)}]
$$
$$
\begin{align*}
&\color{blue}{\sum_{i=1}^n{y_i^2}为c\quad\quad}
&\color{blue}{\sum_{i=1}^n{x_iy_i}为b\quad\quad\quad\quad}\quad
&\color{blue}{\sum_{i=1}^n{x_i^2}为a}
\end{align*}
$$
则公式即代价函数,可化简为:
$$
e = a\omega^2 + b\omega + c
$$
该式中只包含一个参数,即$\omega$。
在代价函数中,需要找到一个点,即找到合适的参数$\omega_i$,使得代价函数最小,即采用梯度下降的方式——就像爬山一样,找到最陡的地方下山最快,找到最低点,就是代价最小的地方。也就是说,要顺着梯度最小的方向去寻找。
**为什么要叫梯度而不叫导数?**
本例中参数只有一个$\omega$,但是在普遍情况下,有许多个参数,也就是$\omega_1, \omega_2,\cdots,\omega_j$,把每个方向上的偏导向量都叠加起来,得到某个方向下降最快的向量,也就是梯度。
接下来,标定一个初始点,从初始点开始下滑,找到最低点,此时,自变量为$\omega_i$,因变量是代价(损失)函数$e$。沿着横坐标方向找到一个$\omega_i$使得$e$最小。过程如下:
$$ \omega_{n+1} = \omega_{n}-\alpha \frac{\partial e(\omega)}{\omega}
$$
其中 $\alpha$ 代表 learning rate,也就是步长(下山走几步)。如果说学习率是个固定值而且值很小,那么在最低点附近有可能震荡来震荡去;如果值太大,就错过了下山的机会,所以聪明人想到用一个值乘上斜率(如果只有一个参数的话),斜率越大,步长越大,斜率越小,下山就小心翼翼。
公式简写:
$$ e(\omega) = \frac{1}{n}\sum_{i=1}^n{(y_i-\widehat{y_i})^2}
= \frac{1}{n}\sum_{i=1}^n({y_i - f(x_i)})^2
$$
$$
\frac{\partial e(\omega)}{\omega} = \frac{1}{n}\sum_{i=1}^n 2[y_i - f_\omega(x_i)]\cdot \frac{\partial f_\omega(x_i)}{\omega}
$$
不难得出$\frac{\partial f_\omega(x_i)}{\omega}= \sum x_i$
则
$$
\frac{\partial e(\omega)}{\omega} = \frac{1}{n}\sum_{i=1}^n 2[y_i - f_\omega(x_i)]\cdot \sum x_i
$$
则
$$
\omega_{n+1} = \omega_{n}-\alpha\frac{1}{n}\sum_{i=1}^n 2[y_i - f_\omega(x_i)]\cdot \sum x_i
$$
### 应用举例
计算$f(x,y) = x*sin(x)+xcos(y)+y^2-2y$的最小值
#### 代数法
import math
def compute_partial_derivatives(x, y):
df_dx = math.sin(x) + x * math.cos(x)
df_dy = -x * math.sin(y) + 2 * y - 5
return df_dx, df_dy
def gradient_descent():
x = 0.0
y = 0.0
alpha = 0.01 # Learning rate
iterations = 1000 # Maximum number of iterations
for i in range(iterations):
df_dx, df_dy = compute_partial_derivatives(x, y)
x -= alpha * df_dx
y -= alpha * df_dy
f_value = x * math.sin(x) + x * math.cos(y) + y**2 - 5 * y
return f_value, x, y
result, x_min, y_min = gradient_descent()
print("Smallest value of f(x, y):", result)
print("x value at minimum:", x_min)
print("y value at minimum:", y_min)
#### 符号法
import sympy as sp
x, y = sp.symbols('x y')
def compute_partial_derivatives(f):
df_dx = sp.diff(f, x)
df_dy = sp.diff(f, y)
return df_dx, df_dy
def gradient_descent():
x_val = 0.0
y_val = 0.0
alpha = 0.01 # Learning rate
iterations = 1000 # Maximum number of iterations
for i in range(iterations):
df_dx, df_dy = compute_partial_derivatives(f)
# 用x_val和y_val替代原先的x和y,然后计算新的df_dx_val和df_dy_val
df_dx_val = df_dx.subs([(x, x_val), (y, y_val)])
df_dy_val = df_dy.subs([(x, x_val), (y, y_val)])
x_val -= alpha * df_dx_val
y_val -= alpha * df_dy_val
return x_val, y_val
f = x * sp.sin(x) + x * sp.cos(y) + y**2 - 5 * y
x_min, y_min = gradient_descent()
result = x_min * sp.sin(x_min) + x_min * sp.cos(y_min) + y_min**2 - 5 * y_min
print("Smallest value of f(x, y):", result)
print("x value at minimum:", x_min)
print("y value at minimum:", y_min)
#### 画图
import numpy as np
import matplotlib.pyplot as plt
def f(x, y):
return x * np.sin(x) + x * np.cos(y) + y**2 - 5*y
# Create a grid of x and y values
x = np.linspace(-10, 10, 100)
y = np.linspace(-10, 10, 100)
X, Y = np.meshgrid(x, y)
# Evaluate the function f(x, y) for each (x, y) pair
Z = f(X, Y)
# Plot the surface
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap='viridis')
# Set labels and title
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('f(x, y)')
ax.set_title('f(x, y) = x*sin(x) + x*cos(y) + y^2 - 5y')
# Show the plot
plt.show()