Gradient descent GD understanding, derivation, code and examples (updating)

# GD and linear fitting example
Linear fitting of discrete data, assuming $y=\omega x$ is the curve to be fitted [$f_\omega(x)=\sum\omega_i x_i$], the actual point is $ (x_0,y_0),(x_1, y_1), \cdots, (x_i, y_i)$, then the distance between the discrete point $(\widehat{x_i}, \widehat{y_i})$ and the curve is $e_i$
$$ e_i = (y_i-\widehat{y_i})^2$$
The mean square error is
$$
e = \frac{1}{n}\sum_{i=1}^n{(y_i-\widehat{y_i}) ^2}
=\frac{1}{n}\sum_{i=1}^n{(y_i-\omega\widehat{x_i})^2} =
\frac{1}{n}[{(y_1^ 2+\cdots+y_n^2)
-2\omega(x_1y_1+\cdots+x_ny_n)
+\omega^2(x_1^2+\cdots+x_n^2)}]
$$
$$
\begin{align*}
& \color{blue}{\sum_{i=1}^n{y_i^2} is c\quad\quad} &\color{
blue}{\sum_{i=1}^n{x_iy_i} is b\quad \quad\quad\quad}\quad
&\color{blue}{\sum_{i=1}^n{x_i^2} is a}
\end{align*}
$$
Then the formula is the cost function, which can be simplified as:
$$
e = a\omega^2 + b\omega + c
$$
This formula only contains one parameter, namely $\omega$.
In the cost function, you need to find a point, that is, find the appropriate parameter $\omega_i$, so that the cost function is the smallest, that is, use the method of gradient descent-just like climbing a mountain, find the steepest place and go down the mountain the fastest, and find the lowest point , is where the cost is the least. In other words, it is necessary to search along the direction with the smallest gradient.

**Why is it called gradient instead of derivative? **
In this example, there is only one $\omega$ parameter, but in general, there are many parameters, that is, $\omega_1, \omega_2,\cdots,\omega_j$, and the partial vectors in each direction are Stack them up to get the fastest-declining vector in a certain direction, that is, the gradient.

Next, calibrate an initial point, slide down from the initial point, and find the lowest point. At this time, the independent variable is $\omega_i$, and the dependent variable is the cost (loss) function $e$. Find a $\omega_i$ along the direction of the abscissa so that $e$ is the smallest. The process is as follows:
$$ \omega_{n+1} = \omega_{n}-\alpha \frac{\partial e(\omega)}{\omega} $$ where
$
\alpha$ represents the learning rate, which is the step Long (walk down the hill a few steps). If the learning rate is a fixed value and the value is small, then it may oscillate around the lowest point; if the value is too large, it will miss the opportunity to go down the mountain, so smart people think of multiplying the slope by a value (if there is only one parameters), the larger the slope, the larger the step size, and the smaller the slope, the more careful you are going down the mountain.

Similar functions:
$$ e(\omega) = \frac{1}{n}\sum_{i=1}^n{(y_i-\widehat{y_i})^2} = \frac{1}{n
} \sum_{i=1}^n({y_i - f(x_i)})^2
$$
$$
\frac{\partial e(\omega)}{\omega} = \frac{1}{n}\ sum_{i=1}^n 2[y_i - f_\omega(x_i)]\cdot \frac{\partial f_\omega(x_i)}{\omega} $$unlocked$\frac{\partial
f_
\ omega(x_i)}{\omega}= \sum x_i$

$$
\frac{\partial e(\omega)}{\omega} = \frac{1}{n}\sum_{i=1}^n 2[y_i - f_\omega(x_i)]\cdot \sum x_i
$$

$$
\omega_{n+1} = \omega_{n}-\alpha\frac{1}{n}\sum_{i= 1}^n 2[y_i - f_\omega(x_i)]\cdot \sum x_i
$$

### Application example
Calculate the minimum value of $f(x,y) = x*sin(x)+xcos(y)+y^2-2y$

#### Algebra

import math

def compute_partial_derivatives(x, y):
    df_dx = math.sin(x) + x * math.cos(x)
    df_dy = -x * math.sin(y) + 2 * y - 5
    return df_dx, df_dy

def gradient_descent():
    x = 0.0
    y = 0.0
    alpha = 0.01  # Learning rate
    iterations = 1000  # Maximum number of iterations

    for i in range(iterations):
        df_dx, df_dy = compute_partial_derivatives(x, y)

        x -= alpha * df_dx
        y -= alpha * df_dy

    f_value = x * math.sin(x) + x * math.cos(y) + y**2 - 5 * y

    return f_value, x, y

result, x_min, y_min = gradient_descent()

print("Smallest value of f(x, y):", result)
print("x value at minimum:", x_min)
print("y value at minimum:", y_min)

#### Notation

import sympy as sp

x, y = sp.symbols('x y')

def compute_partial_derivatives(f):
    df_dx = sp.diff(f, x)
    df_dy = sp.diff(f, y)
    return df_dx, df_dy

def gradient_descent():
    x_val = 0.0
    y_val = 0.0
    alpha = 0.01  # Learning rate
    iterations = 1000  # Maximum number of iterations

    for i in range(iterations):
        df_dx, df_dy = compute_partial_derivatives(f)
        # 用x_val和y_val替代原先的x和y,然后计算新的df_dx_val和df_dy_val
        df_dx_val = df_dx.subs([(x, x_val), (y, y_val)])
        df_dy_val = df_dy.subs([(x, x_val), (y, y_val)])

        x_val -= alpha * df_dx_val
        y_val -= alpha * df_dy_val
        
    return x_val, y_val

f = x * sp.sin(x) + x * sp.cos(y) + y**2 - 5 * y

x_min, y_min = gradient_descent()
result = x_min * sp.sin(x_min) + x_min * sp.cos(y_min) + y_min**2 - 5 * y_min
print("Smallest value of f(x, y):", result)
print("x value at minimum:", x_min)
print("y value at minimum:", y_min)

#### Drawing

import numpy as np
import matplotlib.pyplot as plt

def f(x, y):
    return x * np.sin(x) + x * np.cos(y) + y**2 - 5*y

# Create a grid of x and y values
x = np.linspace(-10, 10, 100)
y = np.linspace(-10, 10, 100)
X, Y = np.meshgrid(x, y)

# Evaluate the function f(x, y) for each (x, y) pair
Z = f(X, Y)

# Plot the surface
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap='viridis')

# Set labels and title
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('f(x, y)')
ax.set_title('f(x, y) = x*sin(x) + x*cos(y) + y^2 - 5y')

# Show the plot
plt.show()

Guess you like

Origin blog.csdn.net/m0_60461719/article/details/131353891