Pytorch Deep Learning-Praxis | Vorlesung 3 | Gradientenabstieg

Derzeit studiere ich Pytorch-Praxiskurse bei „Master Liu“ an Station B

Website: https://www.bilibili.com/video/BV1Y7411d7Ys?p=3&vd_source=32b3ab6f83a7264145dc021d4ff722f6

1. GradientenabstiegGradientenabstieg

veranschaulichen:

1. Der GD-Algorithmus (Gradient Descent) iteriert in der entgegengesetzten Richtung des Gradienten

2. Das Endergebnis kann ein lokales Optimum sein (wie in der Abbildung gezeigt).

# 梯度下降【Gradient Descent】
import matplotlib.pyplot as plt

# 准备数据
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

# 初始猜测的权值
w = 1.0

# 预测函数
def forward(x):
    return x * w


# 损失函数(所有训练数据损失求平均）
def cost(xs, ys):
    cost = 0.0
    for x, y in zip(xs, ys):
        y_pred = forward(x)  # 预测
        cost += (y_pred - y) ** 2
    return cost / len(xs)


# 求梯度函数
def gradient(xs, ys):
    grad = 0.0
    for x, y in zip(xs, ys):
        grad += 2 * x * (x * w - y)
    return grad / len(xs)

print("Predict (before training)", 4, forward(4))
epoch_list = []
cost_list = []

# 梯度下降开始迭代
for epoch in range(100):
    cost_val = cost(x_data, y_data)
    grad_val = gradient(x_data, y_data)
    w -= 0.01 * grad_val  # 0.01为自己设定的学习率
    print('Epoch=', epoch, 'w=', w, 'loss=', cost_val)
    epoch_list.append(epoch)
    cost_list.append(cost_val)

print('Predict(after training)', 4, forward(4))

#作图
plt.plot(epoch_list, cost_list)
plt.xlabel('epoch')
plt.ylabel('cost')
plt.show()

2. Stochastischer Gradientenabstieg

Der Unterschied zwischen den beiden:

1. GD berechnet den Gradienten für alle Trainingsdaten und ermittelt dann seinen Mittelwert. Es besteht keine Korrelation zwischen den Gradienten

2. SGD berechnet den Gradienten für alle Trainingsdaten und es besteht eine Korrelation zwischen dem vorherigen und dem späteren Gradienten.

# 随机梯度下降【Stochastic Gradient Descent】
import matplotlib.pyplot as plt

# 准备数据
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

# 初始猜测的权值
w = 1.0

# 预测函数
def forward(x):
    return x * w

# 损失函数(每一个训练数据求损失）
def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y) ** 2

# 求梯度函数(每一个训练数据都要算一个损失)
def gradient(x, y):
    grad = 2 * x * (x * w - y)
    return grad

print("Predict (before training)", 4, forward(4))
epoch_list = []
loss_list = []

# 梯度下降开始迭代
for epoch in range(100):
    for x, y in zip(x_data, y_data):
        grad_val = gradient(x, y)
        w -= 0.01 * grad_val  # 0.01为自己设定的学习率
        print("\tgrad:", x, y, grad_val)
        l = loss(x, y)

        epoch_list.append(epoch)
        loss_list.append(l)
    print("progress:", epoch, 'w=', w, "loss=", l)

print('Predict(after training)', 4, forward(4))

#作图
plt.plot(epoch_list, loss_list)
plt.xlabel('epoch')
plt.ylabel('loss')
plt.show()