使用pytorch构建神经网络系列
第三章 第一节梯度下降
目录
1.激活函数 Activation Function
Sigmoid / Logistic
求导步骤:
torch.sigmoid
Tanh
tanh函数求导
torch.tanh
Rectified Linear Unit(ReLU)
ReLU函数求导:
F.relu
2.损失函数Loss 及其梯度
Typical Loss:
- Mean Squared Error
- Cross Entropy Loss
MSE
函数求导:
autograd.grad
注意:对函数进行求导时需要将tensor设置需要求导信息:
w.requires_grad_() or require_grad = True
loss.backward
Gradient API:
torch.autograd.grad(loss, [w1, w2,…])
- [w1 grad, w2 grad…]
loss.backward()
- w1.grad
- w2.grad
Softmax
soft version of max
对softmax进行求导:
当i = j
p对a的倒数为正
i 不等于 j
p对a的倒数为负
F.softmax
这里使用retain_grad = True 避免第二次求导报错
3.感知机Perceptron
单输出感知机
求导的推导过程:
x = torch.randn(1,10)
w = torch.randn(1,10, requires_grad = True)
o = torch.sigmoid(x@w.t())
loss = F.mse_loss(torch.ones(1,1),o)
loss.backward()
w.grad
tensor([[ 0.0169, -0.0018, -0.0036, 0.0080, -0.0022, -0.0006, -0.0151, -0.0008,
0.0030, 0.0019]])
Multi-output Perceptron
推导过程:
多输出感知机类似全连接层输出层:
x = torch.randn(1,10)
w = torch.randn(3,10, requires_grad = True)
o = torch.sigmoid(x@w.t())
loss = F.mse_loss(torch.ones(1,3),o)
loss.backward()
w.grad
tensor([[-0.0660, -0.0477, -0.0214, -0.0425, 0.0230, -0.0326, -0.0285, -0.0139,
0.0227, -0.0181],
[-0.1109, -0.0802, -0.0359, -0.0715, 0.0387, -0.0547, -0.0479, -0.0234,
0.0381, -0.0304],
[-0.0386, -0.0279, -0.0125, -0.0249, 0.0135, -0.0191, -0.0167, -0.0082,
0.0133, -0.0106]])
4.链式法则与BP
Chain rule
Backpropagation:
5.函数优化实例
Himmelblau function
画出函数图:
def himmelblau(x):
return (x[0] **2 +x[1] - 11) **2 + (x[0] + x[1]**2 - 7) **2
x = np.arange(-6, 6, 0.1)
y = np.arange(-6, 6, 0.1)
print('x,y rang:', x.shape, y.shape)
X, Y = np.meshgrid(x, y) #组成x,y坐标
print('X, Y maps:', X.shape, Y.shape)
Z = himmelblau([X, Y])
fig = plt.figure('himmelblau')
ax = fig.gca(projection = '3d')
ax.plot_surface(X, Y, Z)
ax.view_init(30, 30)
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()
优化函数:
x = torch.tensor([4., 0.], requires_grad = True)
optimizer = torch.optim.Adam([x], lr = 1e-3)
for epoch in range(20000):
pred = himmelblau(x)
optimizer.zero_grad()
pred.backward()
optimizer.step()
if epoch % 2000 == 0:
print('step {}: x = {}, f(x) = {}'.format(epoch, x.tolist(), pred.item()))
step 0: x = [3.999000072479248, -0.0009999999310821295], f(x) = 34.0
step 2000: x = [3.5741987228393555, -1.764183521270752], f(x) = 0.09904692322015762
step 4000: x = [3.5844225883483887, -1.8481197357177734], f(x) = 2.1100277081131935e-09
step 6000: x = [3.5844264030456543, -1.8481241464614868], f(x) = 2.41016095969826e-10
step 8000: x = [3.58442759513855, -1.848125696182251], f(x) = 2.9103830456733704e-11
step 10000: x = [3.584428310394287, -1.8481262922286987], f(x) = 9.094947017729282e-13
step 12000: x = [3.584428310394287, -1.8481265306472778], f(x) = 0.0
step 14000: x = [3.584428310394287, -1.8481265306472778], f(x) = 0.0
step 16000: x = [3.584428310394287, -1.8481265306472778], f(x) = 0.0
step 18000: x = [3.584428310394287, -1.8481265306472778], f(x) = 0.0
参考:网易云课程