文章目录

NEURAL NETWORKS

Define the network
loss function
Backprop
Update the weights
REF

NEURAL NETWORKS

我们可以用torch.nn包来构建神经网络。

在之前的文章中了解了autograd,nn包则依赖于autograd包来定义模型并对它们求导。一个nn.Module包括不同的层和用来返回输出的方法forward(input)。

例如，可以看一下用于数字图像识别的网络：
【pic】

这是一个简单的前馈神经网络(feed-forward network）。网络首先接受输入，然后将它送入下一层，一层一层的传递，最后传出输出。

神经网络的典型训练过程如下：

定义包含可学习参数(或者叫权重）的神经网络
在输入数据集上迭代
通过网络处理输入
计算损失(输出和正确答案的距离）
将梯度反向传播给网络的参数
更新网络的权重，一个常用的简单的规则：weight = weight - learning_rate * gradient

Define the network

import torch 
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)

output

Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

我们已经在上面的类中定义了一个forward函数，且backward已经自动地在autograd中定义了。在forward函数中可以使用任何tensor操作。

模型的可训练参数通过net.parameters()找到。

params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

# output:
# 10
# torch.Size([6, 1, 3, 3])

我们现在可以先尝试一个32x32的输入。因为这个网络（LeNet）的期待输入是32x32。如果要使用MNIST数据集来训练的话，要注意把图片大小调整为32x32。

input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

# output:
# tensor([[ 0.0399, -0.0856,  0.0668,  0.0915,  0.0453, -0.0680, -0.1024,  # 0.0493, -0.1043, -0.1267]], grad_fn=<AddmmBackward>)

清零所有参数的梯度缓存，然后进行随机梯度的反向传播：

net.zero_grad()
out.backward(torch.randn(1, 10))

Tips: torch.nn只支持小批量处理(mini-batches）。整个torch.nn包只支持小批量样本的输入，不支持单个样本。

比如，nn.Conv2d 接受一个4维的张量，即nSamples x nChannels x Height x Width

如果是一个单独的样本，只需要使用input.unsqueeze(0)来添加一个“假的”批大小维度。

Recap：

torch.Tensor - 一个多维数组，支持诸如backward()等的自动求导操作，同时也保存了张量的梯度。
nn.Module - 神经网络模块。是一种方便封装参数的方式，具有将参数移动到GPU、导出、加载等功能。
nn.Parameter - 张量的一种，当它作为一个属性分配给一个Module时，它会被自动注册为一个参数。
autograd.Function - 实现了自动求导前向和反向传播的定义，每个Tensor至少创建一个Function节点，该节点连接到创建Tensor的函数并对其历史进行编码。

到现在为止，我们已经了解了：

如何定义一个网络
处理输入并调用backward

还需要学习的部分：

计算loss
更新网络的权重

loss function

一个损失函数接受一对(output, target)作为输入，计算一个值来估计网络的输出和目标值相差多少。

nn包中有很多不同的损失函数。nn.MSELoss是比较简单的一种，它计算输出和目标的均方误差(mean-squared error）。

output = net(input)
target = torch.randn(10)  # 本例子中使用模拟数据
target = target.view(1, -1)  # 使目标值与数据值形状一致
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

# tensor(1.0263, grad_fn=<MseLossBackward>)

现在，如果使用loss的.grad_fn属性跟踪反向传播过程，会看到计算图如下：

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss

当调用了loss.backward()后，整张图对loss微分。这个图里所有requires_grad为true的张量都会对其.grad属性做梯度累加。

我们可以通过下面的方式来向后跟踪几步

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

# output:
# <MseLossBackward object at 0x7f8dac1b4550>
# <AddmmBackward object at 0x7f8dac1b4a90>
# <AccumulateGrad object at 0x7f8dac1b4a90>

Backprop

我们只需要调用loss.backward()来反向传播权重。首先我们需要清零现有的梯度，否则梯度会和已有的梯度累加。

demo time : 下面就是通过调用loss.backward()，查看conv1层的偏置在反向传播前后的梯度变化

net.zero_grad()     # 清零所有参数(parameter）的梯度缓存

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

# output:
# conv1.bias.grad before backward
# tensor([0., 0., 0., 0., 0., 0.])
# conv1.bias.grad after backward
# tensor([ 0.0084,  0.0019, -0.0179, -0.0212,  0.0067, -0.0096])

Update the weights

最简单的更新规则是随机梯度下降法(SGD）:

weight = weight - learning_rate * gradient

简单实现：

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

同时，在我们使用神经网络时，可能还希望使用各种不同的更新规则，如SGD、Nesterov-SGD、Adam、RMSProp等。为此，可以使用 torch.optim，它实现了所有的这些方法。使用起来也很简单：

import torch.optim as optim

# 创建优化器(optimizer）
optimizer = optim.SGD(net.parameters(), lr=0.01)

# 在训练的迭代中：
optimizer.zero_grad()   # 清零梯度缓存
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # 更新参数

REF

https://pytorch.apachecn.org/docs/1.4/blitz/neural_networks_tutorial.html
https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html

2018k

发布了50 篇原创文章 · 获赞 51 · 访问量 1968

私信关注

【Pytorch 学习笔记（三）】：NN相关