dejahu的深度学习学习笔记05-Pyotrch中的深度学习计算

开始之前给大家安利一下我之前写的使用tensorflow2构建物体分类模型的博客，我在博客中详细介绍了数据集收集、模型构建和模型使用三个方面，结合视频你也可以快速构建自己的物体分类模型，快去试试吧！

手把手教你用tensorflow2.3训练自己的分类数据集_dejahu的博客-CSDN博客

前面的几章内容重点讨论了模型的基本构建和一些模型训练所涉及的技巧，今天学习的课程中，在树叶分类的任务中看到不少关于比赛的技巧，不只是迁移学习那么简单，包括优化器、数据增广等方面都有讲究，后面需要专门抽时间来看下前面的房价预测和树叶分类竞赛，为研二参加比赛积累一些经验。

今天这节内容主要来讨论Pytorch中代码的构建，其实深度学习模型的构建可以看作是一个乐高的积木，需要了解这些组件都是什么，然后一步步地将这些积木串联起来。

模型构造

简单模型构造nn.Sequential

Pytorch的模型构造中有层和块的概念，多个层构成块，多个块又构成了最终的模型，定义块的目的主要是方便代码的复用，块的概念在后面的VGG网络和Resnet网络中都有用到，层和块之间的串联依靠nn.Sequential函数来完成，比如下面是一个多层感知机的构成。

import torch
from torch import nn
from torch.nn import functional as F

net = nn.Sequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10))

X = torch.rand(2, 20)
net(X)

在这个代码中，nn.Sequential定义了一种特殊的Module，也就是按照顺序执行的模型。

复杂模型构造nn.Module

当然不是所有的模型都是定义了顺序执行的层然后从头执行到尾，一些复杂的网络，比如孪生网络这些，中间涉及到了共享的层，所有需要将模型的基本层和模型的具体运算分开。具体的做法是定义一个新的模型类继承nn.Module，然后在__init__方法中定义模型所具有的层，在forward方法中定义模型前向传播的方式，比如下方的代码：

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.out = nn.Linear(256, 10)

    def forward(self, X):
        return self.out(F.relu(self.hidden(X)))

使用模型的时候我们需要将模型先实例化，然后在每次调用正向传播函数的时候调用这些层。

net = MLP()
net(X)

顺序块Sequential使用起来比较简单，本质上也是通过继承nn.moudle来完成的，具体的方式如下：

class MySequential(nn.Module):
    def __init__(self, *args):
        super().__init__()
        for block in args:
            self._modules[block] = block

    def forward(self, X):
        for block in self._modules.values():
            X = block(X)
        return X

net = MySequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10))
net(X)

我们也可以在正向传播函数forward中定义更加复杂的运算，梯度的部分不需要关心，框架有自动求导的机制来帮助我们计算梯度，代码如下：

class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.rand_weight = torch.rand((20, 20), requires_grad=False)
        self.linear = nn.Linear(20, 20)

    def forward(self, X):
        X = self.linear(X)
        X = F.relu(torch.mm(X, self.rand_weight) + 1)
        X = self.linear(X)
        while X.abs().sum() > 1:
            X /= 2
        return X.sum()

net = FixedHiddenMLP()
net(X)

以及可以混合搭配各种组合块

class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(nn.Linear(20, 64), nn.ReLU(),
                                 nn.Linear(64, 32), nn.ReLU())
        self.linear = nn.Linear(32, 16)

    def forward(self, X):
        return self.linear(self.net(X))

chimera = nn.Sequential(NestMLP(), nn.Linear(16, 20), FixedHiddenMLP())
chimera(X)

注：目前来看，我们学习到的参数主要是init方法中定义的层的参数，所以包含权重的层都需要在init中声明，forward中是可以定义一些运算，但是定义的都是简单的固化的运算

参数管理

我们整个模型训练的目的是为了优化模型的参数，所以如何管理这些参数是十分重要的，本节内容我们主要介绍如何查看模型的参数和保存模型的参数。

总结：我们的目的还是先找到层，找到层之后就有字典形式的state_dict，这个时候就能锁定参数w和偏置b，同时找到这些参数的具体数值data和梯度grad

首先是具有单隐藏层的多层感知机，代码如下，这个感知机接受大小为4的输出，中间是8个神经元构成的隐藏层，经过relu激活函数之后最后的输出为1，相当于是做一个二分类的任务，代码如下：

import torch
from torch import nn

net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)
print(net)
print(net[0].state_dict())
print(net[1].state_dict()) # 这层对应的是relu的激活函数，虽然没有参数保存，但是也是其中的一层
print(net[2].state_dict())
# 输出
Sequential(
  (0): Linear(in_features=4, out_features=8, bias=True)
  (1): ReLU()
  (2): Linear(in_features=8, out_features=1, bias=True)
)
OrderedDict([('weight', tensor([[-0.2621, -0.4443, -0.0113,  0.1603],
        [ 0.0941,  0.3402, -0.2206, -0.0210],
        [ 0.3453,  0.1936, -0.4182, -0.4882],
        [-0.2145,  0.1083,  0.3432,  0.2874],
        [ 0.4649,  0.4542, -0.1082,  0.3344],
        [-0.3602,  0.2645,  0.3924, -0.4864],
        [ 0.0332, -0.4944,  0.0177,  0.4426],
        [-0.1140, -0.2459, -0.3820,  0.2194]])), ('bias', tensor([-0.1371,  0.3342, -0.1522, -0.2517, -0.4342,  0.3827, -0.2339,  0.4428]))])
OrderedDict()
OrderedDict([('weight', tensor([[ 0.3125, -0.2123, -0.2490, -0.0613,  0.1593, -0.1922,  0.0933, -0.2504]])), ('bias', tensor([-0.0281]))])

Process finished with exit code 0

参数访问

首先这个网络是个Sequential的模型，也就是一层层的结构，每层都会对应有一个索引，我们可以通过索引来访问模型的具体层，每一层的构成是权重w和偏置b，是按照字典的形式来保存的，所以我们可以通过下列方式来访问第三个全连接层的参数。

.state_dict()可以访问该层的所有参数，但是不包括网络的结构信息，这在模型保存的时候将会用到

print(net[2].state_dict())
# 输出如下：
OrderedDict([('weight', tensor([[ 0.1635, -0.1104,  0.0472, -0.0757,  0.2884, -0.3521,  0.2227,  0.0880]])), ('bias', tensor([-0.0959]))])

.weight和.bias可以分别访问模型的权重和偏置，权重的偏置又包括了数据data和梯度grad。

print(type(net[2].bias))
print(net[2].bias)
print(net[2].bias.data)
# 输出如下：
<class 'torch.nn.parameter.Parameter'>
Parameter containing:
tensor([-0.0959], requires_grad=True)
tensor([-0.0959])

同时也可以通过.named_parameters()来一次性访问所有的参数

print(*[(name, param.shape) for name, param in net[0].named_parameters()])
print(*[(name, param.shape) for name, param in net.named_parameters()])
# 输出如下：
('weight', torch.Size([8, 4])) ('bias', torch.Size([8]))
('0.weight', torch.Size([8, 4])) ('0.bias', torch.Size([8])) ('2.weight', torch.Size([1, 8])) ('2.bias', torch.Size([1]))

另外我们可以通过上节内容来建立一个嵌套的块，就是网络里面包block

def block1():
    return nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 4),
                         nn.ReLU())

def block2():
    net = nn.Sequential()
    for i in range(4):
        # 在这里嵌套
        net.add_module(f'block {
      
      i}', block1())
    return net

rgnet = nn.Sequential(block2(), nn.Linear(4, 1))
rgnet(X)

网络的结构是这样的

Sequential(
  (0): Sequential(
    (block 0): Sequential(
      (0): Linear(in_features=4, out_features=8, bias=True)
      (1): ReLU()
      (2): Linear(in_features=8, out_features=4, bias=True)
      (3): ReLU()
    )
    (block 1): Sequential(
      (0): Linear(in_features=4, out_features=8, bias=True)
      (1): ReLU()
      (2): Linear(in_features=8, out_features=4, bias=True)
      (3): ReLU()
    )
    (block 2): Sequential(
      (0): Linear(in_features=4, out_features=8, bias=True)
      (1): ReLU()
      (2): Linear(in_features=8, out_features=4, bias=True)
      (3): ReLU()
    )
    (block 3): Sequential(
      (0): Linear(in_features=4, out_features=8, bias=True)
      (1): ReLU()
      (2): Linear(in_features=8, out_features=4, bias=True)
      (3): ReLU()
    )
  )
  (1): Linear(in_features=4, out_features=1, bias=True)
)

我们的目的还是先找到层，找到层之后就有字典形式的state_dict，这个时候就能锁定参数w和偏置b，同时找到这些参数的具体数值data和梯度grad

rgnet[0][1][0].bias.data
# 输出
tensor([-0.4900, -0.4889,  0.3830, -0.0363,  0.2097, -0.0641, -0.1053, -0.3248])

参数初始化

前面的章节我们讨论过通过权重的合理初始化我们可以让模型变得更加稳定，比如之前介绍的Xavier初始化，参数的初始化主要在nn.init模块中，默认情况下，Pytorch会根据一个范围均匀地初始化权重和偏置矩阵，这个范围是根据输入和输出的维度计算出来的。

也可以使用内置的其他初始化方法来初始化模型的参数，比如我们这里将所有权重参数初始化为标准差为0.01的高斯随机变量，且将偏置参数设置为0，需要把你目标初始化的参数传入。

def init_normal(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, mean=0, std=0.01)
        nn.init.zeros_(m.bias)

net.apply(init_normal)
net[0].weight.data[0], net[0].bias.data[0]
# 输出
(tensor([ 0.0104,  0.0022, -0.0064, -0.0095]), tensor(0.))

另外这里是Xavier初始化方法，我们对层先进行判定，然后再执行初始化的操作

def xavier(m):
    if type(m) == nn.Linear:
        nn.init.xavier_uniform_(m.weight)

def init_42(m):
    if type(m) == nn.Linear:
        nn.init.constant_(m.weight, 42)

net[0].apply(xavier)
net[2].apply(init_42)
print(net[0].weight.data[0])
print(net[2].weight.data)
# 输出：
tensor([ 0.1484,  0.2210,  0.3737, -0.3520])
tensor([[42., 42., 42., 42., 42., 42., 42., 42.]])

自定义的初始化我们这里就不做讨论，感兴趣的小伙伴这里看这里：5.2. 参数管理 — 动手学深度学习 2.0.0-alpha1 documentation (d2l.ai)

参数绑定

参数绑定用于层之间来共享参数，比如之前提到的孪生网络，中间有一层就是共享的参数，代码如下，shared就是我们这里的共享层。

# 我们需要给共享层一个名称，以便可以引用它的参数。
shared = nn.Linear(8, 8)
net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), shared, nn.ReLU(), shared,
                    nn.ReLU(), nn.Linear(8, 1))
net(X)
# 检查参数是否相同
print(net[2].weight.data[0] == net[4].weight.data[0])
net[2].weight.data[0, 0] = 100
# 确保它们实际上是同一个对象，而不只是有相同的值。
print(net[2].weight.data[0] == net[4].weight.data[0])

这个例子表明第二层和第三层的参数是绑定的。它们不仅值相等，而且由相同的张量表示。因此，如果我们改变其中一个参数，另一个参数也会改变。你可能会想，当参数绑定时，梯度会发生什么情况？答案是由于模型参数包含梯度，因此在反向传播期间第二个隐藏层和第三个隐藏层的梯度会加在一起。

自定义层

和搭建积木一样，深度学习比较迷人的一点是可以自己设计网络的层，你可以根据自己的任务来设计一些层加入到你的模型里面，对发论文也比较有帮助。自己设计的层还是离不开nn.moudle这个类，需要完成init方法和forward方法。

不带参数的层

不带参数的层的设计比较简单，只需要你定义你是如何计算的即可，如下

import torch
import torch.nn.functional as F
from torch import nn

class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, X):
        return X - X.mean()

设计好的层可以作为单独的积木来使用，也可以添加到更大的模型中去

# 单独使用
layer = CenteredLayer()
layer(torch.FloatTensor([1, 2, 3, 4, 5]))
# 构建更加复杂的模型
net = nn.Sequential(nn.Linear(8, 128), CenteredLayer())

注：在实际使用的过程中，你会看到eps这个参数，这个参数一般取值是1e-5，主要是为了防止除0的情况出现。

带参数的层

带参数层的构建需要同时定义参数和参数的运算，也需要说明实例化的时候需要传入哪些参数，这块的设计和后面block的设计有点类似，大家可以等到后面我们讨论vgg和resnet的时候对比来看。

class MyLinear(nn.Module):
    def __init__(self, in_units, units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units))
        self.bias = nn.Parameter(torch.randn(units,))

    def forward(self, X):
        linear = torch.matmul(X, self.weight.data) + self.bias.data
        return F.relu(linear)

使用如下：

linear = MyLinear(5, 3)linear.weight
# 输出
Parameter containing:tensor([[ 0.1476, -1.2550,  0.0803],        [-0.8999, -1.3699,  0.4572],        [-0.1212,  0.2888, -0.7945],        [ 0.7072,  0.4077,  0.6760],        [-0.3221,  1.4658,  0.8462]], requires_grad=True)

也可以将自己自定义的层加入到完整的网络模型中

net = nn.Sequential(MyLinear(64, 8), MyLinear(8, 1))net(torch.rand(2, 64))
# 输出
tensor([[0.],        [0.]])

读写文件

模型训练完毕之后我们需要将模型保存下来，方便下次使用，如果大家在jupyter进行开发的话一般对应的是一个会话，只要会话不关，你的模型就在，为了方便我们下次使用，我们还是需要保存下来，总的来看有两种方式，一种是把模型和参数都存下来，另一种是只存参数，不保存模型，一般而言，第一种方法粗暴一点但是模型大一些，后面的方法麻烦一些但是模型要小一些。

注：Pytorch中注意使用eval（）和train（）方法来控制模型的梯度是否关闭或者开启

保存全部

# 保存模型
torch.save(model, 'model.pt')
# 加载模型
model = torch.load('model.pt')

只保存权重参数

只保存权重参数的话加载的时候需要先把模型实例化一下

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.output = nn.Linear(256, 10)

    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))

net = MLP()
X = torch.randn(size=(2, 20))
Y = net(X)
torch.save(net.state_dict(), 'mlp.params')
clone = MLP()
clone.load_state_dict(torch.load('mlp.params'))
clone.eval()

GPU的使用

CPU的通用性非常好，能处理各种各样的任务，但是对于深度学习的任务而言比较慢，所以我们需要通过GPU来加速我们模型的训练，这节内容主要来讨论一下torch中GPU如何使用。

另外关于如何在windows下配置GPU版本的Pyotrch和tensorflow我也专门写了一期博客，感兴趣的小伙伴可以看看：

(6条消息) 2021年Windows下安装GPU版本的Tensorflow和Pytorch_dejahu的博客-CSDN博客

查看GPU信息的话可以在cmd中输入nvidia-smi来进行查看，注意最新的30系显卡只支持cuda11

在PyTorch中，CPU和GPU可以用torch.device('cpu')和torch.cuda.device('cuda')表示。应该注意的是，cpu设备意味着所有物理CPU和内存。这意味着PyTorch的计算将尝试使用所有CPU核心。然而，gpu设备只代表一个卡和相应的显存。如果有多个GPU，我们使用torch.cuda.device(f'cuda:{i}')来表示第ii块GPU（ii从0开始）。另外，cuda:0和cuda是等价的。

import torchfrom torch
import nn
torch.device('cpu'), torch.cuda.device('cuda'), torch.cuda.device('cuda:1')# 输出(device(type='cpu'), <torch.cuda.device at 0x7f1b0c470c70>, <torch.cuda.device at 0x7f1b0c4373a0>)

我们可以查看GPU的数量，通过下面的函数

torch.cuda.device_count()

经验之谈，在平时的任务中，我们最好设置一个全局变量，来决定你的device，在实际运行的时候使用to(device)的方法变量进行转化。

比如，这里的这个方法来灵活根据你不同的设备来调整device

def try_gpu(i=0):  #@save
    """如果存在，则返回gpu(i)，否则返回cpu()。"""
    if torch.cuda.device_count() >= i + 1:
        return torch.device(f'cuda:{
      
      i}')
    return torch.device('cpu')

def try_all_gpus():  #@save
    """返回所有可用的GPU，如果没有GPU，则返回[cpu(),]。"""
    devices = [
        torch.device(f'cuda:{
      
      i}') for i in range(torch.cuda.device_count())]
    return devices if devices else [torch.device('cpu')]

try_gpu(), try_gpu(10), try_all_gpus()

同时还可以通过.device方法来查看参数所属的设备，如下

net[0].weight.data.device

服务器上还是指定一下GPU比较好，一是方便管理，二是不给别人带来麻烦。