Pytorch笔记(三)

用例子来学习Pytorch续

这篇笔记是接着Pytorch笔记(二)继续的

Pytorch的nn模块(neural network)

根据上面的学习我们了解到一个神经网络包含了前向传播和反向传播,其中我们定义node和edge分别为torch和function就是这样数据和数据(data)的操作方法(operation)的组合,构成了我们的神经网络,这一点我认为很类似于一个系统框图(信号与系统或者自动控制理论中)

虽然已经有了可以搭建神经网络的方法,但是这样的操作太底层了,对于一般性来说,如果仅仅是使用,已经很成熟的模块,我们不需这个计算的过程,而是专注于结构上的构建,比如一个 Layer 或者是一个 Block

# -*- coding: utf-8 -*-
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    y_pred = model(x)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

梳理代码:

  • 这里我们使用的搭建层的方法为torch.nn.Sequential ,Sequential就是连续的,也是从顶到底的,一个堆叠,里面包含了一个线性层 一个RuLU激活函数层 以及最后输出的线性层,这个结构和我们之前的网络结构是相同的

  • 在更新参数的时候我们使用得是model.paramrters()去拿到模型里面的learnable参数,使用参数自身的gard去更新它的值,等于para是一个对象,里面既有自己的值,同时保存了反向传播的梯度

Optimization 加速器

还记得我们上面是怎么更新梯度的吗,我们使用with torch.no_grad():下面的代码去逐个参数的更新数值,这样的做法很辛苦,而且是很基础的更新,事实上在更新方法上,我们已经有个很多很成熟的optimization algorithms ,比如Adam 、 RMSProp 、AdaGrad 等等,他们都包含在optim这个包下

# -*- coding: utf-8 -*-
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()

代码解析:

  • 最直观的感受就是最下面没有with torch.no_grad() 方法了,取而代之得是一个optimizer.step() ,这里可以看到我们多一个新的对象,叫做optimizer(加速器)它的定义在torch.optim.Adam下面,里面接收的参数是model.parameters(),以及一个学习率,这一点其实和之前也是共通的,我们之前操作得也是model的parameters,现在只不过把它传给了加速器,让加速器帮我们进行一个反向传播更新从参数的方法

定制自己的module

我们上面使用了一个Sequnetial module ,这个模型最大的特点就是自顶而下,也就是好像仅仅有一条线,类似于我们学习布局的时候用到的linear Layout,它仅仅在乎一个vertical方向的布局

那我们想要有更丰富的module,那该怎么办呢?对此我们可以,继承与torch.nn.Module,我们写它的一个自定义子类,下面我们实现一个2层的子类module

# -*- coding: utf-8 -*-
import torch


class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Tensors.
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
  • 这里看到,我们自己定义了一个twolayer module

这个module的写发也可以变成这样

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.relu=torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Tensors.
        """
        h = self.linear1(x)
        h_relu=self.relu(h)
        y_pred = self.linear2(h_relu)
        return y_pred
  • 因为Relu这个激活函数,它很特殊,反向传播是,梯度在负数为0,正数为1,不会影响参数,所以我们可以使用clamp去把它对截断,同样也可以用使用nn.ReLU 两者的功效是相同的

流程控制与权值共享

下面介绍一个,网络流程控制的案例,这种结构大多数用于RNN或者有LSTM模块的网络中,他们的网络结构都是变化的对于一般的全连接,或者CNN可能用得比较少

# -*- coding: utf-8 -*-
import random
import torch


class DynamicNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we construct three nn.Linear instances that we will use
        in the forward pass.
        """
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
        and reuse the middle_linear Module that many times to compute hidden layer
        representations.

        Since each forward pass builds a dynamic computation graph, we can use normal
        Python control-flow operators like loops or conditional statements when
        defining the forward pass of the model.

        Here we also see that it is perfectly safe to reuse the same Module many
        times when defining a computational graph. This is a big improvement from Lua
        Torch, where each Module could be used only once.
        """
        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(random.randint(0, 3)):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
  • 可以看到和之前并没有什么太大不同就是每次训练隐藏层的数量是不唯一的

这是官网关于MSELoss的解释 链接
在这里插入图片描述
可以看到,如果不指定reduction的参数,这里返回一个损失序列,指定为mean则求均值,指定为sum则求和
参见我们上一次的笔记我们的损失函数是
L o s s = ( Y p r e y t r u e ) 2 Loss= \sum (Y_{pre}-y_{true})^2
所以,这里我们的reduction设置为sum

猜你喜欢

转载自blog.csdn.net/weixin_43869493/article/details/105486048
今日推荐