foreword

The core content comes from blog link 1 blog link 2 I hope you can support the author a lot
This article is used for records to prevent forgetting

Recurrent Neural Networks - Concise Implementation of Recurrent Neural Networks

Textbook

While the previous section was instructive for understanding how recurrent neural networks are implemented, it is not convenient. This section will show how to implement the same language model more efficiently using the functions provided by the high-level API of the deep learning framework. We still start by reading the Time Machine dataset.

import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l

batch_size, num_steps = 32, 35
train_iter, vocab = d2l.load_data_time_machine(batch_size, num_steps)

1 Define the model

The high-level API provides implementations of recurrent neural networks. We construct a recurrent neural network layer with a single hidden layer of 256 hidden units rnn_layer. In fact, we haven't discussed the significance of multilayer recurrent neural networks. It is now sufficient to understand multiple layers as the output of one layer of recurrent neural network being used as the input of the next layer of recurrent neural network.

num_hiddens = 256
rnn_layer = nn.RNN(len(vocab), num_hiddens)

We use a tensor to initialize the hidden state, its shape is (number of hidden layers, batch size, number of hidden units).

nn.RNNThe hidden state returned by the example in the forward calculation refers to the hidden state of the hidden layer at the last time step : when there are multiple layers of the hidden layer, the hidden state of each layer will be recorded in this variable; for long short-term memory (LSTM) ), the hidden state is a tuple (h, c), namely hidden state and cell state. We will introduce long short-term memory and deep recurrent neural networks later in this chapter.

state = torch.zeros((1, batch_size, num_hiddens))
state.shape

output:

torch.Size([1, 32, 256])

With a hidden state and an input, we can compute the output with the updated hidden state. It is important to emphasize that rnn_layerthe "output" of (Y) does not involve the computation of the output layer: it refers to the hidden states at each time step that can be used as input for subsequent output layers.

X = torch.rand(size=(num_steps, batch_size, len(vocab)))
Y, state_new = rnn_layer(X, state)
Y.shape, state_new.shape

output:

(torch.Size([35, 32, 256]), torch.Size([1, 32, 256]))

Similar to the previous section, we define a RNNModelclass for a complete recurrent neural network model. Note that rnn_layeronly hidden recurrent layers are included, we also need to create a separate output layer.

class RNNModel(nn.Module):
    """循环神经网络模型"""
    def __init__(self, rnn_layer, vocab_size, **kwargs):
        super(RNNModel, self).__init__(**kwargs)
        self.rnn = rnn_layer
        self.vocab_size = vocab_size
        self.num_hiddens = self.rnn.hidden_size
        # 如果RNN是双向的（之后将介绍），num_directions应该是2，否则应该是1
        if not self.rnn.bidirectional:
            self.num_directions = 1
            self.linear = nn.Linear(self.num_hiddens, self.vocab_size)
        else:
            self.num_directions = 2
            self.linear = nn.Linear(self.num_hiddens * 2, self.vocab_size)

    def forward(self, inputs, state):
        X = F.one_hot(inputs.T.long(), self.vocab_size)
        X = X.to(torch.float32)
        Y, state = self.rnn(X, state)
        # 全连接层首先将Y的形状改为(时间步数*批量大小,隐藏单元数)
        # 它的输出形状是(时间步数*批量大小,词表大小)。
        output = self.linear(Y.reshape((-1, Y.shape[-1])))
        return output, state

    def begin_state(self, device, batch_size=1):
        if not isinstance(self.rnn, nn.LSTM):
            # nn.GRU以张量作为隐状态
            return  torch.zeros((self.num_directions * self.rnn.num_layers,
                                 batch_size, self.num_hiddens),
                                device=device)
        else:
            # nn.LSTM以元组作为隐状态
            return (torch.zeros((
                self.num_directions * self.rnn.num_layers,
                batch_size, self.num_hiddens), device=device),
                    torch.zeros((
                        self.num_directions * self.rnn.num_layers,
                        batch_size, self.num_hiddens), device=device))

2 Training and Prediction

Before training the model, let's make predictions based on a model with random weights.

device = d2l.try_gpu()
net = RNNModel(rnn_layer, vocab_size=len(vocab))
net = net.to(device)
d2l.predict_ch8('time traveller', 10, net, vocab, device)

Obviously, this kind of model can't output good results at all. Next, we use the hyperparameter calls defined in the previous section train_ch8and use the high-level API to train the model.

num_epochs, lr = 500, 1
d2l.train_ch8(net, train_iter, vocab, lr, num_epochs, device)

output:

perplexity 1.3, 286908.2 tokens/sec on cuda:0
time traveller came the time traveller but now you begin to spen
traveller pork acong wa canome precable thig thit lepanchat

Compared to the previous section, the model achieves lower perplexity in a shorter time due to more code optimizations by the high-level API of the deep learning framework.

3 Summary

The high-level API of the deep learning framework provides the implementation of recurrent neural network layers.
The recurrent neural network layer of the high-level API returns an output and an updated hidden state, we also need to calculate the output layer of the entire model.
Using a high-level API implementation speeds up training compared to implementing recurrent neural networks from scratch.

Introduction to Deep Learning (58) Recurrent Neural Network - Simple Implementation of Recurrent Neural Network

Introduction to Deep Learning (58) Recurrent Neural Network - Simple Implementation of Recurrent Neural Network

foreword

Recurrent Neural Networks - Concise Implementation of Recurrent Neural Networks

Textbook

1 Define the model

2 Training and Prediction

3 Summary

Guess you like