02- pytorch implements RNN

A guide package

import torch
from torch import nn
from torch.nn import functional as F
import dltools

1.1 Import training data

batch_size, num_steps = 32, 35
# 更改了默认的文件下载方式,需要将 article 文件放入该文件夹
train_iter, vocab = dltools.load_data_time_machine(batch_size, num_steps)

1.2 Constructing a neural network

num_hiddens = 256
# 构造了一个具有256个隐藏神经单元的单隐藏层的循环神经网络
rnn_layer = nn.RNN(len(vocab), num_hiddens)

A recurrent neural network (RNN)  layer is constructed with the following characteristics:

  • num_hiddens = 256: This line of code defines the number of hidden units in the RNN layer , that is, the number of neurons inside the RNN layer. In this example, it is set to 256, which means that the RNN layer will have 256 hidden neural units.

  • nn.RNN(len(vocab), num_hiddens): This line of code creates an instance of the RNN layer. Its parameters are as follows:

    • len(vocab): This is the feature dimension of the input data . In a recurrent neural network, the input data is usually a sequence, and the input at each time step is a vector. len(vocab)Represents the size of the vocabulary, which represents the number of possible inputs at each time step in the sequence. In natural language processing tasks, the size of the vocabulary usually corresponds to the number of different words in the vocabulary.

    • num_hiddens: This is the number of hidden units inside the RNN layer , which is 256 according to the previously defined value.

In summary, this code creates a single hidden layer RNN layer with 256 hidden neural units . This RNN layer can be used to process sequence data, such as text data, where each time step can correspond to a word in a vocabulary or an embedding representation of a word.

1.3 Initialize hidden state

# 初始化隐藏状态
state = torch.zeros((1, batch_size, num_hiddens))

A tensor of all zeros is created as the hidden state . The shape of the tensor is (1, batch_size, num_hiddens)where:

  • 1Indicates the number of time steps. What is initialized here is the hidden state of an initial time step.
  • batch_sizeRepresents the batch size, i.e. the number of samples processed at one time.
  • num_hiddensRepresents the number of hidden units at each time step, that is, the dimension of the hidden state.

2. Build a complete recurrent neural network

# 构建一个完整的循环神经网络
class RNNModel(nn.Module):
    def __init__(self, rnn_layer, vocab_size, **kwargs):
        super().__init__(**kwargs)
        self.rnn = rnn_layer
        self.vocab_size = vocab_size
        self.num_hiddens = self.rnn.hidden_size
        
        if not self.rnn.bidirectional:
            self.num_directions = 1
            self.linear = nn.Linear(self.num_hiddens, self.vocab_size)
        else:
            self.num_directions = 2
            self.linear = nn.Linear(self.num_hiddens * 2, self.vocab_size)
            
    # 前向传播
    def forward(self, inputs, state):
        X = F.one_hot(inputs.T.long(), self.vocab_size)
        X = X.to(torch.float32)
        Y, state = self.rnn(X, state)
        
        output = self.linear(Y.reshape(-1, Y.shape[-1]))
        return output, state
    
    # 初始化隐藏状态
    def begin_state(self, device, batch_size=1):
        return torch.zeros((self.num_directions * self.rnn.num_layers, batch_size, self.num_hiddens), device=device)

This section defines a RNNModelPyTorch model class named , which is a Recurrent Neural Network (RNN) model for processing sequence data.

  1. __init__Method: This is the constructor of the class and is used to initialize the various components of the model. Here, the following work was done:

    • super().__init__(**kwargs) The constructor of the parent class is called to ensure that the model is initialized correctly.
    • self.rnn = rnn_layerStores the incoming RNN layer .
    • self.vocab_size = vocab_sizeThe size of the vocabulary is stored .
    • self.num_hiddens = self.rnn.hidden_sizeObtained the hidden state size of the RNN layer .
    • Depending on whether the RNN is bidirectional , a linear layer is optionally created that maps the RNN output to a vocabulary-sized space. In the case of a bidirectional RNN, the dimensions of the input are twice the size of the hidden state.
  2. forwardMethod : This method defines the forward propagation process. It accepts input inputsand the current hidden state state. In forward pass, it does the following:

    • Use F.one_hotto convert the input inputsto a one-hot encoding to match the vocabulary size. Then convert it to a float tensor.
    • Pass the input data and hidden state to the RNN layer to get the output Yand new hidden statestate .
    • Reshape the RNN output Yinto a two-dimensional tensor , then self.linearmap it to a vocabulary-sized space through a linear layer, and return the output result.
  3. begin_state Method: This method is used to initialize the hidden state , returning an all-zero tensor whose shape depends on the number of RNN layers, number of directions, number of hidden units, and batch size.

2.1 Instantiate the model

# 在训练前,跑下模型
device = dltools.try_gpu()
net = RNNModel(rnn_layer, vocab_size=len(vocab))
net = net.to(device)

Created an RNNModel object that accepts a rnn_layer and a vocabulary size as parameters. Finally, it moves the model to the previously determined device

Three execution training

# 训练
num_epochs, lr = 200, 0.1
dltools.train_ch8(net, train_iter, vocab, lr, num_epochs, device)

3.1 Execute predictions

dltools.predict_ch8('time traveller', 10, net, vocab, device)

Guess you like

Origin blog.csdn.net/March_A/article/details/132839130