[The most detailed in the whole network] Using PyTorch to realize the recurrent neural network

Table of contents

1. What is a Recurrent Neural Network

2. Recurrent Neural Networks in PyTorch

3. Create a recurrent neural network model

summary

4. Training the recurrent neural network model

5. Evaluate Recurrent Neural Network Models


Welcome to this tutorial on implementing recurrent neural networks with PyTorch! Here, I will show you how to use PyTorch to create, train and evaluate a recurrent neural network (RNN) and apply it to text generation tasks. This tutorial will cover the following topics:

1. What is a Recurrent Neural Network

2. Recurrent Neural Networks in PyTorch

3. Create a recurrent neural network model

4. Training the recurrent neural network model

5. Evaluate Recurrent Neural Network Models

6. Applying the Recurrent Neural Network Model to Text Generation

let's start!

1. What is a Recurrent Neural Network

A recurrent neural network (RNN) is a type of neural network for sequential data, often used for tasks such as language modeling, translation, and music generation. RNN predicts the output of the next time step based on the input of the previous time step and the state of the current time step. This makes RNNs very effective when dealing with continuous data, as it can use information from previous time steps to update its state and generate new outputs.

The core of the RNN model is the "recurrent" structure. In traditional neural networks, each input is processed independently, whereas in RNNs, each input is associated with the output from the previous time step. At each time step, the RNN passes the current input and the output of the previous time step into a vector called a "hidden state". This hidden state captures information about current and past inputs and uses it to predict the next output.

2. Recurrent Neural Networks in PyTorch

PyTorch is a very popular deep learning framework that provides many tools for building and training neural networks. In PyTorch, we can use the RNN class and LSTM class in the torch.nn module to build RNN models.

The RNN class is the most basic RNN type. It takes an input tensor and an initial hidden state tensor, and outputs an output tensor and a final hidden state tensor. The LSTM class is a modified type of RNN that uses a structure called Long Short-Term Memory (LSTM) to remember and manage state. LSTM is more suitable for processing long sequence data than RNN, because it can better handle the problem of gradient disappearance and gradient explosion.

In the next sections, we will use the RNN class in PyTorch to build a simple recurrent neural network model and use it for text generation tasks.

3. Create a recurrent neural network model

In this example, we will create a character-level text generative model that takes one or more characters as input and generates the next character as output. We'll use some of Shakespeare's works as our training data to generate new Shakespeare-style texts.

First, we need to preprocess the text. We'll convert all characters to lowercase and represent them as numbers. We'll also create a dictionary that maps each character to a unique number, and use those numbers to train the model.

import string 

def preprocess(text): 
    # 将文本转换为小写 
    text = text.lower() 

    # 删除所有标点符号 
    text = text.translate(str.maketrans('', '', string.punctuation)) 

    # 创建一个字符到数字的映射表 
    char_to_index = {char: i for i, char in enumerate(set(text))} 
    index_to_char = {i: char for char, i in char_to_index.items()} 

    # 将文本表示为数字 
    text = [char_to_index[char] for char in text] 

    return text, char_to_index, index_to_char

Next, we'll create a TextDataset class for loading and processing training data. This class will take a list of preprocessed text and convert it to a PyTorch tensor.

import torch 
from torch.utils.data import Dataset 
class TextDataset(Dataset): 
    def __init__(self, text): 
        self.text = torch.tensor(text) 

    def __len__(self): 
        return len(self.text) - 1 

    def __getitem__(self, idx): 
        x = self.text[idx] 
        y = self.text[idx+1] 
        return x, y

Next, we'll create an RNN class to implement our recurrent neural network model. The model will contain an embedding layer, an RNN layer and a fully connected layer. The embedding layer converts the numeric representation of the input character into a vector representation, the RNN layer will process the sequence data and generate a hidden state, and the fully connected layer will generate an output based on the hidden state.

import torch.nn as nn 
class RNN(nn.Module): 
    def __init__(self, input_size, hidden_size, output_size): 
        super(RNN, self).__init__()
 
        self.embedding = nn.Embedding(input_size, hidden_size) 
        self.rnn = nn.RNN(hidden_size, hidden_size, batch_first=True) 
        self.fc = nn.Linear(hidden_size, output_size) 
 
    def forward(self, x, h): 
        x = self.embedding(x) 
        out, h = self.rnn(x, h) 
        out = self.fc(out) 
        return out, h

We also need to define some hyperparameters such as sequence length, batch size, hidden layer size and number of training iterations.

# 定义超参数 
sequence_length = 100 
batch_size = 128 
hidden_size = 256 
num_layers = 2 
num_epochs = 50 
learning_rate = 0.001

We also need to define a function to generate the input and target sequences. This function will split the text data into multiple sequences according to the specified sequence length and convert each sequence into a PyTorch tensor.

def generate_sequence_data(text, sequence_length, batch_size): 
    num_sequences = len(text) // sequence_length 
    text = text[:num_sequences * sequence_length] 
    x_data = text[:-1] 
    y_data = text[1:] 

    x_batches = torch.split(x_data, batch_size) 
    y_batches = torch.split(y_data, batch_size) 

    sequence_data = [] 
    for i in range(num_sequences // batch_size): 
        x = x_batches[i * batch_size:(i + 1) * batch_size] 
        y = y_batches[i * batch_size:(i + 1) * batch_size] 
        sequence_data.append((torch.stack(x), torch.stack(y))) 

    return sequence_data

Next, we will load and preprocess our training data and convert it into a PyTorch dataset.

with open('shakespeare.txt', 'r', encoding='utf-8') as f: 
    text = f.read() 

text, char_to_index, index_to_char = preprocess(text) 
dataset = TextDataset(text)

Now, we can create a DataLoader object for loading data and splitting it into batches.

dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)

Next, we'll create an RNN object and define our loss function and optimizer.

model = RNN(len(char_to_index), hidden_size, len(char_to_index)).to(device) 
criterion = nn.CrossEntropyLoss() 
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

Now we are ready to train our model. In each training iteration, we will feed the model an input sequence and compute the output of the model. We will use a loss function to calculate the difference between the model output and the target sequence and update the model parameters using the backpropagation algorithm.

for epoch in range(num_epochs): 
    total_loss = 0 
    h = torch.zeros(num_layers, batch_size, hidden_size).to(device) 
for i, (x, y) in enumerate(dataloader): 
        x = x.to(device) 
        y = y.to(device) 

        optimizer.zero_grad() 
        out, h = model(x, h) 
        loss = criterion(out.view(-1, len(char_to_index)), y.view(-1)) 
        loss.backward() 
        optimizer.step() 

        total_loss += loss.item() 
        if (i+1) % 100 == 0: 
            print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                .format(epoch+1, num_epochs, i+1, len(dataset)//batch_size, total_loss / (i+1)))

After training is complete, we can use our trained model to generate text. We can generate text by providing a starting character and looping many times. Every time we generate text, we will feed the current character to the model and get a prediction for the next character

def generate_text(model, char_to_index, index_to_char, sequence_length, seed_text, num_chars): 
    with torch.no_grad(): 
        h = torch.zeros(num_layers, 1, hidden_size).to(device) 
        seed = [char_to_index[char] for char in seed_text] 
        input = torch.tensor(seed).unsqueeze(1).to(device) 
        output = [] 
        for i in range(num_chars): 
            out, h = model(input, h) 
            out = out[-1] 
            prob = nn.functional.softmax(out, dim=0).cpu().numpy() 
            index = np.random.choice(len(char_to_index), p=prob) 
            output.append(index_to_char[index]) 
            input = torch.tensor([index]).unsqueeze(1).to(device) 
        return ''.join(output)

We can now train our model and generate text using the following code snippet.

# 定义超参数 
sequence_length = 100 
batch_size = 128 
hidden_size = 256 
num_layers = 2 
num_epochs = 50 
learning_rate = 0.001 

# 加载数据集 
with open('shakespeare.txt', 'r', encoding='utf-8') as f: 
    text = f.read() 

text, char_to_index, index_to_char = preprocess(text) 
dataset = TextDataset(text) 

# 创建数据加载器 
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True) 

# 创建模型,损失函数和优化器 
model = RNN(len(char_to_index), hidden_size, len(char_to_index)).to(device) 
criterion = nn.CrossEntropyLoss() 
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) 

# 训练模型 
for epoch in range(num_epochs): 
    total_loss = 0 
    h = torch.zeros(num_layers, batch_size, hidden_size).to(device) 
    for i, (x, y) in enumerate(dataloader):
        x = x.to(device) 
        y = y.to(device) 

        optimizer.zero_grad() 
        out, h = model(x, h) 
        loss = criterion(out.view(-1, len(char_to_index)), y.view(-1)) 
        loss.backward() 
        optimizer.step() 

        total_loss += loss.item() 
        
        if (i+1) % 100 == 0: 
            print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                .format(epoch+1, num_epochs, i+1, len(dataset)//batch_size, total_loss / (i+1))) 

# 生成文本 
generated_text = generate_text(model, char_to_index, index_to_char, sequence_length, 'shall i compare thee to a summer\'s day?\n', 1000) 
print(generated_text)

This will train our recurrent neural network and use it to generate a piece of text that is 1000 characters long. You can experiment with different hyperparameters and training data to create different models and generate text.

summary

In this tutorial, we introduced the recurrent neural network (RNN) and its applications, implemented a basic character-level RNN model using PyTorch, and took text generation as an example. We first preprocess the text dataset and then define an RNN model. We train our model using the Cross Entropy loss function and the Adam optimizer, and verify that it works by generating some sample text.

In general, RNNs are very useful when dealing with sequence data, such as natural language processing, time series forecasting, etc. PyTorch provides developers with very convenient tools and interfaces to easily build and train deep learning models. At the same time, the training and inference process of the model can be accelerated by using GPU. Therefore, we hope this tutorial has provided you with a basic knowledge of PyTorch and RNNs, and inspired you to start trying more complex deep learning tasks.

Next, we describe how to train a recurrent neural network model, evaluate a recurrent neural network model, and apply a recurrent neural network model to text generation.

4. Training the recurrent neural network model

Our model has been defined, the next step is to train the model. We need to provide the model with training data, and then update the model parameters by calculating the loss and backpropagation. Before that, we need to set some training hyperparameters, such as learning rate, number of iterations, and so on.

learning_rate = 0.005 
n_epochs = 2000 
print_every = 100 
plot_every = 10 
hidden_size = 100
 
# Create RNN model and set loss and optimizer 
rnn = RNN(n_characters, hidden_size, n_characters) 
loss_fn = nn.CrossEntropyLoss() 
optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate) 

# Initialize loss and time variables 
current_loss = 0 
all_losses = [] 
start_time = time.time()

Then we start training the recurrent neural network model:

for epoch in range(1, n_epochs + 1): 
    # Get input and target tensors from random batch 
    input_tensor, target_tensor = random_training_example(all_characters, n_characters,         
    data, chunk_len) # Initialize hidden state hidden = rnn.init_hidden() 

    # Zero the gradients 
    optimizer.zero_grad() 

    # Forward pass 
    loss, hidden = rnn(input_tensor, target_tensor, hidden) 

    # Backward pass 
    loss.backward() 

    # Update parameters 
    optimizer.step() 

    # Update loss 
    current_loss += loss.item() 

    # Print progress 
    if epoch % print_every == 0: 
        print(f'Epoch [{epoch}/{n_epochs}], Loss: {current_loss/print_every:.4f}') 
        current_loss = 0

    # Save the model and the optimizer every 500 epochs 
    if epoch % 500 == 0: 
        torch.save(rnn.state_dict(), f'./models/rnn_{epoch}.pth')                         
        torch.save(optimizer.state_dict(), f'./models/optimizer_{epoch}.pth') 

    # Record the average loss every `plot_every` epochs 
    if epoch % plot_every == 0: 
        all_losses.append(current_loss / plot_every) 
        current_loss = 0

During the training process, we first randomly obtain a batch of training samples from the dataset. Then, we initialize the hidden state, zero the gradient, calculate the loss through forward propagation, and then update the model parameters through backpropagation. Finally, we record the loss per epoch and visualize the loss after training is complete. During training, we can also save the state of the model and optimizer to restore the model later.

5. Evaluate Recurrent Neural Network Models

Once we have trained the model, we need to evaluate the performance of the model. We can use some indicators to measure the performance of the model, such as accuracy, precision, recall, F1 score, and so on. For text generation tasks, we usually use perplexity as an evaluation metric. Perplexity represents the predictive ability of the model for a given sequence, and the calculation method is:

$$\text{perplexity} = e^{-\frac{1}{N} \sum_{i=1}^{N} \log p(x_i)}$$

Among them, $N$ is the length of the sequence, and $p(x_i)$ is the predicted probability of the model for the $i$th character.

Here is a sample code to evaluate the model:

def evaluate(rnn, data, prime_str='A', predict_len=100, temperature=0.8): 
    # Initialize hidden state 
    hidden = rnn.init_hidden() 

    # Initialize input sequence 
    prime_input = char_tensor(prime_str) 

    # Generate prime sequence 
    for p in range(len(prime_str) - 1):
         _, hidden = rnn(prime_input[p].unsqueeze(0), hidden) 

    # Initialize predicted sequence 
    predicted = prime_str 

    # Generate predicted sequence 
    for p in range(predict_len): 

        # Forward pass 
        output, hidden = rnn(prime_input[-1].unsqueeze(0), hidden) 

        # Sample from the network as a multinomial distribution 
        output_dist = output.data.view(-1).div(temperature).exp() top_i =         
        torch.multinomial(output_dist, 1)[0] 

        # Add predicted character to string and use as next input 
        predicted_char = all_characters[top_i] 
        predicted += predicted_char 
        prime_input = char_tensor(predicted_char) 

        # Calculate perplexity 
        perplexity = calc_perplexity(rnn, data) 

        return predicted, perplexity 

def calc_perplexity(rnn, data): 

    # Set loss function 
    loss_fn = nn.CrossEntropyLoss() 

    # Initialize variables 
    loss = 0 
    n_chars = 0

    # Iterate over all chunks in data 
    for chunk in data: 

        # Get input and target tensors 
        input_tensor = char_tensor(chunk[:-1]) 
        target_tensor = char_tensor(chunk[1:]) 

        # Initialize hidden state 
        hidden = rnn.init_hidden() 

        # Forward pass 
        output, hidden = rnn(input_tensor, target_tensor, hidden) 

        # Calculate loss 
        loss += loss_fn(output.view(-1, n_characters), target_tensor.view(-1)).item() * len(chunk) 

        # Update number of characters 
        n_chars += len(chunk) - 1 

    # Calculate perplexity 
    perplexity = np.exp(loss / n_chars) 

    return perplexity

We first define an evaluate function that takes a trained model, an initial string, a generated sequence length, and a temperature parameter. We use the initial string to generate a starting sequence and use the model to generate a sequence of length predict_len. We then compute the perplexity for that sequence and return the resulting sequence and perplexity.

The process of calculating perplexity is in the calc_perplexity function. We compute the loss for each sequence using the cross-entropy loss function, and then add up the loss and the number of characters. Finally, we divide the loss by the number of characters and use the exponential function to calculate the perplexity value.

Finally, we look at how recurrent neural networks can be applied to text generation tasks. For text generation tasks, our goal is to generate the next character or word based on a previous piece of text. We can use the trained recurrent neural network model to predict the probability distribution of the next character or word, and sample according to that distribution. Specifically, we can perform the following steps:

1. Given an initial string and a cyclic neural network model.

2. Use the initial string to generate a starting sequence, which is fed into the recurrent neural network model.

3. In the output of the model, a character with the highest probability is selected as the predicted next character, and this character is added to the end of the sequence.

4. Taking the predicted character as input, repeat step 3 until a sequence of the desired length is generated.

When generating text, we usually add a temperature parameter to control the diversity of the generated text. The higher the temperature parameter, the more random the generated text; the lower the temperature parameter, the more conservative the generated text. Here is a sample code that generates text:

def generate_text(rnn, prime_str='A', predict_len=100, temperature=0.8): 
    # Initialize hidden state 
    hidden = rnn.init_hidden() 

    # Initialize input sequence 
    prime_input = char_tensor(prime_str) 

    # Generate prime sequence 
    for p in range(len(prime_str) - 1): 
        _, hidden = rnn(prime_input[p].unsqueeze(0), hidden) 

    # Initialize predicted sequence 
    predicted = prime_str 

    # Generate predicted sequence 
    for p in range(predict_len): 

        # Forward pass 
        output, hidden = rnn(prime_input[-1].unsqueeze(0), hidden) 

        # Sample from the network as a multinomial distribution 
        output_dist = output.data.view(-1).div(temperature).exp() top_i =         
        torch.multinomial(output_dist, 1)[0] 

        # Add predicted character to string and use as next input 
        predicted_char = all_characters[top_i] 
        predicted += predicted_char 
        prime_input = char_tensor(predicted_char) 

    return predicted

We can use this function to generate text of specified length. We first use the initial string to generate a starting sequence, and use the model to generate a sequence of length predict_len. We then sample the resulting sequence with the temperature parameter and return the resulting text.

Along the way, we also need some helper functions to convert characters and words into numeric representations. Here are helper functions that convert characters to numeric representations:

def char_tensor(string): 
    tensor = torch.zeros(len(string)).long() 
    for c in range(len(string)): 
        tensor[c] = all_characters.index(string[c]) 
    return tensor

This function takes a string as input and converts it to a tensor of integers whose length is the length of the string. Each element is the index of the corresponding character in the character table.

For word-level recurrent neural networks, we need a different helper function to convert words into numeric representations. Here are helper functions that convert words to numeric representations:

def word_tensor(word, word_to_ix): 
    tensor = torch.zeros(len(word)).long() 
    for w in range(len(word)): 
        tensor[w] = word_to_ix[word[w]] 
    return tensor

The function takes as input a word and the word table word_to_ix and converts the word into a tensor of integers whose length is word. Each element is the index of the corresponding word in the word list.

Next, we need some functions for training and evaluating recurrent neural network models. Here is the code for the training function:

def train(rnn, criterion, optimizer, train_data, n_epochs=200, print_every=100, plot_every=10): 
    # Initialize losses and perplexities 
    current_loss = 0 
    all_losses = [] 
    current_perplexity = 0 
    all_perplexities = [] 

    # Train loop 
    for epoch in range(1, n_epochs + 1): 
        # Initialize hidden state 
        hidden = rnn.init_hidden() 

        # Shuffle training data 
        random.shuffle(train_data) 

        # Train on each example 
        for i, (input_seq, target_seq) in enumerate(train_data): 
            # Clear gradients 
            optimizer.zero_grad() 

            # Convert input and target sequences to tensors 
            input_tensor = input_seq.unsqueeze(0) 
            target_tensor = target_seq.unsqueeze(0)

            # Forward pass 
            output, hidden = rnn(input_tensor, hidden) 

            # Calculate loss and perplexity 
            loss = criterion(output.view(-1, n_categories), target_tensor.view(-1)) 
            perplexity = torch.exp(loss) 
            
            # Backward pass and optimization step 
            loss.backward() 
            optimizer.step() 

            # Update current loss and perplexity 
            current_loss += loss.item() 
            current_perplexity += perplexity.item() 

            # Print progress 
            if (i + 1) % print_every == 0: 
                print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Perplexity: {:.4f}' 
                    .format(epoch, n_epochs, i + 1, len(train_data), 
                        current_loss / print_every, current_perplexity / print_every)) 
                current_loss = 0 
                current_perplexity = 0 

        # Add loss and perplexity to list 
        all_losses.append(current_loss) 
        all_perplexities.append(current_perplexity) 

        # Plot losses and perplexities 
        if epoch % plot_every == 0: 
            plot_loss(all_losses, epoch) 
            plot_perplexity(all_perplexities, epoch)

This function accepts the cyclic neural network model rnn, the loss function criterion, the optimizer optimizer, the training data train_data, the number of training cycles n_epochs, every number of steps to print the progress print_every, and every number of cycles to draw the loss and perplexity chart plot_every as input.

During training, we first initialize the variables current_loss and current_perplexity of loss and perplexity, and then run the training loop. In each training epoch, we first initialize the hidden state hidden, and then randomly shuffle the training data. Next, we train each training sample, the specific steps are as follows:

1. Clear the gradient.

2. Convert the input sequence and target sequence into tensors.

3. Forward propagation, computing the output sequence and new hidden state.

4. Calculate loss and perplexity.

5. Back propagation and optimization.

6. Update the current loss and perplexity.

7. If the number of steps to print the progress is reached, print the progress and reset the current loss and perplexity.

After each training epoch, we add the current loss and perplexity to a list and plot the loss and perplexity every certain number of epochs.

Here is the code for the evaluation function:

def evaluate(rnn, criterion, data): 
    # Initialize loss and perplexity 
    total_loss = 0 total_perplexity = 0 

    # Loop through data 
    with torch.no_grad(): 
        for input_seq, target_seq in data: 
            # Initialize hidden state 
            hidden = rnn.init_hidden() 

            # Convert input and target sequences to tensors 
            input_tensor = torch.tensor(input_seq, dtype=torch.long).view(-1, 1) 
            target_tensor = torch.tensor(target_seq, dtype=torch.long).view(-1, 1) 

            # Forward pass 
            output, hidden = rnn(input_tensor, hidden) 
            loss = criterion(output, target_tensor) 

            # Update total loss and perplexity 
            total_loss += loss.item() 
            total_perplexity += np.exp(loss.item()) 

# Calculate average loss and perplexity 
avg_loss = total_loss / len(data) 
avg_perplexity = total_perplexity / len(data) 

return avg_loss, avg_perplexity

This function accepts the recurrent neural network model `rnn`, the loss function `criterion` and the test data `data` as input. During evaluation, we initialize the loss and perplexity variables `total_loss` and `total_perplexity`, and then loop over the test data. For each test sample, we first initialize the hidden state `hidden`, and then convert the input and target sequences into tensors. Next, we do a forward pass, computing the output sequence and new hidden state, and computing the loss and perplexity. Finally, we calculate the average loss and perplexity by dividing the total loss and perplexity by the number of test samples. Finally, let's look at how to use the trained recurrent neural network model for text generation. The following is the code of the text generation function: To be continued.

Guess you like

Origin blog.csdn.net/m0_61789994/article/details/129189854