Table of contents
1. What is a Recurrent Neural Network
2. Recurrent Neural Networks in PyTorch
3. Create a recurrent neural network model
4. Training the recurrent neural network model
5. Evaluate Recurrent Neural Network Models
Welcome to this tutorial on implementing recurrent neural networks with PyTorch! Here, I will show you how to use PyTorch to create, train and evaluate a recurrent neural network (RNN) and apply it to text generation tasks. This tutorial will cover the following topics:
1. What is a Recurrent Neural Network
2. Recurrent Neural Networks in PyTorch
3. Create a recurrent neural network model
4. Training the recurrent neural network model
5. Evaluate Recurrent Neural Network Models
6. Applying the Recurrent Neural Network Model to Text Generation
let's start!
1. What is a Recurrent Neural Network
A recurrent neural network (RNN) is a type of neural network for sequential data, often used for tasks such as language modeling, translation, and music generation. RNN predicts the output of the next time step based on the input of the previous time step and the state of the current time step. This makes RNNs very effective when dealing with continuous data, as it can use information from previous time steps to update its state and generate new outputs.
The core of the RNN model is the "recurrent" structure. In traditional neural networks, each input is processed independently, whereas in RNNs, each input is associated with the output from the previous time step. At each time step, the RNN passes the current input and the output of the previous time step into a vector called a "hidden state". This hidden state captures information about current and past inputs and uses it to predict the next output.
2. Recurrent Neural Networks in PyTorch
PyTorch is a very popular deep learning framework that provides many tools for building and training neural networks. In PyTorch, we can use the RNN class and LSTM class in the torch.nn module to build RNN models.
The RNN class is the most basic RNN type. It takes an input tensor and an initial hidden state tensor, and outputs an output tensor and a final hidden state tensor. The LSTM class is a modified type of RNN that uses a structure called Long Short-Term Memory (LSTM) to remember and manage state. LSTM is more suitable for processing long sequence data than RNN, because it can better handle the problem of gradient disappearance and gradient explosion.
In the next sections, we will use the RNN class in PyTorch to build a simple recurrent neural network model and use it for text generation tasks.
3. Create a recurrent neural network model
In this example, we will create a character-level text generative model that takes one or more characters as input and generates the next character as output. We'll use some of Shakespeare's works as our training data to generate new Shakespeare-style texts.
First, we need to preprocess the text. We'll convert all characters to lowercase and represent them as numbers. We'll also create a dictionary that maps each character to a unique number, and use those numbers to train the model.
import string
def preprocess(text):
# 将文本转换为小写
text = text.lower()
# 删除所有标点符号
text = text.translate(str.maketrans('', '', string.punctuation))
# 创建一个字符到数字的映射表
char_to_index = {char: i for i, char in enumerate(set(text))}
index_to_char = {i: char for char, i in char_to_index.items()}
# 将文本表示为数字
text = [char_to_index[char] for char in text]
return text, char_to_index, index_to_char
Next, we'll create a TextDataset class for loading and processing training data. This class will take a list of preprocessed text and convert it to a PyTorch tensor.
import torch
from torch.utils.data import Dataset
class TextDataset(Dataset):
def __init__(self, text):
self.text = torch.tensor(text)
def __len__(self):
return len(self.text) - 1
def __getitem__(self, idx):
x = self.text[idx]
y = self.text[idx+1]
return x, y
Next, we'll create an RNN class to implement our recurrent neural network model. The model will contain an embedding layer, an RNN layer and a fully connected layer. The embedding layer converts the numeric representation of the input character into a vector representation, the RNN layer will process the sequence data and generate a hidden state, and the fully connected layer will generate an output based on the hidden state.
import torch.nn as nn
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.embedding = nn.Embedding(input_size, hidden_size)
self.rnn = nn.RNN(hidden_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x, h):
x = self.embedding(x)
out, h = self.rnn(x, h)
out = self.fc(out)
return out, h
We also need to define some hyperparameters such as sequence length, batch size, hidden layer size and number of training iterations.
# 定义超参数
sequence_length = 100
batch_size = 128
hidden_size = 256
num_layers = 2
num_epochs = 50
learning_rate = 0.001
We also need to define a function to generate the input and target sequences. This function will split the text data into multiple sequences according to the specified sequence length and convert each sequence into a PyTorch tensor.
def generate_sequence_data(text, sequence_length, batch_size):
num_sequences = len(text) // sequence_length
text = text[:num_sequences * sequence_length]
x_data = text[:-1]
y_data = text[1:]
x_batches = torch.split(x_data, batch_size)
y_batches = torch.split(y_data, batch_size)
sequence_data = []
for i in range(num_sequences // batch_size):
x = x_batches[i * batch_size:(i + 1) * batch_size]
y = y_batches[i * batch_size:(i + 1) * batch_size]
sequence_data.append((torch.stack(x), torch.stack(y)))
return sequence_data
Next, we will load and preprocess our training data and convert it into a PyTorch dataset.
with open('shakespeare.txt', 'r', encoding='utf-8') as f:
text = f.read()
text, char_to_index, index_to_char = preprocess(text)
dataset = TextDataset(text)
Now, we can create a DataLoader object for loading data and splitting it into batches.
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
Next, we'll create an RNN object and define our loss function and optimizer.
model = RNN(len(char_to_index), hidden_size, len(char_to_index)).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
Now we are ready to train our model. In each training iteration, we will feed the model an input sequence and compute the output of the model. We will use a loss function to calculate the difference between the model output and the target sequence and update the model parameters using the backpropagation algorithm.
for epoch in range(num_epochs):
total_loss = 0
h = torch.zeros(num_layers, batch_size, hidden_size).to(device)
for i, (x, y) in enumerate(dataloader):
x = x.to(device)
y = y.to(device)
optimizer.zero_grad()
out, h = model(x, h)
loss = criterion(out.view(-1, len(char_to_index)), y.view(-1))
loss.backward()
optimizer.step()
total_loss += loss.item()
if (i+1) % 100 == 0:
print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch+1, num_epochs, i+1, len(dataset)//batch_size, total_loss / (i+1)))
After training is complete, we can use our trained model to generate text. We can generate text by providing a starting character and looping many times. Every time we generate text, we will feed the current character to the model and get a prediction for the next character
def generate_text(model, char_to_index, index_to_char, sequence_length, seed_text, num_chars):
with torch.no_grad():
h = torch.zeros(num_layers, 1, hidden_size).to(device)
seed = [char_to_index[char] for char in seed_text]
input = torch.tensor(seed).unsqueeze(1).to(device)
output = []
for i in range(num_chars):
out, h = model(input, h)
out = out[-1]
prob = nn.functional.softmax(out, dim=0).cpu().numpy()
index = np.random.choice(len(char_to_index), p=prob)
output.append(index_to_char[index])
input = torch.tensor([index]).unsqueeze(1).to(device)
return ''.join(output)
We can now train our model and generate text using the following code snippet.
# 定义超参数
sequence_length = 100
batch_size = 128
hidden_size = 256
num_layers = 2
num_epochs = 50
learning_rate = 0.001
# 加载数据集
with open('shakespeare.txt', 'r', encoding='utf-8') as f:
text = f.read()
text, char_to_index, index_to_char = preprocess(text)
dataset = TextDataset(text)
# 创建数据加载器
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
# 创建模型,损失函数和优化器
model = RNN(len(char_to_index), hidden_size, len(char_to_index)).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# 训练模型
for epoch in range(num_epochs):
total_loss = 0
h = torch.zeros(num_layers, batch_size, hidden_size).to(device)
for i, (x, y) in enumerate(dataloader):
x = x.to(device)
y = y.to(device)
optimizer.zero_grad()
out, h = model(x, h)
loss = criterion(out.view(-1, len(char_to_index)), y.view(-1))
loss.backward()
optimizer.step()
total_loss += loss.item()
if (i+1) % 100 == 0:
print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch+1, num_epochs, i+1, len(dataset)//batch_size, total_loss / (i+1)))
# 生成文本
generated_text = generate_text(model, char_to_index, index_to_char, sequence_length, 'shall i compare thee to a summer\'s day?\n', 1000)
print(generated_text)
This will train our recurrent neural network and use it to generate a piece of text that is 1000 characters long. You can experiment with different hyperparameters and training data to create different models and generate text.
summary
In this tutorial, we introduced the recurrent neural network (RNN) and its applications, implemented a basic character-level RNN model using PyTorch, and took text generation as an example. We first preprocess the text dataset and then define an RNN model. We train our model using the Cross Entropy loss function and the Adam optimizer, and verify that it works by generating some sample text.
In general, RNNs are very useful when dealing with sequence data, such as natural language processing, time series forecasting, etc. PyTorch provides developers with very convenient tools and interfaces to easily build and train deep learning models. At the same time, the training and inference process of the model can be accelerated by using GPU. Therefore, we hope this tutorial has provided you with a basic knowledge of PyTorch and RNNs, and inspired you to start trying more complex deep learning tasks.
Next, we describe how to train a recurrent neural network model, evaluate a recurrent neural network model, and apply a recurrent neural network model to text generation.
4. Training the recurrent neural network model
Our model has been defined, the next step is to train the model. We need to provide the model with training data, and then update the model parameters by calculating the loss and backpropagation. Before that, we need to set some training hyperparameters, such as learning rate, number of iterations, and so on.
learning_rate = 0.005
n_epochs = 2000
print_every = 100
plot_every = 10
hidden_size = 100
# Create RNN model and set loss and optimizer
rnn = RNN(n_characters, hidden_size, n_characters)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
# Initialize loss and time variables
current_loss = 0
all_losses = []
start_time = time.time()
Then we start training the recurrent neural network model:
for epoch in range(1, n_epochs + 1):
# Get input and target tensors from random batch
input_tensor, target_tensor = random_training_example(all_characters, n_characters,
data, chunk_len) # Initialize hidden state hidden = rnn.init_hidden()
# Zero the gradients
optimizer.zero_grad()
# Forward pass
loss, hidden = rnn(input_tensor, target_tensor, hidden)
# Backward pass
loss.backward()
# Update parameters
optimizer.step()
# Update loss
current_loss += loss.item()
# Print progress
if epoch % print_every == 0:
print(f'Epoch [{epoch}/{n_epochs}], Loss: {current_loss/print_every:.4f}')
current_loss = 0
# Save the model and the optimizer every 500 epochs
if epoch % 500 == 0:
torch.save(rnn.state_dict(), f'./models/rnn_{epoch}.pth')
torch.save(optimizer.state_dict(), f'./models/optimizer_{epoch}.pth')
# Record the average loss every `plot_every` epochs
if epoch % plot_every == 0:
all_losses.append(current_loss / plot_every)
current_loss = 0
During the training process, we first randomly obtain a batch of training samples from the dataset. Then, we initialize the hidden state, zero the gradient, calculate the loss through forward propagation, and then update the model parameters through backpropagation. Finally, we record the loss per epoch and visualize the loss after training is complete. During training, we can also save the state of the model and optimizer to restore the model later.
5. Evaluate Recurrent Neural Network Models
Once we have trained the model, we need to evaluate the performance of the model. We can use some indicators to measure the performance of the model, such as accuracy, precision, recall, F1 score, and so on. For text generation tasks, we usually use perplexity as an evaluation metric. Perplexity represents the predictive ability of the model for a given sequence, and the calculation method is:
$$\text{perplexity} = e^{-\frac{1}{N} \sum_{i=1}^{N} \log p(x_i)}$$
Among them, $N$ is the length of the sequence, and $p(x_i)$ is the predicted probability of the model for the $i$th character.
Here is a sample code to evaluate the model:
def evaluate(rnn, data, prime_str='A', predict_len=100, temperature=0.8):
# Initialize hidden state
hidden = rnn.init_hidden()
# Initialize input sequence
prime_input = char_tensor(prime_str)
# Generate prime sequence
for p in range(len(prime_str) - 1):
_, hidden = rnn(prime_input[p].unsqueeze(0), hidden)
# Initialize predicted sequence
predicted = prime_str
# Generate predicted sequence
for p in range(predict_len):
# Forward pass
output, hidden = rnn(prime_input[-1].unsqueeze(0), hidden)
# Sample from the network as a multinomial distribution
output_dist = output.data.view(-1).div(temperature).exp() top_i =
torch.multinomial(output_dist, 1)[0]
# Add predicted character to string and use as next input
predicted_char = all_characters[top_i]
predicted += predicted_char
prime_input = char_tensor(predicted_char)
# Calculate perplexity
perplexity = calc_perplexity(rnn, data)
return predicted, perplexity
def calc_perplexity(rnn, data):
# Set loss function
loss_fn = nn.CrossEntropyLoss()
# Initialize variables
loss = 0
n_chars = 0
# Iterate over all chunks in data
for chunk in data:
# Get input and target tensors
input_tensor = char_tensor(chunk[:-1])
target_tensor = char_tensor(chunk[1:])
# Initialize hidden state
hidden = rnn.init_hidden()
# Forward pass
output, hidden = rnn(input_tensor, target_tensor, hidden)
# Calculate loss
loss += loss_fn(output.view(-1, n_characters), target_tensor.view(-1)).item() * len(chunk)
# Update number of characters
n_chars += len(chunk) - 1
# Calculate perplexity
perplexity = np.exp(loss / n_chars)
return perplexity
We first define an evaluate function that takes a trained model, an initial string, a generated sequence length, and a temperature parameter. We use the initial string to generate a starting sequence and use the model to generate a sequence of length predict_len. We then compute the perplexity for that sequence and return the resulting sequence and perplexity.
The process of calculating perplexity is in the calc_perplexity function. We compute the loss for each sequence using the cross-entropy loss function, and then add up the loss and the number of characters. Finally, we divide the loss by the number of characters and use the exponential function to calculate the perplexity value.
Finally, we look at how recurrent neural networks can be applied to text generation tasks. For text generation tasks, our goal is to generate the next character or word based on a previous piece of text. We can use the trained recurrent neural network model to predict the probability distribution of the next character or word, and sample according to that distribution. Specifically, we can perform the following steps:
1. Given an initial string and a cyclic neural network model.
2. Use the initial string to generate a starting sequence, which is fed into the recurrent neural network model.
3. In the output of the model, a character with the highest probability is selected as the predicted next character, and this character is added to the end of the sequence.
4. Taking the predicted character as input, repeat step 3 until a sequence of the desired length is generated.
When generating text, we usually add a temperature parameter to control the diversity of the generated text. The higher the temperature parameter, the more random the generated text; the lower the temperature parameter, the more conservative the generated text. Here is a sample code that generates text:
def generate_text(rnn, prime_str='A', predict_len=100, temperature=0.8):
# Initialize hidden state
hidden = rnn.init_hidden()
# Initialize input sequence
prime_input = char_tensor(prime_str)
# Generate prime sequence
for p in range(len(prime_str) - 1):
_, hidden = rnn(prime_input[p].unsqueeze(0), hidden)
# Initialize predicted sequence
predicted = prime_str
# Generate predicted sequence
for p in range(predict_len):
# Forward pass
output, hidden = rnn(prime_input[-1].unsqueeze(0), hidden)
# Sample from the network as a multinomial distribution
output_dist = output.data.view(-1).div(temperature).exp() top_i =
torch.multinomial(output_dist, 1)[0]
# Add predicted character to string and use as next input
predicted_char = all_characters[top_i]
predicted += predicted_char
prime_input = char_tensor(predicted_char)
return predicted
We can use this function to generate text of specified length. We first use the initial string to generate a starting sequence, and use the model to generate a sequence of length predict_len. We then sample the resulting sequence with the temperature parameter and return the resulting text.
Along the way, we also need some helper functions to convert characters and words into numeric representations. Here are helper functions that convert characters to numeric representations:
def char_tensor(string):
tensor = torch.zeros(len(string)).long()
for c in range(len(string)):
tensor[c] = all_characters.index(string[c])
return tensor
This function takes a string as input and converts it to a tensor of integers whose length is the length of the string. Each element is the index of the corresponding character in the character table.
For word-level recurrent neural networks, we need a different helper function to convert words into numeric representations. Here are helper functions that convert words to numeric representations:
def word_tensor(word, word_to_ix):
tensor = torch.zeros(len(word)).long()
for w in range(len(word)):
tensor[w] = word_to_ix[word[w]]
return tensor
The function takes as input a word and the word table word_to_ix and converts the word into a tensor of integers whose length is word. Each element is the index of the corresponding word in the word list.
Next, we need some functions for training and evaluating recurrent neural network models. Here is the code for the training function:
def train(rnn, criterion, optimizer, train_data, n_epochs=200, print_every=100, plot_every=10):
# Initialize losses and perplexities
current_loss = 0
all_losses = []
current_perplexity = 0
all_perplexities = []
# Train loop
for epoch in range(1, n_epochs + 1):
# Initialize hidden state
hidden = rnn.init_hidden()
# Shuffle training data
random.shuffle(train_data)
# Train on each example
for i, (input_seq, target_seq) in enumerate(train_data):
# Clear gradients
optimizer.zero_grad()
# Convert input and target sequences to tensors
input_tensor = input_seq.unsqueeze(0)
target_tensor = target_seq.unsqueeze(0)
# Forward pass
output, hidden = rnn(input_tensor, hidden)
# Calculate loss and perplexity
loss = criterion(output.view(-1, n_categories), target_tensor.view(-1))
perplexity = torch.exp(loss)
# Backward pass and optimization step
loss.backward()
optimizer.step()
# Update current loss and perplexity
current_loss += loss.item()
current_perplexity += perplexity.item()
# Print progress
if (i + 1) % print_every == 0:
print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}, Perplexity: {:.4f}'
.format(epoch, n_epochs, i + 1, len(train_data),
current_loss / print_every, current_perplexity / print_every))
current_loss = 0
current_perplexity = 0
# Add loss and perplexity to list
all_losses.append(current_loss)
all_perplexities.append(current_perplexity)
# Plot losses and perplexities
if epoch % plot_every == 0:
plot_loss(all_losses, epoch)
plot_perplexity(all_perplexities, epoch)
This function accepts the cyclic neural network model rnn, the loss function criterion, the optimizer optimizer, the training data train_data, the number of training cycles n_epochs, every number of steps to print the progress print_every, and every number of cycles to draw the loss and perplexity chart plot_every as input.
During training, we first initialize the variables current_loss and current_perplexity of loss and perplexity, and then run the training loop. In each training epoch, we first initialize the hidden state hidden, and then randomly shuffle the training data. Next, we train each training sample, the specific steps are as follows:
1. Clear the gradient.
2. Convert the input sequence and target sequence into tensors.
3. Forward propagation, computing the output sequence and new hidden state.
4. Calculate loss and perplexity.
5. Back propagation and optimization.
6. Update the current loss and perplexity.
7. If the number of steps to print the progress is reached, print the progress and reset the current loss and perplexity.
After each training epoch, we add the current loss and perplexity to a list and plot the loss and perplexity every certain number of epochs.
Here is the code for the evaluation function:
def evaluate(rnn, criterion, data):
# Initialize loss and perplexity
total_loss = 0 total_perplexity = 0
# Loop through data
with torch.no_grad():
for input_seq, target_seq in data:
# Initialize hidden state
hidden = rnn.init_hidden()
# Convert input and target sequences to tensors
input_tensor = torch.tensor(input_seq, dtype=torch.long).view(-1, 1)
target_tensor = torch.tensor(target_seq, dtype=torch.long).view(-1, 1)
# Forward pass
output, hidden = rnn(input_tensor, hidden)
loss = criterion(output, target_tensor)
# Update total loss and perplexity
total_loss += loss.item()
total_perplexity += np.exp(loss.item())
# Calculate average loss and perplexity
avg_loss = total_loss / len(data)
avg_perplexity = total_perplexity / len(data)
return avg_loss, avg_perplexity
This function accepts the recurrent neural network model `rnn`, the loss function `criterion` and the test data `data` as input. During evaluation, we initialize the loss and perplexity variables `total_loss` and `total_perplexity`, and then loop over the test data. For each test sample, we first initialize the hidden state `hidden`, and then convert the input and target sequences into tensors. Next, we do a forward pass, computing the output sequence and new hidden state, and computing the loss and perplexity. Finally, we calculate the average loss and perplexity by dividing the total loss and perplexity by the number of test samples. Finally, let's look at how to use the trained recurrent neural network model for text generation. The following is the code of the text generation function: To be continued.