[Introduction to Artificial Intelligence] Simple introduction and application examples of RNN, LSTM, and GRU

Article directory

[Introduction to Artificial Intelligence] Simple introduction and application examples of RNN, LSTM, and GRU
1. Introduction to RNN
- 1.1 Concept introduction
- 1.2 Introduction to method use
2. Coding layer embedding
- 2.1 Parameters of embedding
- 2.2 Understanding of embedding
3. Several explanations on the input and output of the Linear layer and CrossEntropyLoss layer
4. LSTM, GRU
五. pack_padded_sequence
6. Examples

1. Introduction to RNN

1.1 Concept introduction

Recurrent Neural Network
Conceptually similar to CNN, both have the concept of weight sharing. CNN is a nuclear scanning space, and RNN is a nuclear scanning time.
Specifically, RNN is a bit like reusing linear layers.
RNN structure display:
Each time step will generate a hidden variable h _i , _which will be passed to the next time step as part of the input. h _i will save the information in the previous time step.

1.2 Introduction to method use

You can use the following combinations to build an RNN structure

RNN = torch.nn.RNN(input_size, hidden_size, num_layers)
outputs, hidden_n = RNN(inputs, hidden_0)

input_size can be understood as the dimension of word encoding, hidden_size is the dimension of latent variables, and num_layers is the number of stacked layers of RNN;

In order to input data at the same time step each time, the shape of the inputs is （seqlen，batch_size，input_size）;

hidden_0 refers to the initial hidden variable h ₀ , which is a priori data. If you don’t know, you might as well give it all 0. Its shape is (num_layers, batch_size, hidden_size);

Outputs are the hidden variables generated at all time steps, and their sizes are (seqlen, batch_size, hidden_size);

_{hidden_n is the hidden variable h n} of the last time step , and its size is (num_layers, batch_size, hidden_size).

2. Coding layer embedding

2.1 Parameters of embedding

nn.Embedding(num_embeddings,embedding_dim)

num_embeddings represents the size of the dictionary. For example, if there are a total of 5,000 words that may appear during training, then there will be num_embedding=5000;
embedding_dim represents the dimension of the embedding vector, that is, how many dimensions of a vector are used to represent a symbol.

2.2 Understanding of embedding

Embedding sounds fancy, but it is actually a lookup table that stores the embedding vectors of a fixed-size dictionary.
Inputting a number embedding returns the embedding vector corresponding to the number. The embedding vector reflects the semantic relationship between the symbols represented by each number.
The input is a list of numbers, and the output is a list of corresponding symbol embedding vectors.
The input of embedding can only be a number, and the number is represented by torch.LongTensor

3. Several explanations on the input and output of the Linear layer and CrossEntropyLoss layer

3.1 Linear layer

For the Linear layer, the input shape satisfies (N, $*$ , in_features), the output is (N, $*$ ，out_festures）。
Satisfies (N, $*$ , in_features) This form can be used for dimensional transformation, that is, only the last dimension is dimensional transformed, so the three-dimensional data of RNN can also be directly input into the linear layer.

3.2 CrossEntropyLoss layer

For the CrossEntropyLoss layer, ensure that the first input (Input) is two-dimensional data and has a shape of (N, C). In this context, N can be understood as the total number of words in all sentences, and C is class_num, which is the number of categories. Each The probability that an element corresponds to a certain word at a certain position;
The second input (target) is one-dimensional data with shape (N). Each element corresponds to the number of the word that should be at a certain position.

3.3 The significance of mentioning them

Understanding their input and output size relationships will help you understand the classification code:
Why can the output of RNN be directly input to the linear layer in the code, because the linear layer only transforms the last dimension;
Why should the output of the final Linear layer be compressed into two dimensions? This is because the CrossEntropyLoss layer requires this.

4. LSTM, GRU

As upgraded versions of RNN, LSTM and GRU have greatly improved their performance. Generally speaking, their ideas and effects are similar, but the calculation amount of GRU is smaller. It is recommended to use GRU.

4.1 LSTM

LSTM follows the following calculation formula:
Network structure diagram:
The input data is (inputs, h ₀ , c ₀ ), and the output data is (outputs, h _n , c _n ). The shape of each part can be referred to RNN. The shapes of h and c are consistent.

4.2 GRU

GRU follows the following calculation formula:
Network structure diagram:

Insert image description here

Use PyTorch to implement GRU and use torch.nn.GRU. The specific usage is almost the same as RNN. You can refer to RNN directly.

4.3 Bidirectional RNN, LSTM, GRU

The three of them are similar. Taking bidirectional RNN as an example, bidirectional means doing the forward and reverse steps once and then splicing them together.
At this time, the shape of the hidden variable h _n becomes (num_layers $*$ num_direction, batch_size, hidden_size), if you want to use it, remember to do splicing. The shape of the outputs is (seqlen, batch_size, hidden_size $*$ num_direction), which splices the output of each hidden layer. If bidirectional is used, num_direction=2, and the bidirectional parameter is used to determine whether bidirectional is used.
The hidden layer dimension has changed from the original hidden_size to hidden_size $*$ num_direction。

五. pack_padded_sequence

5.1 Introduction

When training RNN, if you want to perform batch training, you need to truncate and fill.
Because the sentences are of different lengths, in order to facilitate calculations, the long sentences need to be cut off to make up for the short ones.
pack_padded_sequenceWhat it does is compress the padding characters to speed up the calculation efficiency of RNN.

5.2 How to compress

Suppose there are six sentences in a batch, and these sentences need to be filled.
Here, the sentences are sorted in reverse order according to their length, pads are padding, and different colors represent words input at different time steps.

Insert image description here

How to compress, as shown in the figure above pack_padded_sequence, it flattens the sentences sorted above according to the time step, and uses a value at each time step to represent how much data there is in the current time step of the batch, as indicated by the green area on the right side of the figure above Within the time step, there are only 3 valid inputs, that is, the batch value at this time step is 3.
This method will return an PackedSequenceobject instance, which contains the compressed data dataand the batch size in each time step batch_sizes, such as the above tensor ([4,3,3,2,1,1]).
When compressing, you need to pass in the length of the actual sentence to facilitate the subsequent reverse operation. Note that the latest version of PyTorch no longer needs to sort the length of the sentence before passing it in.
PyTorch's RNN, LSTM, and GRU can all accept PackedSequenceand return new ones PackedSequence.

5.3 Decompression

Use the method to decompress pad_packed_sequencethe returned data .PackedSequence

6. Examples

Determine country based on name
Word/Word->Number->one_hot->embedding->GRU-> Classify based on the last hidden variable h _n
Model structure:

Insert image description here

specific code

import csv
import math
import gzip
import time
import torch
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torch.nn.utils.rnn import pack_padded_sequence


BATCH_SIZE = 256
N_EPOCHS = 100
N_CHARS = 128
HIDDEN_SIZE = 100
N_LAYER = 2
USE_GPU = True


# 准备数据
class NameDataset(Dataset):
    def __init__(self,is_train_set = True):
        
        # 读取数据源文件，根据数据文件的不同，读取方式也是各种各样的
        filename = 'names_train.csv.gz' if is_train_set else 'names_test.csv.gz'
        with gzip.open(filename,'rt') as f:
            reader = csv.reader(f)
            rows = list(reader)   
            # 转化后举例rows[1]=['Ajdrna', 'Czech'] Ajdrna为名字,Czech为国家
            
        # 把名字与国家保存在list里
        self.names = [row[0] for row in rows]
        self.len = len(self.names)
        self.countries = [row[1] for row in rows]
        
        # 把国别及其编号保存成字典
        self.country_list = list(sorted(set(self.countries)))  # 保存国名在list中
        # set(self.countries)是通过构建集合来去除重复的国名
        # sorted()是按照Ascall码顺序排序
        self.country_dict = self.getCountryDict()  # 将国名保存在字典里，名字做key，索引号为value
        self.country_num = len(self.country_list)  # 求出国家的数量
            
    def __getitem__(self,index):
        return self.names[index], self.country_dict[self.countries[index]]
            #  根据index在list中找到名字，根据index找到国名，然后在找到对应的索引号
    
    def __len__(self):
        return self.len   # 数据集的长度
    
    def getCountryDict(self):
        country_dict = dict()  # 构建空字典
        for idx, country_name in enumerate(self.country_list,0):# 0表示序号从0开始
            country_dict[country_name] = idx  # 把国家名和类别号,用键值对组合起来
        return country_dict
    
    def idx2country(self,index):
        return self.country_list[index]   # 根据国家的类别号找出国家的名
    
    def getCountriesNum(self):
        return self.country_num   # 获取国家的数量

    
trainset = NameDataset(is_train_set = True)
trainloader = DataLoader(trainset,batch_size=BATCH_SIZE,shuffle=True)

testset = NameDataset(is_train_set=False)
testloader = DataLoader(testset,batch_size=BATCH_SIZE,shuffle=False)

N_COUNTRY = trainset.getCountriesNum()   # 获取国家的数量


# 构建模型

def create_tensor(tensor):
    # 实际上就是把数据传送到指定设备上
    if USE_GPU:
        device = torch.device("cuda:0")
        tensor = tensor.to(device)
    return tensor

class RNNClassifier(torch.nn.Module):
    def __init__(self,input_size,hidden_size,output_size,n_layers=1,bidirectional=True):
        super(RNNClassifier,self).__init__()
        self.hidden_size = hidden_size
        self.n_layers = n_layers
        self.n_directions = 2 if bidirectional else 1
        
        self.embedding = torch.nn.Embedding(input_size,hidden_size)
        self.gru = torch.nn.GRU(hidden_size,hidden_size,n_layers,
                               bidirectional = bidirectional)
        self.fc = torch.nn.Linear(hidden_size*self.n_directions,output_size)
        
    def _init_hidden(self,batch_size):
        hidden = torch.zeros(self.n_layers*self.n_directions,
                            batch_size,self.hidden_size)
        return create_tensor(hidden)
    
    def forward(self,input,seq_lengths):
        input = input.t()
        # batch_size*seqlen --> seqlen*batch_size
        batch_size = input.size(1) # 保存batch_size用于生成初始化隐层
        
        hidden = self._init_hidden(batch_size)  # 生成初始化隐层
        embedding = self.embedding(input)  # 编码

        gru_input = pack_padded_sequence(embedding,seq_lengths)#压缩padd
        # 一种高效的处理方式， seq_lengths是每个句子的真实长度，其返回的是一个packedsequence类的对象
        
        output, hidden = self.gru(gru_input,hidden)
        if self.n_directions == 2:  # 双层的话要把隐变量拼接一下
            hidden_cat = torch.cat([hidden[-1],hidden[-2]],dim=1)
        else:
            hidden_cat = hidden[-1]
        fc_output = self.fc(hidden_cat)  # 做分类
        return fc_output

    
# 确定优化策略
criterion = torch.nn.CrossEntropyLoss()  
optimizer = torch.optim.Adam(classifier.parameters(),lr=0.001)
    

# 完善训练与测试代码 
def name2list(name):
    # 把词转化成Ascall码的组合
    arr = [ord(c) for c in name]
    # ord() 用于获取Ascall码值
    return arr,len(arr)  # 返回一个元组
    
def make_tensors(names,countries):
    sequences_and_lengths = [name2list(name) for name in names]
    name_sequences = [s1[0] for s1 in sequences_and_lengths]
    seq_lengths = torch.LongTensor([s1[1] for s1 in sequences_and_lengths])
    countries = countries.long()# .long()是转化成长整型
    
    # padding
    seq_tensor = torch.zeros(len(name_sequences),seq_lengths.max()).long()
    for idx,(seq,seq_len) in enumerate(zip(name_sequences,seq_lengths),0):
        seq_tensor[idx,:seq_len] = torch.LongTensor(seq)
        
    seq_lengths,perm_idx = seq_lengths.sort(dim=0,descending =True)
    seq_tensor = seq_tensor[perm_idx]
    countries = countries[perm_idx]
    
    return create_tensor(seq_tensor),create_tensor(seq_lengths),create_tensor(countries)

def trainModel():
    total_loss = 0
    for i,(name,countries) in enumerate(trainloader, 1):
        inputs, seq_lengths, target = make_tensors(name,countries) # 此时name和country都是python数据要把他们转换成张量
        
        output = classifier(inputs,seq_lengths.to("cpu"))
        
        loss = criterion(output,target)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        if i%10 == 0:
            print(f'[{
      
      time_since(start)}] Epoch{
      
      epoch}',end='')
            print(f'[{
      
      i*len(inputs)}/{
      
      len(trainset)}]',end='')
            print(f'loss={
      
      total_loss/(i*len(inputs))}')
    return total_loss


def testModel():
    correct = 0
    total = len(testset)
    print("evaluating trained model ...")
    with torch.no_grad():
        for i, (names,countries) in enumerate(testloader, 1):
            inputs, seq_lengths,target = make_tensors(names,countries)
            output = classifier(inputs,seq_lengths.to("cpu"))
            pred = output.max(dim=1,keepdim=True)[1]
            correct += pred.eq(target.view_as(pred)).sum().item()
            
        percent = '%.2f' %(100*correct/total)
        print(f'Test set: Accuracy {
      
      correct}/{
      
      total} {
      
      percent}%')
    return correct/total


# 主函数
# 把执行时间转化成多少分多少秒
def time_since(since):
    s = time.time() - since
    m = math.floor(s/60)
    s -= m * 60
    return '%dm %ds' %(m,s)


if __name__ == "__main__":
    classifier = RNNClassifier(N_CHARS, HIDDEN_SIZE, N_COUNTRY, N_LAYER)
    # 字符数 ， 隐层维度 ， 国家数量 ， GRU的层数
    if USE_GPU:  # 是否采用GPU
        device = torch.device("cuda:0")
        classifier.to(device)
    start = time.time() # 用于计算持续时间
    print("Training for %d epochs..."%N_EPOCHS)
    acc_list = []  # 记录每一轮的test数据集上的表现
    for epoch in range(1,N_EPOCHS+1):
        trainModel()
        acc = testModel()
        acc_list.append(acc)
    
    print(time_since(start))   # 打印出训练所需时长
    
    # 将表现情况，即准确率，可视化出来
    epoch = np.arange(1, len(acc_list) + 1 , 1)
    acc_list = np.array(acc_list)
    plt.plot(epoch, acc_list)
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.grid()
    plt.show()

[Introduction to Artificial Intelligence] Simple introduction and application examples of RNN, LSTM, and GRU

[Introduction to Artificial Intelligence] Simple introduction and application examples of RNN, LSTM, and GRU

Article directory

1. Introduction to RNN

1.1 Concept introduction

1.2 Introduction to method use

2. Coding layer embedding

2.1 Parameters of embedding

2.2 Understanding of embedding

3. Several explanations on the input and output of the Linear layer and CrossEntropyLoss layer

3.1 Linear layer

3.2 CrossEntropyLoss layer

3.3 The significance of mentioning them

4. LSTM, GRU

4.1 LSTM

4.2 GRU

4.3 Bidirectional RNN, LSTM, GRU

五. pack_padded_sequence

5.1 Introduction

5.2 How to compress

5.3 Decompression

6. Examples

Guess you like