Introduction to RNN
Recurrent Neural Network (RNN) is a type of sequence (sequence) data as input, in the evolution direction of the sequence A recursive neural network that performs recursion (recursion) and connects all nodes (cyclic units) in a chain >(recursive neural network)
RNN network structure
1. Generally use the following RNN expanded structure:
RNN Cell is essentially a linear layer, and the weights on this linear layer are shared of. Why is it said to be a linear layer? For example, if we input a 3-dimensional data, we can output a 5-dimensional data through the RNN Cell, so it can be regarded as a linear layer.
Now we assume that there is a piece of data with weather conditions for 5 days, and the attributes include temperature, air pressure and weather. Suppose we predict the data on day 6, we can use RNN to predict. There are also data such as some stocks with time series in the data that can be predicted using RNN.
date | temperature | air pressure | weather |
2022.2.15 | 15 | 22 | sunny |
2022.2.16 | 12 | 24 | cloudy day |
2022.2.17 | 11 | 22 | sunny |
2022.2.18 | 12 | 23 | sunny |
2022.2.19 | 12 | 21 | rain |
Let’s explain the meaning of parameters in RNN in detail:
Among them, x1, x2, x3, and x4 are inputs. If we take the above example, x1 is an entire row of data on February 15, 2022, including temperature, air pressure, and weather.
h0 is called hidden (hidden layer), and h0 is generally initialized to all 0 at the beginning.
After h0 and x1 pass through the RNN Cell, a h1 will be output. This h1 is also a hidden layer. After h1 is output, it is also sent to the RNN Cell where x2 is located. The purpose is to fuse the previous and later data together.
Let’s take a look at how RNN Cell works specifically and how it integrates x and h:
We can clearly see how ht is obtained from the picture below, where Wih and Whh are weights, bih and bhh are offsets, which should be familiar to those who have studied machine learning.
2. But sometimes the following structure is also used:
Because all RNN Cells used in the RNN network architecture are the same, it will have such a structure after looping.
How to write RNN in PyTorch
If we use RNN Cell to implement RNN, then we need to write a loop, where the inside of the loop can be written like this:
cell=torch.nn.RNNCell(input_size=input_size,hidden_size=hidden_size)
hidden1=cell(input,hidden)
Among them, input_size is the dimension of the input, and hidden_size is the dimension of the hidden layer. If the above input=x0, hidden=h0, then the final result obtained after passing the cell() function is h1.
The input size is (batch, input_size)
The hidden size is (batch, input_size)
The size of hidden1 is (batch,hidden_size)
If we use the RNN function directly, we don't have to write the loop ourselves.
The parameters of RNN are mainly as follows:
Let’s briefly introduce a few parameters:
input_size: Taking the weather data above as an example, there are several attributes in each sample, which are represented by several-dimensional vectors. The above data input_size=4
batchSize: indicates the size of batch processing, which can be customized
seq_len: For example, taking the above weather data as an example, there are several samples. This one has 5 samples, which is sql_len=5.
hidden_size: hidden_size here, you can think of it as the number of hidden nodes in the hidden layer
Generally, constructing data needs to satisfy:
input.shape=(batchSize,input_size)
output.shape=(batchSize,hidden_size)
dataset.shape=(seq_len,batchSize,input_size)
What do each return value of RNN represent?
The inputs represent x1, x2, x3....
The input hidden is h0
Then the cell() function will return two values, out represents the output of all hidden layers h1, h2, h3.... The output hidden is the last output hN
What data size needs to be met?
Task
Finally, we implement a string conversion task to convert the string hello into ohlol characters, as shown in the following figure:
First, we convert the string into a one-hot vector representation
Then calculate the cross entropy loss
Specific code implementation:
import torch
input_size=4
hidden_size=4#最后的输出
batch_size=1
num_layers=1
seq_len=5
idx2char=['e','h','l','o']
x_data=[1,0,2,2,3] #编码出hello对应的数值
y_data=[3,1,2,3,2]#编码出ohlol对应的数值
#定义独热向量表
one_hot_lookup=[
[1,0,0,0],
[0,1,0,0],
[0,0,1,0],
[0,0,0,1]
]
#将x_data中的每个数据取出来,并查one_hot_lookup这个表,转换成01组成的表
x_one_hot=[one_hot_lookup[x] for x in x_data]
inputs=torch.Tensor(x_one_hot).view(seq_len,batch_size,input_size)
labels=torch.LongTensor(y_data)
class Model(torch.nn.Module):
def __init__(self,input_size,hidden_size,batch_size,num_layers=1):
super(Model, self).__init__()
self.num_layers=num_layers
self.batch_size=batch_size
self.input_size=input_size
self.hidden_size=hidden_size
self.rnn=torch.nn.RNN(input_size=self.input_size,
hidden_size=self.hidden_size,
num_layers=self.num_layers)
def forward(self,input):
hidden=torch.zeros(self.num_layers,
self.batch_size,
self.hidden_size)
out,_=self.rnn(input,hidden)
return out.view(-1,self.hidden_size)
net=Model(input_size,hidden_size,batch_size,num_layers)
criterion=torch.nn.CrossEntropyLoss()
optimizer=torch.optim.Adam(net.parameters(),lr=0.1)#Adam是改进的随机梯度优化器
for epoch in range(15000):
optimizer.zero_grad()
outputs=net(inputs)
loss=criterion(outputs,labels)
loss.backward()
optimizer.step()
_,idx=outputs.max(dim=1)
idx=idx.data.numpy()
print('Predicted:',''.join(idx2char[x] for x in idx),end='')
print(',Epoch [%d/15] loss=%.4f'%(epoch+1,loss.item()))