How to understand the input and output format of LSTM

1. Define the LSTM structure

bilstm = nn.LSTM(input_size=10, hidden_size=20, num_layers=2, bidirectional=True)

Define a two-layer bidirectional LSTM with an input size of 10 and a hidden size of 20.
Note: After the structure of LSTM is defined, the input_size, hidden_size, num_layers under the same program should be the same as here.

2. Input format

Official documentation:
Insert picture description here

input = torch.randn(5, 3, 10)#(seq_len, batch, input_size)

(1) If the data to be input is one-dimensional data, then:
seq_lenIndicates how much data is input for each batch
batchIndicates that the data is divided into batches
input_sizeAt this time it is 1
For example:
we have original data data = 1,2,3,4,5,6,7,8,9,10 a total of 10 samples, and then these data will be put into LSTM for processing, Before processing, we need to transform the data form. First, we set seq_len to 3, then the data form at this time is:
1-2-3, 2-3-4, 3-4-5, 4-5- 6, 5-6-7, 6-7-8, 7-8-9, 8-9-10, 9-10-0, 10-0-0 (the last two data are incomplete, zero padding is performed)

Then set batch_size to 2.
Then we take out the first batch as 1-2-3, 2-3-4. The size of this batch is (2, 3, 1). We feed this stuff into the model.
The next batch is 3-4-5, 4-5-6.
The third batch is 5-6-7, 6-7-8.
The fourth batch is 7-8-9, 8-9-10.
The fifth batch is 9-10-0, 10-0-0. A total of 5 batches were generated in our data.

(2) If the data to be input is two-dimensional data
seq_lenIndicates how much data is input for each batch
batchIndicates that the data is divided into batches
input_sizeThe length of the attribute vector representing each data

E.g:

data_ = [[1, 10, 11, 15, 9, 100],
         [2, 11, 12, 16, 9, 100],
         [3, 12, 13, 17, 9, 100],
         [4, 13, 14, 18, 9, 100],
         [5, 14, 15, 19, 9, 100],
         [6, 15, 16, 10, 9, 100],
         [7, 15, 16, 10, 9, 100],
         [8, 15, 16, 10, 9, 100],
         [9, 15, 16, 10, 9, 100],
         [10, 15, 16, 10, 9, 100]]

seq_len = 3, batch = 2, input_size = 6,
then our first batch is:

tensor([[[  1.,  10.,  11.,  15.,   9., 100.],
         [  2.,  11.,  12.,  16.,   9., 100.],
         [  3.,  12.,  13.,  17.,   9., 100.]],
 
        [[  2.,  11.,  12.,  16.,   9., 100.],
         [  3.,  12.,  13.,  17.,   9., 100.],
         [  4.,  13.,  14.,  18.,   9., 100.]]])

The last batch is:

tensor([[[  9.,  15.,  16.,  10.,   9., 100.],
         [ 10.,  15.,  16.,  10.,   9., 100.],
         [  0.,   0.,   0.,   0.,   0.,   0.]],
 
        [[ 10.,  15.,  16.,  10.,   9., 100.],
         [  0.,   0.,   0.,   0.,   0.,   0.],
         [  0.,   0.,   0.,   0.,   0.,   0.]]])

3. Output format

Official documentation:
Insert picture description hereNotes :
\bullet outputThe shape is (seq_len, batch, num_directions * hidden_size): This tensor contains the output characteristics (h_t) of each cycle of the last layer of LSTM. If it is a bidirectional LSTM, the output of each time step h = [h forward, h reverse] (the forward and reverse h of the same time step are connected)
\bullet h_nEach layer is saved with the output h of the last time step. If it is a bidirectional LSTM, the output h of the last time step of the forward and backward directions is saved separately.
\bullet c_nSame as h_n, except that it saves the value of c

Analysis :
\bullet outputIs a three-dimensional tensor, the first dimension represents the sequence length, the second dimension represents a batch of samples (batch), the third dimension is hidden_size (hidden layer size) * num_directions, where num_directions is based on whether it is "two-way" or not Is 1 or 2. Therefore, we can know that the size of the third dimension of output changes according to whether it is bidirectional. If it is not bidirectional, the third dimension is equal to the size of the hidden layer we define; if it is bidirectional, the size of the third dimension is equal to 2 times The size of the hidden layer.

\bullet h_nIs a three-dimensional tensor, the first dimension is num_layers num_directions, num_layers is the number of layers of the neural network we defined, num_directions has been introduced above, the value is 1 or 2, indicating whether it is a bidirectional LSTM. The second dimension represents the batch size of a batch. The third dimension represents the size of the hidden layer. The first dimension is where h_n is difficult to understand. First, we define the current LSTM as a unidirectional LSTM, then the size of the first dimension is num_layers, which represents the output of the last time step of the nth layer. If it is a bidirectional LSTM, the size of the first dimension is 2 * num_layers, at this time, the dimension still represents the output of the last time step of each layer, and the output of the last time step is used in the forward and backward operations. One of this dimension.
For example: we define a bidirectional LSTM with num_layers = 3, the size of the first dimension of h_n is equal to 6 (2
3), h_n [0] means the output of the last time step of the first layer forward propagation, h_n [1 ] Represents the output of the last time step of the first layer backward propagation, h_n [2] represents the output of the last time step of the second layer forward propagation, h_n [3] represents the output of the last time step of the second layer backward propagation , H_n [4] and h_n [5] represent the output of the last time step of the third layer forward and backward propagation respectively.

\bullet c_nThe structure is the same as that of h_n, so it will not be repeated here.

4. Understanding of some parameters

\bullet seq_lenThe input_size here will be used when describing a word or a data, so that the word or data can be more easily understood by the machine
\bullet batch Batch processing, here refers to updating the parameters after each batch of training. If the data is not divided into batches to update the data but update one by one, the calculation amount is too large and the time is too long. Eventually the error will be larger.

5. List together

input(seq_len,batch,input_size)
rnn = torch.nn.LSTM(input_size,hidden_size,num_layers)
h0(num_layers*num_directions,batch,hidden_size)
c0(num_layers*num_directions,batch,hidden_size)
output(seq_len,batch,num_direction*hidden_size)
hn(num_layers*num_directions,batch,hidden_size)
cn(num_layers*num_directions,batch,hidden_size)

LSTM in pytorch official documentation .
Understanding the input format Understanding the
output format

Published 41 original articles · praised 13 · visits 6692

Guess you like

Origin blog.csdn.net/comli_cn/article/details/105275827