How to build LSTM (pytorch version)

1. Unidirectional LSTM

0. Import package

import torch

1.rnn = torch.nn.LSTM(input_size,hidden_size,num_layers)

rnn = torch.nn.LSTM(10, 20, 2)  #(input_size,hidden_size,num_layers)

The first parameter input_size in parentheses is the dimension of the input vector, the second parameter hidden_size is the dimension of the hidden layer vector, and the third parameter num_layers represents the number of loop layers.

The third parameter num_layers is not easy to understand, you can understand it through the following figure:
Insert picture description here
generally num_layers defaults to 1, when num_layers is 2, it is like the picture above, the h_t output from the first layer is used as the input of the second layer, this It is expanded in space, and what we can often see is the expansion in time, as shown below (num_layers = 1 when expanded on the time axis):
Insert picture description here
Therefore, when we apply the back propagation algorithm to LSTM, both To calculate the propagation of error along time and the propagation of error along the layer, the ordinary fully connected neural network only needs to calculate the propagation of error along the layer when doing back propagation, as shown in the following figure,
Insert picture description here
you can use LSTM with fully connected neural The network makes an analogy:
Insert picture description herewe remove the specific connection and we can see that LSTM is very similar to the fully connected neural network. LSTM only expands the function of the ordinary hidden layer.

2.input = torch.randn(seq_len, batch, input_size)

input = torch.randn(5, 3, 10)#(seq_len, batch, input_size)

Generate a shape 5 3 10 5*3*10 tensor, tensor elements which are subject to a random number of standard normal distribution.
See the link for the popular understanding of tensors:link. The
second parameter batch indicates that the entire input sequence is divided into 3 small batches (that is, batching), and the first parameter seq_len indicates that each small batch requires several time steps for processing The third parameter input_size is the dimension of the input vector.
batch reference.
Why do you batch it here? It is more convenient when using gradient descent. It does not need to update the parameters every time the smallest unit of an input is processed, and it does not need to wait for all inputs to be processed before updating all parameters. Here you can refer to the small batch gradient descent.

3. Two initializations

h0 = torch.randn(2, 3, 20) #(num_layers,batch,output_size)
c0 = torch.randn(2, 3, 20) #(num_layers,batch,output_size)

4. Forward calculation

output, (hn, cn) = rnn(input, (h0, c0))

Put the input input, and the initial h0, c0 into rnn to perform forward calculation, calculate the output and hn, cn, of course, rnn here is already an instantiated LSTM (see the first title).

Insert picture description here

Two, two-way LSTM

import torch
rnn = torch.nn.LSTM(input_size=10, hidden_size=20, num_layers=2,bidirectional=True)#(input_size,hidden_size,num_layers)
input = torch.randn(5, 3, 10)#(seq_len, batch, input_size)
h0 = torch.randn(4, 3, 20) #(num_layers,batch,output_size)
c0 = torch.randn(4, 3, 20) #(num_layers,batch,output_size)
output, (hn, cn) = rnn(input, (h0, c0))

Just add bidirectional = True in the brackets of torch.nn.LSTM () to turn on bidirectional LSTM. The difference between bidirectional LSTM and unidirectional LSTM is that unidirectional LSTM can only use the values ​​of the previous steps when calculating the relevant values ​​of a time step, while bidirectional LSTM can use the values ​​of the previous steps and the latter steps. Value.

For example, when adding a sentence:
"I am going to __."
At this time, one-way LSTM can be used to speculate that a place name should be filled in the space. But if the residual sentence becomes:
"I want __ school."
This is based on the words "I want". It is difficult to deduce the contents of the space, but if you combine "school", you can accurately guess. This is the application scenario of bidirectional LSTM.

Insert picture description here LSTM input and output format .

Published 41 original articles · praised 13 · visits 6692

Guess you like

Origin blog.csdn.net/comli_cn/article/details/104523867