How to train LSTM

0. Success and failure

After learning the first two articles on how to build LSTM and how to understand the input and output formats of LSTM , we started to learn how to train LSTM on this basis.

1. Define the structure of LSTM

rnn = torch.nn.LSTM(30, 1, 2, batch_first=True)  #(input_size,hidden_size,num_layers)
input = train_y  #(seq_len, batch, input_size)
h0 = torch.randn(2, 193, 1) #(num_layers,batch,hidden_size)
c0 = torch.randn(2, 193, 1) #(num_layers,batch,hidden_size)

\bullet rnn = torch.nn.LSTM(30, 1, 2, batch_first=True)
Several parameters of this LSTM are defined here:
the size of the input data is 30, that isinput_size=10That is to say, a data vector with a size of 30 is input every cycle, and the size of the
hidden layer is 1, iehidden_size=1
The number of LSTM layers is 2, namelynum_layers=2
The input shape was originally (seq_len, batch, input_size), when setbatch_first=TrueWhen the input shape becomes (batch, seq_len, input_size), the output shape also changes from (seq_len, batch, num_directions hidden_size) to (batch, seq_len, num_directions hidden_size)

\bullet input = train_y #(seq_len, batch, input_size)
Sort the input tensor according to the shape of ((seq_len, batch, input_size)) and assign it to input,
hereseq_len * batch * input_size = total number of input elements
Next, we will analyze it based on the two input situations you will encounter when using it:
(1) If the original input is one-dimensional data, for example, when predicting stocks, the original input is the daily maximum stock price changes with the number of days , The original input at this time is one-dimensional.
At this time, the first step will generally turn this one-dimensional data into a two-dimensional data, for example: the
most original one-dimensional data:
[10.52 14.62 5.48 9.35 3.91 9.35 3.91 14.62 14.62 3.91
7.05 5.48 3.91 7.05 10.52 10.52 14.62 10.52 7.05 14.62 ]

Let each 10 be grouped into a two-dimensional tensor:
Insert picture description here
after this transformation, each row of data corresponds to a target value, that is to say, this LSTM will learn a target value based on 10 consecutive sequence values.
However, the input value of LSTM requires a three-dimensional tensor (seq_len, batch, input_size). At this time, we also need to reconstruct the above two-dimensional tensor to make it a three-dimensional tensor. (Reconstruction method refers to tensor operation )
Here we makeseq_len=11input_size=10; Batch operation means that we perform several forward calculations and then reverse calculations to update the weights. Take the above data, because 20 data becomes 11 rows after becoming two-dimensional data, so if you batch operation , For example, we choosebatch=3, Then the weights will be updated every three lines of the above two-dimensional tensor, and the last two lines will be calculated and then the weights will be updated, but because of the previousseq_len=11input_size=10and sobatch=1(For a deeper understanding, please see how to understand the input and output format of LSTM )

(2) If the original input is two-dimensional data, for example, when performing part-of-speech recognition on a word in a sentence, we will use an attribute vector to describe a word, so the most original input sentence is Into a two-dimensional matrix.
For example: the
most original data

data_ = [[1, 10, 11, 15, 9, 100],
         [2, 11, 12, 16, 9, 100],
         [3, 12, 13, 17, 9, 100],
         [4, 13, 14, 18, 9, 100],
         [5, 14, 15, 19, 9, 100],
         [6, 15, 16, 10, 9, 100],
         [7, 15, 16, 10, 9, 100],
         [8, 15, 16, 10, 9, 100],
         [9, 15, 16, 10, 9, 100],
         [10, 15, 16, 10, 9, 100]]

Each line here represents a word, the same needs to be reconstructed into 3 as a tensor, at this time we can set the input structure value, such asseq_len=2,batch=5,input_size=6When
our first batch is:

tensor([[  1.,  10.,  11.,  15.,   9., 100.],
        [  2.,  11.,  12.,  16.,   9., 100.]]

The last batch is:

tensor([[  9.,  15.,  16.,  10.,   9., 100.],
        [ 10.,  15.,  16.,  10.,   9., 100.]])

among thembatch=2Batch operation means that we perform several forward calculations and then reverse calculations to update the weights

\bullet h0 = torch.randn(2, 193, 1) #(num_layers,batch,hidden_size)
Herehidden_sizeAccording to what you wantoutputShape, for example, I want the output to be a one-dimensional vector, thenhidden_size=1, If you want the output to be a two-dimensional matrix, thenhidden_sizeIs the size of the second dimension of this matrix, for example, if the output is 3 * 5, thenhidden_size=5
\bullet c0 = torch.randn(2, 193, 1) #(num_layers,batch,hidden_size)

c0versush0Is exactly the same

2. Forward calculation

output, (hn, cn) = rnn(input, (h0, c0))

3. Choose the optimizer and loss function

optimizer = torch.optim.Adam(rnn.parameters(), lr=LR)  # optimize all cnn parameters
loss_func = nn.MSELoss()

4. Multiple forward and backward calculations to update the parameters

for step in range(EPOCH):
	output,(hn, cn)= rnn(input,(h0, c0))
	loss = loss_func(output, train_target_nom)
	optimizer.zero_grad()  # clear gradients for this training step
	loss.backward()  # back propagation, compute gradients
	optimizer.step()

5. Convert the output to the desired form

Because output is a three-dimensional tensor, but we don't necessarily need this type of result in the end, so we have to transform it.
For example, it becomes a two-dimensional output:

output=output.view(193,-1)

Reference 1
Reference 2

Published 41 original articles · praised 13 · visits 6692

Guess you like

Origin blog.csdn.net/comli_cn/article/details/105299975