0. Success and failure
After learning the first two articles on how to build LSTM and how to understand the input and output formats of LSTM , we started to learn how to train LSTM on this basis.
1. Define the structure of LSTM
rnn = torch.nn.LSTM(30, 1, 2, batch_first=True) #(input_size,hidden_size,num_layers)
input = train_y #(seq_len, batch, input_size)
h0 = torch.randn(2, 193, 1) #(num_layers,batch,hidden_size)
c0 = torch.randn(2, 193, 1) #(num_layers,batch,hidden_size)
rnn = torch.nn.LSTM(30, 1, 2, batch_first=True)
Several parameters of this LSTM are defined here:
the size of the input data is 30, that isinput_size=10That is to say, a data vector with a size of 30 is input every cycle, and the size of the
hidden layer is 1, iehidden_size=1
The number of LSTM layers is 2, namelynum_layers=2
The input shape was originally (seq_len, batch, input_size), when setbatch_first=TrueWhen the input shape becomes (batch, seq_len, input_size), the output shape also changes from (seq_len, batch, num_directions hidden_size) to (batch, seq_len, num_directions hidden_size)
input = train_y #(seq_len, batch, input_size)
Sort the input tensor according to the shape of ((seq_len, batch, input_size)) and assign it to input,
hereseq_len * batch * input_size = total number of input elements
Next, we will analyze it based on the two input situations you will encounter when using it:
(1) If the original input is one-dimensional data, for example, when predicting stocks, the original input is the daily maximum stock price changes with the number of days , The original input at this time is one-dimensional.
At this time, the first step will generally turn this one-dimensional data into a two-dimensional data, for example: the
most original one-dimensional data:
[10.52 14.62 5.48 9.35 3.91 9.35 3.91 14.62 14.62 3.91
7.05 5.48 3.91 7.05 10.52 10.52 14.62 10.52 7.05 14.62 ]
Let each 10 be grouped into a two-dimensional tensor:
after this transformation, each row of data corresponds to a target value, that is to say, this LSTM will learn a target value based on 10 consecutive sequence values.
However, the input value of LSTM requires a three-dimensional tensor (seq_len, batch, input_size). At this time, we also need to reconstruct the above two-dimensional tensor to make it a three-dimensional tensor. (Reconstruction method refers to tensor operation )
Here we makeseq_len=11;input_size=10; Batch operation means that we perform several forward calculations and then reverse calculations to update the weights. Take the above data, because 20 data becomes 11 rows after becoming two-dimensional data, so if you batch operation , For example, we choosebatch=3, Then the weights will be updated every three lines of the above two-dimensional tensor, and the last two lines will be calculated and then the weights will be updated, but because of the previousseq_len=11,input_size=10and sobatch=1(For a deeper understanding, please see how to understand the input and output format of LSTM )
(2) If the original input is two-dimensional data, for example, when performing part-of-speech recognition on a word in a sentence, we will use an attribute vector to describe a word, so the most original input sentence is Into a two-dimensional matrix.
For example: the
most original data
data_ = [[1, 10, 11, 15, 9, 100],
[2, 11, 12, 16, 9, 100],
[3, 12, 13, 17, 9, 100],
[4, 13, 14, 18, 9, 100],
[5, 14, 15, 19, 9, 100],
[6, 15, 16, 10, 9, 100],
[7, 15, 16, 10, 9, 100],
[8, 15, 16, 10, 9, 100],
[9, 15, 16, 10, 9, 100],
[10, 15, 16, 10, 9, 100]]
Each line here represents a word, the same needs to be reconstructed into 3 as a tensor, at this time we can set the input structure value, such asseq_len=2,batch=5,input_size=6When
our first batch is:
tensor([[ 1., 10., 11., 15., 9., 100.],
[ 2., 11., 12., 16., 9., 100.]]
The last batch is:
tensor([[ 9., 15., 16., 10., 9., 100.],
[ 10., 15., 16., 10., 9., 100.]])
among thembatch=2Batch operation means that we perform several forward calculations and then reverse calculations to update the weights
h0 = torch.randn(2, 193, 1) #(num_layers,batch,hidden_size)
Herehidden_sizeAccording to what you wantoutputShape, for example, I want the output to be a one-dimensional vector, thenhidden_size=1, If you want the output to be a two-dimensional matrix, thenhidden_sizeIs the size of the second dimension of this matrix, for example, if the output is 3 * 5, thenhidden_size=5。
c0 = torch.randn(2, 193, 1) #(num_layers,batch,hidden_size)
c0versush0Is exactly the same
2. Forward calculation
output, (hn, cn) = rnn(input, (h0, c0))
3. Choose the optimizer and loss function
optimizer = torch.optim.Adam(rnn.parameters(), lr=LR) # optimize all cnn parameters
loss_func = nn.MSELoss()
4. Multiple forward and backward calculations to update the parameters
for step in range(EPOCH):
output,(hn, cn)= rnn(input,(h0, c0))
loss = loss_func(output, train_target_nom)
optimizer.zero_grad() # clear gradients for this training step
loss.backward() # back propagation, compute gradients
optimizer.step()
5. Convert the output to the desired form
Because output is a three-dimensional tensor, but we don't necessarily need this type of result in the end, so we have to transform it.
For example, it becomes a two-dimensional output:
output=output.view(193,-1)