pytorch nn.LSTM () Parameter Description

Input Data Format:
INPUT (seq_len, BATCH, input_size)
H0 (num_layers * num_directions, BATCH, hidden_size)
cO (num_layers * num_directions, BATCH, hidden_size)

输出数据格式:
output(seq_len, batch, hidden_size * num_directions)
hn(num_layers * num_directions, batch, hidden_size)
cn(num_layers * num_directions, batch, hidden_size)

import torch
import torch.nn as nn
from torch.autograd import Variable

Building Network Model # --- input matrix wherein the number input_size, wherein the number of output matrix hidden_size, layers num_layers
Inputs = torch.randn (5,3,10) -> (seq_len, the batch_size, input_size)
RNN = nn.LSTM (10 , 20,2) -> (input_size, hidden_size, num_layers)
H0 = torch.randn (2,3,20) -> (* num_layers. 1, the batch_size, hidden_size)
cO = torch.randn (2,3,20) - > (* num_layers. 1, the batch_size, hidden_size)
num_directions =. 1 is unidirectional because LSTM
'' '
the Outputs: Output, (H_n, C_N)
' ''
Output, (HN, CN) = RNN (Inputs, (H0, cO) )
. 1
2
. 3
. 4
. 5
. 6
. 7
. 8
. 9
10
. 11
12 is
13 is
14
batch_first: a first output whether the input dimension batch_size, the default value is False. Because Torch, people used to use with in the dataset Torch, dataloader continuous input data to the neural network model, there is there is a batch_size parameter indicates how many data once entered. LSTM in the model, the input data must be a batch of data, in order to distinguish bulk data LSTM of bulk data and dataloader the meaning is the same, LSTM models will be distinguished by this parameter setting. If the meaning is the same, it is set to True, if different meaning, is set to False. torch.LSTM batch_size dimension in a second dimension on the default, the parameter settings can therefore be placed batch_size first dimension. Such as: input the default is (4,1,5), is the middle of the batch_size 1, is designated after batch_first = True (1,4,5). So, if your input data is two-dimensional data, then it should be batch_first to True;

inputs = torch.randn (5,3,10): seq_len = 5, bitch_size = 3, input_size = 10
I appreciated: There are three sentences, each sentence five words, each word is represented by a vector of 10 dimensions; the length of the sentence is not the same, so seq_len short or long, can be resolved and this is LSTM special about the length of the sequence. Only seq_len this parameter is variable.
Detailed hn and cn on some parameters look at this
in the case of experience inconsistent length of text, the text within the input data before the model feature works the same batch will be aligned with its length padding. But when the unidirectional aligned data LSTM even two-way LSTM have a problem, LSTM will deal with a lot of meaningless padding character, so the model will have a certain bias, this time you need to use function torch.nn.utils.rnn .pack_padded_sequence () and torch.nn.utils.rnn.pad_packed_sequence ()
For more details, see explanation here

BiLSTM
BILSTM bidirectional LSTM; bonding of the forward and backward LSTM LSTM into LSTM. View example as follows:


LSTM structure deduced:


More detailed formula derivation https://blog.csdn.net/songhk0209/article/details/71134698

GRU formula derivation :( online map looked a bit strenuous, drew on their own data flow diagram)


---------------------
Author: sunny fight crossing
Source: CSDN
Original: https: //blog.csdn.net/yangyang_yangqi/article/details/84585998
copyright notice : This article is a blogger original article, reproduced, please attach Bowen link!

Guess you like

Origin www.cnblogs.com/jfdwd/p/11184846.html