lstm&bilstm input and output format (code attached)

Lstm

self.lstm = nn.LSTM(input_size=self.input_size,
                    hidden_size=self.hidden_size,
                    num_layers = self.num_layers,
                    batch_first=True,
                    bidirectional=False,
                    dropout=self.dropout
                    ) 
self.fc1 = nn.Linear(self.hidden_size,int(self.hidden_size/2))
self.fc2 = nn.Linear(int(self.hidden_size/2), 2)

The parameters of the one-way lstm model in the above figure represent:

  • input_size: the dimension of the word vector

  • hidden_size: The dimension of the output layer (also known as the hidden layer, because the output of the previous time step will become the parameter of the next time step, as shown below)
    Insert picture description here

  • num_layers: the number of layers of the network

  • batch_first: Make the number of words of the original sentence * batchsize * word vector dimension, and transform it into batchsize *…

  • bidirectional: Is it a two-way neural network

  • Dropout: Prevent overfitting, generally set to 0.5, but dropout is only required for training on the training set, that is, when running the model on the validation set and test set. Dropout should be set to 0

If the specifications of our input (input_embeded) are as follows:

input_embeded:
(batch_size * len_sen(句子长度,也就是句子有多少个单词) * input_size(单词的向量维度)

There are three outputs after lstm, namely:

output, (h_n, c_n) = self.lstm(input_embeded)

The respective specifications are

  • output:batch_size * len_sen * hidden_size
  • h_n: (1*self.num_layers(lstm的层数)) * batch_size * hidden_size
  • c_n: (1*self.num_layers) * batch_size * hidden_size

The output is easy to understand, it is all the output. Assuming that there is only one layer of lstm, then h_n is actually equal to output[-1], which means that h_n retains the output of the last time step, and c_n retains the memory cells of the last time step. Among them, h_n is obtained by c_n after passing through a tanh activation function and passing through an output gate.

In sentiment classification, we only need h_n. And continue to give some fully connected layers, and finally output, as shown in the following code:

output, (h_n, c_n) = self.lstm(input_embeded)  # h_n:最后一个时间步的输出(1*bi?2:1 * bs * hidden_size),c_n:最后一层的记忆细胞
out = h_n.squeeze(0)  # 去除第一维
out = F.relu(out)
out_fc1 = self.fc1(out)
out = F.relu(out_fc1)
out_fc2 = self.fc2(out)
return F.log_softmax(out_fc2,dim=-1)

The final output specification is: batch_size * 2, further compare with label.

BILSTMInsert picture description here

As can be seen from the above figure, the difference between bilstm and lstm is that bilstm passes forward and backward to finally get two outputs, which is h_n, and then concatenate them. The following steps are the same.

The BILSTM code is as follows

self.lstm = nn.LSTM(input_size=self.input_size,
                    hidden_size=self.hidden_size,
                    num_layers = self.num_layers,
                    batch_first=True,
                    bidirectional=self.bidirectional,
                    dropout=self.dropout
                    ) 
self.fc1 = nn.Linear(self.hidden_size * 2,self.hidden_size)
self.fc2 = nn.Linear(self.hidden_size, 2)                 

Assuming that the input is the same as the above LSTM, look at the output format:
first pass the lstm layer, and the output is also three, as follows:

output, (h_n, c_n) = self.lstm(input_embeded)

The current specification of h_n is:
(2*self.num_layers (number of lstm layers)) * batch_size * hidden_size

We need to splice h_n as shown in the initial figure:

out = torch.cat([h_n[-1, :, :], h_n[-2, :, :]], dim=-1)

The specification of out obtained in this step is:
bs * (2*hidden_size)

Get out, go through the fully connected layer, and output, the code is as follows:

out = F.relu(out)
out_fc1 = self.fc1(out)
out = F.relu(out_fc1)
out_fc2 = self.fc2(out)

Guess you like

Origin blog.csdn.net/jokerxsy/article/details/106673603