Seq2Seq PyTorch translation model based on detailed notes presentation (a)

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: HTTPS: //blog.csdn.net/qysh123/article/details/91245246
Seq2Seq is the mainstream model of deep learning translation, translation in natural language, knowledge and even cross-modal mapping aspects of good results. In software engineering, in recent years it has been widely used, for example:

Jiang, Siyuan, Ameer Armaly, and Collin McMillan. "Automatically generating commit messages from diffs using neural machine translation." In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, pp. 135-146. IEEE Press, 2017.

Hu, Xing, Ge Li, Xin Xia, David Lo, and Zhi Jin. "Deep code comment generation." In Proceedings of the 26th Conference on Program Comprehension, pp. 200-210. ACM, 2018.

Here I combined sample code PyTorch given Seq2Seq to briefly summarize the details of when this model implementation and PyTorch corresponding API. PyTorch on its website there Tutorial: https: //pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html, which corresponds to the GitHub link is: https: //github.com/pytorch/tutorials/blob/master/intermediate_source/seq2seq_translation_tutorial .py. Here to this code example to summarize:

Gives the corresponding data in the link above to download the official website link: https: //download.pytorch.org/tutorial/data.zip, in addition, in fact, a lot of online tutorials are also above the official translation of the tutorial, I reference to some, including:

https://www.cnblogs.com/HolyShine/p/9850822.html

https://www.cnblogs.com/www-caiyin-com/p/10123346.html

http://www.pianshen.com/article/5376154542/

So we can be the basis of these tutorials, I just make some additions and explain their basis, it does not give a full explanation as above tutorial sessions, but I think summarize some important elements. First, initialize encoding these do not sum up, we take a look at the existing tutorial on understanding. Summary From the beginning Encoder:

EncoderRNN class (nn.Module):
DEF __init __ (Self, input_size, hidden_size):
Super (EncoderRNN, Self) .__ the init __ () # of property is inherited from the parent class is initialized.
= hidden_size self.hidden_size

self.embedding = nn.Embedding (input_size, hidden_size) # input to do the initial Embedding.
= nn.GRU self.gru (hidden_size, hidden_size) #Applies A Multilayer GateD Recurrent Unit (GRU) to RNN AN INPUT Sequence.

DEF Forward (Self, INPUT, hidden):
Embedded self.embedding = (INPUT) .view (. 1 , 1, -1) #view actually existing tensor transformation method.
Embedded = Output
Output, self.gru hidden = (Output, hidden)
return Output, hidden

DEF initHidden (Self):
return torch.zeros (. 1,. 1, self.hidden_size, Device = Device) # initialization generating (1,1,256 ) all zero-dimensional Tensor.
Although only a few lines, some may still need the discussion: nn.Embedding initial embedding is, of course, this embedding is completely random, not by training or practical sense, I think that even some articles online no clear (e.g. error is explained here: https: //my.oschina.net/earnp/blog/1113896), specifically with reference to the discussion here: https: //blog.csdn.net/qq_36097393/article / details / 88567942. Meaning of its parameters can refer to this explanation: nn.Embedding (2, 5), where there are two words represent 2, 5 for 5 dimension, in fact, is a 2x5 matrix, so if you have 1000 words each 100 word hope is that dimension, you can build such a word embedding, nn.Embedding (1000, 100). You can also run the sample code below I summarize:

Torch Import
Import torch.nn AS NN

word_to_ix = { 'Hello': 0, 'World':. 1}
Embeds nn.Embedding = (2,5)
hello_idx torch.LongTensor = ([word_to_ix [ 'Hello']])
world_idx = torch.LongTensor ([word_to_ix [ 'world']])
hello_embed = Embeds (hello_idx)
Print (hello_embed)
world_embed = Embeds (world_idx)
Print (world_embed)
the specific meaning I believe we see at a glance, you can try to run it (every time the print results are not the same, and also lacks practical implications).

The other is .view (1, 1, -1) meaning, to be honest I did not understand before, in fact, someone on stackoverflow already discussed this issue:

https://stackoverflow.com/questions/42479902/how-does-the-view-method-work-in-pytorch

We look to know, I have here the example given above provide some others:

Torch Import
A torch.range = (. 1, 16)
Print (A)
A = a.view (. 4,. 4)
Print (A)
Encoder simply summarizes these. The following summary directly into a decoder with attention mechanism (To aid understanding, the following added some notes indicating the latitude every step of the Tensor, I personally think it can be easy to understand):

class AttnDecoderRNN(nn.Module):
def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):#MAX_LENGTH在翻译任务中定义为10
super(AttnDecoderRNN, self).__init__()
self.hidden_size = hidden_size
self.output_size = output_size#这里的output_size是output_lang.n_words
self.dropout_p = dropout_p#dropout的比例。
self.max_length = max_length

self.embedding = nn.Embedding(self.output_size, self.hidden_size)
self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)#按照维度要求,进行线性变换。
self.dropout = nn.Dropout(self.dropout_p)
self.gru = nn.GRU(self.hidden_size, self.hidden_size)
self.out = nn.Linear(self.hidden_size, self.output_size)

def forward(self, input, hidden, encoder_outputs):

print(input)
print('size of input: '+str(input.size()))
print('size of self.embedding(input): '+str(self.embedding(input).size()))

embedded = self.embedding(input).view(1, 1, -1)
print('size of embedded: '+str(embedded.size()))

embedded = self.dropout(embedded)
print('size of embedded[0]: '+str(embedded[0].size()))
print('size of torch.cat((embedded[0], hidden[0]), 1): '+str(torch.cat((embedded[0], hidden[0]), 1).size()))
print('size of self.attn(torch.cat((embedded[0], hidden[0]), 1)): '+str(self.attn(torch.cat((embedded[0], hidden[0]), 1)).size()))

#Size of embedded: [1,1,256]
Of Embedded #size [0]: [1,256]
#size of size of torch.cat ((Embedded [0], hidden [0]),. 1): [1,512]

# learned here equivalent weights out attention
# Note that the torch is to concatenate function torch.cat, splicing on the existing dimensions, according to the wording of the code, that is, on the second splicing latitude.
# The stack is built on a new dimension, and then spliced on the latitude.
= F.softmax attn_weights (
self.attn (torch.cat ((Embedded [0], hidden [0]),. 1)), = Dim. 1) where # represents the F.softmax torch.nn.functional.softmax

attn_weights of #size: [1,10]
#size of attn_weights.unsqueeze (0): [1,1,10]
#size of encoder_outputs: [10,256]
#size of encoder_outputs.unsqueeze (0): [1,10,256]

#unsqueeze explanation is new new Tensor a with a Returns Dimension One inserted the size of AT the specified position.
attn_applied = torch.bmm (attn_weights.unsqueeze (0),
encoder_outputs.unsqueeze(0))#bmm本质上来讲是个批量的矩阵乘操作。

#Size of attn_applied: [1,1,256]
output = torch.cat((embedded[0], attn_applied[0]), 1)
#Size of output here is: [1,512]
print('size of output (at this location): '+str(output.size()))
output = self.attn_combine(output).unsqueeze(0)
#Size of output here is: [1,1,256]
#print(output)
output = F.relu(output)#rectified linear unit function element-wise:
#print(output)
output, hidden = self.gru(output, hidden)
output = F.log_softmax(self.out(output[0]), dim=1)
print('')
print('------------')
return output, hidden, attn_weights

def initHidden(self):
return torch.zeros (1, 1, self.hidden_size , device = device)
first, dropout, dropout can first take a look on PyTorch official explanation:

https://pytorch.org/docs/stable/nn.html?highlight=nn%20dropout#torch.nn.Dropout

Simply put, During training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution, a friend gives a very detailed discussion and explanation:

https://blog.csdn.net/stdcoutzyx/article/details/49022443

Secondly, it should be noted nn.Linear about the meaning and role, or give an explanation official website: Applies a linear transformation to the incoming data, similarly, I can refer to the sample code given below:

Torch Import
Import torch.nn AS NN
m = nn.Linear (2,. 3)
INPUT = torch.randn (2, 2)
Print (INPUT)
Output = m (INPUT)
Print (Output)
Next, explanation torch.bmm. According to explain PyTorch official website, https: //pytorch.org/docs/stable/torch.html highlight = torch% 20bmm # torch.bmm?

torch.bmm the role is: Performs a batch matrix-matrix product of matrices stored in batch1 and batch2, this explanation is too abstract, in fact, by a very good example to understand it is in fact a bulk matrix multiplication:

Torch Import
BATCH1 torch.randn = (2,3,4)
Print (BATCH1)
batch2 torch.randn = (2,4,5)
Print (batch2)
RES = torch.bmm (BATCH1, batch2)
Print (RES)
specific multiplication rule is: If batch1 is a (b × n × m) tensor, batch2 is a (b × m × p) tensor, out will be a (b × n × p) tensor.

About torch.cat, or to the official website gives examples of PyTorch to do a simple explanation:

Concatenates the given sequence of seq tensors in the given dimension. 例子如下:

Torch Import
X = torch.randn (2,3)
Print (X)
Print (torch.cat ((X, X, X), 0))
Print (torch.cat ((X, X, X),. 1))
here first summarize here, we will continue to sum up a blog in the next.
----------------
Disclaimer: This article is CSDN blogger original article "cricket cricket", and follow CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source and this link statement.
Original link: https: //blog.csdn.net/qysh123/article/details/91245246

Guess you like

Origin www.cnblogs.com/jfdwd/p/11454585.html