Attention is all you need articles in Transformer Positional Encoding code implementation and to explain

First, when it comes to paper because there is no use RNN also did not use CNN to extract feature, so the sentence is not a good use location information. So we need to add Positional Encoding after input embedding. Therefore, the paper proposes a Positional Encoding implementation, implementation code and posted the following explanation.

First, look at the way the paper presented, the position information pos is the word, dmodel for the word vector embedding dimension. 

The resulting vector magnitude ranges are between -1 and 1.

 

 

  

code show as below.

# N_position sentence is divided into words or characters in length, d_hid word vector for the dimension.
get_sinusoid_encoding_table DEF (n_position, d_hid, padding_idx = None):
'' 'position encoding SINUSOID Table' ''

DEF cal_angle (position, hid_idx):
return position / np.power (10000, 2 * (hid_idx // 2) / d_hid)

get_posi_angle_vec DEF (position):
return [cal_angle (position, hid_j) for hid_j in Range (d_hid)]

sinusoid_table np.array = ([get_posi_angle_vec (pos_i) for pos_i in Range (n_position)])

sinusoid_table [0 :: 2 :, ] = np.sin (sinusoid_table [:, 0 :: 2]) # dim 2i even-numbered sinusoidal
sinusoid_table [:, 1 :: 2] = np.cos (sinusoid_table [:, 1 :: 2]) # dim 2i + 1 odd cosine

IF padding_idx None Not IS:
# ZERO Dimension Vector for padding
sinusoid_table [padding_idx] = 0.

return torch.FloatTensor (sinusoid_table) # n_position × d_hid obtained position vector of each word

description link: https: //blog.csdn.net/qq_33278884/article/details/88868808

Guess you like

Origin www.cnblogs.com/wisir/p/12461641.html