Transformer basic code implementation (input part implementation)

    The input part is mainly divided into two parts: "Text Embedded Layer" (Embedings) and "Positional Encoding".

1.
    The purpose of this layer of text embedding layer is to convert the digital representation of text vocabulary into vector representation .

class Embeddings(nn.Module):
    def __init__(self,d_model,vocab):
        super(Embeddings,self).__init__()
        self.lut = nn.Embedding(vocab,d_model)
        self.d_model = d_model

    def forward(self,x):
        return self.lut(x)*math.sqrt(self.d_model)

    d_model represents the embedding dimension of a word, that is, how many dimensional vectors a word needs to represent.
    vocab represents the total number of words in the vocabulary.
    Directly use the built-in Embedding module to process the data, and finally use d_model for scaling.
    Test code:

d_model = 512
vocab = 1000
x=Variable(torch.LongTensor([[100,2,29,165],[1,6,8,7]]))

emb=Embeddings(d_model,vocab)
embr=emb(x)

insert image description here

    2. The vectorized text of the position encoder has no position information. Using the position encoder, the vector can reflect the position information.

class PositionalEncoding(nn.Module):
    def __init__(self,d_model,dropout,max_len=5000):
        super(PositionalEncoding,self).__init__()

        self.dropout = nn.Dropout(p=dropout)

        pe = torch.zeros(max_len,d_model)

        position = torch.arange(0,max_len).unsqueeze(1)

        div_term = torch.exp(torch.arange(0,d_model,2)*(-math.log(10000.00)/d_model))

        pe[:,0::2]=torch.sin(position*div_term)

        pe[:,1::2]=torch.cos(position*div_term)

        pe=pe.unsqueeze(0)

        self.register_buffer('pe',pe)

    def forward(self,x):
        x=x+Variable(self.pe[:,:x.size(1)],requires_grad = False)
        return self.dropout(x)

    Position information first requires a position encoding matrix and an absolute position matrix.
    The position encoding matrix is ​​first set to 0, and its size is (sentence length, word dimension). The size of the absolute position matrix is ​​(0, sentence length) and its value is a continuous natural number. Use the unsqueeze method to widen the dimension of the matrix so that it becomes a (sentence length, 1) matrix.
    We want to change the absolute position matrix to the size of (sentence length, word dimension), which requires a transformation matrix div_term. Jump is used when div_term is initialized, which is to initialize according to odd and even numbers, so that sin and cos can be used to distinguish the position.
    After this processing, we get a two-dimensional matrix; to output it with embedding, we need to increase the dimension of the matrix, and finally register the code as a model buffer, because this matrix has no parameters and does not need subsequent updates. From the beginning It can be fixed.
    Some processing of the data is also required to slice the second dimension of this three-digit tensor, which is the dimension of the maximum length of the sentence, to be the same as the second dimension of the input X. Also adapt to sentence length, etc.
insert image description here

Guess you like

Origin blog.csdn.net/daweq/article/details/129803994