Pytorch implements the transformer model

Transformer is a powerful neural network architecture that can be used to process sequence data, such as natural language processing tasks. In PyTorch, Transformer models can be easily implemented using the torch.nn.Transformer class.
The following is a sample code of a simple Transformer model implementation, which converts an input sequence into an output sequence, which can be used for sequence-to-sequence translation tasks: the sample code is as follows
:

import torch
import torch.nn as nn
import torch.nn.functional as F
import math

class PositionalEncoding(nn.Module):
    def __init__(self, d_model, dropout=0.1, max_len=5000):
        super().__init__()
        self.dropout = nn.Dropout(p=dropout)

        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0).transpose(0, 1)
        self.register_buffer('pe', pe)

    def forward(self, x):
        x = x + self.pe[:x.size(0), :]
        return self.dropout(x)


class TransformerModel(nn.Module):
    def __init__(self, input_vocab_size, output_vocab_size, d_model, nhead, num_layers, dim_feedforward, dropout=0.1):
        super(TransformerModel, self).__init__()

        self.d_model = d_model
        self.nhead = nhead
        self.num_layers = num_layers
        self.dim_feedforward = dim_feedforward

        self.embedding = nn.Embedding(input_vocab_size, d_model)
        self.pos_encoder = PositionalEncoding(d_model, dropout)
        encoder_layer = nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward, dropout)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers)
        self.decoder = nn.Linear(d_model, output_vocab_size)

        self.init_weights()

    def init_weights(self):
        initrange = 0.1
        self.embedding.weight.data.uniform_(-initrange, initrange)
        self.decoder.bias.data.zero_()
        self.decoder.weight.data.uniform_(-initrange, initrange)

    def forward(self, src, src_mask=None):
        src = self.embedding(src) * math.sqrt(self.d_model)
        src = self.pos_encoder(src)
        output = self.transformer_encoder(src, src_mask)
        output = self.decoder(output)
        return output

In the above code, we defined a model class called TransformerModel which inherits from nn.Module. The model includes the following components:

nn.Embedding: Converts each token in the input sequence to its vector representation.
PositionalEncoding: Encodes the position of each marker in the sequence as a vector.
nn.TransformerEncoder: Transforms the encoded input sequence into an output sequence.
nn.Linear: Converts the output of the Transformer to the final output sequence.
You can modify the hyperparameters in the TransformerModel class according to your own needs, such as input and output vocabulary size, embedding dimension, Transformer layer number, hidden layer dimension, etc. When training with this model, you need to define a loss function and optimizer, and use PyTorch's standard training loop for training.

In Transformer, the role of Positional Encoding is to embed the position information in the input sequence into the vector space, so that the vector corresponding to each position is unique. In this implementation, Positional Encoding uses the formula:

PE ( p o s , 2 i ) = sin ⁡ ( p o s / 1000 0 2 i / d model ) \text{PE}{(pos, 2i)} = \sin(pos / 10000^{2i/d{\text{model}}}) PE ( p os ,2i ) _=sin ( pos / 1000 02 i / d model )

PE ( p o s , 2 i + 1 ) = cos ⁡ ( p o s / 1000 0 2 i / d model ) \text{PE}{(pos, 2i+1)} = \cos(pos / 10000^{2i/d{\text{model}}}) PE ( p os ,2i _+1)=cos(pos/100002 i / d model )

where pos represents the position in the input sequence and i represents the dimension of the vector. The resulting Positional Encoding matrix is ​​added to the embedding vector of the input sequence.

Guess you like

Origin blog.csdn.net/qq_23345187/article/details/129357428