Attention mechanism - Self-Attention Networks (SANet)

Self-attention mechanism (self-attention) is a neural network model based on attention mechanism, which is mainly used in natural language processing tasks. It is widely used in the Transformer model, which can calculate the relationship between each element in the input sequence and other elements, and use these relationships to better represent the input sequence.

In the self-attention mechanism, each element is a vector representation, for example, in language processing, the embedding vector of each word can be used as an element in the input sequence. Then, to calculate the relationship between each element and other elements, the self-attention mechanism introduces three matrices: query matrix, key matrix and value matrix. These matrices can extract features from each element in the input sequence through a linear transformation.

Realize self-attention with pytorch:

import torch
import torch.nn as nn
import torch.nn.functional as F

class SelfAttention(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(SelfAttention, self).__init__()
        
        self.query = nn.Linear(input_size, hidden_size)
        self.key = nn.Linear(input_size, hidden_size)
        self.value = nn.Linear(input_size, hidden_size)
        
    def forward(self, x):
        # 计算Q、K、V
        q = self.query(x)
        k = self.key(x)
        v = self.value(x)
        
        # 计算Self-Attention矩阵
        attn_weights = torch.bmm(q, k.transpose(1, 2))
        attn_weights = F.softmax(attn_weights, dim=-1)
        
        # 使用Self-Attention矩阵对V进行加权平均
        attn_output = torch.bmm(attn_weights, v)
        
        return attn_output

In the above code, we defined a SelfAttentionclass that inherits from nn.Module. In __init__()the function, we define query, , keyand valuethree linear layers for computing query, key, and value vectors, respectively. In forward()the function, we first compute the q, , kand vvectors, then use torch.bmm()the function to compute the Self-Attention matrix, and use F.softmax()the function to normalize the Self-Attention matrix. Finally, we use torch.bmm()the function to vperform a matrix product of the Self-Attention matrix with the value vector and return the weighted averaged output.

SelfAttentionAn instance can be created and tested using the following code :

input_size = 128
hidden_size = 64
batch_size = 32
seq_len = 10

sa = SelfAttention(input_size, hidden_size)

x = torch.randn(batch_size, seq_len, input_size)
output = sa(x)

print(output.size())  # 输出：torch.Size([32, 10, 64])

In the code above, we create an SelfAttentioninstance and use torch.randn()the function to generate a random input tensor of size , where is xthe batch size, is the sequence length, and is the feature dimension. Finally, we pass the instance and store the output in a tensor. We print the size to make sure the output size is as expected.(32, 10, 128)3210128xsaoutputoutput

Self-attention is implemented in the network:

import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(MyModel, self).__init__()
        
        # 定义Self-Attention模块
        self.self_attn = nn.MultiheadAttention(hidden_size, num_heads=8)
        
        # 定义前向神经网络
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)
        
    def forward(self, x):
        # 使用Self-Attention模块进行特征提取
        x, _ = self.self_attn(x, x, x)
        
        # 经过前向神经网络进行分类
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        
        return x

In the code above, we define a MyModelneural network named , which consists of a Self-Attention module and a feed-forward neural network. In __init__()the function, we first define an nn.MultiheadAttentioninstance and store it in self.self_attn. Next, we define a feed-forward neural network that consists of an input layer fc1, a ReLU activation function, and an output layer fc2. In forward()the function, we pass the input tensor xto the Self-Attention module and use xas query, key and value to extract features from the input. We then classify the features using a feed-forward neural network and return an output tensor x.

MyModelAn instance can be created and tested using the following code :

input_size = 128
hidden_size = 64
num_classes = 10
batch_size = 32
seq_len = 10

model = MyModel(input_size, hidden_size, num_classes)

x = torch.randn(seq_len, batch_size, input_size)
output = model(x)

print(output.size())  # 输出：torch.Size([10, 32, 10])

In the code above, we create an MyModelinstance and use torch.randn()the function to generate a random input tensor xof size , (10, 32, 128)where 10is the sequence length, 32is the batch size, and 128is the feature dimension. Finally, we xpass modelthe instance and store the output in outputa tensor. We print outputthe size to make sure the output size is as expected.

Attention mechanism - Self-Attention Networks (SANet)

Supongo que te gusta