[Complex Network Modeling] - Building a Graph Attention Network Model Based on Pytorch

Table of contents

1. Steps to build the graph attention network GAT model

1.1 Diagram representation

1.2 Self-attention mechanism

1.3 Attention Aggregation

1.4 Multi-head attention

1.5 Output and Prediction

2. Use the attention mechanism to weight and aggregate the neighbors of the node

3. Building a graph attention network (GAT) model based on pytorch

3.1 GraphAttentionLayer class

3.2GAT model

3.3 Code usage


"Graph Attention Networks" - Petar Veličković et al. (2018)

Code link: dgl/examples/tensorflow/gat at master dmlc/dgl GitHub

This paper introduces a model called Graph Attention Networks (GAT) for node classification tasks. GAT uses the attention mechanism to carry out weighted aggregation of the neighbors of nodes, and can adaptively learn the importance of each neighbor node to the target node.

f1bb936c86174d609e90c870f32316f1.png

1. Steps to build the graph attention network GAT model

1.1 Diagram representation

First, represent the graph as a set V of nodes and a set E of edges. Each node has a feature vector, which can be expressed as a feature matrix X ∈ R^(N×F), where N is the number of nodes and F is the feature dimension.

1.2 Self-attention mechanism

The GAT model uses a self-attention mechanism to learn the weights between nodes. For each node i, the GAT model computes its attention coefficient a_ij, which represents the correlation between node i and node j. These attention coefficients are computed by:

a_ij = LeakyReLU(α^T [Wh_i || Wh_j])

Among them, h_i and h_j are the feature vectors of node i and node j, W is the learnable weight matrix, α is the attention parameter vector, || represents the splicing operation of the vector, and LeakyReLU is an activation function. By calculating the attention coefficient, the importance of node i to node j can be obtained.

1.3 Attention Aggregation

After computing the attention coefficients, the GAT model weightedly aggregates them with the corresponding node feature vectors. For node i, its aggregated feature representation can be calculated as follows:

h_i' = σ(∑(j∈N_i) a_ij * (W * h_j))

Among them, N_i represents the set of nodes adjacent to node i, ∑ represents the summation operation, W is a learnable weight matrix, and σ is an activation function. Through weighted aggregation, the feature vector of node i will fuse the information of its neighbor nodes, and the attention coefficient will determine the contribution of each neighbor node to node i.

f97b0d9af3fa44bf8c9ca4cd921cacc8.png

 Left: The attention mechanism a(W~hi, W~hj) employed by our model, parametrized by a weight vector , applying a LeakyReLU activation. Right: An illustration of multihead attention (with K =3heads) by node 1 on its neighborhood. Different arrow styles and colors denote independent attention computations. The aggregated features from each head are concatenated or averaged to obtain ~h1.

1.4 Multi-head attention

In order to improve the expressiveness and stability of the model, the GAT model introduces a multi-head attention mechanism. By using multiple independent attention heads, different node weights can be learned in parallel, and then concatenated or averaged to get the final node representation.

1.5 Output and Prediction

After aggregation through multi-head attention, the resulting node representations can be used for node classification or graph-level tasks. The node representation can be used as input, and the dimensionality reduction or corresponding task prediction can be performed through the fully connected layer.

2. Use the attention mechanism to weight and aggregate the neighbors of the node

GAT (Graph Attention Networks) uses the attention mechanism to weight and aggregate the neighbors of nodes as follows:

In a graph structure, each node is connected to some neighbor nodes. The GAT model measures the correlation and importance between a node and its neighbors by calculating the attention coefficient between a node and its neighbors. These attention coefficients determine how much each neighbor node contributes to the target node.

Specifically, for each target node i, the GAT model calculates the attention coefficient a_ij between it and its neighbor node j . The calculation formula of the attention coefficient a_ij is as follows:

a_ij = LeakyReLU(α^T [Wh_i || Wh_j])

Among them, h_i and h_j represent the feature vectors of target node i and neighbor node j respectively. W is the learnable weight matrix, α is the attention parameter vector, and LeakyReLU is an activation function. [W h_i || W h_j] means concatenating two feature vectors as the input of the attention mechanism. Through this calculation process, the attention coefficient a_ij between each pair of nodes i and j can be obtained.

The attention coefficient a_ij reflects the correlation and importance between target node i and neighbor node j. It is obtained through learning and can adaptively adjust the importance of each neighbor node to the target node. A larger attention coefficient means that the neighbor node contributes more to the target node, and a smaller attention coefficient means that the contribution is smaller.

Then, the features of neighbor nodes are weighted and aggregated according to the attention coefficient a_ij. For a target node i, the GAT model multiplies the feature vector of its neighbor nodes with the corresponding attention coefficient, and sums the weighted feature vectors of all neighbor nodes. In this way, the target node i can adaptively aggregate the information of neighbor nodes, focusing on the neighbor nodes that contribute to its own task.

By utilizing the attention mechanism to perform weighted aggregation of a node's neighbors, the GAT model can adaptively learn the importance of each neighbor node to the target node. This enables the model to better capture the relationship and structural information between nodes in the graph, and achieve better performance in tasks such as node classification and graph classification.

3. Building a graph attention network (GAT) model based on pytorch

3.1 GraphAttentionLayer class

The GraphAttentionLayer class is a key component in building a graph attention network (GAT) model. It defines the operation of the graph attention layer, which is used to calculate the attention coefficient and perform weighted aggregation.

import torch
import torch.nn as nn
import torch.nn.functional as F

class GraphAttentionLayer(nn.Module):
    def __init__(self, in_features, out_features, dropout=0.6, alpha=0.2):
        super(GraphAttentionLayer, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.dropout = dropout
        self.alpha = alpha

        self.W = nn.Parameter(torch.zeros(size=(in_features, out_features)))
        nn.init.xavier_uniform_(self.W.data, gain=1.414)
        self.a = nn.Parameter(torch.zeros(size=(2*out_features, 1)))
        nn.init.xavier_uniform_(self.a.data, gain=1.414)

        self.leakyrelu = nn.LeakyReLU(self.alpha)

    def forward(self, input, adj):
        h = torch.mm(input, self.W)  # 输入节点特征与权重矩阵的乘积
        N = h.size()[0]  # 节点数量

        # 计算注意力系数
        a_input = torch.cat([h.repeat(1, N).view(N*N, -1), h.repeat(N, 1)], dim=1).view(N, -1, 2*self.out_features)
        e = self.leakyrelu(torch.matmul(a_input, self.a).squeeze(2))
        attention = F.softmax(e, dim=1)
        attention = F.dropout(attention, self.dropout, training=self.training)

        # 加权聚合邻居节点特征
        h_prime = torch.matmul(attention, h)
        return F.elu(h_prime)

The main properties and methods of the GraphAttentionLayer class are as follows:

  • __init__(self, in_features, out_features, dropout=0.6, alpha=0.2): Constructor, initialize the parameters of the graph attention layer.

    • in_features: Dimensions of the input features.
    • out_features: Dimensions of the output features.
    • dropout: Dropout probability, used to prevent overfitting, the default is 0.6.
    • alpha: The negative slope parameter of the LeakyReLU activation function, the default is 0.2.
  • forward(self, input, adj): The forward propagation function, which performs the operation of the graph attention layer.

    • input: The input node feature vector, the shape is (N, in_features), where N is the number of nodes.
    • adj: Adjacency matrix, which represents the connection relationship between nodes, with a shape of (N, N).
    • Returns the node feature vector after weighted aggregation, the shape is (N, out_features).

In forwardthe method , GraphAttentionLayer does the following:

  1. Multiply the input node feature vector with the weight matrix to obtain a linear transformation of the node features.
  2. According to the principle of the attention mechanism, the attention coefficient between the node and the neighbor nodes is calculated.
  3. Operate the LeakyReLU activation function on the attention coefficient and introduce a nonlinear relationship.
  4. According to the attention coefficient, the features of the neighbor nodes are weighted and aggregated to obtain the aggregated features of the target node.
  5. After nonlinear transformation using an activation function, the aggregated features are returned as the output of the graph attention layer.

By defining multiple GraphAttentionLayer instances and combining them together, a graph attention network model (GAT) with a multi-head attention mechanism can be constructed to process graph data.

3.2GAT model

The GAT class is one of the main components for building a Graph Attention Network (GAT) model. It defines the structure of the entire GAT model, including multiple attention heads and the final output layer.

class GAT(nn.Module):
    def __init__(self, in_features, hidden_features, out_features, dropout=0.6, alpha=0.2, num_heads=8):
        super(GAT, self).__init__()
        self.dropout = dropout

        self.attentions = [GraphAttentionLayer(in_features, hidden_features, dropout=dropout, alpha=alpha) for _ in range(num_heads)]
        for i, attention in enumerate(self.attentions):
            self.add_module('attention_{}'.format(i), attention)

        self.out_att = GraphAttentionLayer(hidden_features * num_heads, out_features, dropout=dropout, alpha=alpha)

    def forward(self, input, adj):
        x = F.dropout(input, self.dropout, training=self.training)
        x = torch.cat([att(x, adj) for att in self.attentions], dim=1)
        x = F.dropout(x, self.dropout, training=self.training)
        x = F.elu(self.out_att(x, adj))
        return F.log_softmax(x, dim=1)

The main attributes and methods of the GAT class are as follows:

  • __init__(self, in_features, hidden_features, out_features, dropout=0.6, alpha=0.2, num_heads=8): Constructor, initialize the parameters of the GAT model.

    • in_features: Dimensions of the input features.
    • hidden_features: The dimension of the hidden feature, which is the output feature dimension of the attention head.
    • out_features: The number of output categories.
    • dropout: Dropout probability, used to prevent overfitting, the default is 0.6.
    • alpha: The negative slope parameter of the LeakyReLU activation function, the default is 0.2.
    • num_heads: Number of attention heads, default is 8.
  • forward(self, input, adj): The forward propagation function, which performs the operation of the GAT model.

    • input: The input node feature vector, the shape is (N, in_features), where N is the number of nodes.
    • adj: Adjacency matrix, which represents the connection relationship between nodes, with a shape of (N, N).
    • Returns the predicted class probabilities with shape (N, out_features).

In forwardthe method, the GAT model does the following:

  1. Perform a Dropout operation on the input node features.
  2. Feature transformation and attention computation are performed through multiple attention heads respectively. Each attention head produces an output feature vector.
  3. Concatenate the output feature vectors of all attention heads together.
  4. Perform the Dropout operation again on the spliced ​​feature vectors.
  5. A non-linear transformation is performed through the output layer and the predicted class probabilities are output using the LogSoftmax function.

By defining an instance of the GAT class, and configuring parameters such as input feature dimension, hidden feature dimension, and number of output categories according to specific application scenarios, a graph attention network (GAT) model can be built and trained to process graph data and perform node classification Or tasks such as graph classification.

744212a603a6422a9d71c895f3bd4798.png

A t-SNE plot of the computed feature representations of a pre-trained GAT model’s first hidden layer on the Cora dataset.

3.3 Code usage

This code defines a GraphAttentionLayer class and a GAT class. The GraphAttentionLayer class implements the operation of the graph attention layer, which is used to calculate the attention coefficient and perform weighted aggregation. The GAT class defines the structure of the entire graph attention network model, including multiple attention heads and the final output layer.

To use this model, follow these steps:

1. Define the input feature dimension, hidden feature dimension and output category number:

in_features = ...
hidden_features = ...
out_features = ...

2. Create an instance of the GAT model:

model = GAT(in_features, hidden_features, out_features, dropout=dropout, alpha=alpha, num_heads=num_heads)

This code creates an instance of the GAT model and passes in the necessary parameters:

  • in_features: Dimensions of the input features.
  • hidden_features: The dimension of the hidden feature, which is the output feature dimension of the attention head.
  • out_features: The number of output categories.
  • dropout: Dropout probability, used to prevent overfitting.
  • alpha: The negative slope parameter of the LeakyReLU activation function.
  • num_heads: Number of attention heads.

In this way, you have successfully created an instance of the GAT model, which can be used in training and inference. Remember to use the loss function and optimizer to train the model during training, and to evaluate and infer the model as needed.

After creating the GAT model instance, proceed with the following steps:

3. Loss function definition and optimizer selection:

Choose an appropriate loss function and optimizer to train the model.

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

Among them, criterionthe loss function is defined, and the cross-entropy loss function is used here; optimizerthe Adam optimizer is selected, and the model parameters are passed to the optimizer.

4. Training cycle:

Train the model using the training data.

model.train()

for epoch in range(num_epochs):
    optimizer.zero_grad()
    output = model(input, adj)
    loss = criterion(output, labels)
    loss.backward()
    optimizer.step()
    # 其他训练过程中的操作,如计算准确率等

In the training loop, the model is first set to training mode, and then a number of training iterations are performed. In each iteration, the input data and adjacency matrix are passed to the model, and the output of the model is obtained. Then calculate the loss function value, perform backpropagation and parameter update. Other operations in the training process, such as calculating accuracy, can be added as needed.

5. Model evaluation:

Evaluate the performance of the trained model using validation or test set data.

model.eval()

with torch.no_grad():
    output = model(input, adj)
    # 进行评估操作,如计算准确率、查准率、查全率等

Set the model to evaluation mode and use validation or test data for inference. Calculate the performance of the model on evaluation indicators, such as accuracy, precision, recall, etc., as needed.

6. Model inference:

Use the trained model to make predictions on new unlabeled data.

model.eval()

with torch.no_grad():
    output = model(input, adj)
    # 进行推断操作,根据输出进行预测

Also set the model to evaluation mode and use new unlabeled data for inference. Predictive operations are performed based on the output of the model.

Guess you like

Origin blog.csdn.net/lxwssjszsdnr_/article/details/130794869