0. Preface
Why should you learn Pytorch-Geometric? (Hereinafter referred to as PyG) To put it simply, it is useful for the current projects. Another feature is that compared to NYU's DeepGraphLibrary, the problem with DGL is that the API is more difficult, and there is currently no need to migrate.
The graph convolution framework can do a lot of things, and it provides a lot of convenient data sets and various GNN SOTA implementations. In fact, what attracts me most is that the API of this framework is relatively friendly, and there are more people using PyG for projects. The ecology is relatively friendly to people like me who make 3D meshes.
Note that this tutorial is completely based on the latest official (2020.04.14) tutorial. On this basis, the implementation of a simplified version of GCN is completed. Children's shoes who are interested in the official implementation of GCN can do so [1]
.
Below, I will follow [1]
the steps exactly. The difference is that here I will analyze the simplified implementation of GCN based on the latest version of PyG (1.4.3), so that everyone can better understand the implementation principle of GCN. The following is the order of explanation. :
-
①Data Handling of graph data
-
②Common Benchmark Datasets
-
③Mini-batches
-
④Data Transforms
-
⑤Learning Methods on Graphs
Additionally, the environment I'm using is:
- Ubuntu 18.04
- Queue10.0
- pytorch 1.4.0
conda install pytorch=1.4.0 cudatoolkit=10.0
- pytorch geometric 1.4.3
- torch-scatter
pip install torch-scatter==latest+cu100 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
- torch-spline-conv
pip install torch-spline-conv==latest+cu100 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
- torch-cluster
pip install torch-cluster==latest+cu100 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
- torch-sparse
pip install torch-sparse==latest+cu100 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
1. Data processing of graph structure
First, what is a picture ? A graph is a combination of edges and vertices. In PyG, a simple graph can be described as an instance of torch_geometric.data.Data[2]
, which has several important properties that need to be explained. In addition, if your graph needs to be extended, then you can use the torch_geometric.data.Data class Just make the changes.
Figure 1.1 Common member variables of torch_geometric.data.Data
Generally speaking, for general tasks, the data class only needs to have a few attributes such as x , edge_index , edge_attr , y , etc., and these attributes are all optional. In other words, the Data class are not limited to these properties.
For example, data.face (torch.LongTensor, [3, num_faces]) can be extended to save the connection relationship of the triangles of the 3D mesh.
Figure 1.2 Official description of torch_geometric.data.Data
Figure 1.3 Data example (3 nodes, 4 edges (bidirectional), each node has 2 features [-1, 2], [0, 3], [1, 1].)
It should be noted that although the graph only has 2 edges, we still need to define 4 index tuples to consider the bidirectional relationship of the edges.
The schematic diagram of the graph built in Figure 1.3 is as follows:
2. Common Benchmark data sets
Although the Bengio team recently developed 6 Benchmark data sets based on DGL, there is no problem in doing this on pyG. So there is no need to switch to DGL directly because of this.
PyTorch Geometric contains a large number of basic data sets, all Planetoid datasets ( Cora , Citeseer , Pubmed ), cleaned graph classification data sets from TU Dortmund, a series of 3D point cloud and mesh data sets, such as FAUST, ShapeNet, etc. .
PyG provides automatic downloading of these data and processes it into the Data form mentioned before. Take the ENZYMES data set as an example (containing 600 pictures and 6 categories):
Figure 2.1 Analysis of the ENZYMES data set
As can be seen from Figure 2.1, each sample is an instance of Data, with three attributes: vertex feature x , connection relationship edge_index and category y . It can be seen that each data of ENZYMES is a graph.
Note: A dataset can be shuffled by using dataset=dataset.shuffle() .
In addition, the tutorial also provides an explanation of Planetoid's Cora data set (used for semi-supervised graph node classification). The data in the Cora data set here has three new attributes: train_mask, test_mask, and val_mask. These three attributes are used for characterization . Data nodes required for training, testing and validation.
The difference between Cora and ENZYMES is that each data in Cora is a node in the entire graph, while each data in ENZYMES is an independent graph.
Figure 2.2 Cora data set description
3. Mini-Batches
We know that neural networks are usually trained in batches, and PyG achieves parallelization on mini-batches by creating sparse block diagnol adjacency matrices .
Figure 3.1 PyG mini-batch batch processing of graphs with different numbers of nodes and edges
And splice the node feature x and category feature y according to the node dimension. In this way, PyG can stuff samples with different numbers of nodes and edges into a Batch.
Figure 3.2 ENZYMES data set loading instructions (not shuffled)
(Note that the DataLoader here is PyG’s own, not pytorch’s. In addition, when use_node_attr=False, x is [nodes_num, 3]; when use_node_attr=True, x is [nodes_num, 21])
Here, torch_geometric.data.Batch inherits from torch_geometric.data.Data , and has an additional attribute called batch, whose function is to indicate which graph (ENZYMES)/sample each node belongs to.
In addition, torch_geometric.data.DataLoader is just a version of pytorch's Dataloader that rewrites the collate function.
Parameters normally passed to the Dataloader of pytorch, such as pin_memory
, num_workers
etc., can be passed to torch_geometric.data.DataLoader .
Of course, users can customize the processing of [3]
node data characteristics x by using torch-scatter and use custom Dataset and Dataloader to process their own special form of data [4]
.
4. Data Transforms
Similar torchvision
to the use in pytorch, we also need to process and transform graph data. PyG provides its own transform method and toolkit. The required input is a Data object and a transformed Data object is returned.
Similarly, transform can be used to perform a series of splicing through torch_geometric.transforms.Compose .
The example the author gives is the Airplane class of the ShapeNetpre_transform = T.KNNGraph(k=6)
data set (containing 17,000 3D shape point clouds and per point labels from 16 shape categories). The author changes the point cloud data into a graph data set.
Figure 4.1 ShapeNet data set processing (turning point cloud data into graph data)
If you have other needs, you can go to torch_geometric.transforms to check whether there is a transform that meets your purpose. If not, write it yourself~
5. Learning Methods on Graphs
After completing the first 4 steps, let us now start the first GNN~. Here, we will use the most basic GCN layer to reproduce the experiment on the Cora Citation data set. To understand GCN, we need to start with the Fourier transform Speaking of it, by analogy with time domain --> frequency domain, through the Hemlholtz formula, the vertex domain is changed to the spectral domain for analysis. In this way, the convolution of the vertex domain becomes the dot product of the spectral domain, which saves the amount of calculation.
In addition, the transformation process also involves the meaning of the Laplacian matrix L (the divergence of each vertex : it can be understood as the gain of the information of each vertex , the outgoing is positive and the incoming is negative ), because of the properties of L ( Positive semi-definite, eigenvalue is greater than or equal to 0, etc.), assuming that its eigenvalue is λ λλ , the eigenvector isUUU , by comparison with the spectrogram:
- U U U can be analogized to the basis function of Fourier transformation;
- l lλ is analogous to frequency w
GCN [5]
is based on this and is obtained through two steps of optimization. It takes into account both self-loop and k-localize (locality), and also renormalizes the degree to avoid the Matthew effect being too obvious and making the model unstable. It's easy to fall into local minima.
Okay, I won’t mention it anymore. If you are interested in the derivation and emergence of GCN, you can read [6-7]
(first understand the general meaning of Laplacian matrix and transformation in graph theory, and then go to YouTube to watch the tutorial on GNN by Assistant Professor Jiang Chenghan of National Taiwan University ) to learn, let’s look at the code below.
5.1 Implementation of GCN in PyG
PyG provides the base class torch_geometric.nn.MessagePassing . By inheriting this class, we can implement various GNNs based on message passing. With MessagePassing
, users no longer need to pay attention to the internal implementation details of message progation. MessagePassing mainly focuses on its UPDATE . AGGREGATION , MESSAGE these three member functions.
When users implement their own GNN, they generally only overwrite the two member functions AGGREGATION and UPDATE , and MESSAGE/Propagate comes with MessagePassing . (The official GCN is like this~)
Our goal is: to implement a simplified version of GCN that is consistent with the official one, and to master how to define graph convolution in PyG by implementing it.
- First, we first define a graph data
data
(directed graph, 4 nodes, 3 edges, the feature dimension of each node is 1, and the value is also 1):
import torch
from torch_geometric.data import Data
edge_index = torch.tensor([[1, 2, 3], [0, 0, 0]], dtype=torch.long)
x = torch.tensor([[1], [1], [1], [1]], dtype=torch.float)
data = Data(edge_index=edge_index, x=x)
print(edge_index)
print(data)
- MessagePassing message passing mechanism
Through the above picture [9]
, it is easy to understand what the corresponding content of each block of GCN based on message passing is:
message = ϕ \phiϕ; aggregation = □ □ □; Update = γ \gamma γ . Then replace the above message passing formula and get the following new form:
This picture shows the data flow method in this example. Please note: GCN’s default scatter method isadd
(as for why, please see the picture below: Because usingmean
Inmax
the case of sum, as the number of GNN network layers increases in each subgraph, the feature distinction before each node becomes smaller and smaller, which does not meet our goals)
- The implementation of GCN can also be divided into 5 steps:
- Add self-loops to the adjacency matrix.
edge_index
= A ^ = I N + A \hat A = I_{N} + A A^=IN+A (implemented by modifyingedge_index),implementedthe add_remaining_self_loops[10]
Originaledge_index
After adding self_loopedge_index
-
Linearly transform node feature matrix.
x
= Θ W \Theta W Θ W (corresponding in the codeself.matmul(weight, x)
)
original inputx
obtained through weight transformx
-
Normalize node features.
norm
= D ^ − 0.5 A ^ D ^ − 0.5 \hat D^{-0.5}\hat A \hat D^{-0.5} D^−0.5A^D^− 0 . 5 (in the source codeA ^ \hat AA^ is used to indicate the weight of the edge, which is 1 by default.)
norm
The value of has the same length edge_index
as:
- Sum up neighboring node features. ∑ i ∈ N ( p ) ( D ^ − 0.5 A ^ D ^ − 0.5 Θ W ) \sum_{i ∈ N(p)}(\hat D^{-0.5}\hat A \hat D^{-0.5} \Theta W) ∑i∈N(p)(D^−0.5A^D^− 0 . 5 ΘW)
(Step 4 is implemented in MessagePassing, that is, scatter_add/sum/mean in the above figure, the user does not need to worry) What is indef message(self, x_j, norm)
isthe result of expanding to self_loopx_j
in step 2x
x_j
The value of:
message(self, x_j, norm)
the output of:
- Return new node embeddings. Return the result X new X_{new}Xnew.
Because it is scatter_add
the way, the connection relationships of [1, 0], [2, 0], [3, 0] are added to get the final output result:
Obviously
− 0.0330 = − 0.0075 − 0.0075 − 0.0075 − 0.0106 -0.0330 = -0.0075 -0.0075 -0.0075-0.0106−0.0330=−0.0075−0.0075−0.0075−0.0106
2.3680 = 0.5364 + 0.5364 + 0.5364 + 0.7586 2.3680 = 0.5364+0.5364+0.5364+0.7586 2.3680=0.5364+0.5364+0.5364+0 . 7 5 8 6
Similarly, if changed toscatter_max
, the result is as follows, because− 0.0075 = max ( − 0.0075 , − 0.0106 ) -0.0075 = max (-0.0075, -0.0106)−0.0075=max(−0.0075,−0.0106), 0.7586 = m a x ( 0.5364 , 0.7586 ) 0.7586 = max(0.5364, 0.7586) 0.7586=max(0.5364,0.7586)
The implementation of these five steps is completely implemented through the following code:
import torch
from torch_scatter import scatter_add
from torch_geometric.nn import MessagePassing
import math
def glorot(tensor):
if tensor is not None:
stdv = math.sqrt(6.0 / (tensor.size(-2) + tensor.size(-1)))
tensor.data.uniform_(-stdv, stdv)
def zeros(tensor):
if tensor is not None:
tensor.data.fill_(0)
def add_self_loops(edge_index, num_nodes=None):
print("进入self_loops")
loop_index = torch.arange(0, num_nodes, dtype=torch.long,
device=edge_index.device)
print(loop_index)
loop_index = loop_index.unsqueeze(0).repeat(2, 1)
print(loop_index)
edge_index = torch.cat([edge_index, loop_index], dim=1)
print(edge_index)
print("出self_loops")
# 原来的edge_index为[[1, 2, 3],
# [0, 0, 0]]
# 这样一来,就在原来的边连接关系edge_index的基础上增加了self_loop的关系.
# torch.cat([edge_index, loop_index], dim=1)
# tensor([[1, 2, 3, 0, 1, 2, 3],
# [0, 0, 0, 0, 1, 2, 3]])
return edge_index
def degree(index, num_nodes=None, dtype=None):
out = torch.zeros((num_nodes), dtype=dtype, device=index.device)
print(out.scatter_add_(0, index, out.new_ones((index.size(0)))))
return out.scatter_add_(0, index, out.new_ones((index.size(0))))
class GCNConv(MessagePassing):
def __init__(self, in_channels, out_channels, bias=True):
super(GCNConv, self).__init__(aggr='add') # "Add" aggregation.
# super(GCNConv, self).__init__(aggr='max') # "Max" aggregation.
self.weight = torch.nn.Parameter(torch.Tensor(in_channels, out_channels))
if bias:
self.bias = torch.nn.Parameter(torch.Tensor(out_channels))
else:
self.register_parameter('bias', None)
self.reset_parameters()
def reset_parameters(self):
glorot(self.weight)
zeros(self.bias)
def forward(self, x, edge_index):
# x has shape [N, in_channels]
# edge_index has shape [2, E]
# Step 1: 为adjacency matrix添加self_loop(通过对edge_index拼接连向自己的边[1, 1], [2, 2]等)
# 原来的edge_index = tensor([[1, 2, 3],
# [0, 0, 0]])
# 加上self_loop的index = tensor([[1, 2, 3, 0, 1, 2, 3],
# [0, 0, 0, 0, 1, 2, 3]])
edge_index = add_self_loops(edge_index, x.size(0))
# Step 2: 对输入的node feature matrix进行weight transform.
x = torch.matmul(x, self.weight)
# Step 3-5: 开始消息传递.
edge_weight = torch.ones((edge_index.size(1),),
dtype=x.dtype,
device=edge_index.device)
row, col = edge_index
print("row", row) # row tensor([1, 2, 3, 0, 1, 2, 3])
print("col", col) # col tensor([0, 0, 0, 0, 1, 2, 3])
deg = scatter_add(edge_weight, row, dim=0, dim_size=x.size(0))
print("deg", deg)
# deg是[1, 2, 2, 2], 这是啥?
# 因为
# row = [1, 2, 3, 0, 1, 2, 3]
# edge_weight = [1, 1, 1, 1, 1, 1, 1]
# 所以,主对角上,第0个对应1,第1个对应2个,同理,得到degree矩阵. 这里只返回主对角的元素, 避免稀疏乘.
deg_inv_sqrt = deg.pow(-0.5)
deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0
# 读edge_weight为None的情况,
# deg_inv_sqrt[row] * edge_weight * deg_inv_sqrt[col] == deg_inv_sqrt[row] * deg_inv_sqrt[col]
norm = deg_inv_sqrt[row] * edge_weight * deg_inv_sqrt[col]
print(norm)
# norm = tensor([0.7071, 0.7071, 0.7071, 1.0000, 0.5000, 0.5000, 0.5000])
return self.propagate(edge_index, x=x, norm=norm)
def message(self, x_j, norm):
# x_j has shape [E, out_channels]
# norm: 规则化后的权重.
return norm.view(-1, 1) * x_j if norm is not None else x_j
def update(self, aggr_out):
# aggr_out has shape [N, out_channels]
# Step 5: 返回新的node embeddings.
if self.bias is not None:
return aggr_out + self.bias
else:
return aggr_out
Conduct experiments and get the same effect as the official implementation:
5.2 Training on the Cora Citation dataset
Here we use the official GCN to make a 2-layer GNN network to train the Cora Citation data set. If everything is ok, copy the following code directly to your local computer and run it ~
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.datasets import Planetoid
# 5.1) 加载Cora数据集.(自动帮你下载)
dataset = Planetoid(root='/home/pyG/Cora', name='Cora')
# 5.2) 定义2层GCN的网络.
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = GCNConv(dataset.num_node_features, 16)
self.conv2 = GCNConv(16, dataset.num_classes)
def forward(self, data):
x, edge_index = data.x, data.edge_index
x = self.conv1(x, edge_index)
x = F.relu(x)
x = F.dropout(x, training=self.training)
x = self.conv2(x, edge_index)
return F.log_softmax(x, dim=1)
# 5.3) 训练 & 测试.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Net().to(device)
data = dataset[0].to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
model.train()
for epoch in range(200):
optimizer.zero_grad()
out = model(data)
loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
loss.backward()
optimizer.step()
model.eval()
_, pred = model(data).max(dim=1)
correct = float (pred[data.test_mask].eq(data.y[data.test_mask]).sum().item())
acc = correct / data.test_mask.sum().item()
print('Accuracy: {:.4f}'.format(acc))
# >>> Accuracy: 0.8150
At this point, a complete GNN based on GCN is ready. As for the training data processing and many details, you need to hack the source code. I wish you all a happy learning~
References
[1] PyG official Tutorial
[2] torch_geometric.data.Data
[3] torch-scatter
[4] advanced mini-batching of PyG
[5] GCN: Semi-supervised Classfication with Graph Convolutional Networks
[6] [Actually very simple] Laplacian operator and Laplacian moment
[7] Introduction to GNN Jiang Chenghan of National Taiwan University
[8] Torch geometric GCNConv source code analysis
[9] MessagePassing
[10] add_remaining_self_loops