[PyG] 1. How to use GCN to complete a basic training process (including GCN implementation)

0. Preface

Why should you learn Pytorch-Geometric? (Hereinafter referred to as PyG) To put it simply, it is useful for the current projects. Another feature is that compared to NYU's DeepGraphLibrary, the problem with DGL is that the API is more difficult, and there is currently no need to migrate.

The graph convolution framework can do a lot of things, and it provides a lot of convenient data sets and various GNN SOTA implementations. In fact, what attracts me most is that the API of this framework is relatively friendly, and there are more people using PyG for projects. The ecology is relatively friendly to people like me who make 3D meshes.

Note that this tutorial is completely based on the latest official (2020.04.14) tutorial. On this basis, the implementation of a simplified version of GCN is completed. Children's shoes who are interested in the official implementation of GCN can do so [1].

Below, I will follow [1]the steps exactly. The difference is that here I will analyze the simplified implementation of GCN based on the latest version of PyG (1.4.3), so that everyone can better understand the implementation principle of GCN. The following is the order of explanation. :

  • ①Data Handling of graph data

  • ②Common Benchmark Datasets

  • ③Mini-batches

  • ④Data Transforms

  • ⑤Learning Methods on Graphs

Additionally, the environment I'm using is:

  • Ubuntu 18.04
  • Queue10.0
  • pytorch 1.4.0 conda install pytorch=1.4.0 cudatoolkit=10.0
  • pytorch geometric 1.4.3
  • torch-scatter pip install torch-scatter==latest+cu100 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
  • torch-spline-conv pip install torch-spline-conv==latest+cu100 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
  • torch-cluster pip install torch-cluster==latest+cu100 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
  • torch-sparse pip install torch-sparse==latest+cu100 -f https://pytorch-geometric.com/whl/torch-1.4.0.html

1. Data processing of graph structure

First, what is a picture ? A graph is a combination of edges and vertices. In PyG, a simple graph can be described as an instance of torch_geometric.data.Data[2] , which has several important properties that need to be explained. In addition, if your graph needs to be extended, then you can use the torch_geometric.data.Data class Just make the changes.

insert image description here
Figure 1.1 Common member variables of torch_geometric.data.Data

Generally speaking, for general tasks, the data class only needs to have a few attributes such as x , edge_index , edge_attr , y , etc., and these attributes are all optional. In other words, the Data class are not limited to these properties.

For example, data.face (torch.LongTensor, [3, num_faces]) can be extended to save the connection relationship of the triangles of the 3D mesh.

insert image description here
Figure 1.2 Official description of torch_geometric.data.Data

insert image description here
Figure 1.3 Data example (3 nodes, 4 edges (bidirectional), each node has 2 features [-1, 2], [0, 3], [1, 1].)

It should be noted that although the graph only has 2 edges, we still need to define 4 index tuples to consider the bidirectional relationship of the edges.
The schematic diagram of the graph built in Figure 1.3 is as follows:
insert image description here

2. Common Benchmark data sets

Although the Bengio team recently developed 6 Benchmark data sets based on DGL, there is no problem in doing this on pyG. So there is no need to switch to DGL directly because of this.

PyTorch Geometric contains a large number of basic data sets, all Planetoid datasets ( Cora , Citeseer , Pubmed ), cleaned graph classification data sets from TU Dortmund, a series of 3D point cloud and mesh data sets, such as FAUST, ShapeNet, etc. .

PyG provides automatic downloading of these data and processes it into the Data form mentioned before. Take the ENZYMES data set as an example (containing 600 pictures and 6 categories):
insert image description here
Figure 2.1 Analysis of the ENZYMES data set

As can be seen from Figure 2.1, each sample is an instance of Data, with three attributes: vertex feature x , connection relationship edge_index and category y . It can be seen that each data of ENZYMES is a graph.

Note: A dataset can be shuffled by using dataset=dataset.shuffle() .

In addition, the tutorial also provides an explanation of Planetoid's Cora data set (used for semi-supervised graph node classification). The data in the Cora data set here has three new attributes: train_mask, test_mask, and val_mask. These three attributes are used for characterization . Data nodes required for training, testing and validation.

The difference between Cora and ENZYMES is that each data in Cora is a node in the entire graph, while each data in ENZYMES is an independent graph.

insert image description here
Figure 2.2 Cora data set description

3. Mini-Batches

We know that neural networks are usually trained in batches, and PyG achieves parallelization on mini-batches by creating sparse block diagnol adjacency matrices .

insert image description here
Figure 3.1 PyG mini-batch batch processing of graphs with different numbers of nodes and edges

And splice the node feature x and category feature y according to the node dimension. In this way, PyG can stuff samples with different numbers of nodes and edges into a Batch.
insert image description here
Figure 3.2 ENZYMES data set loading instructions (not shuffled)

(Note that the DataLoader here is PyG’s own, not pytorch’s. In addition, when use_node_attr=False, x is [nodes_num, 3]; when use_node_attr=True, x is [nodes_num, 21])

Here, torch_geometric.data.Batch inherits from torch_geometric.data.Data , and has an additional attribute called batch, whose function is to indicate which graph (ENZYMES)/sample each node belongs to.

In addition, torch_geometric.data.DataLoader is just a version of pytorch's Dataloader that rewrites the collate function.

Parameters normally passed to the Dataloader of pytorch, such as pin_memory, num_workersetc., can be passed to torch_geometric.data.DataLoader .

Of course, users can customize the processing of [3]node data characteristics x by using torch-scatter and use custom Dataset and Dataloader to process their own special form of data [4].

4. Data Transforms

Similar torchvisionto the use in pytorch, we also need to process and transform graph data. PyG provides its own transform method and toolkit. The required input is a Data object and a transformed Data object is returned.

Similarly, transform can be used to perform a series of splicing through torch_geometric.transforms.Compose .

The example the author gives is the Airplane class of the ShapeNetpre_transform = T.KNNGraph(k=6) data set (containing 17,000 3D shape point clouds and per point labels from 16 shape categories). The author changes the point cloud data into a graph data set.

insert image description here
Figure 4.1 ShapeNet data set processing (turning point cloud data into graph data)

If you have other needs, you can go to torch_geometric.transforms to check whether there is a transform that meets your purpose. If not, write it yourself~

5. Learning Methods on Graphs

After completing the first 4 steps, let us now start the first GNN~. Here, we will use the most basic GCN layer to reproduce the experiment on the Cora Citation data set. To understand GCN, we need to start with the Fourier transform Speaking of it, by analogy with time domain --> frequency domain, through the Hemlholtz formula, the vertex domain is changed to the spectral domain for analysis. In this way, the convolution of the vertex domain becomes the dot product of the spectral domain, which saves the amount of calculation.

In addition, the transformation process also involves the meaning of the Laplacian matrix L (the divergence of each vertex : it can be understood as the gain of the information of each vertex , the outgoing is positive and the incoming is negative ), because of the properties of L ( Positive semi-definite, eigenvalue is greater than or equal to 0, etc.), assuming that its eigenvalue is λ λλ , the eigenvector isUUU , by comparison with the spectrogram:

  • U U U can be analogized to the basis function of Fourier transformation;
  • l lλ is analogous to frequency w

GCN [5]is based on this and is obtained through two steps of optimization. It takes into account both self-loop and k-localize (locality), and also renormalizes the degree to avoid the Matthew effect being too obvious and making the model unstable. It's easy to fall into local minima.

Okay, I won’t mention it anymore. If you are interested in the derivation and emergence of GCN, you can read [6-7](first understand the general meaning of Laplacian matrix and transformation in graph theory, and then go to YouTube to watch the tutorial on GNN by Assistant Professor Jiang Chenghan of National Taiwan University ) to learn, let’s look at the code below.

5.1 Implementation of GCN in PyG

PyG provides the base class torch_geometric.nn.MessagePassing . By inheriting this class, we can implement various GNNs based on message passing. With MessagePassing
, users no longer need to pay attention to the internal implementation details of message progation. MessagePassing mainly focuses on its UPDATE . AGGREGATION , MESSAGE these three member functions.

When users implement their own GNN, they generally only overwrite the two member functions AGGREGATION and UPDATE , and MESSAGE/Propagate comes with MessagePassing . (The official GCN is like this~)

Our goal is: to implement a simplified version of GCN that is consistent with the official one, and to master how to define graph convolution in PyG by implementing it.

  • First, we first define a graph data data(directed graph, 4 nodes, 3 edges, the feature dimension of each node is 1, and the value is also 1):
import torch
from torch_geometric.data import Data

edge_index = torch.tensor([[1, 2, 3], [0, 0, 0]], dtype=torch.long)
x = torch.tensor([[1], [1], [1], [1]], dtype=torch.float)

data = Data(edge_index=edge_index, x=x)
print(edge_index)
print(data)

insert image description here

  • MessagePassing message passing mechanism

insert image description here
Through the above picture [9], it is easy to understand what the corresponding content of each block of GCN based on message passing is:
message = ϕ \phiϕ; aggregation = □ □ ; Update = γ \gamma γ . Then replace the above message passing formula and get the following new form:
insert image description here
insert image description here
This picture shows the data flow method in this example. Please note: GCN’s default scatter method isadd(as for why, please see the picture below: Because usingmeanInmaxthe case of sum, as the number of GNN network layers increases in each subgraph, the feature distinction before each node becomes smaller and smaller, which does not meet our goals)
insert image description here

  • The implementation of GCN can also be divided into 5 steps:
    insert image description here
  1. Add self-loops to the adjacency matrix. edge_index = A ^ = I N + A \hat A = I_{N} + A A^=IN+A (implemented by modifyingedge_index),implementedthe add_remaining_self_loops[10]

Originaledge_index
insert image description here

After adding self_loopedge_index
insert image description here

  1. Linearly transform node feature matrix. x = Θ W \Theta W Θ W (corresponding in the codeself.matmul(weight, x))
    original inputx
    insert image description here
    obtained through weight transformx
    insert image description here

  2. Normalize node features. norm = D ^ − 0.5 A ^ D ^ − 0.5 \hat D^{-0.5}\hat A \hat D^{-0.5} D^0.5A^D^0 . 5 (in the source codeA ^ \hat AA^ is used to indicate the weight of the edge, which is 1 by default.)

normThe value of has the same length edge_indexas:
insert image description here

  1. Sum up neighboring node features. ∑ i ∈ N ( p ) ( D ^ − 0.5 A ^ D ^ − 0.5 Θ W ) \sum_{i ∈ N(p)}(\hat D^{-0.5}\hat A \hat D^{-0.5} \Theta W) iN(p)(D^0.5A^D^0 . 5 ΘW)
    (Step 4 is implemented in MessagePassing, that is, scatter_add/sum/mean in the above figure, the user does not need to worry) What is indef message(self, x_j, norm)isthe result of expanding to self_loopx_jin step 2x

x_jThe value of:
insert image description here
message(self, x_j, norm)the output of:
insert image description here

  1. Return new node embeddings. Return the result X new X_{new}Xnew.

Because it is scatter_addthe way, the connection relationships of [1, 0], [2, 0], [3, 0] are added to get the final output result:

Obviously
− 0.0330 = − 0.0075 − 0.0075 − 0.0075 − 0.0106 -0.0330 = -0.0075 -0.0075 -0.0075-0.01060.0330=0.00750.00750.00750.0106
2.3680 = 0.5364 + 0.5364 + 0.5364 + 0.7586 2.3680 = 0.5364+0.5364+0.5364+0.7586 2.3680=0.5364+0.5364+0.5364+0 . 7 5 8 6
insert image description here
Similarly, if changed toscatter_max, the result is as follows, because− 0.0075 = max ( − 0.0075 , − 0.0106 ) -0.0075 = max (-0.0075, -0.0106)0.0075=max(0.0075,0.0106), 0.7586 = m a x ( 0.5364 , 0.7586 ) 0.7586 = max(0.5364, 0.7586) 0.7586=max(0.5364,0.7586)
insert image description here

The implementation of these five steps is completely implemented through the following code:

import torch
from torch_scatter import scatter_add
from torch_geometric.nn import MessagePassing
import math

def glorot(tensor):
    if tensor is not None:
        stdv = math.sqrt(6.0 / (tensor.size(-2) + tensor.size(-1)))
        tensor.data.uniform_(-stdv, stdv)


def zeros(tensor):
    if tensor is not None:
        tensor.data.fill_(0)

        
def add_self_loops(edge_index, num_nodes=None):
    print("进入self_loops")
    loop_index = torch.arange(0, num_nodes, dtype=torch.long,
                              device=edge_index.device)
    print(loop_index)
    loop_index = loop_index.unsqueeze(0).repeat(2, 1)
    print(loop_index)
    
    edge_index = torch.cat([edge_index, loop_index], dim=1)
    print(edge_index)
    print("出self_loops")
	# 原来的edge_index为[[1, 2, 3],
	#                   [0, 0, 0]]
    #  这样一来,就在原来的边连接关系edge_index的基础上增加了self_loop的关系.
    #  torch.cat([edge_index, loop_index], dim=1)
    #      tensor([[1, 2, 3, 0, 1, 2, 3],
    #              [0, 0, 0, 0, 1, 2, 3]])

    
    return edge_index


def degree(index, num_nodes=None, dtype=None):
    out = torch.zeros((num_nodes), dtype=dtype, device=index.device)
    print(out.scatter_add_(0, index, out.new_ones((index.size(0)))))
    return out.scatter_add_(0, index, out.new_ones((index.size(0))))
        

class GCNConv(MessagePassing):
    def __init__(self, in_channels, out_channels, bias=True):
    
        super(GCNConv, self).__init__(aggr='add')  # "Add" aggregation.
        # super(GCNConv, self).__init__(aggr='max')  # "Max" aggregation.
        
        self.weight = torch.nn.Parameter(torch.Tensor(in_channels, out_channels))

        if bias:
            self.bias = torch.nn.Parameter(torch.Tensor(out_channels))
        else:
            self.register_parameter('bias', None)
        
        self.reset_parameters()
        
    def reset_parameters(self):
        glorot(self.weight)
        zeros(self.bias)

    def forward(self, x, edge_index):
        # x has shape [N, in_channels]
        # edge_index has shape [2, E]
        
        # Step 1: 为adjacency matrix添加self_loop(通过对edge_index拼接连向自己的边[1, 1], [2, 2]等)
        # 原来的edge_index = tensor([[1, 2, 3],
        #                           [0, 0, 0]])
        # 加上self_loop的index = tensor([[1, 2, 3, 0, 1, 2, 3],
        #                               [0, 0, 0, 0, 1, 2, 3]])
        edge_index = add_self_loops(edge_index, x.size(0))

        # Step 2: 对输入的node feature matrix进行weight transform.
        x = torch.matmul(x, self.weight)

        # Step 3-5: 开始消息传递.
        edge_weight = torch.ones((edge_index.size(1),), 
                                  dtype=x.dtype,
                                  device=edge_index.device)
        row, col = edge_index
        print("row", row)  # row tensor([1, 2, 3, 0, 1, 2, 3])
        print("col", col)  # col tensor([0, 0, 0, 0, 1, 2, 3])
        deg = scatter_add(edge_weight, row, dim=0, dim_size=x.size(0))
        print("deg", deg)  
        # deg是[1, 2, 2, 2], 这是啥?
        # 因为
        # row = [1, 2, 3, 0, 1, 2, 3]
        # edge_weight = [1, 1, 1, 1, 1, 1, 1]
        # 所以,主对角上,第0个对应1,第1个对应2个,同理,得到degree矩阵. 这里只返回主对角的元素, 避免稀疏乘.
        
        deg_inv_sqrt = deg.pow(-0.5)
        deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0
        
        # 读edge_weight为None的情况, 
        # deg_inv_sqrt[row] * edge_weight *  deg_inv_sqrt[col] == deg_inv_sqrt[row] *  deg_inv_sqrt[col]
        norm = deg_inv_sqrt[row] * edge_weight *  deg_inv_sqrt[col]
        print(norm)
        # norm = tensor([0.7071, 0.7071, 0.7071, 1.0000, 0.5000, 0.5000, 0.5000])
        
        return self.propagate(edge_index, x=x, norm=norm)           


    def message(self, x_j, norm):
        # x_j has shape [E, out_channels]
        # norm: 规则化后的权重.
        return norm.view(-1, 1) * x_j if norm is not None else x_j                  
        


    def update(self, aggr_out):
        # aggr_out has shape [N, out_channels]

        # Step 5: 返回新的node embeddings.
        if self.bias is not None:
            return aggr_out + self.bias
        else:
            return aggr_out

Conduct experiments and get the same effect as the official implementation:
insert image description here

5.2 Training on the Cora Citation dataset

Here we use the official GCN to make a 2-layer GNN network to train the Cora Citation data set. If everything is ok, copy the following code directly to your local computer and run it ~

import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.datasets import Planetoid

# 5.1) 加载Cora数据集.(自动帮你下载)
dataset = Planetoid(root='/home/pyG/Cora', name='Cora')

# 5.2) 定义2层GCN的网络.
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = GCNConv(dataset.num_node_features, 16)
        self.conv2 = GCNConv(16, dataset.num_classes)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index

        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)

        return F.log_softmax(x, dim=1)
        
# 5.3) 训练 & 测试.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Net().to(device)
data = dataset[0].to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

model.train()
for epoch in range(200):
    optimizer.zero_grad()
    out = model(data)
    loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
model.eval()
_, pred = model(data).max(dim=1)
correct = float (pred[data.test_mask].eq(data.y[data.test_mask]).sum().item())
acc = correct / data.test_mask.sum().item()
print('Accuracy: {:.4f}'.format(acc))
# >>> Accuracy: 0.8150

At this point, a complete GNN based on GCN is ready. As for the training data processing and many details, you need to hack the source code. I wish you all a happy learning~

References

[1] PyG official Tutorial
[2] torch_geometric.data.Data
[3] torch-scatter
[4] advanced mini-batching of PyG
[5] GCN: Semi-supervised Classfication with Graph Convolutional Networks
[6] [Actually very simple] Laplacian operator and Laplacian moment
[7] Introduction to GNN Jiang Chenghan of National Taiwan University
[8] Torch geometric GCNConv source code analysis
[9] MessagePassing
[10] add_remaining_self_loops

Guess you like

Origin blog.csdn.net/g11d111/article/details/105505642