Realization of Heterogeneous Graph Neural Network RGCN, RGAT, HAN, GNN-FILM + PyG

15929765:

background

ICDM 2022: The detection of risky commodities on a large-scale e-commerce graph requires classification on a heterogeneous graph. Since it is an anomaly detection, the positive and negative sample data sets are 1:10. Record the preliminary process.

data

insert image description here

insert image description here

process

The competition officially open-sourced the baseline implemented by PyG, which was directly used for preprocessing data. After preprocessing the graph structure, a pt file was obtained, and the pt file was used for subsequent processing:

graph = torch.load(dataset)   //dataset = "xxx.pt"
graph[type].x = [num_nodes , 256]  点数*特征维度
graph[type].y = [num_nodes] 标签=label
graph[type].num_nodes = 数量 
graph[type].maps = id 离散化映射:针对不同的type重新从0开始记录id
# 异构图特殊存边方式,需要指定两个点的种类和边的种类。
graph[(source_type, edge_type, dest_type)].edge_index = (source,dest) [2, num_edges] 
# 借鉴GraphSage的邻居采样dataload,每次训练不使用整张图,可以分batch
train_loader = NeighborLoader(graph, input_nodes=('要分类的type', train_idx),
                                  num_neighbors=[a] * b 往外采样b层,每层每种边a个,内存够a可以填-1 ,  
                                  shuffle=True, batch_size=128)

for batch in train_loader():
batch['item'].batch_size = 128
batch['item'].x  =[num, 256] 前batch_size个是要预测的点,其他为采样出来的点。
batch['item'].y =[num] 前batch_size个是预测点的label,其他无用。

batch = batch.to_homogeneous() 转化为同构图
batch.x = [所有点数量, 256] 
batch.edge_idx = [2, 所有边数量]  记录所有边
batch.edge_type = [所有边数量] 记录边的类型

model(batch.x,batch.edge_index,batch.edge_type)

RGCN

RGCN is relatively simple. In fact, it uses the idea of ​​GCN to deal with isomorphic graphs and applies it to handle heterogeneous graphs.

The basic idea of ​​GCN is to calculate the embedding of node i in the next layer, take out the embedding of nodes adjacent to i in the previous layer and the embedding of node i itself, and multiply these embeddings by the change weight matrix W to be learned by the corresponding network. Then multiply the identity matrix and the normalization matrix, and use the same W for each layer, analogous to convolution.

RGCN is very simple. Don’t heterogeneous graphs have many kinds of edges? I separate the different kinds of edges, and each relationship is a graph, so that the top of this graph is the same. Of course, use GCN to share the W matrix. Find Find the embedding of node i under this relationship, and finally integrate the embeddings of all relationships, add a random weight, and activate it with a relu to complete.

from torch_geometric.nn import RGCNConv

class RGCN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, n_layers=2, dropout=0.5):
        super().__init__()
        self.convs = torch.nn.ModuleList()
        self.relu = F.relu
        self.dropout = dropout
        self.convs.append(RGCNConv(in_channels, hidden_channels, num_relations))
        for i in range(n_layers - 2):
            self.convs.append(RGCNConv(hidden_channels, hidden_channels, num_relations))
        self.convs.append(RGCNConv(hidden_channels, out_channels, num_relations))
      
    def forward(self, x, edge_index, edge_type):
        for conv, norm in zip(self.convs, self.norms):
            x = norm(conv(x, edge_index, edge_type))
            x = F.relu(x)
            x = F.dropout(x, p=self.dropout, training=self.training
        return x

RGAT

Since the W of each layer of RGCN is fixed and not flexible enough, the attention mechanism is added. After all, everything can be noticed.

Let me talk about the changes of GAT on GCN first. When calculating the embedding of node i, it still takes out the nodes adjacent to it and its own embedding. For each such node j, splicing the embeddings of nodes i and j into Double the length, and then calculate a self-attention, as if it is a single-layer feed-forward network, and get the weight of node j relative to node i.

insert image description here

Like RGAT, work hard on the relationship, and use the relationship features to calculate an attention.
insert image description here
The fusion of the last two
insert image description here
RGAT can be regarded as an evolution of RGCN, and it will degenerate into RGCN when attention does not work.

But the actual combat and RGCN are on par, and even inferior to RGCN in the scene of this competition. See the paper for the reason:
insert image description here

  1. After RGAT completes the task better through the attention mechanism, it is difficult to find the point where the attention is set to a normalized constant under the effect of the feedback of the loss mechanism.
  2. RGCN will improve the effect by memorizing samples on some tasks, but the RGAT model is more complex and the probability of this happening is lower.
from torch_geometric.nn import RGATConv

class RGAT(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, n_layers=2, n_heads=3):
        super().__init__()
        self.convs = torch.nn.ModuleList()
        self.relu = F.relu
        self.convs.append(RGATConv(in_channels, hidden_channels, num_relations, heads=n_heads,
                                   concat=False))
        for i in range(n_layers - 2):
            self.convs.append(RGATConv(hidden_channels, hidden_channels, num_relations,
                                       heads=n_heads, concat=False))
        self.convs.append(RGATConv(hidden_channels, hidden_channels, num_relations, 
                                   heads=n_heads, concat=False))
        self.lin1 = torch.nn.Linear(hidden_channels, out_channels)
        
    def forward(self, x, edge_index, edge_type):
        for i, conv in enumerate(self.convs):
            x = conv(x, edge_index, edge_type)
            x = x.relu_()
            x = F.dropout(x, p=0.2, training=self.training
        x = self.lin1(x)
        return x

Heterogeneous Graph Attention Network (HAN HGAT)

Set up multiple matapaths (paths) based on expert experience: point, edge, point, edge, point...
insert image description here

For different matapaths, node i gets all its neighbor nodes j for the path.

1. Points and points calculate attention and sum. Use a multi-head attention mechanism.
insert image description here

2. When all relationships are to be aggregated, an attention is counted, where q, w, and b are shared.
insert image description here
The effect in the experiment is very poor. It may be that my matapath setting is not good, and the multi-head attention training time is too long. My RGCN only needs 5 minutes for an epoch, and it takes 480 minutes.

from torch_geometric.nn import HANConv
labeld_class = 'item'
class HAN(torch.nn.Module):
    def __init__(self, in_channels: Union[int, Dict[str, int]],
                 out_channels: int, hidden_channels=16, heads=4, n_layers=2):
        super().__init__()
        self.convs = torch.nn.ModuleList()
        self.relu = F.relu
        self.convs.append(HANConv(in_channels, hidden_channels, heads=heads, dropout=0.6,
                                  metadata=metada))
        for i in range(n_layers - 1):
            self.convs.append(HANConv(hidden_channels, hidden_channels, heads=heads, dropout=0.6,
                                      metadata=metada))
        self.lin = torch.nn.Linear(hidden_channels, out_channels)

    def forward(self, x_dict, edge_index_dict):
        for i, conv in enumerate(self.convs):
            x_dict = conv(x_dict, edge_index_dict)
        x_dict = self.lin(x_dict[labeled_class])
        return x_dict    

GNN-Film (linear feature adjustment)

Compared with RGCN, the changes are similar to RGAT, and the weights are also changed. A simple feedforward network is added:
insert image description here
the advantage is that when calculating the weight, it adds an affine transformation, which is equivalent to using a neural network to calculate parameters. Then use b and y as weights to adjust embedding.

In the experiment, the effect is surprisingly good, the training is fast, and the effect exceeds RGCN.

class GNNFilm(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, n_layers,
                 dropout=0.5):
        super().__init__()
        self.dropout = dropout
        self.convs = torch.nn.ModuleList()
        self.convs.append(FiLMConv(in_channels, hidden_channels, num_relations))
        for _ in range(n_layers - 1):
            self.convs.append(FiLMConv(hidden_channels, hidden_channels, num_relations))
        self.norms = torch.nn.ModuleList()
        for _ in range(n_layers):
            self.norms.append(BatchNorm1d(hidden_channels))
        self.lin_l = torch.nn.Sequential(OrderedDict([
            ('lin1', Linear(hidden_channels, int(hidden_channels//4), bias=True)),
            ('lrelu', torch.nn.LeakyReLU(0.2)),
            ('lin2', Linear(int(hidden_channels//4),out_channels, bias=True))]))
        
    def forward(self, x, edge_index, edge_type):
        for conv, norm in zip(self.convs, self.norms):
            x = norm(conv(x, edge_index, edge_type))
            x = F.dropout(x, p=self.dropout, training=self.training)
        x = self.lin_l(x)
        return x

Summarize

RGCN, RGAT, GNN-FILM code replacement is very simple, the training code does not need to be changed at all, just change the model code, you can try the effect of all three, HAN is used with caution, the effect is too dependent on the matapath setting, the training time is still long, not worth.

Guess you like

Origin blog.csdn.net/yzsjwd/article/details/126288084