【Graph Neural Network】Take you to get started quickly with OpenHGNN

1. Evaluate new datasets

You can specify your own dataset if desired. In this section, we use HGBn-ACMas an example a node classification dataset.

1.1 How to build a new dataset

Step 1: Preprocessing the dataset A demonstration of
processing is given here , which is a node classification dataset .HGBn-ACM

First, download the HGBn-ACM dataset: HGB dataset . Once the download is complete, it needs to be processed into one dgl.heterograph.

The following code snippet is an example of creating a heterogeneous graph in DGL.

import dgl
import torch as th

graph_data = {
    
    
    ('drug','interacts', 'drug'): (th.tensor([0,1]), th.tensor([1,2])),
    ('drug','interacts', 'gene'): (th.tensor([0,1]), th.tensor([2,3])),
    ('drug','treats','disease'): (th.tensor([1]), th.tensor([2]))
}
graph_data

graph_data
canonical_etypes
It is recommended to set the feature name to h:

g.nodes['drug'].data['h'] = th.ones(3, 1)

DGL provides dgl.save_graphs()and dgl.load_graphs()represents saving and loading heterogeneous graphs in binary form, respectively. So here's how to dgl.save_graphssave graphs to disk:

dgl.save_graphs('demo_graph.bin',g)

Step 2: Add additional information
After the first step, get a demo_graph.binbinary file, and then we move it to openhgnn/dataset/the directory. The specific information for the next step is in NodeClassificationDataset.py

For example, we set category, num_classes, and multi_label(if necessary) to paper, 3and True, which represent the node type to predict the class, the number of classes, and whether the task is multi-label classification, respectively. See Basic Node Classification Dataset for details.
load dgl
Add additional information:

if name_dataset == 'demo_graph':
    data_path = './openhgnn/dataset/demo_graph.bin'
    g, _ = load_graphs(data_path)
    g = g[0].long()
    self.category = 'author'  # 增加额外的信息
    self.num_classes = 4
    self.multi_label = False

Step 3: optional
Using demo_graphas dataset, evaluate an existing model:

python main.py -m GTN -d demo_graph -t node_classification -g 0 --use_best_config

If there is another dataset name, you need to modify the code build_dataset

2. Use a new model

In this part, we create a model called RGAT, which is not in our model package <api-model>.

2.1 How to build a new model

Step 1: Registrar Model
We create a class that inherits from the base model (Base Model) RGATand @register_model(str)register the model with .

from openhgnn.models import BaseModel, register_model
@register_model('RGAT')
class RGAT(BaseModel):
    ...

The second step: implement the function
must implement the class method build_model_from_args, other functions like __init__, forwardetc.

...
class RGAT(BaseModel):
    @classmethod
    def build_model_from_args(cls, args, hg):
        return cls(in_dim=args.hidden_dim,
                   out_dim=args.hidden_dim,
                   h_dim=args.out_dim,
                   etypes=hg.etypes,
                   num_heads=args.num_heads,
                   dropout=args.dropout)

    def __init__(self, in_dim, out_dim, h_dim, etypes, num_heads, dropout):
        super(RGAT, self).__init__()
        self.rel_names = list(set(etypes))
        self.layers = nn.ModuleList()
        self.layers.append(RGATLayer(
            in_dim, h_dim, num_heads, self.rel_names, activation=F.relu, dropout=dropout))
        self.layers.append(RGATLayer(
            h_dim, out_dim, num_heads, self.rel_names, activation=None))
        return

    def forward(self, hg, h_dict=None):
        if hasattr(hg, 'ntypes'):
            # full graph training,
            for layer in self.layers:
                h_dict = layer(hg, h_dict)
        else:
            # minibatch training, block
            for layer, block in zip(self.layers, hg):
                h_dict = layer(block, h_dict)
        return h_dict

Here we do not give RGATLayerimplementation details. For more reading, check out: RGATLayer .
In OpenHGNN, we preprocess the features of the dataset outside the model. Specifically, a linear layer with a bias for each node type is used to map all node features into a shared feature space. Therefore, forwardthe parameters in the model h_dictare not raw features, and your model does not need feature preprocessing.
Step 3: Adding to the supported models dictionary
We should add a new entry to the model/init.py .SUPPORTED _ MODELS

3. Apply to a new scene

In this section, we apply to a recommendation scenario that involves constructing a new task and training stream.

3.1 How to build a new task

Step 1: Register the task
Create a class Recommendationthat inherits the built-in BaseTask and register it with @register_task(str).

from openhgnn.tasks import BaseTask, register_task
@register_task('recommendation')
class Recommendation(BaseTask):
    ...

Step 2: Implement methods
We should implement methods related to evaluation metrics and loss functions.

class Recommendation(BaseTask):
    """Recommendation tasks."""
    def __init__(self, args):
        super(Recommendation, self).__init__()
        self.n_dataset = args.dataset
        self.dataset = build_dataset(args.dataset, 'recommendation')
        self.train_hg, self.train_neg_hg, self.val_hg, self.test_hg = self.dataset.get_split()
        self.evaluator = Evaluator(args.seed)

    def get_loss_fn(self):
        return F.binary_cross_entropy_with_logits

    def evaluate(self, y_true, y_score, name):
        if name == 'ndcg':
            return self.evaluator.ndcg(y_true, y_score)

Finally
in tasks/init.py , add a new entity to the SUPPORTED_TASKS.

3.2 How to build a new trainerflow

Step 1: Register trainerflow
Create a class, inherit BaseFlow , and use @register_trainer(str) to register trainerflow.

from openhgnn.trainerflow import BaseFlow, register_flow
@register_flow('Recommendation')
class Recommendation(BaseFlow):
    ...

Step 2: Implement the method
We declare the function train()as an abstract method. Therefore, train() must be rewritten, otherwise trainerflow cannot be instantiated. An example of a training loop is given below.

...
class Recommendation(BaseFlow):
    def __init__(self, args=None):
        super(Recommendation, self).__init__(args)
        self.target_link = self.task.dataset.target_link
        self.model = build_model(self.model).build_model_from_args(self.args, self.hg)
        self.evaluator = self.task.get_evaluator(self.metric)

    def train(self,):
        for epoch in epoch_iter:
            self._full_train_step()
            self._full_test_step()

    def _full_train_step(self):
        self.model.train()
        logits = self.model(self.hg)[self.category]
        loss = self.loss_fn(logits[self.train_idx], self.labels[self.train_idx])
        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()
        return loss.item()

    def _full_test_step(self, modes=None, logits=None):
        self.model.eval()
        with torch.no_grad():
            loss = self.loss_fn(logits[mask], self.labels[mask]).item()
            metric = self.task.evaluate(pred, name=self.metric, mask=mask)
            return metric, loss

Finally add a new entity to
trainerflow /init.pySUPPORT_FLOWS .

content source

  1. Developer_Guide

Guess you like

Origin blog.csdn.net/ARPOSPF/article/details/130889563