Take you to get started quickly with OpenHGNN
1. Evaluate new datasets
You can specify your own dataset if desired. In this section, we use HGBn-ACM
as an example a node classification dataset.
1.1 How to build a new dataset
Step 1: Preprocessing the dataset A demonstration of
processing is given here , which is a node classification dataset .HGBn-ACM
First, download the HGBn-ACM dataset: HGB dataset . Once the download is complete, it needs to be processed into one dgl.heterograph
.
The following code snippet is an example of creating a heterogeneous graph in DGL.
import dgl
import torch as th
graph_data = {
('drug','interacts', 'drug'): (th.tensor([0,1]), th.tensor([1,2])),
('drug','interacts', 'gene'): (th.tensor([0,1]), th.tensor([2,3])),
('drug','treats','disease'): (th.tensor([1]), th.tensor([2]))
}
graph_data
It is recommended to set the feature name to h
:
g.nodes['drug'].data['h'] = th.ones(3, 1)
DGL provides dgl.save_graphs()
and dgl.load_graphs()
represents saving and loading heterogeneous graphs in binary form, respectively. So here's how to dgl.save_graphs
save graphs to disk:
dgl.save_graphs('demo_graph.bin',g)
Step 2: Add additional information
After the first step, get a demo_graph.bin
binary file, and then we move it to openhgnn/dataset/
the directory. The specific information for the next step is in NodeClassificationDataset.py
For example, we set category, num_classes, and multi_label(if necessary) to paper
, 3
and True
, which represent the node type to predict the class, the number of classes, and whether the task is multi-label classification, respectively. See Basic Node Classification Dataset for details.
Add additional information:
if name_dataset == 'demo_graph':
data_path = './openhgnn/dataset/demo_graph.bin'
g, _ = load_graphs(data_path)
g = g[0].long()
self.category = 'author' # 增加额外的信息
self.num_classes = 4
self.multi_label = False
Step 3: optional
Using demo_graph
as dataset, evaluate an existing model:
python main.py -m GTN -d demo_graph -t node_classification -g 0 --use_best_config
If there is another dataset name, you need to modify the code build_dataset
2. Use a new model
In this part, we create a model called RGAT, which is not in our model package <api-model>.
2.1 How to build a new model
Step 1: Registrar Model
We create a class that inherits from the base model (Base Model) RGAT
and @register_model(str)
register the model with .
from openhgnn.models import BaseModel, register_model
@register_model('RGAT')
class RGAT(BaseModel):
...
The second step: implement the function
must implement the class method build_model_from_args
, other functions like __init__
, forward
etc.
...
class RGAT(BaseModel):
@classmethod
def build_model_from_args(cls, args, hg):
return cls(in_dim=args.hidden_dim,
out_dim=args.hidden_dim,
h_dim=args.out_dim,
etypes=hg.etypes,
num_heads=args.num_heads,
dropout=args.dropout)
def __init__(self, in_dim, out_dim, h_dim, etypes, num_heads, dropout):
super(RGAT, self).__init__()
self.rel_names = list(set(etypes))
self.layers = nn.ModuleList()
self.layers.append(RGATLayer(
in_dim, h_dim, num_heads, self.rel_names, activation=F.relu, dropout=dropout))
self.layers.append(RGATLayer(
h_dim, out_dim, num_heads, self.rel_names, activation=None))
return
def forward(self, hg, h_dict=None):
if hasattr(hg, 'ntypes'):
# full graph training,
for layer in self.layers:
h_dict = layer(hg, h_dict)
else:
# minibatch training, block
for layer, block in zip(self.layers, hg):
h_dict = layer(block, h_dict)
return h_dict
Here we do not give RGATLayer
implementation details. For more reading, check out: RGATLayer .
In OpenHGNN, we preprocess the features of the dataset outside the model. Specifically, a linear layer with a bias for each node type is used to map all node features into a shared feature space. Therefore, forward
the parameters in the model h_dict
are not raw features, and your model does not need feature preprocessing.
Step 3: Adding to the supported models dictionary
We should add a new entry to the model/init.py .SUPPORTED _ MODELS
3. Apply to a new scene
In this section, we apply to a recommendation scenario that involves constructing a new task and training stream.
3.1 How to build a new task
Step 1: Register the task
Create a class Recommendation
that inherits the built-in BaseTask and register it with @register_task(str).
from openhgnn.tasks import BaseTask, register_task
@register_task('recommendation')
class Recommendation(BaseTask):
...
Step 2: Implement methods
We should implement methods related to evaluation metrics and loss functions.
class Recommendation(BaseTask):
"""Recommendation tasks."""
def __init__(self, args):
super(Recommendation, self).__init__()
self.n_dataset = args.dataset
self.dataset = build_dataset(args.dataset, 'recommendation')
self.train_hg, self.train_neg_hg, self.val_hg, self.test_hg = self.dataset.get_split()
self.evaluator = Evaluator(args.seed)
def get_loss_fn(self):
return F.binary_cross_entropy_with_logits
def evaluate(self, y_true, y_score, name):
if name == 'ndcg':
return self.evaluator.ndcg(y_true, y_score)
Finally
in tasks/init.py , add a new entity to the SUPPORTED_TASKS
.
3.2 How to build a new trainerflow
Step 1: Register trainerflow
Create a class, inherit BaseFlow , and use @register_trainer(str) to register trainerflow.
from openhgnn.trainerflow import BaseFlow, register_flow
@register_flow('Recommendation')
class Recommendation(BaseFlow):
...
Step 2: Implement the method
We declare the function train()
as an abstract method. Therefore, train() must be rewritten, otherwise trainerflow cannot be instantiated. An example of a training loop is given below.
...
class Recommendation(BaseFlow):
def __init__(self, args=None):
super(Recommendation, self).__init__(args)
self.target_link = self.task.dataset.target_link
self.model = build_model(self.model).build_model_from_args(self.args, self.hg)
self.evaluator = self.task.get_evaluator(self.metric)
def train(self,):
for epoch in epoch_iter:
self._full_train_step()
self._full_test_step()
def _full_train_step(self):
self.model.train()
logits = self.model(self.hg)[self.category]
loss = self.loss_fn(logits[self.train_idx], self.labels[self.train_idx])
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
return loss.item()
def _full_test_step(self, modes=None, logits=None):
self.model.eval()
with torch.no_grad():
loss = self.loss_fn(logits[mask], self.labels[mask]).item()
metric = self.task.evaluate(pred, name=self.metric, mask=mask)
return metric, loss
Finally add a new entity to
trainerflow /init.pySUPPORT_FLOWS
.