[PyG] Graph conversion with networkx

In the process of using the graph neural network, it is often necessary to use related GNN libraries. Among these GNN libraries, a more efficient and popular graph neural network library is the PyG library in PyTorch. PyG provides many classic graph neural network models and graph data sets. Usually, when using the PyG framework to build and train graph models, it is necessary to select an appropriate graph data structure to construct graphs. The options provided by PyG include Data, HeteroData, TemporalData . In the course of the experiment, it may be necessary to use some functions provided by networkx to implement graph-related operations. At this time, the graph data needs to be converted between the graph structures provided by the two frameworks. Based on this, this article mainly focuses on the conversion operation Organized and summarized.

1. Data preparation

This article takes a simple graph as an example. The isomorphic graph and heterogeneous graph (undirected graph) are as follows:

insert image description here

1. Build a PyG isomorphic graph

import torch
from torch_geometric.data import Data

data = Data()

# 初始化节点特征
data.x = torch.tensor([[-1], [0], [1]], dtype=torch.float)

# 初始化边索引
data.edge_index = torch.tensor([[0, 1, 1, 2],
	                            [1, 0, 2, 1]], dtype=torch.long)

2. Construct PyG heterogeneous graph

import torch
from torch_geometric.data import HeteroData

data = HeteroData()  
  
# 初始化结点特征  
# [num_papers, num_features_paper]  
data['paper'].x = torch.tensor([[0, 1, 2]], dtype=torch.float)  
# [num_authors, num_features_author]  
data['author'].x = torch.tensor([[-1], [1]], dtype=torch.float)  
  
# 初始化边索引  
# [2, num_edges_writes]  
data['author', 'writes', 'paper'].edge_index = torch.tensor([[0, 1],  
                                                             [0, 0]], dtype=torch.long)

data['paper', 'belongs', 'author'].edge_index = torch.tensor([[0, 0],  
                                                              [0, 1]], dtype=torch.long)

3. Build a networkx isomorphic graph

import networkx as nx

# 创建无向图
G = nx.Graph()

# 两种添加节点的方式 add_node 和 add_nodes_from
G.add_nodes_from([0, 1, 2])

# 两种添加连边的方式,add_edge 和 add_edges_from
G.add_edges_from([[0, 1], [1, 2]])

4. Build networkx heterogeneous graph

import networkx as nx  
  
# 创建无向图  
G = nx.Graph()  
  
# 为节点添加 type 属性(属性名可自定义)来区分节点类型  
G.add_nodes_from([0, 2], type='author')  
G.add_nodes_from([1], type='paper')  
  
# 为连边添加 type 属性(属性名可自定义)来区分连边类型  
G.add_edges_from([[0, 1], [1, 2]], type='writes')  
  
# 获取节点 & 连边类型  
node_labels = nx.get_node_attributes(G, 'type')  
edge_labels = nx.get_edge_attributes(G, 'type')

2. Isomorphic graph conversion

1. PyG to networkx

(1) Use the to_networkx method to directly convert

from torch_geometric.utils.convert import to_networkx

G = to_networkx(data)
  • Advantages: simple, efficient
  • Disadvantages: cannot handle large graphs (insufficient memory)

(2) Convert by adding nodes and edges

import numpy as np

G = nx.Graph()

# 使用 add_nodes_from 批处理的效率比 add_node 高
G.add_nodes_from([i for i in range(data.x.shape[0])])

# 使用 add_edges_from 批处理的效率比 add_edge 高
edges = np.array(data.edge_index.T, dtype=int)
G.add_edges_from(edges)

  • Advantages: Suitable for larger graphs
  • Disadvantages: more complicated

2. Networkx to PyG

import torch
import numpy as np

# 创建节点特征矩阵
x = torch.ones((G.number_of_nodes(),1), dtype=torch.float)

# 获取图G邻接矩阵的稀疏表示
adj = nx.to_scipy_sparse_array(G).tocoo()

# 获取非零元素行索引
row = torch.from_numpy(adj.row.astype(np.int64)).to(torch.long)
# 获取非零元素列索引
col = torch.from_numpy(adj.col.astype(np.int64)).to(torch.long)

# 将行和列进行拼接,shape变为[2, num_edges], 包含两个列表,第一个是row, 第二个是col
edge_index = torch.stack([row, col], dim=0)

data = Data(x=x, edge_index=edge_index)

3. Heterogeneous graph conversion

1. PyG to networkx

(1) Use the to_networkx method to directly convert

from torch_geometric.utils.convert import to_networkx

data = data.to_homogeneous()   
G = to_networkx(data)
  • Advantages: simple, efficient
  • Disadvantages: cannot handle large graphs (insufficient memory)

(2) Convert by adding nodes and edges

import numpy as np

G = nx.Graph()  
  
# 需要为节点重新排序  
node_num = 0  
nt_start = {
    
    }  
for nt in data.node_types:  
    nt_start[nt] = node_num  
    node_num += data[nt].x.shape[0]  
  
# 使用 add_nodes_from 批处理的效率比 add_node 高  
for nt in data.node_types:  
    G.add_nodes_from([nt_start[nt] + i for i in range(data[nt].x.shape[0])], node_type=nt)  
  
# 使用 add_edges_from 批处理的效率比 add_edge 高  
for et in data.edge_types:  
    edges = np.array(data[et].edge_index.T, dtype=int)  
    G.add_edges_from([[nt_start[et[0]] + e[0], nt_start[et[2]] + e[0]] for e in edges],  
                     edge_type=et[1])
  • Advantages: Suitable for larger graphs
  • Disadvantages: more complicated

2. Networkx to PyG

It is not common to convert a heterogeneous graph into a PyG graph structure using the networkx framework. Usually, the graph is created in PyG, but in order to draw the graph structure, it needs to be converted to a graph under the networkx framework, and then drawn using the interface provided by networkx .

Guess you like

Origin blog.csdn.net/zzy_NIC/article/details/127996911