图论的基本概念与pyG的环境配置

一、图的表示

该内容大多是来自于https://cse.msu.edu/~mayao4/dlg_book/chapters/chapter2.pdf
在这里插入图片描述

二、pytorch环境配置

1.安装1.8.1版本的pytorch和11.1版本的cudatoolkit。

conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch-lts -c nvidia

2.确认是否正确安装

python -c "import torch; print(torch.__version__)"
python -c "import torch; print(torch.version.cuda)"

3.安装正确版本的PyG

pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cu111.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.8.0+cu111.html
pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-1.8.0+cu111.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.8.0+cu111.html
pip install torch-geometric

Data类——PyG中图的表示及其使用

以作业为例

from torch_geometric.data import Data
import torch
'论文-机构-作者网络'
class Data(object):
    '继承data并重写一些函数'
    def __init__(self, x_org=None,x_author=None,x_thesis=None, edge_author_org_index=None, edge_author_thesis_index=None, edge_author_org_attr=None,edge_author_thesis_attr=None, y=None, **kwargs):
        '''
        :param x_org:表示机构，节点属性矩阵，大小为[num_nodes,num_node_features]
        :param x_author:表示作者，节点属性矩阵，大小为[num_nodes,num_node_features]
        :param x_thesis:表示论文，节点属性矩阵，大小为[num_nodes,num_node_features]
        :param edge_author_org_index:表示作者-机构的边，边索引矩阵，大小为`[2, num_edges]`，第0行为尾节点，第1行为头节点，头指向尾
        :param edge_author_thesis_index:表示作者-论文的边，边索引矩阵，大小为`[2, num_edges]`，第0行为尾节点，第1行为头节点，头指向尾
        :param edge_author_org_attr:对应边的属性矩阵，大小为`[num_edges, num_edge_features]`，默认无属性None
        :param edge_author_thesis_attr:对应边的属性矩阵，大小为`[num_edges, num_edge_features]`，默认无属性None
        :param y:(Tensor, optional)节点、图或者是边的标签，任意大小
        :param kwargs:其他参数
        '''
        #用不用的属性存储不同节点的属性
        self.x_org = x_org#机构类节点
        self.x_author=x_author#作者类节点
        self.x_thesis=x_thesis#论文类节点 
        self.edge_author_org_index = edge_author_org_index#作者-机构边的序号
        self.edge_author_thesis_index = edge_author_thesis_index#作者-论文边的序号
        self.edge_author_org_attr = edge_author_org_attr
        self.edge_author_thesis_attr = edge_author_thesis_attr
        self.y = y#标签

        for key, item in kwargs.items():
            if key == 'num_nodes':
                self.__num_nodes__ = item
            else:
                self[key] = item
    @property
    def num_nodes_org(self):
        return self.x_org.shape[0]
    @property
    def num_nodes_author(self):
        return self.x_author.shape[0]
    @property
    def num_nodes_thesis(self):
        return self.x_thesis.shape[0]
    @classmethod
    def from_dict(cls, dictionary):
        r"""Creates a data object from a python dictionary."""
        data = cls()
        for key, item in dictionary.items():
            data[key] = item
        return data

if __name__ == "__main__":
    # 假设机构为4，作者为5，论文共6篇
    x_org = torch.randn(4, 10)
    x_author = torch.randn(5, 7)
    x_thesis = torch.randn(6, 9)
    # 节点连接关系
    edge_author_org = torch.tensor([
        [0, 1, 2, 3, 4],
        [5, 5, 5, 6, 7],
    ])
    edge_author_thesis = torch.tensor([
        [8, 9, 10, 11],
        [5, 6, 7, 5],
    ])
    data = Data(x_org, x_author, x_thesis, edge_author_org_index=edge_author_org, edge_author_thesis_index=edge_author_thesis)
    print("机构节点的个数是"+str(data.num_nodes_org))
    print("作者节点的个数是"+str(data.num_nodes_author))
    print("论文节点的个数是"+str(data.num_nodes_thesis))

Dataset类——PyG中图数据集的表示及其使⽤

⽣成数据集对象并分析数据集

如下⽅代码所示，在PyG中⽣成⼀个数据集是简单直接的。在第⼀次⽣成PyG内置的数据集时，程序⾸先下载原始⽂件，然后将原始⽂件处理成包含 Data 对象的 Dataset 对象并保存到⽂件。

from torch_geometric.datasets import Planetoid
dataset = Planetoid(root='/dataset/Cora', name='Cora')
# Cora()
len(dataset)
# 1
dataset.num_classes
# 7
dataset.num_node_features
# 1433
data = dataset[0]
# Data(edge_index=[2, 10556], test_mask=[2708],
#         train_mask=[2708], val_mask=[2708], x=[2708, 1433], y=[2708])
data.is_undirected()
# True
data.train_mask.sum().item()
# 140
data.val_mask.sum().item()
# 500
data.test_mask.sum().item()
# 1000

该数据集包含的唯一的图，有2708个节点，节点特征为1433维，有10556条边，有140个用作训练集的节点，有500个用作验证集的节点，有1000个用作测试集的节点。

定义好图神经网络模型之后就可以用来训练了。

model = Net().to(device)
data = dataset[0].to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
model.train()
for epoch in range(200):
    optimizer.zero_grad()
    out = model(data)
    loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()

作业已经在Data类的构建中体现出来了。
其实自己可以具体思考图论与谱聚类的一个区别和联系是什么，为什么谱聚类需要用到图的关系呢？
参考链接：https://www.cnblogs.com/pinard/p/6221564.html