[Caso de combate de la red neuronal de Pytorch] 46 clasificación en papel basada en el conjunto de datos de Cora para implementar la red neuronal de atención gráfica GAT

La característica del mecanismo de atención es que toma un vector de entrada de longitud variable y toma decisiones enfocando la atención en las partes más relevantes. El mecanismo de atención combina los métodos de RNN o CNN.

1 Descripción del combate real

[Objetivo principal: utilizar el mecanismo de atención en la red neuronal gráfica para completar la estructura y construcción de la red neuronal gráfica de atención]

1.1 Para lograr el propósito

Hay un conjunto de datos que registra la información de los artículos.El conjunto de datos contiene las palabras clave y la información de clasificación de cada artículo, así como la información de citas mutuas entre artículos. Cree un modelo de IA, analice la información del papel en el conjunto de datos y haga que el modelo aprenda las características de clasificación de los documentos existentes para predecir la categoría de los documentos con clasificaciones desconocidas.

1.2 Gráfico Red de atención Gráfico

Graph Attention Network (GAT) agrega una capa de autoatención oculta encima de GCN. Al apilar capas de Autoatención, se pueden asignar diferentes pesos a diferentes vértices dentro de la vecindad durante el proceso de convolución, mientras se trata con vecindades de diferentes tamaños.

En el cálculo real, el mecanismo de autoatención puede usar múltiples conjuntos de pesos para calcular al mismo tiempo y no compartir pesos entre sí, de modo que los vértices puedan determinar si la correlación del conocimiento es insignificante.

 2 Escritura de código

La red gráfica que se construirá esta vez.

2.1 Combate de código: Introducir módulos básicos y configurar el entorno de ejecución----Cora_GAT.py (Parte 1)

from pathlib import Path # 引入提升路径的兼容性
# 引入矩阵运算的相关库
import numpy as np
import pandas as pd
from scipy.sparse import coo_matrix,csr_matrix,diags,eye
# 引入深度学习框架库
import torch
from torch import nn
import torch.nn.functional as F
# 引入绘图库
import matplotlib.pyplot as plt
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

# 1.1 导入基础模块,并设置运行环境
# 输出计算资源情况
device = torch.device('cuda')if torch.cuda.is_available() else torch.device('cpu')
print(device) # 输出 cuda

# 输出样本路径
path = Path('./data/cora')
print(path) # 输出 cuda

Resultado de salida:

2.2 Implementación de código: lectura y análisis de datos en papel----Cora_GAT.py (Parte 2)

# 1.2 读取并解析论文数据
# 读取论文内容数据,将其转化为数据
paper_features_label = np.genfromtxt(path/'cora.content',dtype=np.str_) # 使用Path对象的路径构造,实例化的内容为cora.content。path/'cora.content'表示路径为'data/cora/cora.content'的字符串
print(paper_features_label,np.shape(paper_features_label)) # 打印数据集内容与数据的形状

# 取出数据集中的第一列:论文ID
papers = paper_features_label[:,0].astype(np.int32)
print("论文ID序列:",papers) # 输出所有论文ID
# 论文重新编号,并将其映射到论文ID中,实现论文的统一管理
paper2idx = {k:v for v,k in enumerate(papers)}

# 将数据中间部分的字标签取出,转化成矩阵
features = csr_matrix(paper_features_label[:,1:-1],dtype=np.float32)
print("字标签矩阵的形状:",np.shape(features)) # 字标签矩阵的形状

# 将数据的最后一项的文章分类属性取出,转化为分类的索引
labels = paper_features_label[:,-1]
lbl2idx = { k:v for v,k in enumerate(sorted(np.unique(labels)))}
labels = [lbl2idx[e] for e in labels]
print("论文类别的索引号:",lbl2idx,labels[:5])

producción:

2.3 Leer y analizar datos relacionales en papel

Cargue los datos de relación de los artículos, convierta la relación representada por el ID del artículo en los datos en una relación renumerada, trate cada artículo como un vértice y la relación de citas entre artículos como un borde, de modo que los datos de relación de los artículos puedan usar un representado por la estructura del gráfico.

 Calcule la matriz de adyacencia de esta estructura gráfica y conviértala en una matriz de adyacencia gráfica no dirigida.

2.3.1 Implementación del Código: Matriz de Transformación----Cora_GAT.py (Parte 3)

# 1.3 读取并解析论文关系数据
# 读取论文关系数据,并将其转化为数据
edges = np.genfromtxt(path/'cora.cites',dtype=np.int32) # 将数据集中论文的引用关系以数据的形式读入
print(edges,np.shape(edges))
# 转化为新编号节点间的关系:将数据集中论文ID表示的关系转化为重新编号后的关系
edges = np.asarray([paper2idx[e] for e in edges.flatten()],np.int32).reshape(edges.shape)
print("新编号节点间的对应关系:",edges,edges.shape)
# 计算邻接矩阵,行与列都是论文个数:由论文引用关系所表示的图结构生成邻接矩阵。
adj = coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])),shape=(len(labels), len(labels)), dtype=np.float32)
# 生成无向图对称矩阵:将有向图的邻接矩阵转化为无向图的邻接矩阵。Tip:转化为无向图的原因:主要用于对论文的分类,论文的引用关系主要提供单个特征之间的关联,故更看重是不是有关系,所以无向图即可。
adj_long = adj.multiply(adj.T < adj)
adj = adj_long + adj_long.T

producción:

2.4 Implementación del código: datos de la matriz de la estructura del gráfico de procesamiento----Cora_GAT.py (Parte 4)

# 1.4 加工图结构的矩阵数据
def normalize_adj(mx):
    rowsum = np.array(mx.sum(1))
    r_inv = np.power(rowsum,-0.5).flatten()
    r_inv[np.isinf(r_inv)] = 0.0
    r_mat_inv = diags(r_inv)
    return mx.dot(r_mat_inv).transpose().dot(r_mat_inv) # 兑成归一化拉普拉斯矩阵实现邻接矩阵的转化

adj = normalize_adj(adj + eye(adj.shape[0])) # 对邻接矩阵进行转化对称归一化拉普拉斯矩阵转化

2.5 Convertir datos en tensores y asignar recursos informáticos

Convierta los datos de matriz de estructura gráfica procesados ​​en un tipo de tensor compatible con PyTorch y divídalo en 3 partes para entrenamiento, prueba y validación.

2.5.1 Implementación de código: convertir datos en tensores y asignar recursos informáticos----Cora_GAT.py (Parte 5)

# 1.5 将数据转化为张量,并分配运算资源
adj = torch.FloatTensor(adj.todense()) # 节点间关系 todense()方法将其转换回稠密矩阵。
features = torch.FloatTensor(features.todense()) # 节点自身的特征
labels = torch.LongTensor(labels) # 对每个节点的分类标签

# 划分数据集
n_train = 200 # 训练数据集大小
n_val = 300 # 验证数据集大小
n_test = len(features) - n_train - n_val # 测试数据集大小
np.random.seed(34)
idxs = np.random.permutation(len(features)) # 将原有的索引打乱顺序

# 计算每个数据集的索引
idx_train = torch.LongTensor(idxs[:n_train]) # 根据指定训练数据集的大小并划分出其对应的训练数据集索引
idx_val = torch.LongTensor(idxs[n_train:n_train+n_val])# 根据指定验证数据集的大小并划分出其对应的验证数据集索引
idx_test = torch.LongTensor(idxs[n_train+n_val:])# 根据指定测试数据集的大小并划分出其对应的测试数据集索引

# 分配运算资源
adj = adj.to(device)
features = features.to(device)
labels = labels.to(device)
idx_train = idx_train.to(device)
idx_val = idx_val.to(device)
idx_test = idx_test.to(device)

2.6 Implementación del código: defina la función de activación de Mish y la clase de capa de atención del gráfico ---- Cora_GAT.py (Parte 6)

# 1.6 定义Mish激活函数与图注意力层类
def mish(x): # 性能优于RElu函数
    return x * (torch.tanh(F.softplus(x)))
# 图注意力层类
class GraphAttentionLayer(nn.Module): # 图注意力层
    # 初始化
    def __init__(self,in_features,out_features,dropout=0.6):
        super(GraphAttentionLayer, self).__init__()
        self.dropout = dropout
        self.in_features = in_features # 定义输入特征维度
        self.out_features = out_features # 定义输出特征维度
        self.W = nn.Parameter(torch.zeros(size=(in_features,out_features)))
        nn.init.xavier_uniform_(self.W) # 初始化全连接权重
        self.a = nn.Parameter(torch.zeros(size=(2 * out_features,1)))
        nn.init.xavier_uniform_(self.a) # 初始化注意力权重

    def forward(self,input,adj):
        h = torch.mm(input,self.W) # 全连接处理
        N = h.size()[0]
        # 对全连接后的特征数据分别进行基于批次维度和特征维度的复制,并将复制结果连接在一起。
        # 这种操作使得顶点中的特征数据进行了充分的排列组合,结果中的每行信息都包含两个顶点特征。接下来的注意力机制便是基于每对顶点特征进行计算的。
        a_input = torch.cat([h.repeat(1,N).view(N * N ,-1),h.repeat(N,1)],dim=1).view(N,-1,2 * self.out_features) # 主要功能将顶点特征两两搭配,连接在一起,生成数据形状[N,N,2 * self.out_features]
        e = mish(torch.matmul(a_input,self.a).squeeze(2)) # 计算注意力

        zero_vec = -9e15 * torch.ones_like(e) # 初始化最小值:该值用于填充被过滤掉的特征对象atenion。如果在过滤时,直接对过滤排的特征赋值为0,那么模型会无法收敛。
        attention = torch.where(adj>0,e,zero_vec) # 过滤注意力 :按照邻接矩阵中大于0的边对注意力结果进行过滤,使注意力按照图中的顶点配对的范围进行计算。
        attention = F.softmax(attention,dim=1) # 对注意力分数进行归一化:使用F.Sofmax()函数对最终的注意力机制进行归一化,得到注意力分数(总和为1)。
        attention = F.dropout(attention,self.dropout,training=self.training)
        h_prime = torch.matmul(attention,h) # 使用注意力处理特征:将最终的注意力作用到全连接的结果上以完成计算。
        return mish(h_prime)

2.7 Implementación de código: construcción de un modelo de atención de gráfico----Cora_GAT.py (Parte 7)

# 1.7 搭建图注意力模型
class GAT(nn.Module):# 图注意力模型类
    def __init__(self,nfeat,nclasses,nhid,dropout,nheads): # 图注意力模型类的初始化方法,支持多套注意力机制同时运算,其参数nheads用于指定注意力的计算套数。
        super(GAT, self).__init__()
        # 注意力层
        self.attentions = [GraphAttentionLayer(nfeat,nhid,dropout) for _ in range(nheads)] # 按照指定的注意力套数生成多套注意力层
        for i , attention in enumerate(self.attentions): # 将注意力层添加到模型
            self.add_module('attention_{}'.format(i),attention)
        # 输出层
        self.out_att = GraphAttentionLayer(nhid * nheads,nclasses,dropout)

    def forward(self,x,adj): # 定义正向传播方法
        x = torch.cat([att(x, adj) for att in self.attentions], dim=1)
        return self.out_att(x, adj)


n_labels = labels.max().item() + 1 # 获取分类个数7
n_features = features.shape[1] # 获取节点特征维度 1433
print(n_labels,n_features) # 输出7与1433

def accuracy(output,y): # 定义函数计算准确率
    return (output.argmax(1) == y).type(torch.float32).mean().item()

### 定义函数来实现模型的训练过程。与深度学习任务不同,图卷积在训练时需要传入样本间的关系数据。
# 因为该关系数据是与节点数相等的方阵,所以传入的样本数也要与节点数相同,在计算loss值时,可以通过索引从总的运算结果中取出训练集的结果。
def step(): # 定义函数来训练模型 Tip:在图卷积任务中,无论是用模型进行预测还是训练,都需要将全部的图结构方阵输入
    model.train()
    optimizer.zero_grad()
    output = model(features,adj) # 将全部数据载入模型,只用训练数据计算损失
    loss = F.cross_entropy(output[idx_train],labels[idx_train])
    acc = accuracy(output[idx_train],labels[idx_train]) # 计算准确率
    loss.backward()
    optimizer.step()
    return loss.item(),acc

def evaluate(idx): # 定义函数来评估模型 Tip:在图卷积任务中,无论是用模型进行预测还是训练,都需要将全部的图结构方阵输入
    model.eval()
    output = model(features, adj) # 将全部数据载入模型,用指定索引评估模型结果
    loss = F.cross_entropy(output[idx], labels[idx]).item()
    return loss, accuracy(output[idx], labels[idx])

2.8 Optimizador de guardabosques

El número de capas de la red neuronal convolucional gráfica no debe ser demasiado, generalmente alrededor de 3 capas. Este ejemplo implementará una red neuronal convolucional de gráficos de 3 capas, y los cambios dimensionales de cada capa se muestran en la Figura 9-15.

Entrene el modelo usando sentencias de bucle y visualice los resultados del modelo.

2.8.1 Implementación de código: entrenar el modelo con Ranger Optimizer y visualizar los resultados ---- Cora_GAT.py (Parte 8)

# 1.8 使用Ranger优化器训练模型并可视化
model = GAT(n_features, n_labels, 16,0.1,8).to(device) # 向GAT传入的后3个参数分别代表输出维度(16)、Dropout的丢弃率(0.1)、注意力的计算套数(8)

from tqdm import tqdm
from Cora_ranger import * # 引入Ranger优化器
optimizer = Ranger(model.parameters()) # 使用Ranger优化器

# 训练模型
epochs = 1000
print_steps = 50
train_loss, train_acc = [], []
val_loss, val_acc = [], []
for i in tqdm(range(epochs)):
    tl,ta = step()
    train_loss = train_loss + [tl]
    train_acc = train_acc + [ta]
    if (i+1) % print_steps == 0 or i == 0:
        tl,ta = evaluate(idx_train)
        vl,va = evaluate(idx_val)
        val_loss = val_loss + [vl]
        val_acc = val_acc + [va]
        print(f'{i + 1:6d}/{epochs}: train_loss={tl:.4f}, train_acc={ta:.4f}' + f', val_loss={vl:.4f}, val_acc={va:.4f}')

# 输出最终结果
final_train, final_val, final_test = evaluate(idx_train), evaluate(idx_val), evaluate(idx_test)
print(f'Train     : loss={final_train[0]:.4f}, accuracy={final_train[1]:.4f}')
print(f'Validation: loss={final_val[0]:.4f}, accuracy={final_val[1]:.4f}')
print(f'Test      : loss={final_test[0]:.4f}, accuracy={final_test[1]:.4f}')

# 可视化训练过程
fig, axes = plt.subplots(1, 2, figsize=(15,5))
ax = axes[0]
axes[0].plot(train_loss[::print_steps] + [train_loss[-1]], label='Train')
axes[0].plot(val_loss, label='Validation')
axes[1].plot(train_acc[::print_steps] + [train_acc[-1]], label='Train')
axes[1].plot(val_acc, label='Validation')
for ax,t in zip(axes, ['Loss', 'Accuracy']): ax.legend(), ax.set_title(t, size=15)

# 输出模型的预测结果
output = model(features, adj)
samples = 10
idx_sample = idx_test[torch.randperm(len(idx_test))[:samples]]
# 将样本标签与预测结果进行比较
idx2lbl = {v:k for k,v in lbl2idx.items()}
df = pd.DataFrame({'Real': [idx2lbl[e] for e in labels[idx_sample].tolist()],'Pred': [idx2lbl[e] for e in output[idx_sample].argmax(1).tolist()]})
print(df)

2.7 Resumen de los resultados del programa

2.7.1 Proceso de formación 

2.7.2 Resultados del entrenamiento

3 Código Resumen

3.1 Cora_GAT.py

from pathlib import Path # 引入提升路径的兼容性
# 引入矩阵运算的相关库
import numpy as np
import pandas as pd
from scipy.sparse import coo_matrix,csr_matrix,diags,eye
# 引入深度学习框架库
import torch
from torch import nn
import torch.nn.functional as F
# 引入绘图库
import matplotlib.pyplot as plt
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

# 1.1 导入基础模块,并设置运行环境
# 输出计算资源情况
device = torch.device('cuda')if torch.cuda.is_available() else torch.device('cpu')
print(device) # 输出 cuda

# 输出样本路径
path = Path('./data/cora')
print(path) # 输出 cuda

# 1.2 读取并解析论文数据
# 读取论文内容数据,将其转化为数据
paper_features_label = np.genfromtxt(path/'cora.content',dtype=np.str_) # 使用Path对象的路径构造,实例化的内容为cora.content。path/'cora.content'表示路径为'data/cora/cora.content'的字符串
print(paper_features_label,np.shape(paper_features_label)) # 打印数据集内容与数据的形状

# 取出数据集中的第一列:论文ID
papers = paper_features_label[:,0].astype(np.int32)
print("论文ID序列:",papers) # 输出所有论文ID
# 论文重新编号,并将其映射到论文ID中,实现论文的统一管理
paper2idx = {k:v for v,k in enumerate(papers)}

# 将数据中间部分的字标签取出,转化成矩阵
features = csr_matrix(paper_features_label[:,1:-1],dtype=np.float32)
print("字标签矩阵的形状:",np.shape(features)) # 字标签矩阵的形状

# 将数据的最后一项的文章分类属性取出,转化为分类的索引
labels = paper_features_label[:,-1]
lbl2idx = { k:v for v,k in enumerate(sorted(np.unique(labels)))}
labels = [lbl2idx[e] for e in labels]
print("论文类别的索引号:",lbl2idx,labels[:5])

# 1.3 读取并解析论文关系数据
# 读取论文关系数据,并将其转化为数据
edges = np.genfromtxt(path/'cora.cites',dtype=np.int32) # 将数据集中论文的引用关系以数据的形式读入
print(edges,np.shape(edges))
# 转化为新编号节点间的关系:将数据集中论文ID表示的关系转化为重新编号后的关系
edges = np.asarray([paper2idx[e] for e in edges.flatten()],np.int32).reshape(edges.shape)
print("新编号节点间的对应关系:",edges,edges.shape)
# 计算邻接矩阵,行与列都是论文个数:由论文引用关系所表示的图结构生成邻接矩阵。
adj = coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])),shape=(len(labels), len(labels)), dtype=np.float32)
# 生成无向图对称矩阵:将有向图的邻接矩阵转化为无向图的邻接矩阵。Tip:转化为无向图的原因:主要用于对论文的分类,论文的引用关系主要提供单个特征之间的关联,故更看重是不是有关系,所以无向图即可。
adj_long = adj.multiply(adj.T < adj)
adj = adj_long + adj_long.T

# 1.4 加工图结构的矩阵数据
def normalize_adj(mx):
    rowsum = np.array(mx.sum(1))
    r_inv = np.power(rowsum,-0.5).flatten()
    r_inv[np.isinf(r_inv)] = 0.0
    r_mat_inv = diags(r_inv)
    return mx.dot(r_mat_inv).transpose().dot(r_mat_inv) # 兑成归一化拉普拉斯矩阵实现邻接矩阵的转化

adj = normalize_adj(adj + eye(adj.shape[0])) # 对邻接矩阵进行转化对称归一化拉普拉斯矩阵转化


# 1.5 将数据转化为张量,并分配运算资源
adj = torch.FloatTensor(adj.todense()) # 节点间关系 todense()方法将其转换回稠密矩阵。
features = torch.FloatTensor(features.todense()) # 节点自身的特征
labels = torch.LongTensor(labels) # 对每个节点的分类标签

# 划分数据集
n_train = 200 # 训练数据集大小
n_val = 300 # 验证数据集大小
n_test = len(features) - n_train - n_val # 测试数据集大小
np.random.seed(34)
idxs = np.random.permutation(len(features)) # 将原有的索引打乱顺序

# 计算每个数据集的索引
idx_train = torch.LongTensor(idxs[:n_train]) # 根据指定训练数据集的大小并划分出其对应的训练数据集索引
idx_val = torch.LongTensor(idxs[n_train:n_train+n_val])# 根据指定验证数据集的大小并划分出其对应的验证数据集索引
idx_test = torch.LongTensor(idxs[n_train+n_val:])# 根据指定测试数据集的大小并划分出其对应的测试数据集索引

# 分配运算资源
adj = adj.to(device)
features = features.to(device)
labels = labels.to(device)
idx_train = idx_train.to(device)
idx_val = idx_val.to(device)
idx_test = idx_test.to(device)

# 1.6 定义Mish激活函数与图注意力层类
def mish(x): # 性能优于RElu函数
    return x * (torch.tanh(F.softplus(x)))
# 图注意力层类
class GraphAttentionLayer(nn.Module): # 图注意力层
    # 初始化
    def __init__(self,in_features,out_features,dropout=0.6):
        super(GraphAttentionLayer, self).__init__()
        self.dropout = dropout
        self.in_features = in_features # 定义输入特征维度
        self.out_features = out_features # 定义输出特征维度
        self.W = nn.Parameter(torch.zeros(size=(in_features,out_features)))
        nn.init.xavier_uniform_(self.W) # 初始化全连接权重
        self.a = nn.Parameter(torch.zeros(size=(2 * out_features,1)))
        nn.init.xavier_uniform_(self.a) # 初始化注意力权重

    def forward(self,input,adj):
        h = torch.mm(input,self.W) # 全连接处理
        N = h.size()[0]
        # 对全连接后的特征数据分别进行基于批次维度和特征维度的复制,并将复制结果连接在一起。
        # 这种操作使得顶点中的特征数据进行了充分的排列组合,结果中的每行信息都包含两个顶点特征。接下来的注意力机制便是基于每对顶点特征进行计算的。
        a_input = torch.cat([h.repeat(1,N).view(N * N ,-1),h.repeat(N,1)],dim=1).view(N,-1,2 * self.out_features) # 主要功能将顶点特征两两搭配,连接在一起,生成数据形状[N,N,2 * self.out_features]
        e = mish(torch.matmul(a_input,self.a).squeeze(2)) # 计算注意力

        zero_vec = -9e15 * torch.ones_like(e) # 初始化最小值:该值用于填充被过滤掉的特征对象atenion。如果在过滤时,直接对过滤排的特征赋值为0,那么模型会无法收敛。
        attention = torch.where(adj>0,e,zero_vec) # 过滤注意力 :按照邻接矩阵中大于0的边对注意力结果进行过滤,使注意力按照图中的顶点配对的范围进行计算。
        attention = F.softmax(attention,dim=1) # 对注意力分数进行归一化:使用F.Sofmax()函数对最终的注意力机制进行归一化,得到注意力分数(总和为1)。
        attention = F.dropout(attention,self.dropout,training=self.training)
        h_prime = torch.matmul(attention,h) # 使用注意力处理特征:将最终的注意力作用到全连接的结果上以完成计算。
        return mish(h_prime)

# 1.7 搭建图注意力模型
class GAT(nn.Module):# 图注意力模型类
    def __init__(self,nfeat,nclasses,nhid,dropout,nheads): # 图注意力模型类的初始化方法,支持多套注意力机制同时运算,其参数nheads用于指定注意力的计算套数。
        super(GAT, self).__init__()
        # 注意力层
        self.attentions = [GraphAttentionLayer(nfeat,nhid,dropout) for _ in range(nheads)] # 按照指定的注意力套数生成多套注意力层
        for i , attention in enumerate(self.attentions): # 将注意力层添加到模型
            self.add_module('attention_{}'.format(i),attention)
        # 输出层
        self.out_att = GraphAttentionLayer(nhid * nheads,nclasses,dropout)

    def forward(self,x,adj): # 定义正向传播方法
        x = torch.cat([att(x, adj) for att in self.attentions], dim=1)
        return self.out_att(x, adj)


n_labels = labels.max().item() + 1 # 获取分类个数7
n_features = features.shape[1] # 获取节点特征维度 1433
print(n_labels,n_features) # 输出7与1433

def accuracy(output,y): # 定义函数计算准确率
    return (output.argmax(1) == y).type(torch.float32).mean().item()

### 定义函数来实现模型的训练过程。与深度学习任务不同,图卷积在训练时需要传入样本间的关系数据。
# 因为该关系数据是与节点数相等的方阵,所以传入的样本数也要与节点数相同,在计算loss值时,可以通过索引从总的运算结果中取出训练集的结果。
def step(): # 定义函数来训练模型 Tip:在图卷积任务中,无论是用模型进行预测还是训练,都需要将全部的图结构方阵输入
    model.train()
    optimizer.zero_grad()
    output = model(features,adj) # 将全部数据载入模型,只用训练数据计算损失
    loss = F.cross_entropy(output[idx_train],labels[idx_train])
    acc = accuracy(output[idx_train],labels[idx_train]) # 计算准确率
    loss.backward()
    optimizer.step()
    return loss.item(),acc

def evaluate(idx): # 定义函数来评估模型 Tip:在图卷积任务中,无论是用模型进行预测还是训练,都需要将全部的图结构方阵输入
    model.eval()
    output = model(features, adj) # 将全部数据载入模型,用指定索引评估模型结果
    loss = F.cross_entropy(output[idx], labels[idx]).item()
    return loss, accuracy(output[idx], labels[idx])

# 1.8 使用Ranger优化器训练模型并可视化
model = GAT(n_features, n_labels, 16,0.1,8).to(device) # 向GAT传入的后3个参数分别代表输出维度(16)、Dropout的丢弃率(0.1)、注意力的计算套数(8)

from tqdm import tqdm
from Cora_ranger import * # 引入Ranger优化器
optimizer = Ranger(model.parameters()) # 使用Ranger优化器

# 训练模型
epochs = 1000
print_steps = 50
train_loss, train_acc = [], []
val_loss, val_acc = [], []
for i in tqdm(range(epochs)):
    tl,ta = step()
    train_loss = train_loss + [tl]
    train_acc = train_acc + [ta]
    if (i+1) % print_steps == 0 or i == 0:
        tl,ta = evaluate(idx_train)
        vl,va = evaluate(idx_val)
        val_loss = val_loss + [vl]
        val_acc = val_acc + [va]
        print(f'{i + 1:6d}/{epochs}: train_loss={tl:.4f}, train_acc={ta:.4f}' + f', val_loss={vl:.4f}, val_acc={va:.4f}')

# 输出最终结果
final_train, final_val, final_test = evaluate(idx_train), evaluate(idx_val), evaluate(idx_test)
print(f'Train     : loss={final_train[0]:.4f}, accuracy={final_train[1]:.4f}')
print(f'Validation: loss={final_val[0]:.4f}, accuracy={final_val[1]:.4f}')
print(f'Test      : loss={final_test[0]:.4f}, accuracy={final_test[1]:.4f}')

# 可视化训练过程
fig, axes = plt.subplots(1, 2, figsize=(15,5))
ax = axes[0]
axes[0].plot(train_loss[::print_steps] + [train_loss[-1]], label='Train')
axes[0].plot(val_loss, label='Validation')
axes[1].plot(train_acc[::print_steps] + [train_acc[-1]], label='Train')
axes[1].plot(val_acc, label='Validation')
for ax,t in zip(axes, ['Loss', 'Accuracy']): ax.legend(), ax.set_title(t, size=15)

# 输出模型的预测结果
output = model(features, adj)
samples = 10
idx_sample = idx_test[torch.randperm(len(idx_test))[:samples]]
# 将样本标签与预测结果进行比较
idx2lbl = {v:k for k,v in lbl2idx.items()}
df = pd.DataFrame({'Real': [idx2lbl[e] for e in labels[idx_sample].tolist()],'Pred': [idx2lbl[e] for e in output[idx_sample].argmax(1).tolist()]})
print(df)

3.2 Cora_ranger.py

#Ranger deep learning optimizer - RAdam + Lookahead combined.
#https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

#Ranger has now been used to capture 12 records on the FastAI leaderboard.

#This version = 9.3.19  

#Credits:
#RAdam -->  https://github.com/LiyuanLucasLiu/RAdam
#Lookahead --> rewritten by lessw2020, but big thanks to Github @LonePatient and @RWightman for ideas from their code.
#Lookahead paper --> MZhang,G Hinton  https://arxiv.org/abs/1907.08610

#summary of changes: 
#full code integration with all updates at param level instead of group, moves slow weights into state dict (from generic weights), 
#supports group learning rates (thanks @SHolderbach), fixes sporadic load from saved model issues.
#changes 8/31/19 - fix references to *self*.N_sma_threshold; 
                #changed eps to 1e-5 as better default than 1e-8.

import math
import torch
from torch.optim.optimizer import Optimizer, required
import itertools as it



class Ranger(Optimizer):

    def __init__(self, params, lr=1e-3, alpha=0.5, k=6, N_sma_threshhold=5, betas=(.95,0.999), eps=1e-5, weight_decay=0):
        #parameter checks
        if not 0.0 <= alpha <= 1.0:
            raise ValueError(f'Invalid slow update rate: {alpha}')
        if not 1 <= k:
            raise ValueError(f'Invalid lookahead steps: {k}')
        if not lr > 0:
            raise ValueError(f'Invalid Learning Rate: {lr}')
        if not eps > 0:
            raise ValueError(f'Invalid eps: {eps}')

        #parameter comments:
        # beta1 (momentum) of .95 seems to work better than .90...
        #N_sma_threshold of 5 seems better in testing than 4.
        #In both cases, worth testing on your dataset (.90 vs .95, 4 vs 5) to make sure which works best for you.

        #prep defaults and init torch.optim base
        defaults = dict(lr=lr, alpha=alpha, k=k, step_counter=0, betas=betas, N_sma_threshhold=N_sma_threshhold, eps=eps, weight_decay=weight_decay)
        super().__init__(params,defaults)

        #adjustable threshold
        self.N_sma_threshhold = N_sma_threshhold

        #now we can get to work...
        #removed as we now use step from RAdam...no need for duplicate step counting
        #for group in self.param_groups:
        #    group["step_counter"] = 0
            #print("group step counter init")

        #look ahead params
        self.alpha = alpha
        self.k = k 

        #radam buffer for state
        self.radam_buffer = [[None,None,None] for ind in range(10)]

        #self.first_run_check=0

        #lookahead weights
        #9/2/19 - lookahead param tensors have been moved to state storage.  
        #This should resolve issues with load/save where weights were left in GPU memory from first load, slowing down future runs.

        #self.slow_weights = [[p.clone().detach() for p in group['params']]
        #                     for group in self.param_groups]

        #don't use grad for lookahead weights
        #for w in it.chain(*self.slow_weights):
        #    w.requires_grad = False

    def __setstate__(self, state):
        print("set state called")
        super(Ranger, self).__setstate__(state)


    def step(self, closure=None):
        loss = None
        #note - below is commented out b/c I have other work that passes back the loss as a float, and thus not a callable closure.  
        #Uncomment if you need to use the actual closure...

        #if closure is not None:
            #loss = closure()

        #Evaluate averages and grad, update param tensors
        for group in self.param_groups:

            for p in group['params']:
                if p.grad is None:
                    continue
                grad = p.grad.data.float()
                if grad.is_sparse:
                    raise RuntimeError('Ranger optimizer does not support sparse gradients')

                p_data_fp32 = p.data.float()

                state = self.state[p]  #get state dict for this param

                if len(state) == 0:   #if first time to run...init dictionary with our desired entries
                    #if self.first_run_check==0:
                        #self.first_run_check=1
                        #print("Initializing slow buffer...should not see this at load from saved model!")
                    state['step'] = 0
                    state['exp_avg'] = torch.zeros_like(p_data_fp32)
                    state['exp_avg_sq'] = torch.zeros_like(p_data_fp32)

                    #look ahead weight storage now in state dict 
                    state['slow_buffer'] = torch.empty_like(p.data)
                    state['slow_buffer'].copy_(p.data)

                else:
                    state['exp_avg'] = state['exp_avg'].type_as(p_data_fp32)
                    state['exp_avg_sq'] = state['exp_avg_sq'].type_as(p_data_fp32)

                #begin computations 
                exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
                beta1, beta2 = group['betas']

                #compute variance mov avg
                exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
                #compute mean moving avg
                exp_avg.mul_(beta1).add_(1 - beta1, grad)

                state['step'] += 1


                buffered = self.radam_buffer[int(state['step'] % 10)]
                if state['step'] == buffered[0]:
                    N_sma, step_size = buffered[1], buffered[2]
                else:
                    buffered[0] = state['step']
                    beta2_t = beta2 ** state['step']
                    N_sma_max = 2 / (1 - beta2) - 1
                    N_sma = N_sma_max - 2 * state['step'] * beta2_t / (1 - beta2_t)
                    buffered[1] = N_sma
                    if N_sma > self.N_sma_threshhold:
                        step_size = math.sqrt((1 - beta2_t) * (N_sma - 4) / (N_sma_max - 4) * (N_sma - 2) / N_sma * N_sma_max / (N_sma_max - 2)) / (1 - beta1 ** state['step'])
                    else:
                        step_size = 1.0 / (1 - beta1 ** state['step'])
                    buffered[2] = step_size

                if group['weight_decay'] != 0:
                    p_data_fp32.add_(-group['weight_decay'] * group['lr'], p_data_fp32)

                if N_sma > self.N_sma_threshhold:
                    denom = exp_avg_sq.sqrt().add_(group['eps'])
                    p_data_fp32.addcdiv_(-step_size * group['lr'], exp_avg, denom)
                else:
                    p_data_fp32.add_(-step_size * group['lr'], exp_avg)

                p.data.copy_(p_data_fp32)

                #integrated look ahead...
                #we do it at the param level instead of group level
                if state['step'] % group['k'] == 0:
                    slow_p = state['slow_buffer'] #get access to slow param tensor
                    slow_p.add_(self.alpha, p.data - slow_p)  #(fast weights - slow weights) * alpha
                    p.data.copy_(slow_p)  #copy interpolated weights to RAdam param tensor

        return loss

Supongo que te gusta

Origin blog.csdn.net/qq_39237205/article/details/123908293
Recomendado
Clasificación