Interpretation of Kuaishou KDD 2023 Selected Papers# Graph Contrastive Learning with Generative Adversarial Network

Recently, I am very interested in the company Kuaishou. There are a lot of things related to network science and graphs in the Kuaishou short video community. In addition, I am very interested in graph structure, so I am going to write a series of blogs to explain Kuaishou KDD 23 top meeting I hope to have the opportunity to practice QAQ on Kuaishou-Community Science Line in the future.
This article is the first in the series, and the main research is that Topic is graph comparison learning.


  • "Graph Contrastive Learning with Generative Adversarial Network" GAN-based graph comparison learning
  • Authors: Wu Cheng (Tsinghua University), Wang Chaokun (Tsinghua University), Xu Jincao (Tsinghua University), Liu Ziyang (Tsinghua University), Zheng Kai (Kuaishou), Wang Xiaowei (Kuaishou), Song Yang (Kuaishou); the main authors are from Kuaishou Community Science Line Model and Application Department and School of Software, Tsinghua University.
  • Paper related resources: pdf | code
  • Introduction to the paper:

A typical contrastive learning paradigm needs to generate multiple views through an augmentation strategy, and construct positive and negative sample pairs based on these views to train a model with a contrastive learning loss. General augmentation strategies construct comparative views by randomly adding and deleting nodes/edges. These methods either do not consider the introduction of new edges, or may introduce random noise edges, making it difficult to achieve optimal results.
In our work, we propose that graph distribution and graph evolution should be considered when generating contrastive views to mine new edges existing in the future , thereby generating richer contrastive views. Specifically, we propose a Generative Adversarial Contrastive Network model named GACN, which leverages a graph generative adversarial network to adaptively learn the distribution of contrastive views, and controls the generation of new edges based on a two-term regularization loss. To optimize the model parameters of GACN, we also propose a joint training framework of graph generative confrontation and graph contrastive learning. Experimental results show that GACN is able to generate richer and reasonable contrastive views and thus achieve better performance in downstream tasks.

  • Citation format:

Cheng Wu, Chaokun Wang, Jingcao Xu, Ziyang Liu, Kai Zheng, Xiaowei Wang, Yang Song, and Kun Gai. 2023. Graph Contrastive Learning with Generative Adversarial Network. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '23), August 6–10, 2023, Long Beach, CA, USA. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3580305.3599370

ABSTRACT

Problem : The existing GCL method does not consider the distribution of the graph, resulting in the missing consideration of potential edges (unseen edges). Using this information can improve the performance of GCL in experiments.
Method : In this paper, GAN is used to learn graph distribution, improve the ability of data augmentation, and then jointly train graph GAN and GCL model, namely the proposed GACN model. GACN uses the views generated by the generator to train the GNN encoder with two self-supervised losses: graph contrastive loss and bayesian personalized ranking loss.
Experiments : Experiments on 7 datasets prove that GACN outperforms the current 12 SOTA baseline methods.
Conclusions and Findings : The views generated by GACN in data augmentation conform to the well-known preferential attachment rule in online networks.

Preferential Attachment : From the Barabasi-Albert model proposed by Albert-László Barabási and Réka Albert in 1999. Priority linking refers to assigning new links in proportion to the existing degrees of nodes, which usually further increases individual advantages. In this model, nodes with higher degrees are more likely to obtain new links. This highly skewed network has a small number of nodes with very large degrees, and others with much smaller degrees.
Build a random graph model with n nodes, which has a preferential attachment component. Generated by the following algorithm:
Step 1: Execute step 2 with probability p, otherwise execute step 3;
Step 2: Connect a new node to an existing node randomly selected uniformly;
Step 3: Connect with n existing nodes The probability of connecting this new node to the n existing nodes is proportional to the degree;
the degree distribution of this graph structure follows a power law, so that pk p_kpkis the probability that the degree of a randomly selected node is k. Then this degree distribution follows a power law: pk ∝ k − α p_k\propto k^{-\alpha}pkkα .
Reference blog for this paragraph:[SNA] Social Network Analysis Three Graph Theory and Graph Learning

1 INTRODUCTION

The previous GCL enhancement strategy did not consider the distribution and evolution of the graph, because the graph in the online network is a dynamically evolving graph, and the absence of edges currently does not mean that they will not exist in the future. Taking this into account allows for data augmentation of graphs.

For Fig. 2, Simple-GCL randomly replaces existing edges with some new edges in the data augmentation view and then evaluates the link prediction performance. ) income. However, due to the different graph distributions on different datasets, new edges with different rates are required to achieve the best performance.

This paper argues that the data augmentation process of GCL needs to systematically consider the evolution of the graph, and proposes to use GAN to learn the distribution of the graph. The main challenges are as follows:

  • Automatically capture graph features for enhancement : graph distribution is difficult to characterize, graph data is discrete, and sampling-based graph generators are difficult to train end-to-end. It is necessary to design a graph GAN that can generate high-quality augmented views;
  • Joint training of graph GAN model and GCL model : training two models independently cannot guarantee that the generated views that can deceive the GAN discriminator can be well encoded by the GCL model, and maintaining two sets of GNN parameters is also unnecessary, in terms of effectiveness and performance Considering that it is necessary to design a parameter sharing strategy and a joint learning framework;

This paper proposes GACN to solve the above challenges. GACN designed a graph GAN with view generator and view discriminator to learn to generate augmented views through a minimax game. Then two self-supervised losses, graph contrastive loss and bayesian personalized ranking loss, are designed to optimize the GNN Encoder parameters. To train GACN, a joint learning framework is proposed to iteratively optimize the view generator, view discriminator and graph encoder sequentially.

2 RELATED WORK

2.1 Graph Contrastive Learning

The key idea of ​​GCL is to maximize the mutual information of instances (eg. node, subgraph, graph) between the original graph and the views enhanced from it. GCL has many data augmentation tricks:

  • DGI: perform row-wise shuffling on the attribute matrix and then compare at the node-graph level;
  • MVGRL: using edge diffusion augmentation;
  • GraphCL:On top of attribute masking, GraphCL proposes several topology-based augmentations including node dropout, edge dropout and subgraph sampling to incorporate various priors(contrasting views at the graph level);
  • GRACE, GCA, GROC: use node-level same-scale contrast to learn node-level representation;
  • JOAO:bi-level optimization framework;

Defects: Many GCL methods require trial-and-error selection and domain knowledge to augment views.

2.2 Graph Generative Adversarial Network

GANs have achieved great success by devising game-theoretic minimax games. There are currently some studies that have begun to use GAN on graphs. Two important tasks are graph generation and graph embedding.

Some recent new work GASSL and AD-GCL try to combine adversarial learning and graph contrastive learning to avoid capturing redundant information by optimizing the anti-graph enhancement strategy. But in graph data mining, adopting GAN to learn the graph distribution to generate views for GCL has not been explored.

3 PRELIMINARIES


The basic concept definitions of Graph Representation Learning, Graph Neural Networks (GNNs), and Graph Contrastive Learning (GCL) are omitted here, please see the original text. The GNN encoder used in this article is the LightGCN structure and focuses on node-level GCL.

4 METHODS

4.1 Overview


As shown in the figure, GACN contains three modules, namely View Generator, View Discriminator and Graph Encoder. Among them, View Generator learns the distribution of edges and generates enhanced views through edge sampling. View Discriminator is mainly designed to distinguish views generated by Generator and views generated by predefined enhancement strategies (eg. edge dropout). View Generator, View Discriminator is trained in an adversarial manner to generate high-quality views. These high-quality views are subsequently used by the graph encoder to learn robust node representations, which share the same node representations as the view discriminator.
Note that here we do not explicitly encode any principles of graph generation in the model design, but surprisingly, GACN learns graph distributions to generate views that obey the well-known "preferential linkage" rule.

4.2 View Generator

Each edge obeys parameters W i , j W_{i,j}Wi,jThe Bernoulli distribution of , that is, the probability of 1 (existence) is W i , j W_{i,j}Wi,j, the probability of non-existence or 0 is ( 1 − wi , j 1-w_{i,j}1wi,j). Here W \bold{W}W is a learnable parameter. But if the edges are directly in discrete form, there is no way to learn directly end-to-end, so P i , j P_{i,j}are put through the relax operationPi,jbecomes ( 0 , 1 ) (0,1)(0,1 ) and then deal with the continuous variables accordingly.

  • τ g ∈ ( 0 , 1 ] \tau_{g}\in{(0,1]} tg(0,1 ] : Hyperparameters, used to adjust the continuousP i , j P_{i,j}Pi,j, the value range of the numerator in the fraction after subtraction is ( − 1 , 1 ) (-1,1)(1,1 ) , divided by one( 0 , 1 ] (0,1](0,1 ] The scale factor actually plays the role of amplification, and the value range of the entire fraction can be theoretically expanded to( − ∞ , + ∞ ) (-\infin,+\infin)(,+ ) , the sigmoid curve is shown in the figure.


Here C \bold{C}C is a new edge candidate set, not dense matrix considers all node links (the memory overhead is too large!), the implementation of this article only considers the associated edges of top-2000 degrees nodes. This article also tried top-5, 000 and top-10, 000, but the training time increased enormously, and the performance improvement was not obvious.

4.3 View Discriminator

View discriminator is a graph-level classifier that identifies the generated view. Its input is an adjacency matrix, and the output true indicates that the matrix is ​​generated by a predefined enhancement strategy ( G p ⃗ \vec{G_p }Gp ), false means the matrix is ​​generated by generator ( G g ⃗ \vec{G_g}Gg ). GNN encoder f encodes its representation for each node:

4.4 Graph Encoder


The comparative loss makes a dot product of the node embeddings of two different generated views, and then uses softmax to calculate the loss.

4.5 Model Optimization

This section introduces the parameter optimization process of GACN, and the three modules are optimized in sequence.
insert image description here

5 EXPERIMENTS

  • RQ1: How does GACN perform w.r.t. node classification task?
  • RQ2: How does GACN perform w.r.t. link prediction task?
  • RQ3: What are the benefits of the proposed modules of GACN?
  • RQ4: Can the generator of GACN generate high-quality graphs for contrastive learning?
  • RQ5: How do different settings influence the effectiveness of GACN?

5.1 Experimental Settings

5.1.1 Datasets

  • The nodes of Cora are publications, and the edges represent citations. Each node has a 1,433-dimensional sparse word vector with an element value of 0/1;
  • Citeseer is similar to Cora, the nodes are 6 categories, and the node feature dimension is 3,703;
  • UCI data is the Message network of UCI students;
  • Taobao is the user behavior data in Taobao;
  • Amazon is the link between product metadata and products. This article uses the data of the Electronic category;
  • Last.fm is the user's preference data for listening to songs. This paper uses the subset <user, artist> of <user, artist, song> to build a network;
  • Kuaishou is the viewing behavior data of Kuaishou short video users;

5.1.2 Baseline Methods

Compared with 12 SOTA baselines. Among them, Graph representation learning includes DeepWalk, LINE, node2vec, and LightGCN; Graph contrastive learning includes SimpleGCL, DGI, GraphCL, GRACE, and SGL; Graph generative and adversarial learning includes GraphGAN, AD-GCL, and GraphMAE. Since this paper focuses on node-level tasks, algorithms such as GCA, JOAO, MVGRL, and GASSL are not selected for graph-level tasks.

5.1.3 Parameter Settings

Based on PyTorch implementation, the training uses Adam optimizer, and the learning rate is 0.001. Default parameters:
τ g = 0.0001 , τ f = 0.5 , λ g = 0.5 , λ cnt = 1 , λ new = 0.5 , γ = 0.75 \tau_g=0.0001,\tau_f=0.5,\lambda_g=0.5,\lambda_{cnt }=1,\lambda_{new}=0.5,\gamma=0.75tg=0.0001,tf=0.5,lg=0.5,lcnt=1,lnew=0.5,c=0.75
for Cora, Citeseer and UCI:
λ gcl = 1 , λ bpr = 0.0001 \lambda_{gcl}=1,\lambda_{bpr}=0.0001lgcl=1,lbpr=0.0001
for the rest of the other datasets:
λ gcl = 0.0001 , λ bpr = 1 \lambda_{gcl}=0.0001,\lambda_{bpr}=1lgcl=0.0001,lbpr=1
For all baselines, parameters are tuned according to the validation set, and the best results are recorded. The experiment is carried out on a single GTX 1080Ti GPU, and the embedding size is 128.

5.1.4 Metrics

For node classification tasks, select three indicators, P(recision), R(ecall) and F1;
for link prediction tasks, select two indicators, MRR and H(it rate)@k, for the second indicator here, this article The record is H@50, when k=20 and k=100 the results are similar;

5.2 Node Classification (RQ1)

Generate embeddings for nodes under unsupervised conditions, and then design a linear classifier on this basis to classify nodes. Average the results of 10 different random seed initializations to record the final average accuracy.

5.3 Link Prediction (RQ2)

The author's explanation: Because the GACN method can explore the unseen edge to provide high-quality views for GCL by learning the graph distribution, this is the reason for its best effect. SGL also uses contrastive loss and BPR loss, but GACN can achieve better results, which shows the effectiveness of combining GCL and GAN. [This work should be mainly an innovation based on SGL]

5.4 Ablation Study (RQ3)



The ablation experiment shows: 1) Regularization loss helps the generator to generate relevant views to promote comparative learning; 2) graph GAN is very important to GACN, and joint learning produces positive benefits; 3) Self-supervised learning loss is very important to GACN, Because the GAN in GACN performs classification at the graph level instead of learning node representations.

5.5 Quality of Generated Graphs (RQ4)

Explore the quality of views generated by generator, λ cnt , λ new \lambda_{cnt}, \lambda_{new}lcnt,lnewEffects, new edge distributions and case studies.

5.5.1 Impact of λ c n t \lambda_{cnt} lcnt and λ n e w \lambda_{new} lnew

Here the selected data set UCI, in different λ cnt \lambda_{cnt}lcnt, λ new \lambda_{new}lnewNode link prediction is performed under the parameter settings, and each training epoch generates 10 views and then calculates the average total number of edges and the average number of new edges. As shown in the figure, λ cnt \lambda_{cnt}lcntContributes to the stability of the number of edges, λ new \lambda_{new}lnewUsed to limit the number of new edges.

5.5.2 New Edge Distribution

Compared to adding new edges randomly, GACN is able to adjust the number of new edges during training and generate more edges for nodes with higher degrees.

5.5.3 Case Study

To better understand the views generated by GACN, some nodes in UCI are randomly sampled and their neighborhoods within two hops are visualized. Figure 6 shows that GACN tends to attach nodes to nodes with high degrees and remove other edges, which confirms that GACN indeed learns preferential linkage rules and is able to generate plausible alternative views for contrastive learning.

5.6 Parameter Sensitivity (RQ5)

This section analyzes the sensitivity of GACN hyperparameters, embedding size sss , the hyperparameterτ g \tau_{g}tgλ g \lambda_glg, compared to the learned temperature hyperparameter τ f \tau_ftf,记录 MRR rate η = MRR with current settings MRR with default settings \eta=\frac{\text{MRR with current settings}}{\text{MRR with default settings}} the=MRR with default settingsMRR with current settings.

a) The larger the embedding dimension, the better the performance; b) It shows that the model is right for τ g \tau_{g}tgInsensitive, larger may lead to poor performance; c) shows that the model is not sensitive to λ g \lambda_glgSensitive, when its value is too small, the view will be sparse, and there is no information for GCL, and when its value is too large, a dense view will be generated, which will affect the robustness of node representation. Usually set to [ 0.5 , 0.7 ] [0.5,0.7][0.5,0.7 ] is better. d) Explain that different data sets require differentτ f \tau_ftfFor best performance, set it to 0.5 for competitive performance.

At the same time, it also analyzes how the view generator affects GCL, that is, λ cnt \lambda_{cnt}lcnt, λ new \lambda_{new}lnewand γ \gammaSensitivity of the gamma parameter. e) findλ cnt \lambda_{cnt}lcntCompetitive results can be achieved when set to 1, f) found that λ new \lambda_{new}lnewA small value is preferred, but setting it to 0 can lead to a large number of unseen edges and lead to poor performance on some datasets. Therefore, λ new \lambda_{new}lnewA setting of 0.25 is a good choice. g) Different dataset pairs γ \gammaGamma sensitivity is different, a setting of 0.75 is relatively good.

6 CONCLUSION

Similar to the abstract, slightly.

Guess you like

Origin blog.csdn.net/qq_33583069/article/details/132105617