Paper reading- ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning

Table of contents

Summary

1 Introduction

2 Problem statement

3 PROPOSED ANEMONE FRAMEWORK

3.1 Multi-scale Contrastive Learning Model

3.1.1 Enhanced Self-Network Generation

3.1.2 Patch-level comparison network  

 3.1.3 Context-Level Contrastive Networks

3.1.4 Joint training 

 3.2 Statistical Anomaly Estimator

4 EXPERIMENTS 

4.1 Experimental Setup 

 4.1.1 Datasets

 4.1.2 Baselines

 4.1.3 Metric

4.2 Effectiveness evaluation

4.3 Ablation Study and Parameter Analysis


 

Paper link: https://shiruipan.github.io/publication/cikm-21-jin/cikm-21-jin.pdf

Summary

        Anomaly detection on graphs plays an important role in various fields such as network security, e-commerce, and financial fraud detection. However, existing graph anomaly detection methods usually consider a single-scale graph view , which leads to their limited ability to capture anomalous patterns from different perspectives. To this end, we introduce a novel graph anomaly detection framework, ANEMONE, to identify anomalies in multiple graph scales simultaneously .

        Specifically, ANEMONE first utilizes a graph neural network backbone encoder with a multi-scale contrastive learning objective to capture the pattern distribution of graph data by simultaneously learning the agreement between instances at patch and context levels. Then, our method adopts a statistical anomaly estimator to evaluate the anomaly of each node according to the degree of agreement from multiple perspectives. Experiments on three benchmark datasets demonstrate the superiority of our method. 

Anomaly Detection, Graph Neural Networks, Contrastive Learning

1 Introduction

        In recent years, anomaly detection on graphs has received increasing attention in the data mining community [9] due to the widespread use of graph-structured data in modeling real-world systems, including e-commerce and finance [16]. Taking e-commerce fraud detection as an example, anomaly detection algorithms can help identify fraudulent sellers by analyzing users’ attributes (i.e. attributes) and connections (i.e. structures).

        Different from traditional anomaly detection methods that only consider the attribute information of each sample and ignore its potential correlation, graph anomaly detection on the other hand takes sample (ie node) attributes and topological information (ie node adjacency) into consideration [5] . Early methods utilized shallow mechanisms such as ego-network analysis [11], residual analysis [4] or CUR decomposition [10] to detect abnormal nodes, which cannot learn informative knowledge from high-dimensional attributes. Recently proposed methods [1, 5] leverage deep map autoencoders for anomaly detection and significantly improve performance. Recently, by introducing graph self-supervised learning [7], CoLA [6] integrated contrastive learning into graph neural network (GNN) [14] to efficiently detect graph anomalies.

        Despite their success, these methods mainly detect anomalies from a single-scale perspective, ignoring the fact that node anomalies in graphs often appear at different scales . For example, some e-commerce cheaters may directly transact with a small number of unrelated items/users (i.e., local anomalies), while other cheaters tend to hide in large communities in underground industries (i.e., global anomalies). This scale heterogeneity leads to suboptimal performance of existing methods.

        To bridge this gap, we propose a graph anomaly detection framework with multi-scale contrastive learning (abbreviated as ANEMONE) to detect abnormal nodes in graphs. First, to capture abnormal patterns at different scales, our proposed framework simultaneously performs patch-level and context-level contrastive learning via two GNN-based models . In addition, ANEMONE adopts a novel anomaly estimator to predict the anomaly of each node by exploiting the statistics of multiple rounds of contrast scores . The main contributions of this work are summarized as follows:

        We propose ANEMONE, a multi-scale contrastive learning framework for graph anomaly detection, which captures anomalous patterns at different scales.

        We design a novel statistics-based algorithm to estimate node anomalies using the proposed contrastive schema.

        We conduct extensive experiments on three benchmark datasets to demonstrate the superiority of ANEMONE in detecting node-level anomalies on graphs.

2 Problem statement

        In this paper, we focus on the problem of anomaly detection for property graphs. Let G = (A, X) be the property graph of the node set V = { , · · · , }. A ∈  represents a binary adjacency matrix, where Ai,j = 1 means there is a link, otherwise Ai,j = 0. X ∈  represents the attribute matrix, where the i-th row X[i, :] ∈ R represents the attribute vector. Using the above notation, we formalize the graph anomaly detection problem as follows: v_1v_nR^{n*n}R^{n*f}

        Definition 2.1 (Graph Anomaly Detection). Given an attribute graph G = (A, X), the goal is to learn a function Y ( ) :  R^{n*n} * R^{n*f}R^{n} that takes the graph as input data and outputs an anomaly score vector y to measure how abnormal each node is. Specifically, the i-th element in the output score vector y y^{(i)}represents v_ithe degree of abnormality, and the greater the score, the higher the degree of abnormality.

        It is worth noting that graph anomaly detection is performed in an unsupervised setting, which means that the groundtruth labels are not accessible during the training phase.

3 PROPOSED ANEMONE FRAMEWORK

        We propose a framework, ANEMONE, based on multi-scale contrastive learning [2] for graph anomaly detection. The overall flow of our approach is shown in Figure 1. For a selected target node, ANEMONE computes an anomaly score for that node by utilizing two main components: a multiscale contrastive learning model and a statistical anomaly estimator .

        In the multi-scale contrastive learning model, two GNN-based contrastive networks learn patch-level (i.e., node-to-node) consistency and context-level (i.e., node-to-ego network) consistency, respectively . Afterwards, a statistical anomaly estimator aggregates the patch and context-level scores obtained by multiple augmented self-networks , and calculates the final anomaly score of the target node through statistical estimation. We will introduce these two components in the following sections.

3.1 Multi-scale Contrastive Learning Model

3.1.1 Enhanced Self-Network Generation

        In the multi-scale contrastive learning model, we first generate two ego networks of the target node and use data augmentation as the input of the network.

        The motivation behind the generation of ego-nets is to capture the substructure around target nodes (which has been shown to be highly correlated with node anomalies [6, 8]), as well as to provide sufficient diversity of input data and statistical estimates for model training.

        Considering the above, we adopt the random walk based algorithm RWR [12] as our data augmentation strategy. Specifically, centered on the target node, we sample two self-networks with fixed size , denoted as and . In each ego network, we set the first node in the node set as the central (target) node .

To prevent information leakage in the next step of contrastive learning, a preprocessing named target node masking         should be implemented in the ego-network before feeding it to the contrastive network . Specifically, we replace the target node's attribute vector with a zero vector:

3.1.2 Patch-level comparison network  

The goal of the patch-level contrastive network is to learn the consistency between the embedding of          the masked target node in ego-net  and the embedding of the original target node. First, obtain the node embedding of ego-net through the GNN module :

        where \Thetais the parameter set of GNN. For the sake of simplicity, here we directly use a layer of GCN [3], where is adjacency matrix added to the self-loop, is the degree matrix of  ego-net , is the weight matrix of the GCN layer, and is the dimension of embedding, (·) is ReLU activation function. Here GCN can be replaced by other types of GNN. \sigma

         For patch-level contrastive learning, we  choose the embeddings of the masked target nodes by L. It is worth noting that although the corresponding inputs are zero vectors, the embeddings   are informative by aggregating the attributes of other nodes in the ego-net through the GNN.

         Then, ANEMONE computes the embedding . We denote the attribute vector v_iof as , the target node embedding  is given as

         The weights here are shared with the GNN in Equation (1), which ensures that and  are projected into the same embedding space.

         Afterwards, a contrastive learning module is constructed to learn  the consistency between and . Specifically, we utilize bilinear layers to compute their similarity scores:

         Where Wp is the trainable matrix, \sigma(·) is the Sigmoid function.

         To learn a discriminative contrastive network, we introduce a negative sampling strategy for model training . That is, for a given score (we denote it as a "positive score" for the sake of distinction), we compute the negative score by 

         where  is obtained from the self-network centered on another node , ensuring that i≠j. In practice, our contrastive learning model is trained in a mini-batch fashion. Therefore, it can be easily obtained from other target nodes of the same batch . Using and , the patch-level contrastive network is trained with the Jensen-Shannon divergence [13] objective function:

 3.1.3 Context-Level Contrastive Networks

         Symmetrically, the context-level contrastive network has a similar architecture to the patch-level contrastive network. First, similar to Eq (1), a siamese GNN module with a parameter set   generates node embeddings from the input ego network with the formula: 

        Note that the context-level contrastive network has a different set of parameters than the patch-level contrastive network , since the comparison of the two scales should be performed in different embedding spaces.

         The main difference between patch-level and context-level comparisons is that the latter tries to learn the consistency between target node embeddings and ego-network embeddings, which is obtained by the readout module:

        In this paper, we adopt average pooling as our readout function.

        In order to project the attributes of the target node into the same embedding space , it is computed using the MLP module with parameters (similar to Equation (2)) . Subsequently, the context-level scores  are estimated by a bilinear function with a scoring matrix W. Finally, the context-level network is trained by the objective function: 

3.1.4 Joint training 

         During the training phase, we jointly learn the two contrastive networks. The overall objective function is:         

        where \alpha∈ [0, 1] is a trade-off parameter to balance the importance between two components 

 3.2 Statistical Anomaly Estimator

         After the multi-scale contrastive learning model is trained, ANEMONE uses a statistical anomaly estimator to calculate the anomaly score of each node in the inference stage.

         First, for a given target node, we generate R self-networks for the patch-level and context-level contrastive networks, respectively. At the same time, an equal number of negative samples are sampled. Sending them to the corresponding comparison network, we get a total of 4 scores, which are: .

        We assume that anomalous nodes have less consistency with their neighboring structures and contexts . Therefore, we express the base score as the difference between the negative and positive scores :  

         where the subscript "view" denotes "p" or "c", and j ∈ [1,····,R].

         We then consider a statistical method for anomaly estimation . The intuition behind is: 1) abnormal nodes have relatively large base scores; 2) abnormal nodes have unstable base scores under multiple self-network samplings . Therefore, we define the statistical anomaly score and as the sum of the mean and standard deviation of the base scores

         where the subscript "view" stands for "p" or "c". Finally, we combine and , into a final anomaly score , with the parameters in Equation (9) as trade-offs:v_i

 

4 EXPERIMENTS 

4.1 Experimental Setup 

 4.1.1 Datasets

         We conduct extensive experiments on three well-known citation network datasets, namely Cora, CiteSeer and PubMed. Table 1 summarizes the statistics of the dataset. Since these citation datasets have no anomalies by default, and to evaluate our method for detecting different types of anomalies, we manually inject the same number of attribute and structural anomaly nodes following previous work [1, 6] .

 4.1.2 Baselines

        We compare ANEMONE with the following methods: AMEN [11], Radar [4], ANOMALOUS [10], DOMINANT [1] and CoLA [6]. We add a variant of CoLA, CoLA, which integrates the proposed statistical anomaly estimator into CoLA. Our code is available on GitHub 1, including hyperparameter settings. 

 4.1.3 Metric

        The widely used metric ROC-AUC is adopted to evaluate the performance of anomaly detection. The ROC curve represents a plot of true positive rate versus false positive rate, and AUC is the area under the ROC curve. The value of AUC is between [0, 1], the larger the performance, the better

4.2 Effectiveness evaluation

        The ROC curves are shown in Fig. 2(a)-(c), while the comparison of AUC is given in Table 2. We make the following observations:

        In general, ANEMONE always outperforms all baseline methods on the three benchmark datasets, which illustrates that the combination of multi-scale contrastive learning techniques and statistical anomaly estimators significantly benefits node-level anomaly detection.

        Deep learning-based methods, namely DOMINANT, CoLA, and ANEMONE, significantly outperform shallow methods, indicating that shallow mechanisms cannot capture anomalous patterns from high-dimensional attributes and complex underlying graph structures.

        CoLA shows a performance gain over CoLA, validating the effectiveness of the proposed statistical anomaly estimator. 

4.3 Ablation Study and Parameter Analysis

        We further compare the results of ANEMONE and its variants, namely ANEMONE_mean and ANEMONE_std, which consider only the mean or standard deviation when estimating anomaly scores .

        As we can see in Table 2, both components in the anomaly estimator contribute to detecting anomalies, and the mean of the base score has a greater correlation with node-level anomalies . Furthermore, ANEMONE, which combines these two terms, achieves the best performance. The results of the validity analysis of the two comparative scales are shown in Figure 2(d). We observed that the best performance is obtained when Cora is equal to 0.8, CiteSeer is equal to 0.6, and PubMed is equal to 0.8. Larger or smaller values ​​will cause performance degradation. We conclude that both patch-level and context-level contrasts can expose exclusivity anomalies of corresponding scale . Considering both angles comprehensively, we can get the best result.

 

Guess you like

Origin blog.csdn.net/qq_40671063/article/details/130136404