Graph Anomaly Detection via Multi-Scale Contrastive Learning Networks with Augmented View

Table of contents

Summary

Introduction

Early work on graph anomaly detection

Graph Constrastive Learning

Graph Augmentation

Problem definition

Method

Graph Augmentation

Graph Contrastive Network

 Experiments

Experiment Settings

 Model Parameters

Result and Analysis

Ablation Study

 Sensitivity Analysis


Summary

        Graph anomaly detection (GAD) is an important task in graph-based machine learning and has been widely used in many practical applications. The main goal of GAD is to capture anomalous nodes from graph datasets that deviate significantly from the majority of nodes.

        Recent methods focus on various scale comparison strategies for GAD, namely node-subgraph comparison and node-node comparison. However, these methods ignore the subgraph-subgraph contrast information , that is, normal and abnormal subgraph pairs behave differently in terms of embedding and structure in GAD, resulting in suboptimal task performance.

        In this paper, we propose a multi-view multi-scale contrast learning framework with sub-image-sub-image comparison for the first time to implement the above ideas. Specifically, we treat the original input graph as the first view and generate the second view through edge modification. Guided by maximizing the similarity of subgraph pairs, the proposed subgraph-subgraph comparison helps improve the robustness of subgraph embeddings despite structural changes.

         In addition, the introduced subgraph-subgraph comparison and the widely used node-subgraph comparison and node-node comparison can work well with each other to jointly improve GAD performance.

        In addition, we conduct sufficient experiments to study the impact of different graph enhancement methods on detection performance. Comprehensive experimental results well demonstrate the superiority of our method compared with state-of-the-art methods, as well as the effectiveness of the multi-view subgraph pair comparison strategy in GAD tasks. The source code is published at https://github.com/FelixDJC/GRADATE.

Introduction

        Graph-based machine learning has attracted great attention in the past few years (Wu et al. 2020; Liu et al. 2022c). As a representative task in graph learning, graph anomaly detection (Graph Anomaly Detection) aims to find anomalies from most nodes and is becoming an application that researchers are increasingly paying attention to (Ma et al., 2021). Due to its important value in preventing harmful events, GAD has been widely used in many fields, such as misinformation detection (Wu et al., 2019), financial fraud detection (Huang et al., 2018), network intrusion detection (Garcia -Teodoro et al., 2009), etc.

        Different from data in other anomaly detection fields (Cheng et al. 2021a,b; Hu et al. 2022), graph data includes node features and graph structure. The mismatch between these two types of information will produce two typical abnormal nodes, namely feature anomalies and structural anomalies (Liu et al., 2021). The former refers to nodes that are different from neighboring nodes in terms of characteristics, and the latter refers to a group of dissimilar but closely connected nodes.

        To detect these two types of anomalies, many previous methods have made great efforts and achieved impressive results. LOF (Breunig et al., 2000) obtains anomaly information about a node by comparing the characteristics of the node with its context nodes. SCAN (Xu et al., 2007) completes the GAD task through a network structure. By leveraging these two types of information, ANOMALOUS (Peng et al., 2018) detects anomalies based on CUR decomposition and residual analysis. (Muller et al. 2013; Perozzi et al. ¨ 2014) perform feature subspace selection and discover abnormal nodes in the subspace. The above methods rely on specific domain knowledge and cannot mine deep nonlinear information in graph datasets. This makes it difficult for them to further improve detection performance.

        Benefiting from the powerful ability to acquire graph information, graph convolutional networks (GCN) (Kipf and Welling, 2017) have recently achieved excellent performance in many graph data tasks. It is naturally applied to detect anomalies in graphs. The seminal work DOMINANT (Ding et al., 2019) first introduced GCN to accomplish this task. Specifically, DOMINANT compares the reorganized feature and adjacency matrices to the original input matrix. Nodes with larger changes have a higher probability of becoming outliers. Although this method performs well and is simple to implement, some exception information will be ignored. GCN generates node representations by aggregating neighborhood information, which will make anomaly information more difficult to distinguish (Tang et al., 2022). Based on the contrastive learning paradigm, CoLA (Liu et al., 2021) detects anomalies by computing the relationship between a node and its neighbors . This method mines local features and structural information around nodes. At the same time, it masks the characteristics of the target node, thus mitigating the impact of representation averaging. Different from this, ANEMONE (Jin et al., 2021a) adds node-node comparison to the comparison network, focusing on node-level anomaly information.

        However, existing works ignore further utilization of subgraph information and do not directly optimize their embeddings for graph anomaly detection. (Jiao et al., 2020; Hafdi et al., 2022; Han et al., 2022) have demonstrated that subgraph representation learning benefits graph-based machine learning tasks. It will greatly facilitate the mining of local features and structural information of individual subgraphs . For GAD, more representative and intrinsic subgraph embeddings can help compute more reliable relationships between nodes and their neighbors, which is a key step in the comparison strategy.

        To solve this problem, we propose a new graph anomaly detection framework through multi-scale contrastive learning network with newly added subgraph-subgraph contrast and Augmented view (called GRADATE). Specifically, we treat the original input graph as the first view and employ edge modification as a graph augmentation technique to generate the second view. In each view, subgraphs are sampled via a random walk. Then, we construct a multi-view comparison network with node-subgraph, node-node and subgraph-subgraph comparisons. The first two contrasts capture subgraph-level and node-level exception information from each view. Subimage - Subimage contrast is defined between two views and mines more local anomaly information for detection. In this way, the node-subgraph contrast will be significantly enhanced. After that, we combine various anomaly information and calculate the anomaly score of each node. Finally, we explore and analyze the impact of different graph enhancements on subgraph representation learning in GAD. Our main contributions are as follows:

        We introduce subgraph-subgraph contrast with GAD in the first practice and propose a multi-scale contrastive learning network framework with enhanced views.

        We study the impact of different graph augmentations on subgraph representation learning for tasks.

        Extensive experiments on six benchmark datasets demonstrate the effectiveness of the edge modification-based subgraph-subgraph comparison strategy in graph anomaly detection and the superiority of GRADATE compared to state-of-the-art methods.

Early work on graph anomaly detection

        (Li et al. 2017; Perozzi and Akoglu 2016; Peng et al. 2018) A non-deep paradigm is usually adopted to detect anomaly information from node features and network structure. However, they cannot continually improve their performance without digging for deeper information. In recent years, the rise of neural networks (Tu et al. 2021, 2022; Liang et al. 2022; Liu et al. 2022c) has enhanced the model’s ability to mine deep nonlinear information. The reconstruction-based method DOMINANT (Ding et al. 2019) obtains node anomaly scores by calculating changes in feature and structure matrices after GCN. AAGNN (Zhou et al. 2021) applies oneclass SVM in graph anomaly detection. HCM (Huang et al. 2021) considers the hop count estimate of a node and its first-order neighbors as its anomaly score. CoLA (Liu et al. 2021) first introduces the contrastive learning paradigm (Yang et al. 2022b,a) to detect node anomalies in the graph. Later methods (Jin et al. 2021a; Zheng et al. 2021; Zhang, Wang, and Chen 2022; Duan et al. 2022) made further improvements based on CoLA.

Graph Constrastive Learning

        Contrastive learning is one of the most important paradigms in unsupervised learning. Graph contrastive learning mines supervisory information for downstream tasks without expensive labels and achieves great success (Liu et al. 2022a).

        According to the negative sample usage strategy, existing works can be divided into two subcategories based on negative samples and without negative samples. For the first type, DGI (Velickovic et al. 2019) maximizes the mutual information between nodes and graphs to obtain useful supervisory information. SUBG-CON (Jiao et al. 2020) and GraphCL (Hafdi et al. 2022) form node subgraph comparisons to learn better node representations. For the second type, BGRL (Thhakoor et al. 2021) applies Siamese networks to obtain rich information from both views. (Liu et al. 2022b,d,e) takes advantage of Barlow Twins (Zbontar et al. 2021), which designs a special loss function to avoid representation collapse.

Graph Augmentation

        Graph augmentation produces reasonable changes in graph datasets (Ding et al. 2022). It expands the dataset and improves the model’s generalization ability without requiring expensive labels (Zhao et al. 2022). Most methods focus on operations on nodes or edges in the graph. (Wang et al. 2021; Feng et al. 2020; You et al. 2020) Pay attention to modifying node features. RoSA (Zhu et al. 2022) uses random walks with restarts as graph augmentation to learn robust representations of nodes. (Klicpera, Weißenberger, and Gunnemann 2019; Zhao et al. 2021) Adjust the adjacency matrix by adding or removing edges.

Problem definition

        In the next section, we formalize the task of graph anomaly detection. For a given undirected graph G=(V,E)

Method

        In this section, we introduce the proposed framework GRADATE. It consists of two main modules.

        In the graph enhancement module, we treat the original graph as the first view and generate the second view through edge modification.

        In the graph comparison network module, we first obtain anomaly information from the feature comparison of nodes and subgraphs, treating it as a node-subgraph comparison.

        For each view, the subgraph is sampled via a random walk and paired with the target node . We then build inter-node comparisons to capture node-level anomalies .

        The newly added subgraph-subgraph comparison directly optimizes GAD's subgraph embedding between two views.

        In the process, the performance of node-subgraph comparison will be improved.

        Afterwards, we employ an ensemble loss function to train these three comparisons. Finally, we synthesize various anomaly information and calculate the anomaly score of each node.

Graph Augmentation

        Graph augmentation is crucial to the self-supervised learning paradigm. It can help the model mine deeper semantic information of the graph. In this article, we leverage edge modification to create a second view. We then sample the subgraph via a random walk. Nodes and subgraphs will form the input to the graph comparison network.

Edge modification.

        Edge modification (EM) constructs a second view by perturbing the edges of the graph. Inspired by (Jin et al. 2021b), we not only delete edges in the adjacency matrix, but also add the same number of edges simultaneously. In practice, we first set a fixed proportion P to uniformly and randomly delete PM/2 edges from the adjacency matrix. PM/2 edges are then added to the matrix uniformly and randomly. In this way, we try to learn robust representations of subgraphs without destroying the properties of the graph. The graph augmentation methods used to generate the second view will be discussed further in the ablation studies section.

Random walk.

        For a target node, an effective anomaly detection method is to measure the characteristic distance between it and its neighbors (Liu et al. 2021). Therefore, we employ restarted random walk (RWR) (Qiu et al. 2020) to sample the subgraph around the node. The lower the feature similarity, the higher the abnormality of the target node.

Graph Contrastive Network

        The contrastive learning paradigm has been shown to be effective for GAD (Liu et al. 2021). We construct a multi-view graph comparison network, which consists of three parts, namely node-subgraph, node-node and subgraph-subgraph comparison. The first two contrasts are defined in each view and will be enhanced by the fusion of information from both views. Node subgraph comparison is mainly used to capture abnormal information in node neighborhoods. The second comparison allows for better detection of node-level anomalous lies. At the same time, a subfigure-subfigure contrast works between the two views. It will directly optimize the subgraph embedding of GAD, which significantly benefits node-subgraph comparison.

        Node subgraph comparison. The target node vi forms a positive pair with the subgraph where it is located, and forms a negative pair with a random subgraph where another node vj is located. We first adopt a GCN layer to map the features of nodes in the subgraph to the embedding space. It is worth noting that the characteristics of the target node in the subgraph are masked, that is, set to 0. The subgraph hidden layer representation can be defined as:

 Then, the final representation zi of the subgraph is calculated through the Readout function. Specifically, we use the average function to implement Readout:

 Correspondingly, we utilize MLP to transform the target node features into the same embedding space as the subgraph. The node hidden layer is expressed as:

         In each view, the anomaly degree of the target node is related to the subgraph similarity s_i^1and node embedding. We employ a bilinear model to measure this relationship:

         In general, target node and subgraph representations tend to be similar in positive pairs, i.e. s_i^1  = 1. Therefore, we employ the binary cross-entropy (BCE) loss (Velickovic et al., 2019) to train the contrast.

         where yi is equal to 1 in positive pairs and 0 in negative pairs.

s^2_iWe can also get the similarity and BCE loss          from another view L^2_{NS}. It is worth mentioning that the two networks in the two views use the same architecture and share parameters. Therefore, the final node subgraph contrast loss

          Among them, α∈(0,1) is a trade-off parameter used to balance the importance between two vi.

        Node-node comparison. Node-node comparison can effectively detect node-level anomalies. Likewise, target node features will be masked. And its representation is aggregated from other nodes in the subgraph. In each view, after MLP, it forms a positive pair with the same node, and after MLP, it forms a negative pair with another node. We utilize the new GCN to obtain the representation of subgraphs.

         At the same time, we use MLP to map node features to the same latent space:

        Similar to the node-subgraph comparison, we adopt a bilinear model to evaluate the relationship between ui and eˆi \widehat{s}_i^1. Then the node-node comparison loss function can be defined as

         Similarly, the similarity and loss L 2 NN of another view can also be calculated \widehat{s}^2_i. Therefore, the final node-node contrast loss function

         Among them, the view balance parameter α is shared with the node subgraph contrast loss.

       Subfigure-Subfigure comparison. Subfigure - A subfigure comparison is defined between two views. Its purpose is to learn more representative and intrinsic subgraph embeddings for GAD, thereby helping node-subgraph comparison to identify the relationship between nodes and their neighbors. In practice, we directly optimize the subgraph representation under the joint loss of node-subgraph contrast.

        The subgraph forms a direct pair with the perturbed subgraph, where the same target node vi is located in the other view. Different from the common graph comparison method (You et al. 2020), it forms a negative pair with two subgraphs with another node vj located in two views. Node vj is the same as the subgraph that forms a negative pair with vi in ​​the node subgraph comparison. Inspired by (Oord, Li, and Vinyals 2018), we adopt a loss function to optimize.

        loss function. In order to combine the advantages of the three comparisons, we optimize the joint loss function:

         Anomaly score calculation. In node-subgraph and node-node contrasts, a normal node is similar to the subgraph or node in its positive pair, but not similar to the subgraph or node in its negative pair. In contrast, outlier nodes are similar to nodes or subgraphs in neither positive nor negative pairs. Of course, we define the anomaly score of the target node as follows:

 We then comprehensively fuse the anomaly information from the two views and the three comparisons. The anomaly score can be further expressed as:

         One detection using only one random walk cannot capture enough semantic information. Multiple rounds of detection are crucial to calculate the anomaly score for each node. Inspired by (Jin et al. 2021a), we calculate the final anomaly score through the mean and standard deviation of multiple rounds of detection results:

         where R is the number of anomaly detection rounds. In general, the overall process of GRADATE is shown in Algorithm 1.

 Experiments

        We conduct extensive experiments on six graph benchmark datasets to verify the excellent performance of GRADARE. The results also confirm the effectiveness of subimage comparison and edge modification for GAD.

Experiment Settings

        The details of the experimental settings are as follows: (1) Data set. The proposed method is evaluated on six benchmark datasets, and the details are shown in Table 2.

        Datasets include Citation (Yuan et al. 2021), Cora (Sen et al. 2008), WebKB (Craven et al. 1998), UAI2010 (Wang et al. 2018), UAT and EAT (Mrabah et al. 2022).

         (2) Exception injection. After DOMINANT, we inject the same number of features and structural anomalies into the original dataset that had no abnormal nodes before. The total number of anomalies for each dataset is shown in the last column of Table 2.

         (3) Baseline. For the GAD task, we compare with eight well-known baseline methods. The first column of Table 3 summarizes them. The first two models are non-deep algorithms, and the remaining models are based on graph neural networks. Following CoLA, the node features of all datasets were reduced to 30 by PCA before running ANOMALOUS.

        (4) Metric system. We adopt the widely used anomaly detection metric AUC to evaluate the above methods.

 Model Parameters

        In node-subgraph and subgraph-subgraph comparisons, both GCN models have one layer and use ReLU as activation function. The size of the subgraph in the network is set to 4. Both node and subgraph features are mapped to 64 dimensions in the hidden space. In addition, we performed 400 rounds of model training and 256 rounds of anomaly score calculation.

Result and Analysis

        In this subsection, we evaluate the anomaly detection performance of GRADATE by comparing it with eight baseline methods. Figure 2 shows the ROC curves of the nine models. At the same time, Table 3 shows the comparison results of the AUC values ​​corresponding to Figure 2. Regarding the results, we have the following conclusions:

 (Figure 2: ROC curves of six benchmark data sets. The larger the area under the curve, the better the anomaly detection performance. The black dotted line is the "random line", indicating the performance under random guessing)

        We can intuitively find that GRADATE outperforms its competitors on these six data sets. Specifically, GRADATE achieves significant increases in AUC of 2.54%, 0.62%, 3.64%, 0.45%, 5.31%, and 0.43% on EAT, WebKB, UAT, Cora, UAI2010, and Citation, respectively. As shown in Figure 2, GRADATE's area under the curve is significantly larger than that of its competitors.

         We observe that most neural network based methods outperform shallow methods, LOF and ANOMALOUS. Shallow methods have inherent limitations in handling the high-dimensional features of graph data.

        Among deep methods, methods based on contrastive learning, CoLA, ANEMONE, SL-GAD, Sub-CR and GRADATE work better. This shows that contrastive learning-based patterns can effectively detect anomalies by mining the feature and structural information of graphs. GRADATE achieves the best performance with the newly added subgraph-subgraph comparison and multi-view learning strategy.

Ablation Study

        Comparative strategies at all scales. To verify the effectiveness of the proposed subgraph-subgraph comparison, we conduct ablation study experiments. For convenience, NS, NS+SS, NS+NN and NS+NN+SS mean using node-subgraph comparison only (CoLA), using node-subgraph and subgraph-subgraph comparison, using node-subgraph and node -Node comparison (ANEMONE), and use the above three comparisons (GRADATE) respectively. As shown in Table 4, adding subgraph-subgraph contrast can enhance detection performance by improving node-subgraph contrast. Best performance will be obtained using all three contrasts.

         Graph enhancement strategies. Meanwhile, we adopt four different graph enhancements to form the second view and explore their impact on performance. Gaussian noise features (GNF) refer to node features that are randomly perturbed by Gaussian noise. Feature masking (FM) means that random parts of node features are masked. Graph diffusion (GD) utilizes a graph diffusion matrix generated by a diffusion model (Hassani and Khasahmadi 2020; Klicpera, Weißenberger, and Gunnemann 2019). ¡ GNF and FM are perturbations of node characteristics. GD and edge modification (EM) are widely used edge graph enhancement methods. As shown in Table 5, EM achieves the best performance on all datasets. Upon further analysis, the perturbation of node characteristics may disrupt the characteristics of normal nodes. This compromises the comparison between a node and its neighbors, which is the basis of GAD contrastive learning. This may cause some normal nodes to be misclassified as abnormal and cause performance degradation. Furthermore, GD is a structure-based graph augmentation method. However, its main purpose is to capture global information. Therefore, EM is more compatible with subgraph-subgraph comparison than GD, and can improve node-subgraph comparison to mine local neighborhood information of nodes.

 Sensitivity Analysis

        Equilibrium parameters α, β and γ. We discuss three important balancing parameters in the loss function. As shown in Figure 3, the hyperparameters α and β effectively improve the detection performance of EAT and UAI2010. Similar phenomena can be observed on other datasets. In fact, we set α to 0.9, 0.1, 0.7, 0.9, 0.7 and 0.5 on EAT, WebKB, UAT, Cora, UAI2010 and Citation. Meanwhile, we set β to 0.3, 0.7, 0.1, 0.3, 0.5 and 0.5.

         Figure 4 illustrates the performance change of GRADATE when γ changes from 0.1 to 0.9. From the figure, we observe that GRADATE tends to perform well by setting γ to 0.1 across all benchmarks.

         Edge modification ratio P. We also study the impact of different parameterized edge modifications. Figure 5 shows that on UAT, UAI2010 and Citation, the detection performance suffers relatively small fluctuations due to modifying the proportion P. Taken together, we fixed P = 0.2 on all data sets.

 

Guess you like

Origin blog.csdn.net/qq_40671063/article/details/132865722