Article directory
Introduction to the paper
Original title : ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning
Chinese title : Graph Anomaly Detection based on Multi-Scale Contrastive Learning
Publication conference : CIKM
Publication year : 2021-10-26
Author : Ming Jin
latex citation :
@inproceedings{jin2021anemone,
title={Anemone: Graph anomaly detection with multi-scale contrastive learning},
author={Jin, Ming and Liu, Yixin and Zheng, Yu and Chi, Lianhua and Li, Yuan-Fang and Pan, Shirui},
booktitle={Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
pages={3122--3126},
year={2021}
}
Summary
Anomaly detection on graphs plays an important role in network security, e-commerce, financial fraud detection and other fields. However, existing graph anomaly detection methods usually only consider a single graph-scale view, which results in their limited ability to capture anomaly patterns from different perspectives. To this end, we introduce a new graph anomaly detection framework, ANEMONE, to identify anomalies at multiple graph scales simultaneously. Specifically, ANEMONE first utilizes a graph neural network backbone encoder with a multi-scale contrastive learning objective to capture the pattern distribution of graph data by simultaneously learning the agreement between instances at the patch and context levels. Our method then employs a statistical anomaly estimator to evaluate the anomaly of each node based on the degree of agreement across multiple angles. Experiments on three benchmark datasets demonstrate the superiority of this method.
Problems
These methods mainly detect anomalies from a single-scale perspective, ignoring the fact that node anomalies in the graph often occur at different scales.
Paper contribution
- A multi-scale contrastive learning framework ANEMONE is proposed for graph anomaly detection, which can capture anomaly patterns at different scales.
- A new statistics-based algorithm is designed to estimate node anomalies with the proposed contrast pattern.
- Extensive experiments are conducted on three benchmark datasets to demonstrate the superiority of ANEMONE in detecting node-level anomalies on graphs
illustrate
- ego network:ego-net
- Random walk: You can refer to my notes in this article
- Negative sampling strategy: Negative sampling strategy
1. ANEMONE framework
For a selected target node, ANEMONE calculates the node’s anomaly score by leveraging two main components:
- Multi-scale contrastive learning model : Two gnn-based contrastive networks learn patch-level (i.e., node-to-node) protocol and context-level (i.e., node-to-self network) protocol respectively.
- Statistical anomaly estimator : Summarizes the patch-level and context-level scores obtained by multiple enhanced self-networks, and calculates the final anomaly score of the target node through statistical estimation. We will introduce these two components in the following sections.
-
Multi-scale contrastive learning model
Preparation :
- Input graph GGG , select a target node
- With the target node as the center, use the random walk method to collect two ego-networks (a simple understanding is the two sub-networks traversed with the target node as the center), recorded as G p G_pGpand G c G_cGc(p and c represent patch_level and context-level respectively)
- Will G p G_pGpand G c G_cGcThe first node in the node collection is set as the center (target) node.
- In order to prevent information leakage in the next contrastive learning step, a preprocessing called target node masking should be performed in the ego network before inputting it into the contrast network. Specifically, the attribute vector of the target node is replaced with a zero vector .
patch_level comparison network :
- Learn G p G_pGpEmbedding and target node vi v_iviEmbedding consistency. G p G_pGpThe embedding is obtained using the GCN model, denoted as H p H_pHp, target node vi v_iviThe embedding is obtained using the MLP model, denoted zp z_pzp。
- Note that we share the same θ \theta hereθ , in order to map the target node embedding and graph embedding into the same space.
- Note that when using GCN to obtain graph embedding, the attribute vector of the target node should use a zero vector. When using MLP to obtain node embedding, just use the original attribute vector of the target node.
- Use bilinear layers to calculate their similarity scores:
sp ( i ) = B ilinear ( hp ( i ) , zp ( i ) ) = σ ( hp ( i ) W pzp ( i ) T ) s_p^{(i )} = Bilinear(h_p^{(i)},z_p^{(i)}) = \sigma (h_p^{(i)}W_pz_p^{(i)T})sp(i)=Bilinear(hp(i),zp(i))=s ( hp(i)Wpzp(i)T) - Use negative sampling strategy to train:
context_level comparison network:
- Learn G c G_cGcEmbedding and target node vi v_iviEmbedding consistency. G p G_pGpThe embedding is obtained using GCN model + readout, denoted as H c H_cHc, target node vi v_iviThe embedding is obtained using the MLP model, denoted zc z_czc。
z c z_c zcThe calculation method is similar to the patch_level comparison network. - The similarity calculation method is also the same as the patch_level comparison network.
- Negative sampling strategy training:
Joint training, the loss function is:
-
Statistical anomaly estimator
After the above calculation, for each node iiI mean, there are 4R fractions, so
[ sp , 1 ( i ) , . . . , sp , R ( i ) , sc , 1 ( i ) , . . . , sc , R ( i ) , sp , 1 ( ˜ i ) , . . . , sp , R ( ˜ i ) , sc , 1 ( ˜ i ) , . . . , sc , R ( ˜ i ) ] [s_{p,1}^{(i) },...,s_{p,R}^{(i)},s_{c,1}^{(i)},...,s_{c,R}^{(i)},s_ {p,1}^{\~(i)},...,s_{p,R}^{\~(i)},s_{c,1}^{\~(i)},.. .,s_{c,R}^{\~(i)}][sp,1(i),...,sp,R(i),sc,1(i),...,sc,R(i),sp,1(˜i),...,sp,R(˜i),sc,1(˜i),...,sc,R(˜i)]Anomalous nodes are assumed to have less consistency with their adjacent structures and contexts. Therefore, we express the basic score as the difference between positive and negative scores:
the view can be p or cStatistical methods for anomaly estimation:
- Abnormal nodes have relatively large base scores. This is because abnormal node embedding usually has less consistency with graph embedding, which will lead to sview, j (i) s_{view,j}^{(i)}sview,j(i)很小,而sview , j ( ˜ i ) s_{view,j}^{\~(i)}sview,j(˜i)is large, resulting in a large base score.
- The basic score of abnormal nodes under multiple self-network sampling is unstable.
- Therefore, we will count the anomaly score yp (i) y_p^{(i)}yp(i)和 y c ( i ) y_c^{(i)} yc(i)Defined as the sum of the mean and standard deviation of the underlying score:
2. Experiment
-
data set
We conduct extensive experiments on three well-known citation network datasets, namely Cora, CiteSeer and PubMed.
-
Experimental results
Summarize
Paper content
-
learned methods
How to write a paper:
- introduce -> problem statement ->…