Paper Reading- Pick and Choose: A GNN-based Imbalanced Learning Approach for Fraud Detection

Table of contents

Summary

introduction

2 Definition and problem statement

2.1 Definition

2.2 Problem statement

3 methods

 3.1 Overview

 3.2 Options: Label Balanced Sampler

 3.3 Selection: Neighborhood Sampler

3.4 Aggregation: Messaging Architecture

 3.5 training

 4 experiments


Summary

        Graph-based fraud detection methods have recently attracted much attention due to the rich relational information of graph-structured data, which may benefit fraudster detection. However, GNN-based algorithms may perform poorly when the label distribution of nodes is heavily skewed, which is common in sensitive domains such as financial fraud. To address the class imbalance problem in graph-based fraud detection , we propose a Pick and Choose Graph Neural Network (PC-GNN for short) for imbalanced supervised learning on graphs.

        First, nodes and edges are picked using a designed label-balanced sampler to build subgraphs for mini-batch training.

        Next, for each node in the subgraph, neighbor candidates are selected by the proposed neighborhood sampler . Finally, information from selected neighbors and different relations is aggregated to obtain the final representation of the target node. Experiments on benchmark and real-world graph-based fraud detection tasks show that PC-GNN clearly outperforms the state-of-the-art baselines.

introduction

        Fraud detection is an important task with many high-impact applications in domains such as security [29], finance [22, 39, 49], healthcare [17], and review management [10, 26, 34]. Although many techniques have been developed over the past few years to detect fraudsters in multidimensional point collections, graph-based fraud detection [1, 3, 33] has recently gained attention as graph data become ubiquitous. Essentially, the basic assumption of graph-based fraud detection is that users and fraudsters have rich behavioral interactions when purchasing products or posting reviews, and this interaction can be represented as graph-like data, thereby presenting effective multi-faceted information for use. for fraud detection.

        However, in fraud detection tasks, the number of fraudsters may be much smaller than the number of benign fraudsters. For example, in YelpChi [34] , a dataset of real reviews on Yelp.com, 14.5% of reviews are spam, while the rest are considered recommended reviews. In Alibaba Group's real financial dataset [49], only 0.5% of users are defaulters who cannot repay credit debts borrowed from financial platforms. Therefore, graph-based fraud detection algorithms often suffer from class imbalance and perform poorly, especially for a small but more important class, namely fraudsters .

        In recent years, research efforts dedicated to addressing the class imbalance problem in traditional feature-based supervised learning settings are mainly divided into two directions, namely resampling and reweighting methods. Resampling methods balance the number of examples by oversampling the minority class [5, 24] or undersampling the majority class [32]. Reweighting methods assign different weights to different classes or even different samples through cost-sensitive adjustments [4, 9, 19, 21] or meta-learning based methods [14, 35, 37] .

        While class imbalance supervised learning in traditional feature spaces is well studied, graph neural network algorithms that specifically address the class imbalance problem have not been fully explored . DR-GCN [36] is a pioneer in solving the problem of graph class imbalance. This approach proposes a class-conditional adversarial regularizer and a latent distribution alignment regularizer, but does not scale to large graphs.

        We highlight three major challenges from two aspects when designing class-imbalanced graph neural networks for graph-based fraud detection.

        From the application side, fraudsters may forge noisy information to make it difficult to be identified, such as camouflage [10]. The first challenge that arises from this is link information redundancy . For example, for spammers, they will use benign accounts to post their spam comments, so that there will be many connections between spam comments and benign users, and spammers will hide among benign users. Many feature-based or label-based similarities will fail to identify such noisy neighbors, since fraudsters may have a close Euclidean distance to benign neighbors, but their labels will be different. The second challenge posed by cloaking is that fraudsters lack the necessary link information . For example, in a financial setting, fraudsters avoid transacting with each other to avoid detection together. As shown in Figure 1, the left fraudulent central node has similar transaction patterns to the right central node, so the characteristics of the right fraudulent node are crucial for identifying the left fraudulent node. However, there is no link between two nodes, which degrades the performance of GNN-based methods.

        From an algorithmic perspective, the challenge comes from the message aggregation of GNNs [11, 16, 38, 43, 44], which may result in the feature dilution of the minority class. Recall that a key design of graph neural networks lies in neighborhood aggregation, but in an imbalanced setting, most neighbors of a central node may belong to the majority class. For example, as shown in Figure 1, only one of the six neighbors belongs to the same fraud category as the left center node. As a result, features of cheating neighbors are easily ignored and predictions are easily dominated by mostly benign features .

 (Fig. 1: Illustration of the challenge in graph-based fraud detection. For the fraud center node on the left, only one of the six neighbors belongs to the same fraud class due to class imbalance. Therefore, messages from fraudulent neighbors in the message passing process are easily masked. Also, the neighborhood of the right fraud center node is similar to the neighborhood of the left fraud center node , but they are not connected.)

        To address the above challenges, in this paper, we propose a GNN-based imbalance learning method for graph-based fraud detection. For the algorithmic challenge, we design a label-balanced sampler to select nodes and edges for training . The probability assigned to each node is inversely proportional to its label frequency, so nodes in the minority class are more likely to be selected. Therefore, the induced subgraph of the chosen nodes will have a balanced label distribution.

        For application-side challenges, we propose a neighborhood sampler to select neighborhoods with a learnable parameterized distance function .

        For fraudulent target nodes, redundant links can be filtered by selecting neighbors that are far from the target and removing them from the neighbor set. And the necessary links that are beneficial to fraud prediction will be created by selecting similar nodes of the fraud class and treating them as neighbors.

        We integrate the above two stages of graph sampling and neighbor selection into a general GNN framework , and name our model Pick and Choose Graph Neural Network (PC-GNN). Our contributions can be listed as follows.

        We formulate the graph-based fraud detection problem as an imbalanced node classification task, and propose a GNN-based imbalanced learning method to address class imbalances on graphs.

        We design a label-balanced sampler to select nodes and edges for subgraph training, and a neighborhood sampler to select neighbors to oversample the neighborhood of the minority class and undersample the neighborhood of the majority class .

        Extensive experiments are conducted on two public benchmark datasets and two real-world datasets to verify the effectiveness of the proposed framework.

        The remainder of this paper is organized as follows. Section 2 presents the definition and problem statement of this paper. Section 3 details the proposed PC-GNN framework, and Section 4 illustrates the experiments. Section 5 surveys relevant studies in the literature and Section 6 concludes the paper.

2 Definition and problem statement

2.1 Definition

        Definition 2.1 (Unbalance Ratio). Given a set of labels C, C1 and C2 represent two classes in C. The imbalance ratio of C1 and C2 is defined as IR = | C1 | / | C2 | . Therefore, IR lies in the range [0, +∞). If IR > 1, then C1 is called the majority class and C2 is called the minority class. In particular, if IR = 1, then C is balanced.

        Definition 2.2 (multi-relational imbalance graph). Given a graph G = (V, E, A, X, C), V = {v1, . . . ,vN } is a set of nodes, and E = {E1, . . . ,ER } is an edge set of R relations , A = {A1, . . . , AR } is a set of corresponding R-relational adjacency matrices. For each node vi ∈ V, xi ∈ X is a d-dimensional feature vector, ci ∈ C is a scalar label, i = 1, . . . ,N. X and C are the set of node features and labels, respectively. If the imbalance ratio of two classes in C is much greater than 1, we call G a multi-relational imbalance graph.

2.2 Problem statement

        Definition 2.3 (Graph-based Fraud Detection). The graph-based fraud detection problem defined on a multi-relational imbalanced graph G = (V, E, A, X, C) has been formulated in Definition 2.2. Every node in V is marked as fraudulent or benign in C. Graph-based fraud detection is to find fraudulent nodes that are significantly different from other benign nodes on a multi-relational imbalanced graph G, which can be formulated as an imbalanced node classification problem on G.

3 methods

        In this section, we introduce the proposed PC-GNN framework. First, we give an overview of the whole framework. We then detail the picking process and selection process in Sections 3.2 and 3.3, respectively . Next, we explain how to aggregate information from different neighbors and relations in Section 3.4 .

 (Fig. 2: The figure shows layer ℓ of the proposed PC-GNN framework on an example graph. ❶ shows an example graph containing 11 nodes. Solid and dashed lines represent two kinds of relationships between these nodes. Gray nodes are fraudulent nodes, and white nodes are benign nodes. From ❶ to ❷, a subset of nodes and edges is selected using a label-balanced sampler to build a subgraph for mini-batch training. ❷ illustrates the subgraph induced by picked nodes , where Unsampled nodes and edges are blurred. From ❷ to ❸, neighbors are selected by a neighborhood sampler. For a fraudulent node v in ❷, the neighborhood is oversampled, similar to v but not directly connected. Meanwhile, v’s original neighbor The set is under-sampled , where different nodes under the learned distance function are removed. Under different relations, as shown in ❸, the neighbors chosen for v may be different, for example different from aggregate all the neighbor information, and The embeddings from different relations are concatenated to get the final representation of v at layer ℓ, denoted as in the figure .)

 3.1 Overview

         We illustrate the pipeline of the proposed framework on an example graph in Figure 2. To obtain a representation of a target entity, there are three main steps: picking, selecting, and aggregating. In the selection step, we design a label-balanced sampler to select nodes and edges for subgraph training. Next, in the selection step, we design a neighborhood sampler to oversample the neighbors of the minority class and undersample the neighbors of the majority class. Finally, in the aggregation step, we aggregate information from sampled neighbors and different relations.

 3.2 Options: Label Balanced Sampler

         We design a label-balanced graph sampler to select nodes and edges to build subgraphs. The key idea is to incorporate label distribution information into the sampling process. For node samplers, the sampling probability of the minority class is higher than that of the majority class.

         Formally, G = (V, E, A, X, C) is a multi-relational imbalanced graph, is the sum of the adjacency matrices of all relations, and is the normalized adjacency matrix, where D is the degree Diagonal matrix as its elements. For a node v ∈ V, its sampling probability is defined as equation (1) .

         where ^A(:,v) is the column of v in the normalized adjacency matrix ^A, and LF(C(v)) denotes the label frequency of class C(v). The selected set of nodes is labeled Vp, and Gp = (Vp, Ep, Ap, Xp, Cp) is a subgraph derived from Vp and its one-hop neighbors.

 3.3 Selection: Neighborhood Sampler

        After the selection step, the following steps are performed on the derived subgraph Gp. For notational clarity, we omit the subscript p in the following sections, ie we denote the adjacency matrix in Ap as and the \{A_r\}^R_{r=1}edges in Ep as . Neighbors of node v under each relation Er are collected in sets  , as shown in equation (2).

         As discussed in Section 1, a neighbor definition such as equation (2) is not appropriate. In an unbalanced graph, because Nr(v) may contain disguised neighbors or lack necessary nodes that are critical for prediction. To alleviate this problem, we should undersample the neighborhood of the majority class to filter out those noisy neighbors, and oversample the neighborhood of the minority class to add useful edges.

        In effect, we add a constraint to the definition of the majority-class neighborhood to filter those neighbors that are far away from the target node under a certain distance function. The set of undersampled neighbors of node v is denoted by Nr(v), as shown in Equation (3), obviously .

 3.3.1 Distance function.

        The distance function D(·,·) depends on a specific metric in the latent space. A widely used distance function in latent spaces is the Euclidean distance of features, ie D(v, u) = ∥xv − xu ∥, where xv ∈ Rd denotes the feature of node v. However, this distance function is inflexible in fraud detection since it does not consider label information. Therefore, inspired by LAGCN [6], we adopt a parameterized distance function combining latent embedding and ground truth label information, defined as follows:

         where  is a fully-connected layer that predicts the fraud probability based on the learned embeddings of  node v in relation Er and  layer is the weight matrix of the distance function. Distance is defined as the difference between the predicted probabilities of v and u .

 3.3.2 Neighborhood sampling.

        Therefore, the undersampled neighborhood is rewritten as Equation (5).

         Furthermore, v's neighborhood (belonging to the minority class) can be oversampled by nodes on the graph that are far away from v but have some similarity to v, which can be expressed as and formulated as Equation (6)

         To sum up, for the target node v of the majority class, the neighborhood of v is undersampled ; for the target node v of the minority class, the neighborhood of v is oversampled

 3.3.3 Learning.

        The neighborhood sampler is learnable due to the parameterization of the distance function. The parameters of the distance function include  weights, i.e.  , it is optimized with the cross-entropy loss as equation (7).

3.4 Aggregation: Messaging Architecture

        After the selection step, the oversampled neighborhood of the minority class or the undersampled neighborhood of the majority class are collected under the relation Er. Message passing based graph neural networks aim to aggregate information from all neighbors and relations. Let denote  the representation of node v at level ℓ under relation Er, where v ∈ V, r = 1, . . . , R, ℓ = 1, . . . , L and L is the number of levels.

         The polymerization step is further divided into two steps. First, under each relation, aggregate all the information of the selected neighbors as shown in Eq. (8), where AGG(r)r is the average aggregator function of layer r under the relation Er, and ⊕ denotes the join operation , is the weight matrix.

         Then, we need to combine each  with the previous layer  's representation to obtain ℓ in layer ℓ , as shown in Equation (9). is the weight matrix.

 3.5 training

         After the aggregation step, the MLP classifier is trained with a graph neural network to minimize the cross-entropy loss.

         The total loss function is formulated as (11), where α is the balance parameter.

         Algorithm 1 summarizes the overall training algorithm. Given a multi-relational imbalanced graph G and a training node set Vtrain, we first select nodes from Vtrain according to the sampling probability in Eq. (1) is used for training (row 3). These nodes are divided into mini-batches of size Nbatch (line 6). For a node in each subgraph of each batch, its neighbors are oversampled or undersampled according to its label frequency (Line 9). Messages from selected neighbors are then aggregated (Line 10) and the representations of different relations are concatenated (Line 11).

 4 experiments

         In this section, we investigate the effectiveness of the proposed PC-GNN model on two graph-based fraud detection tasks, namely, opinion fraud detection and financial fraud detection, with the aim of answering the following research questions.

        • RQ1: Does PC-GNN outperform state-of-the-art methods for graph-based anomaly detection?

        • RQ2: How do key components contribute to forecasting?

        • RQ3: How do different training parameters perform?

        • RQ4: If the proposed module is applied to other GNN models, will it lead to performance improvement?

Guess you like

Origin blog.csdn.net/qq_40671063/article/details/130766344