【Paper Reading】EULER: Detecting Network Lateral Movement via Scalable Time Link Prediction (NDSS-2022)

Author: George Washington University - Isaiah J. King, H. Howie Huang
Citation: King IJ, Huang H H. Euler: Detecting Network Lateral Movement via Scalable Temporal Graph Link Prediction [C]. Proceedings 2022 Network and Distributed System Security Symposium, 2022 .Original
address: https://dl.acm.org/doi/pdf/10.1145/3588771
Source code address: https://github.com/iHeartGraph/Euler
Dataset: LANL



0. Summary

  A framework for EULER is proposed. It consists of a graph-agnostic neural network model stacked on a model-agnostic sequence encoding layer such as a recurrent neural network. Models built according to the EULER framework can easily distribute their graph convolutional layers across multiple machines to achieve large performance gains. The EULER model can efficiently identify anomalous connections between entities with high accuracy and outperforms other unsupervised techniques.

1. Introduction & Motivation

  The most reliable way to detect the spread of malware is not to exhaustively list every known malicious signature associated with it; instead, it is to train a model to learn what normal activity looks like, and to sound an alert when it detects behavior that deviates from normal activity. Existing challenges:The detection model needs to be scalable to accommodate terabytes of log files and must have an extremely low false positive rate

  In this work, we formulate abnormal lateral movement detection as a temporal graph link prediction problem. Interactions occurring in discrete time units on the network can be abstracted as a series of graphs called snapshots G t = { V , E t } G_t = \{V, E_t\}Gt={ V,Et} , whereVVV is the set time period ttin the networkEntity setE t = { ( u , v ) ∈ V } E_t = \{(u, v) ∈ V\ }Et={(u,v)V}The temporal link prediction model will learn normal behavioral patterns from previous snapshots and assign likelihood scores to edges that occur in the future. Edges with low likelihood scores are associated with abnormal connections in the network

  Recent temporal link prediction methods combine graph neural networks (GNNs) with sequence encoders such as recurrent neural networks (RNNs) to capture topological and temporal features of evolving networks. However, these approaches either rely on the RNN output from the embedded GNN stage, or simply incorporate the GNN into the RNN architecture. As shown in Figure 1a, these models must be continuous, thus they cannot be extended to handle large datasets.

  observed:1) The most memory-intensive part of existing architectures occurs in the message passing stage of GNN; 2) There is an imbalance between the huge size of node input features and the relatively small topological node embeddings; which means that the most work and the most memory usage happens in GNNs. If multiple replicated GNNs independently operate on snapshots, they can be executed concurrently and the performance improves accordingly, as shown in Figure 1b.
insert image description here

(a) Previous approaches rely on RNN output in the embedded GNN stage, or simply incorporate GNNs into the RNN architecture, which forces the model to work serially, one snapshot at a time. In contrast, (b) the EULER framework can leverage multiple worker machines to keep continuous snapshots of the discrete-time graph. These workers process snapshots in parallel through replicated GNNs shared by each machine. The outputs of these GNNs are returned to the leader machine, which runs them through a recurrent neural network to create temporal node embeddings that can be used for link prediction.

The summary contributions are as follows:

  • For the first time, temporal graph link prediction is used for anomaly-based intrusion detection. Other studies applying graph analysis to anomaly detection either did not consider the temporal nature of the data, or did not use powerful GNN models
  • For temporal link prediction and detection, our proposed simple framework is as accurate or more accurate than state-of-the-art temporal graph autoencoder models
  • A scalable framework for distributed time-linked prediction for big data is proposed

2. Background

  Discrete temporal graph : G = { G 1 , G 2 , . . . GT } G = \{G_1, G_2, ...G_T \}G={ G1,G2,...GT} is defined as a series of graphsG t = { V , E t , X t } G_t = \{V, E_t, X_t\}Gt={ V,Et,Xt} is called a snapshot. VVV represents the set of all nodes appearing in the network,E t E_tEtIndicates the time ttThe relationship between t nodes, that is, the edge set, X t X_tXtmeans with ttNode-related features at time t . All graphs are directed, some have weighted edges,W : E → RW : E → RW:ER represents the edge frequency in the time period covered by each snapshot. One graph is the time windowδ δAll subject, object, time triples< src , dst , ts > <src,dst,ts> within δ<src,dst,ts> .
  Temporal Link Prediction: Defined as finding a function that describes the likelihood that an edge exists at a point in time in a temporal graph, given a previously observed snapshot of the network. Observed interactions between entities with a likelihood score below a certain threshold are called anomalies. In the context of network monitoring, these anomalous edges often indicate lateral movement.

3. Motivation

  Consider the example shown in Figure 2. The first two time slices show normal activity in the network: first at t0, Alice and Bob authenticate to their computers A and B, then at t1 computers A and B make a request to the shared drive. At times t2 and t3, if we don't see Bob authenticating to computer B first, it doesn't communicate with the shared drive. A simple probability distribution is obvious: P ( ( C 1 , SD ) ∈ E t + 1 ∣ ( B , C 1 ) ∈ E t ) = 1 P ( ( C 1 , SD ) ∈ E t + 1 ∣ ( B , C 1 ) ∉ E t ) = 0 \begin{gathered} \mathsf{P}((\mathsf{C1},\mathsf{SD})\in{\mathcal{E}}_{\mathsf{ t}+1}\mid(\mathsf{B},\mathsf{C1})\in{\mathcal{E}}_{\mathrm{t}})=1 \\ \mathsf{P}((\ mathsf{C1},\mathsf{SD})\in\mathcal{E}_{\mathsf{t}+1}\mid(\mathsf{B},\mathsf{C1})\not\in\mathcal{ E}_{\mathsf{t}})= 0 \end{gathered}P((C1,SD)Et+1(B,C1 )Et)=1P((C1,SD)Et+1(B,C1 )Et)=0However, in t4 and t5, something unusual happens: Computer B requests data from the shared drive without Bob authenticating to it first, which could be an attack.

insert image description here
  Existing graph-based methods do not consider time, while many event-based methods view each event in isolation, they lack the ability to capture the importance of interactions occurring between other entities in the network and how they are related to individual events, will No difference can be seen between (C1,SD) at time t1 and time t5. To detect the attack in the example, the model needs to consider the event with reference to previous events as well as other interactions in the network.An event between two entities occurring at one point in time cannot be considered identical to the same event occurring in the future in a different global context

The examples given in the original text here are t1 and t4, but what I understand is: the traditional method will think that t5 is normal because t0 happened, but it is not, so time information must be considered, and the time impact of the same information cannot be ignored, such as Verification expired and re-verification is required. So changed to t1 and t5.


4. EULER

  The framework aims to learn a probability function conditioned on previous states of a temporal graph to determine the likelihood of an edge appearing in a later state.

A. Encoders and decoders

  It consists of a model-agnostic graph neural network (GNN) stacked on a model-agnostic recurrent neural network (RNN). These models collectively aim to find the encoding function f ( ⋅ ) f( )f() and the decoding functiong ( ⋅ ) g(·)g() . The encode function will haveTTThe nodes in the time graph of T snapshots are mapped toTTT low-dimensional embedding vectors. The decoding function ensures that minimal information is lost during the encoding process and is designed to remove potentialZZZ = f ( { G 0 , ... , GT } ) = RNN ⁡ ( [ GNN ⁡ ( X 0 , A 0 ) , ... , GNN ⁡ ( XT , AT ) ] ) \begin{aligned } & Z=f(\{\mathcal{G}_0,\ldots,\mathcal{G}_T\}) \\&=\operatorname{RNN}(\left[\operatorname{GNN}(\mathbf{X} _0,\mathbf{A}_0),\ldots,\operatorname{GNN}(\mathbf{X}_T,\mathbf{A}_T)\right]) \end{aligned}Z=f({ G0,,GT})=RNN ( [ GNN ( X0,A0),,GNN(XT,AT)])Among them A t A_tAtis the time ttt snapshot of∣ V ∣ × ∣ V ∣ |V|×|V|V×V adjacency matrix representation. ThisT × ∣ V ∣ × d T × |V|× dT×V×d- dimensional tensorZZZ is optimized to contain information about the structure of the graph, and the dynamics of how it changes over time.

  The function g ( Z t ) = P r ( A t + n = 1 ∣ Z t ) = σ ( Z t Z t T ) = A ~ t + n \mathbf{g}(\mathbf{Z}_\mathbf {t})=\mathbf{Pr}(\mathbf{A}_{\mathbf{t}+\mathbf{n}}=\mathbf{1}\mid\mathbf{Z}_\mathbf{t}) \\=σ(\mathbf{Z}_{\mathrm{t}}\mathbf{Z}_{\mathrm{t}}^{\mathsf{T}})={\mathbf{\tilde{A} }}_{\mathrm{t+n}}g(Zt)=Pr ( A)t+n=1Zt)=s ( ZtZtT)=A~t+nσ ( ⋅ ) σ(·)σ ( ) represents the logistic sigmoid function, andA ~ t + n {\mathbf{\tilde{A}}}_{\mathrm{t+n}}A~t+nmeans at time t + nt + nt+Reconstructed adjacency matrix at n .

B. Workflow

  of the EULER frameworkThe core is to stack a copy of a model-independent GNN (which we call a topological encoder) on top of a model-independent recurrent layer with some simple constraints. It has the potential for massive parallelism when adapted to a leader/worker paradigm with one recurrent layer as the leader and multiple topological encoders as workers. The overall workflow is shown in Figure 3, which is divided into five stages:

  • The leader spawns workers and instructs them which snapshots to load
  • The leader starts the training loop and the workers generate topological embeddings
  • After receiving topological embeddings, the leader processes them through the RNN
  • Send the output of the RNN back to the worker to calculate the loss or score
  • In training mode, the loss is returned to the leader for backpropagation.

c. training

  Two training modes: link detector and link predictor. The difference is that in step 4, Z t Z_tZtEmbeds are sent to staff to calculate damages. Link detectors are inductive; they use partially observed snapshots to generate Z t Z_tZtand try with g ( Z t ) g(Z_t)g(Zt) to reconstruct the complete adjacency matrixA t A_tAt. Audits are then performed manually to identify anomalous connections that have occurred. Link predictors are transductive; they use snapshots to generate Z t Z_tZt, to predict the future state A t + n A_{t+n}At+n, and then score the observed edges.

D. classification

  Although for most of our evaluation we rely on regression metrics related to the fitness of the scores assigned to edges, it is useful to automate the process of determining outlier thresholds to obtain classification scores. To this end, we take one or more full snapshots as an additional validation set when training the model. Using the final hidden state h of the RNN from the training snapshot as input to the validation snapshot, find the optimal cutoff threshold for the marginal likelihood score. Given the set of edge scores present in the validation snapshot, the optimal cutoff threshold τ satisfies argmin ∥ ( 1 − λ ) TPR ( τ ) − λ FPR ( τ ) ∥ \text{argmin}\quad\|(1-\lambda )\text{TPR}(\tau)-\lambda\text{FPR}(\tau)\|argmin(1l ) TPR ( t )λ FPR ( τ ) TPR ( τ ) TPR(τ)TPR ( τ )FPR ( τ ) FPR(τ)FPR ( τ ) refers to the given cut-off thresholdτ τThe true positive rate and false positive rate for the classification of τ , λ = 0.6.

5. Benchmarking

  The most general GNN available, superimposed on the GRU. Very simple, known as the "naive method", but it is also the fastest time model tested.

  An edge dropout layer is included before the initial forward pass, and a feature dropout layer is included between all layers to prevent overfitting and oversmoothing on small datasets.

  Both the hidden layer and the output are 32-dimensional. The GCN output sequence is then passed through a tanh activation function, then processed by a single 32-dimensional GRU, and finally the output is projected into a 16-dimensional embedding by an MLP.

Other evaluation models: DynGraph2Vec, Evolving GCN, VGRNN, VGAE

Three datasets: Facebook, Enron10 and COLAB
insert image description hereinsert image description here

6. Lateral movement detection

LANL dataset, 57 days of log files from 5 different sources, normal activity + red team activity, labeled.
insert image description here
We test the three encoders with two recurrent neural networks and models without recurrent layers to measure the value of temporal data to the overall embedding. Encoder models include GCN, GAT, and GraphSAGE. The recurrent models are GRU and LSTM.
insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/qq_44623371/article/details/130863632