【GNN+Encrypted Traffic C】VT-GAT: A Novel VPN Encrypted Traffic Classification Model Based on GAT

Introduction to the paper

Original title : VT-GAT: A Novel VPN Encrypted Traffic Classification Model Based on Graph Attention Neural Network
Chinese title : VT-GAT: A Novel VPN Encrypted Traffic Classification Model Based on Graph Attention Neural Network
Published book : Collaborative Computing: Networking , Applications and Worksharing
Year of publication : 2022
Author : Hongbo Xu
latex citation :

@inproceedings{xu2022vt,
  title={VT-GAT: A Novel VPN Encrypted Traffic Classification Model Based on Graph Attention Neural Network},
  author={Xu, Hongbo and Li, Shuhao and Cheng, Zhenyu and Qin, Rui and Xie, Jiang and Sun, Peishuai},
  booktitle={International Conference on Collaborative Computing: Networking, Applications and Worksharing},
  pages={437--456},
  year={2022},
  organization={Springer}
}

Summary

Virtual Private Network (VPN) technology is currently widely used in various scenarios such as remote office. With the development of proxy technology, VPN traffic identification has become increasingly important to network security and management. Unlike other tasks such as application classification, VPN traffic has only one flow problem. In addition, the development of encryption technology has also brought new challenges to VPN traffic identification.

In order to solve the above problems, this paper proposes a VPN traffic graph classification model VT-GAT based on the graph attention network (GAT). Compared with existing VPN encrypted traffic classification technologies, VT-GAT solves the problem of previous technologies ignoring the graph connectivity information contained in the traffic. VT-GAT first builds traffic behavior graphs by describing raw traffic data at the packet and flow levels. Then the graph neural network and attention mechanism are combined to automatically extract behavioral features in traffic map data. Extensive experimental results on the Datacon21 dataset show that VT-GAT can achieve more than 99% on all classification indicators. Compared with existing machine learning and deep learning methods, VT-GAT improves F1-Score by approximately 3.02%-63.55%. In addition, VT-GAT maintains good robustness when the number of classification categories changes. These results demonstrate the effectiveness of VT-GAT in VPN traffic classification.

Problems

  1. When users use VPN apps to obfuscate their identities, the amount of traffic extracted drops dramatically. As shown in Figure 1, the server-side IP address and port of packets sent by the user are replaced by the VPN application. Therefore, VPN traffic cannot be divided into multiple streams based on server IP address and port. This phenomenon is called the single flow problem
  2. Finding practical and robust features is a feasible way to solve single-flow problems. We note that previous studies mainly focused on the spatiotemporal characteristics of traffic. Furthermore, the behavioral characteristics of graph connections implied by traffic are often ignored. Only using traditional deep learning methods cannot quickly and effectively extract the connection behavior characteristics of flows from existing features.

Paper contribution

  1. A method to extract traffic behavior graphs from VPN encrypted traffic is proposed. It can transform the traffic classification problem into a graph classification problem. Through experimental verification, this method can effectively improve the classification accuracy of the model.
  2. A VT-GAT model based on graph attention network is proposed. To the best of our knowledge, this is the first model to implement VPN traffic classification using graph neural networks. VT-GAT combines the spatiotemporal characteristics of traffic with the behavioral characteristics of graphs to achieve classification, making up for the shortcomings of existing technologies. In addition, VT-GAT enhancement aggregates the features of adjacent nodes based on the graph attention mechanism, which improves the robustness of the model.
  3. A traffic graph data suitable for VPN encrypted traffic classification is proposed. A prototype system was implemented based on the VT-GAT model, and experiments were conducted on the latest released data set Datacon21.

The paper’s approach to solving the above problems:

A graph neural network model VT-GAT that integrates graph behavioral characteristics and spatio-temporal traffic characteristics is proposed to solve the above problems.

Thesis tasks:

Graph classification

1. Method

  • Traffic behavior graph construction

    Node feature extraction: CICFlowMeter

    • Aggregated characteristics : These characteristics are the overall characteristics of the traffic obtained in the network flow, including total duration, total number of packets, total packet length, etc.
    • Time features : mainly include original features and statistical features related to time, such as average sending interval time, total sending interval time, etc.
    • Statistical features : Statistics on packet size (excluding aggregation features), including the number of upstream packets per second, average packet length, standard deviation of packet length, etc.
    • Content characteristics : Characteristics of message content fields, including the number of FIN messages, the number of SYN messages, the number of ACK messages, etc.

    Edge construction method:

    1. Set a window TTT , and the sliding intervalM s M_sMs, and eventually n windows will be obtained
    2. Convection sequence PPP performs sliding detection. For example, under the initialization window, the obtained flow sequence isO = (O 1, . . ., O m) O=(O_1,...,O_m)O=(O1,...,Om)
    3. Traverse each stream in the sequence of streams within the window, with O 1 O_1O1For example, extract its (sip, sport) and (dip, dport), if the vertex set VVIf V does not include (sip, sport) or (dip, dport), it must be placed in the corresponding edge weight matrix and assigned a value of 1. If it exists, add 1 to the existing value.
    4. Under this sliding window, for the collected vertex set VVV , edge weight matrixDDD , edge setEEE , build the graph structureg 1 g_1g1. By analogy, all graph structures are obtained, g 2 , g 3 , . . . , gn g_2, g_3,..., g_ng2g3...gn
  • Model

    GATInsert image description here

2. Experiment

Insert image description here
Insert image description here

Summarize

tool

  • CICFlowMeter

data set

  • ISCX VPN-nonVPN
  • Datacon21

Guess you like

Origin blog.csdn.net/Dajian1040556534/article/details/132824793