【GNN+加密流量C】MEMG: Mobile Encrypted Traffic Classification With Markov Chains and Graph Neural Network

Introduction to the paper

Original title :
Chinese title :
Publication conference :
Publication year :
Author :
latex citation :


Summary

In recent years, user privacy and information security have received widespread attention, and the encryption rate of mobile traffic has increased significantly, posing considerable challenges to traditional traffic classification methods. Machine learning methods and deep learning methods have become mainstream methods to solve this problem. However, existing machine learning methods require manual features and cannot adapt to newly generated traffic patterns. Deep learning methods can automatically learn features from raw traffic sequences, but increase computational costs. To address these challenges, this paper proposes a mobile encrypted traffic classification method based on Markov chains and graph neural networks (MEMG). We use Markov chains to mine the hidden topological information of flows. Then the flow graph structure is constructed on this basis, and the sequence information of the flow is added to the node characteristics of the graph. We also design a graph neural network-based classifier to learn topological and sequential features from graphs. This classifier can map graph structures to embedding space and classify different graph structures through differences in embedding vectors. We conduct comprehensive experiments on both real and public datasets. The real data set contains the traffic of 29 commonly used mobile encryption applications that we have recently collected, with a total traffic of more than 116,000. Our method achieves 6.1% and 3.5% higher accuracy than state-of-the-art methods on our dataset and public datasets, respectively. We also reduced training time overhead and GPU memory usage by 40% and 46% respectively.

Problems

Previous Markov methods build category-level topology by analyzing all sample information of the same category in the training set. The test samples are then classified using maximum likelihood probability. This approach fails to capture the sequence information of a single stream, which proves to be crucial information

Paper contribution

  1. A graph structure representation of traffic, MarkovGraph (MG), is proposed, which captures the hidden topological information and sequence information of the flow. We have experimentally demonstrated the superiority of MG in reducing computational consumption and time overhead.
  2. A GNN-based classifier containing graph convolutional network (GCN) is designed to extract topological features. In addition, Multi-Layer Perceptions (mlp) is used to learn sequence features and fuse sequence features with topological information to reduce the bias caused by a single feature and improve classification performance.
  3. More than 116,000 real-world traffic data sets were collected from 29 applications across the campus network. We demonstrate the accuracy and efficiency of MEMG on our dataset and public datasets. Compared with state-of-the-art classification methods, MEMG has better classification accuracy, minimal training time, and minimal computational resource consumption.

The paper’s approach to solving the above problems:

Use gnn to extract topological information and sequence information

Thesis tasks:

Graph classification

1. Graph structure abstraction for encrypted traffic flow

Markovgraph build:

definition:

  • Number of samples: N
  • Number of tags: M
  • x i x_i xi:第i亪流,xi = [ p 1 i , p 2 i , . . . , plii ] x_i = [p_1^i,p_2^i,...,p_li^i]xi=[p1i,p2i,...,plii] 0 < i < N 0<i<N 0<i<N
  • p a i p_a^i pai: i-th stream, a-th packet
  • Y i Y_i Yi: The label of the i-th stream, 0 < Y i < = M 0<Y_i<=M0<Yi<=M

Status series transfer:

Assuming MTU=1500 bytes, get the length of the first 100 packets in each stream

  • State set : { S 1 , S 2 , S 3 , . . . , S 10 } \{S_1,S_2,S_3,...,S_{10}\}{ S1,S2,S3,...,S10} ,S i = [ i − 1 , i ∗ 150 ] character S_i=[i-1,i*150] characterSi=[i1,i150 ] bytes , for example, if the length of a data packet is 200 bytes, then he belongs toS 2 S_2S2
  • State transition matrix : WWW , the shape of this matrix is ​​[10,10]
  • Initial vector : the first packet in the flow
  • Maximum length of status sequence : 100

Nodes and node characteristics:

  • Node : each state in the Markov graph
  • Node characteristics : Since each node contains multiple packets, it is difficult to collect sequence information for each packet. Therefore we perform a slicing operation on the packet status sequence, and the context of each packet is the first n packets and the last n packets of the status sequence (we make n = 2 in our experiments). Then, the subsequences around the central data packet are used as the context of the data packet to describe partial sequence information, and a recurrent neural network is used to compress the context of all data packets in the same state; finally a p-dimensional vector is formed (we use p = 128) to represent node characteristics.

MEMG model:
Insert image description here

2. Experiment

Comparative Experiment:
Insert image description here
Insert image description here

Summarize

Advantage

Combine Markov model to construct graph structure + use context to describe node characteristics + jump knowledge network to reduce the size of graph structure

data set

  • Mampf: Encrypted traffic classification based on multi-attribute markov probability fingerprints

Readable citations

Papers comparing relevant models of experiments

  • Deep fingerprinting: Undermining website fingerprinting defenses with deep learning(DF)
  • Robust smartphone app identification via encrypted network traffic analysis(Appscanner)
  • Adaptive encrypted traffic fingerprinting with bidirectional dependence(BIND)
  • Website fingerprinting at internet scale(CUMUL)
  • Mampf: Encrypted traffic classification based on multi-attribute markov probability fingerprints(MaMPF)

Traffic collection:

  • Robust smartphone app identification via encrypted network traffic analysis

Guess you like

Origin blog.csdn.net/Dajian1040556534/article/details/132848656