website_fingerprinting use items

website_fingerprinting

Currently this project supports the following models:

  • Deep Fingerprinting

  • SDAE

  • LSTM

  • CNN

The remaining two are statistical machine learning models: [Currently these two models are not well adapted, but the feature extraction inside is effective]

  • CUMULATION

  • AppScanner

Instructions

data preparation

First of all, you need to prepare the data format: you
Insert picture description here
need to organize the network traffic into the above 6 files, and place them in the same directory with the file name as above.

X_train_pkt_length.pkl : 包长序列,训练集。
X_valid_pkt_length.pkl : 包长序列,验证集。
X_test_pkt_length.pkl : 包长序列,测试集。
y_train_pkt_length.pkl : 流量标签,训练集。
y_valid_pkt_length.pkl : 流量标签,验证集。
y_test_pkt_length.pkl : 流量标签,测试集。

Among them, X_*_pkt_length.pklis a numpy matrix saved using pickle.save(), and its shape is m × lm\times lm×l . Where m is the number of samples, l is the length of the packet length sequence, and the packet length sequences ofall samples in the same data set need to be filled to the same length.
y_*_pkt_length.pklIt is also a numpy matrix saved by pickle.save(), its shape ism × 1 m\times1m×1 , m represents the number of samples, and the i-th element is an integer, which represents the label of the i-th sample of the corresponding training set, validation set, and test set.
The saving of the data set requires steps similar to the following:

        with gzip.GzipFile(path_dir+"/"+"X_train_"+feature_name+".pkl","wb") as fp:
            pickle.dump(X_train,fp,-1)
        with gzip.GzipFile(path_dir+"/"+"X_valid_"+feature_name+".pkl","wb") as fp:
            pickle.dump(X_valid,fp,-1)
        with gzip.GzipFile(path_dir+"/"+"X_test_"+feature_name+".pkl","wb") as fp:
            pickle.dump(X_test,fp,-1)

        with gzip.GzipFile(path_dir+"/"+"y_train_"+feature_name+".pkl","wb") as fp:
            pickle.dump(y_train,fp,-1)
        with gzip.GzipFile(path_dir+"/"+"y_valid_"+feature_name+".pkl","wb") as fp:
            pickle.dump(y_valid,fp,-1)
        with gzip.GzipFile(path_dir+"/"+"y_test_"+feature_name+".pkl","wb") as fp:
            pickle.dump(y_test,fp,-1)

The number of samples of the packet length sequence of the training set needs to be equal to the number of samples of the traffic label sequence of the training set.
The number of samples of the packet length sequence of the verification set needs to be equal to the number of samples of the traffic label sequence of the verification set.
The number of samples of the packet length sequence of the test set needs to be equal to the number of samples of the traffic label sequence of the test set.

The project provides an example data set app_dataset, which is a 55-category data set. The packet length of each sample is 1000. If it is insufficient, it will be filled with 0. If it exceeds 1000, it will be truncated.


Modify the data directory

After preparing the data according to the above steps, you need to modify the data directory.
Modify website_fingerprinting/data_utils.pythe NB_CLASSESvariables in the file and the default number set directory dataset_dirvariables.
The NB_CLASSESvariable is the number of different labels in the data set.
dataset_dirIs the directory of the default data set

Insert picture description here


Configuration model

Before running the model, you need to modify their configuration files.
Currently, the configuration file of each model is in a directory named after the model name:
Insert picture description here
for example, for the Deep fingerprinting model, its configuration file is df_model_config.py in the df directory.
Insert picture description here
Modify the model file: modify the number of categories in it and the length parameters of the packet length sequence. The parameters that need to be modified are marked in each mode file.
Insert picture description here

Run the model

Insert picture description here
Run X_example.pytraining model, where X can be df, cnn, lstm, sdae.
Run X_eval.pyto test the model, where X can be df, cnn, lstm, sdae.

For example:
the data set comes app_dataset operation df_example.pyresult is:
Insert picture description here

Guess you like

Origin blog.csdn.net/jmh1996/article/details/109055781