A Lightweight IoT Cryptojacking DetectionMechanism in Heterogeneous Smart HomeNetworks

Network and Distributed Systems Security (NDSS) Symposium 2022, 24-28 April 2022

Abstract

Recently, cryptojacking malware has become an easy way to profit from a large number of victims. Previously studied cryptojacking detection systems focused on (in-browser) and (host-based) host-based cryptojacking malware.

However, none of these earlier works investigated different attack configurations and network settings in this context . For example, an attacker with an aggressive monetization strategy may increase computing resources to maximum utilization in order to gain more benefits in a short period of time, while a stealthy attacker may wish to stay on the victim's device for a longer period of time without Not to be found.

The accuracy of detection mechanisms may vary between aggressive and stealthy attackers. The monetization strategy, type of cryptojacking malware, victim's device, and network settings can all play a key role in the performance evaluation of a detection mechanism.

In addition, a smart home network with multiple IoT devices can easily be exploited by attackers to mine. However, no previous work has investigated the impact of cryptographic malware on IoT devices and compromised smart home networks.

  • This paper first proposes an accurate and efficient IoT cryptojacking detection mechanism based on network traffic characteristics, which can detect both (in-browser) and (host-based) cryptojacking .
  • We then focus on cryptojacking implementations on new device categories, such as IoT, and design several new experimental scenarios to evaluate our detection mechanism to cover the attacker's current attack surface. In particular, we test our mechanism in various attack configurations and network settings. For this, we use a network trace dataset consisting of 6.4M network packets.

The results show that our detection algorithm can achieve up to 99% accuracy with only one hour of training data. To the best of our knowledge, this work is the first study focusing on cryptojacking in IoT and the first to analyze various attack behaviors and network settings in the field of cryptojacking detection.

1. INTRODUCTION

background

Blockchain technology eliminates central authority and ensures the immutability of transactions on the chain through a consensus model based on computing power. This consensus model is called PoW proof of work and is used in blockchain networks like Bitcoin, Ethereum, and Monero. The PoW consensus algorithm depends on the computing power of the hardware, and solving hash-based problems requires a lot of energy, and this energy cost is one of the main expenses of mining operations.

In this ecosystem, cryptojacking is the act of using a victim's processing power without the victim's knowledge and consent. There are two main ways that attackers can misuse a victim's computing power for cryptojacking.

  • Inject scripts into websites.
  • Pass the mining program to the host.

With the popularity of cryptojacking attacks and the ability to target larger attack domains, IoT devices are increasingly becoming a target for attackers. However, IoT devices are usually resource-constrained, that is, attackers generally do not profit from individual devices, but use techniques such as botnet attacks to control IoT devices on a large scale and enable them to represent attack or mining.

  • Mirai carried out a massive distributed denial-of-service attack (DDoS) in 2017, in which attackers used the network to counterfeit bitcoins and turn the botnet into a crypto mining pool.
  • Recently, LIQUOR IoT, another botnet inspired by Mirai, also started mining Monero on IoT devices.

While the IoT industry and the capabilities of IoT devices continue to evolve, it also provides more room for attackers to expand their attack surface.

A few reasons why IoT networks offer benefits over non-IoT networks (regular computers, servers):

  1. The diversity of vendors, communication protocols, and hardware makes it difficult to develop a standard/unified defense solution.
  2. Due to limited capabilities and resources, IoT devices are not easy to configure. IoT devices are always in communication with the cloud for real-time and remote access capabilities, which provides another attack surface for attackers to spread and maintain botnets.
  3. IoT devices have security flaws (e.g., default passwords, no authentication) due to their rapidly growing market and lack of security awareness, making them ideal targets for cryptojacking malware attackers.

In this paper, we focus on IoT devices and our goal is to design a detection system capable of detecting browser and host-based IoT cryptojacking malware .

Detecting IoT cryptojacking is challenging since most IoT devices are not allowed to be programmed to harvest hardware-level features or browser-specific features. However, functions based on network traffic can be collected in a unified interface on the router, that is, no programming is required to modify the device at all. Therefore, in this paper, we use network-based signatures to detect IoT cryptojacking malware and propose an accurate, lightweight and easy-to-implement cryptojacking detection system that can detect two types of cryptojacking attacks .

We conduct a series of experiments to design and evaluate an optimal IoT cryptojacking detection mechanism.

  • First, we conduct experiments to find the best-ranking features, the most accurate classifier, and the optimal training size.
  • We then evaluate the effectiveness of our IoT cryptojacking detection mechanism through 12 experiments designed to evaluate various attacker behaviors and network settings.

To this end, we safely implement cryptojacking malware on IoT devices, laptops, and servers.

contribute

  • We propose an accurate and effective cryptojacking detection algorithm for IoT networks. Since we use network traffic-based features, our algorithms are able to detect browser and host-based cryptojacking malware without relying on the cloud or the device.
  • To evaluate our algorithm, we design several novel experimental scenarios. We evaluated various attack configurations (e.g., cryptojacking types, monetization strategies, devices, and throttling values) and network settings (e.g., fully or partially compromised). This paper is the first to analyze various attack strategies and network settings in the field of encrypted attack detection.
  • To overcome some practical issues in implementing cryptojacking malware on IoT devices, we use new techniques that can be improved upon in other future studies.
  • To accelerate research in this area, we release the dataset and code.

https://github.com/cslfiu/IoTCryptojacking2

Summary of Study Findings

In addition to our lightweight and high-accuracy IoT detection mechanism, we conducted extensive experiments to evaluate various attacker behaviors and network settings, which led to some interesting results worth noting:

  • We find that the highest malicious packet generation rate is 72% lower than the lowest packet generation rate of the benign dataset given in Table III . This suggests that encrypted malware does not generate as many packets as everyday web browsing and application data.
  • We found that while in-browser malware uses evasive techniques such as throttling CPU and minimizing network traffic , host-based malware attempts to manipulate devices with maximum computing power .
  • We observed that encryption attackers were more likely to be detected during attacks against server type devices than other device types (i.e. laptops and IoT) .
  • We find that the malicious scenario with a stealthy strategy (i.e. 10% throttling) is less accurate than the robust (i.e. 50% throttling) and aggressive (i.e. 100% throttling) attack scenarios. This means that the attacker's obfuscation method can still make a difference during the detection phase.

Throttling: Control water flow like a valve to avoid excessive flow per unit time

2. BACKGROUND

A. Cryptocurrency Mining

Cryptocurrency mining, the process by which new cryptocurrencies enter circulation, is a key component in the maintenance and continuity of the distributed blockchain ledger. The immutability of the blockchain network is provided by the consensus mechanism, i.e. cryptocurrency mining.

Cryptocurrency mining is a laborious, expensive process, and miners' rewards depend on an element of luck. The work-based consensus mechanism benefits from the diffusion properties of the hashing algorithm to prevent miners from predicting the hash value in an asymmetric pattern,

B. Types of Cryptojacking

This section explains the details of different types of cryptojacking malware and their similarities and differences.

  • In-browser cryptojacking: Attackers leverage technologies such as JS libraries and Wasm to implement in-browser cryptojacking malware.
  • Host-based cryptojacking: The attacker hides himself in the computer system of the victim's host and mines cryptocurrency.

C. Machine Learning Tools

  1. Feature extraction and selection tools: Feature extraction is a dataset size reduction operation in which a dataset is reduced to a more manageable and usable form for processing. After feature extraction calculations using dtsfresh , the significance levels (P-values) are ranked and correlation tables are built to eliminate less important features and improve the classification process.
  2. Machine Learning Classifier: Classification is a technique for determining the category of a dependent variable based on one or more independent variables. We used several different classification models (e.g. Logreg, KNN, SVM, RF) to train our model and obtained accurate results in this paper.

3. ADVERSARY MODEL AND ATTACK SCENARIOS

We evaluate 7 attack cases and conduct 12 discrete experiments to test the cryptojacking detection mechanism proposed in this paper. In this section, we explain how IoT devices can be targeted by malware and how we track these adversaries in our experiments.

A. Cryptojacking with Service Providers

Attackers usually exploit code injection vulnerabilities in web pages and web applications to inject mining scripts provided by service providers. Over the past year, the Internet of Things framework has developed rapidly. Attackers merge these framework functions with known vulnerabilities and exploit these vulnerabilities to run their malware in these devices.

We implemented the WebOS IoT cryptojacking malware using LG's WebOS development framework [40] and developed a basic WebOS application that invokes a cryptojacking script when the user starts running the application.

In order to be able to create a stable and controlled cryptojacking environment, we prepared a website under a controlled server and hosted several different cryptojacking scripts. We have chosen Webmine [41] as our main service provider. We ran the scripts using combinations of different levels of computing hardware to observe the characteristic results of these scripts.

B. Cryptojacking using command and control (C&C) servers

C&C refers to: the enemy uses the computer to send instructions to the device . For identity security reasons, attackers typically host these servers on cloud-based platforms.

Figure 1 shows the basic configuration of a C&C service connected to a mining pool. In the cryptojacking domain, C&C servers work as a subset of mining pools, receiving tasks from mining pools and distributing tasks to IoT devices.

Please add a picture description

In this paper, we focus on the communication pipeline between the attacked device and the C&C server. To demonstrate the process and data communication in this setup, we created a C&C server that sends mining tasks between different time periods. This time frequency can be changed according to the block frequency of the blockchain network. We successfully implemented this scenario using LG WebOS [40] smart TV and test platform.

4. IOT CRYPTOJACKING DETECTION VIA NETWORK TRAFFIC

Network traffic classification and identification technology has been widely used in the past few years, and the creation of user or device profiles on the server side and the local network side is a well-known technology.

In this article, we consider a smart home network setup where many IoT and non-IoT devices are connected to a router in order to be able to connect to the internet. Each device can be identified by its MAC address. Therefore, we define a device in the network as (MAC 0 , MAC 1 , … , MAC n ) (MAC_0, MAC_1, …, MAC_n) of a given device in the network(MAC0MAC1MACn)

We hypothesized that one or more devices in this network were compromised by an attacker to perform cryptocurrency mining on behalf of the attacker, and our aim was to detect the device doing so by monitoring its network traffic over a certain period of time .

For this, we use machine learning algorithms, which were previously trained on malignant and benign data. Devices generate continuous network traffic that needs to be converted into a data format, and machine learning algorithms can predict whether a device will function or not.

We filter each packet with the following filter before converting it to the correct format.

( M A C s r c = = M A C i )    O R    ( M A C d s t = = M A C i ) (MAC_{src}==MAC_i)\;OR\;(M AC_{dst}==MAC_i) (MACsrc==MACi)OR(MACdst==MACi)

Then, extract the following metadata from each packet:

P k t i = [ M A C i ,   t i m e s t a m p ,   p a c k e t    l e n g t h ] Pkt_i= [MAC_i,\, timestamp,\, packet\,\, length] Pkti=[MACi,timestamp,packetlength]

At the end of the process, we have a range of packet lengths arriving at a given time for each device. Finally, we use the 10 data packages to compute the features and use these features to train/test the machine learning algorithm.

5. DATASET COLLECTION

The data we focus on in this article is network communication data between IoT devices and cryptojacking service providers. In this section, we explain the main dataset collection and creation process by focusing on the topology, tools, methodologies, and other implementation details we use in an IoT environment.

A. Topolpgy

Detect unauthorized mining behavior of controlled devices in the network.

Please add a picture description

  1. A regular smart home network, with all devices going through a single internet router.
  2. In this network, there are computers dedicated to collecting all Internet traffic using port mirroring and ARP rerouting.
  3. Compromised devices in the network connect to cryptojacking service providers or malware C&C servers to accept tasks and return calculation results.

B. Devices

insert image description here

We conducted experiments on four different devices representing different computing capabilities. Raspberry Pi and LG Smart TV represent IoT devices in real-world networks, while laptops represent regular devices, and Tower servers represent computing-powerful devices. Table I shows the devices we used in our experiments and their specifications.

Furthermore, in the given topology in Figure 2:

  • Use TP-link Archer C7 V5 as router.
  • Use Ettercap to manipulate the ARP protocol and forward network traffic to the IP address of the data collection computer.

With this network configuration, we were able to collect all network data using Wireshark packet collector and analyzer.

C. Implementation Methodology

Implementations of in-browser and host-based cryptojacking differ in several ways. In the next subsections, we explain the details of its implementation.

Implementing In-browser Cryptojacking

To be able to perform browser cryptojacking in a safe environment, we launched a basic WordPress [49] web page containing several different malware. We placed different HTML-based malware samples in the source code of different pages of the test website. We connect these pages with test devices and collect at least 12 hours of network traffic data for each use case scenario, as described in Section III. We used scripts distributed by the Webmine.io and WebminePool [50] service providers for in-browser cryptocurrency mining.

Implementing Host-based Cryptojacking

Implementing host-based cryptojacking on Raspberry Pis, laptops, and Tower Servers is easy. We downloaded the cryptocurrency mining binary MinerGate V1.7 and let it run.

However, doing this on an LG Smart TV is more difficult as the device must be able to execute the binary MinerGate file. Therefore, we developed a base application using the LG WebOS framework, which runs cryptojacking malware whenever the program is running. We use a cloud server configured with 1 GB RAM, 1 Core CPU, and UbuntuServer 18.4. After we created the malicious program running in the WebOS supported Smart TV and C&C server, we implemented two different models for the actual mining process as follows:

  1. Disconnected Mining Pool: We made the first implementation using the RandomX PoW algorithm [3], [52]. When the application is activated, it sends a connection request to the C&C server, after which the C&C server sends mining tasks to the malicious application. Mining tasks contain three variables: hashrate, nonce value range, and difficulty target. The implementation keeps mining until the C&C server issues a new command or finds a hash and nonce that meets the difficulty target.
  2. Connecting mining pools via API: The difference between this implementation and the previous one is that instead of creating mining tasks itself, the C&C server receives these tasks from the mining pool through its API framework and sends them to the malicious application.

The only difference between these two methods is the entity that creates the mining task (C&C server vs mining pool). For our dataset, we used Ant mining pool [53] and collected data flow between smart applications and C&C servers.

D. Labeling

When Wireshark collects all network data, we use all the attack scenarios we presented in Section III to collect network data. Network traffic generated by devices performing mining was marked as malicious, while data sets collected by devices not performing encryption or currently mining were marked as benign.

E. Initial Data Analysis

We devised different scenarios to collect malicious data using our controlled environment setup to evaluate various configurations an attacker might use and possible network setups in a real-world smart home environment. See Section VI for more details on configuration and results.

On the other hand, for the first set of experiments, we downloaded a benign dataset from a public repository [54], and for the second set of experiments, we collected our own benign dataset for the same set of devices that we used for malicious data collection . The complete details of the dataset are shown in Table II, Table III and Table XI.

Please add a picture description

benign vs. malignant

The main goal of this paper is to be able to distinguish between malignant and benign web data. To this end, we performed some preliminary data analysis on the cryptojacking network data and listed the results as follows:

  • Packets per second (PPS) rate: is an important statistic to differentiate malicious data from benign data. As we can see from Table II, when the most powerful device we used ran the binary cryptojacking malware, it produced the highest PPS rate. However, the highest malicious PPS rate is still 72% lower than the lowest PPS rate of the benign dataset given in Table III. This suggests that cryptojacking malware does not generate as many packets as everyday web browsing and application data. This is an important challenge both in the data collection and analysis phases. We discuss this challenge in detail in Section VII.
  • Average Packet Size (APS) Rate: is the average size of all inbound and outbound network packets. The highest malicious PPS rate was created by a Raspberry Pi when mining in-browser with webmine.io, but the highest APS rate for malicious data was still 35% lower than the lowest APS rate for benign data.

Host-Based Cryptojacking vs. Browser-Based Cryptojacking

In order to be able to see the different patterns produced by different devices under different attack scenarios, we performed browser and binary cryptojacking on all devices used in this paper, and summarized the results in Table 2. We can summarize our observations as follows:

  • In-browser mining: Usually generates very small amounts of PPS rate and APS rate.
  • For different service providers for in-browser mining: There is no significant difference between Wembine.io and WebminePool.
  • Binary Mining Mode: A feature that does not appear to be used by in-browser mining applications.
  • For in-browser mining mode and binary mining mode: We can observe that in-browser mining always generates a small amount of network traffic. There is no significant correlation between hardware power, PPS rate, and APS rate. Binary mining, however, reveals a completely different pattern, where APS and PPS rates are directly related to device power (raspberry pi vs. server) .

Raspberry Pi vs. Laptop vs. Server

Finally, we performed device-specific analysis:

  • All devices provide nearly identical PPS and APS results for In-browser applications.
  • For all devices performing binary mining (host-based), we observed that the binary cryptojacking malware is correlated with the power of the victim host system, and the PPS and APS rates are also directly affected by the power of the victim host system.

In summary, we found that in-browser malware attempted to remain stealthy and reduce high data-intensive communications when viewed in terms of network traffic, but host-based software generated significant network traffic . This is because host-based cryptojacking malware often runs in conjunction with other computationally intensive applications.

6. EVALUATION

In this section, we design four sets of experiments to design and evaluate IoT cryptojacking detection mechanisms that are accurate, efficient, and applicable to different configurations and network settings:

  • First, we conduct a set of experiments to design an optimal IoT detection mechanism with high accurate prediction rate and minimum training size and time.
  • Second, we conduct experiments to evaluate the detection mechanism we devised in the first section for different configurations such as different devices.
  • Third, we conduct a set of experiments to evaluate the proposed mechanism in various smart home network settings.
  • Fourth, we conduct a set of experiments to evaluate the sensitivity of the proposed classifier.

A. Designing an IoT Cryptojacking Detection Mechanism

After the dataset collection and labeling process, we created a full dataset consisting of a malignant dataset and a benign dataset with an equal number of packets. Table IV gives the size of the dataset and the total feature extraction and classification time for the whole dataset.

In this subsection, our goal is to design an IoT detection mechanism based on network traffic characteristics and use a machine learning (ML) classifier for classification.

To do this, we performed the following steps:

  • We use feature extraction to create feature vectors from the original dataset.
  • We select the best features and remove irrelevant features through feature selection algorithm.
  • We train and test several ML classifiers and decide which classifier performs best.
  • We test the best algorithms with different training sizes to optimize the training data and time, and calculate the prediction time to evaluate the feasibility of the algorithms in practical applications.

1. Feature extraction

We use tsfresh [37] to extract features from the dataset. The tsfresh library is a python package that automatically computes statistical features from time series data. In our case, we used 10 packets for each feature vector, which computed 788 different statistical features, such as timestamps and packet lengths.

2. Feature Selection

Feature selection is the process of selecting a relevant subset of features for our model. All features in the dataset do not have the same correlation score. In order to be able to refine the extracted features and use only the most relevant ones, we calculated P-values. The P-value is the probability of an outcome that is more extreme than the sample observations obtained when the null hypothesis is true. If the P value is small, it means that the probability of the occurrence of the null hypothesis is very small , and if it occurs, according to the principle of small probability, we have reasons to reject the null hypothesis. The smaller the P value, the more reason we reject the null hypothesis. We find statistically significant features across 290 datasets and train our model with these features. We repeated the same process for the rest of the experiments.

3. Classifier selection

We implemented four machine learning classifiers to test the accuracy of the features described in the previous subsection. During the implementation of these classifiers, we used 75% of the data for training and 25% for testing the classifiers. In the first three sets of experiments, we used scikit-learn's default parameters [57], while in Section VI-D3, we tested non-default parameters.

We used 5-fold cross-validation (CV) to evaluate the effectiveness of the classifiers, and used accuracy, precision, recall, F1 score, and ROC as metrics for all our experiments.
insert image description here
SVM is a useful and well-designed supervised machine learning classifier that works very well when there are sharp separation boundaries . Also, SVM classifiers are very stable and small changes in the dataset do not lead to important changes in the results. Therefore, we decided to use the SVM classifier in the rest of this paper for further analysis and implementation of other use case scenarios.

4. Training size and time

In this section, we conduct experiments with different training sizes. Through this experiment, we analyze the impact of dataset collection time on classification accuracy and overall classification time.

To obtain reference results, we first fit the time of a representative dataset to 12 hours by reducing the size of the original dataset , then fit it to 12 hours, 6 hours, 3 hours, and finally 1 hours, and then repeat the classification to measure accuracy and time-based values ​​at each training size. Figure 3 summarizes the 4 different results covered in this section:
Please add a picture description

  • Accuracy: After fitting our model for 1 hour by reducing the data set, the accuracy did not drop below 94%. It is shown that our model does not depend extremely on the size of the dataset and it can give accurate results even when the data collection time is longer.
  • Prediction time for per feature vector: The time required for each feature vector to predict the category. Experiments show that the time required to train feature vectors is related to the size of the dataset. For larger datasets, more time is required to evaluate each. However, after feature extraction, each vector reduces the time by 100-150ms and is optimized.
  • Feature extraction and classification time: Feature extraction and classification time represents the time required for each dataset to compute features and classify those features. As shown in c and d, this time is directly related to the dataset size. However, we got near-perfect results in a very short time.

We can conclude that:

  • We implement a successful detection system without incurring significant overhead inside the device or network.
  • We can use a slightly smaller dataset to train our model without sacrificing the accuracy and trustworthiness of the dataset.

B. Evaluation With Different Adversarial Behaviours

In this subsection, our goal is to evaluate the IoT cryptojacking detection mechanism we designed in Section VIA using various attack configurations.

Attackers can choose different profit strategies for different victim devices, or choose browser-based or host-based encryption hijacking types. We evaluate our mechanism by conducting a comprehensive set of experiments testing these three configurations.

All scenarios and experiments are implemented using the same feature extraction and selection process described in Section VI-A. This approach to implementation allows us to observe the results of how to effectively use a feature set for different use case scenarios.

We created a balanced dataset for the three scenarios to minimize the impact of the unbalanced dataset problem . Table VII presents the dataset sizes and sources we used to implement these three scenarios. We use SVM classifier for the model training process.
insert image description here
insert image description here

  • In Scenario 1 ( testing different types of device setups ), we managed to achieve almost perfect scores from all three experiments. However, servers showed the highest accuracy, i.e., in attacks against server-type devices, cryptojacking attackers had a higher probability of being detected.
  • In Scenario 2 ( testing different profitable strategies ), the malicious scenario with covert strategy (i.e. 10% throttling) and robust strategy (i.e. 50% throttling) were both less accurate than the aggressive (i.e. 100% throttling) scenario. While an accuracy value of 87% or 91% is still considered very high, it also means that an attacker's obfuscation method can still make a difference in the detection phase.
  • In Scenario 3 ( testing malware types ), the in-browser malware results were only slightly worse than the host-based cryptojacking case. Although cryptojacking malware has the ability to compromise different devices, our proposed malware detection system needs to be able to detect the ongoing cryptojacking process without relying on any device.

We see from the results of three scenarios and eight discrete experiments that our extracted features can achieve near-perfect scores without relying on any device.

C. Adversarial Models of Compromised Device Numbers inSmart Home Network

In this section, we investigate different adversarial models simplified in a smart home environment simulation network.

We implement four different scenarios and present their results in the rest of this section:
insert image description here

  1. Scenario 4 (Complete Compromise of Devices): In this scenario, the virus attacker exploits all the devices in the smart home environment to attack. This scenario may apply to several network-based attacks. For this experiment, we used the entire dataset.
  2. Scenario 5 (Partial Device Compromise): It shows that the attacker uses two different devices of different categories (here IoT and Laptop) to carry out different cryptojacking attacks. While IoT devices are compromised by host-based cryptojacking attacks and performing binary mining operations, laptop devices are at risk from in-browser cryptojacking attacks. In this case, it is likely that two discrete attacks performed by different malicious entities need to be considered . In this case, however, both devices use the same gateway (e.g. router, ADSL modem, Ethernet port) for internet communication.
  3. Scenario 6 (single device compromised): When an attacker injects malware using a specific vulnerability, only one or very few devices may be exploited by the specific vulnerability. It is difficult to detect a rogue device when only one device in the network is compromised by the attacker. In this scenario, our goal is to test out that only one IoT device is compromised .
  4. Scenario 7 (Compromised IoT): In this scenario, we discuss a situation where two IoT devices from different domains are exploited by two different types of cryptojacking malware. To be able to simulate this environment, we used an LG WebOS smart TV exploited to host a host-based malicious application, and a Raspberry Pi exploited by a malicious webpage to mine in the browser.

insert image description here

The results of the last two scenarios are important because they reflect most of Mirai [13] and other known IoT botnet [58] attack scenarios. Our results show that the scenario where the device is fully compromised is the most likely to be detected by our detection mechanism, while a relatively high accuracy (>92%) can also be achieved when only the IoT device is compromised. This means that our implemented detection model and feature set can successfully detect various home environment attack scenarios. Overall, the combination of our chosen classifier and feature set successfully detects cryptojacking malware with high accuracy.

D. Classifier Sensitivity Evaluation

In this section, we conduct more experiments to test the sensitivity of the classifier.

To validate the results in the previous section with our own dataset, we repeated the same scenarios (Scenarios 1-7), and the dataset sizes and results are given in Tables XVI and XVII in the Appendix.

Furthermore, to test the sensitivity of the classifier, we designed three additional experiments : 1) imbalanced dataset, 2) transferability, and 3) non-default parameter experiments. In the remainder of this section, we explain the details of these experiments.

  1. Scenario 8: Imbalanced dataset. We created the following datasets to test the imbalanced dataset scenario :
    insert image description here

Table XIII shows the performance of our detection system on different scenarios of imbalanced datasets. Our trained model was able to detect and correctly classify all imbalances with an overall accuracy of 98%.

  1. Scenario 9: Transferability of classifiers. This scenario is mainly to test the transferability of our model and see how our detection system is resistant to new attack surfaces . Therefore, we train and test with different malware.
    insert image description here

From Table XV, we can see that our proposed detection system can detect cryptojacking malware without relying on any platform and service provider, and these results validate that our machine learning-based detection system can provide effective protection.
3. Scenario 10: Experiment with non-default parameters. To test the SVM classifier , we tuned three parameters: kernels, Regularisation parameter (C), Gamma.
- Kernel is the main function to transform low-dimensional data into high-dimensional data.
- The regularization parameter is used to adjust the penalty parameter between the decision boundary and the classification error.
- When the gamma parameter is high, nearby points will have a higher influence.
insert image description here
Our results show that different variables can significantly change the results of SVM classifiers . Using the default parameters, the classifier trains the model with an average score of 87%. However, after we varied the parameters and calculated all possible 15 combinations, we got various training scores between 0.52-0.89. Also, some parameters lead to overfitting of dataset classification. However, we note that our cryptojacking detection mechanism is highly configurable and can be customized as needed .

7. DISCUSSION

8. RELATED WORK

9. CONCLUSION

  • In this paper, an accurate and effective cryptojacking detection mechanism is proposed based on features extracted from network traffic.

  • Our mechanism is capable of detecting both in-browser and host-based cryptojacking malware. We used one hour of network traffic data to train a machine learning classifier and achieved 99% detection accuracy.

  • We also design novel attack scenarios to test our mechanisms in attack configurations and home network settings.

  • Additionally, we analyzed cryptojacking attacks on several different platforms to understand the efficiency of our detection mechanism.

  • We show how different configurations an attacker might use and different network settings where mining will be performed affect detection accuracy.

  • Furthermore, we openly share our collected network traffic and code to accelerate research in this area.

Guess you like

Origin blog.csdn.net/Sky_QiaoBa_Sum/article/details/127529428