The use of behavior-based graph mining compression-based malware detection technology

Summary - behavior-based detection methods commonly used to deal with the threat through the static obfuscated malicious software. This method is generally used to represent the process or the system of FIG behavior, usually based on the frequency of the mining technology to extracting characteristic of malware pattern data from FIG. Recent studies have shown that the molecular field of mining, mining algorithm based on FIG. Frequency sub-optimal method usually looks for a high degree of recognition mode. We propose a malware detection method, a method of compressing quantized data mining based on the flow diagram, to obtain a higher detection accuracy of the model. Our experiments on a large set of malware diverse data sets show that our method of model-based detection than a frequency higher than 600% detection of validity.

Key words - malware detection, quantification data flow analysis, data mining, graph mining, machine learning

I. Summary

Evil intention software is still the biggest IT 's security threats, there are thousands of variants appear every day, resulting in a loss of billions of dollars per year. As the developers of malicious software has become a profitable business model [1] [2] , today's malware background is very complex, and will take advantage of a variety of confusion and anti-debugging techniques [3] [4] . This poses a threat to traditional signature-based detection because polymorphic malware usually automatically creates confusion binary looks totally different brothers processes.

As a response, behavior-based malware detection technology has been greatly developed in the past decade. Static detection of different analysis and detection based on behavior detection method does not use the malware binary file, but by learning malicious software, and then detects typical malware behavior. Currently, there are a popular behavior model using a system call to FIG. [5] [6] [7] [8] [9] [10] [11] or resource dependent [12] [13] . The most common use of such methods to detect malware behavior model graph-based, is to get the typical malware by scanning the unknown map patterns of behavior (ie, the sub Legend: as of brevity, we require the term mode and sub-graphs, and terminology systems and can call Windows API call swap). These models warehouse, or is manually specified, or else use graph mining to extract from known malware figure.

The core idea is to determine the graph mining in a training set, a map shared by a number of identification modes: helps to accurately map the known samples of malicious and drawing samples of benign mode of separation. Most FIG mining to identify malware methods are from the perspective of a frequency to determine whether it is useful model [5] [6] [7] [8] [9] [10] [11] [14] [15] . This means that a model usefulness depends on how often it appears in the analyzed samples, regardless of the mode of other attributes. Therefore, detection is usually based on the behavior of the graph mining algorithm malware frequency, such AGM [16] , gFSG [. 17] or gSpan [18 is] .

Recent molecular excavation (FIG) field obtained showed, as compared with a so-called mining based compression, based on the frequency output of mining generally less interesting (?) , So discrimination is more obvious pattern [ 19] . Compared with frequency-based mining, the compression-based mining did consider the complexity of the structure of a pattern, to decide whether it will be useful. They by account for (?) A set of graphic pattern contractility mining to achieve. In other words, a compression mining concentrate most chart patterns, even if there is a lower frequency than another simpler model has a more limited compression capacity, may still be simpler but higher frequency than a pattern occurs with a better discriminative. (To avoid confusion, please note that "compression" should be understood in an intuitive way, rather than in information theory: We usually think, "compressed" lossy.)

To our knowledge, compression-based graph mining the effectiveness of malware detection to date, he has not been concerned about. We saw good reason to believe that, in view of the molecular mining areas will be migrated to malware analysis. This assumption is confirmed by the results we obtained in a preliminary study, where we use advanced mining technology based on the frequency of the data stream obtained from a large number of malicious software samples [18] . The resulting pattern, even though in principle there is a difference in malware detection and is valid, but it is completely Some very simple action, such as reading system library functions, or write registry entries. The use of such a simple model used in malware detection is problematic because they may: A) is very sensitive to changes in the analysis of malicious software family, b) in the same reason, relatively easy to circumvent [20] , c) may be miss some of the more important and more sophisticated malware specific patterns of behavior, such as self-replication.

Therefore, we propose to use a behavior-based graph mining method to extract the compressed mode. Literature [13] Based on the behavior of the malware detection model, malicious software behavior expressed as quantitative data flow graph ( QDFGs ), we demonstrate, using an algorithm based excavated compressed mode, than pure based mode frequency method of mining, in the superior malware detection rates. There is further, consider QDFGs coded quantized data stream to determine FIG compression level is calculated using the FIG structural properties than the compression factor will produce better results.

problem. We solved looking at malware behavior figure interesting question patterns, they have a sufficiently strong degree of recognition, at a reasonable cost of mining, to provide high detection accuracy. In particular, our goal is to model effective concept, like that described in the related work, as a measure of the efficiency with better results than the simple detection regardless of mode frequencies.

Solution. In order to tap malware behavior patterns with high recognition, we used a well-known and adjusted for QDFGs based compression algorithm of the graph mining. QDFGs malicious software behavior modeling as a set of quantized data flow between system entities, the system call executed by the guide. Matching pattern obtained from well-known malware and benign software, we train a supervised classifier for unknown malware samples for classification.

contribution. To our knowledge, we are i) the first mining technology based on compression map information using quantitative data stream based on the behavior of the malware detection, II) we show that the use of mined based compression mode, based on the ratio of the common mining algorithm to obtain the frequency mode, the accuracy of the mentioned 600% .

组织结构。第2节中，概述图挖掘和构成我们方法的定量数据流模型，第3节介绍具体步骤。第4节讨论了评估指标，我们的成果在第5节中展示，第6节用来总结。

---------------------------------------------------

论文原文发表于TDSC 2019 地址： https://ieeexplore.ieee.org/document/7867799

The use of behavior-based compression-based graph mining technology of malware detection technology

I. Summary

Guess you like