Executables Malware Analysis

This document collects papers that are related with PC executables analysis.

Executables Malware

Survey

A Survey on Malware Detection Using Data Mining Techniques. Acm Computing Surveys, 2017. paper
A survey on automated dynamic malware-analysis techniques and tools. Acm Computing Surveys, 2008. paper
A survey of malware detection techniques. Purdue University, 2007. paper
Behavioral detection of malware: from a survey towards an established taxonomy. Journal in Computer Virology, 2008. paper

DataSet and Benchmark

Static Analysis + Data Mining Techniques

2016

DL4MD: A Deep Learning Framework for Intelligent MalwareDetection paper

This paper is based on the extracted Windows API calls, the developed deep learning framework outperformed ANN, SVM, NB, and DT in malware detection.

In this paper, based on the Windows Application Programming Interface (API) calls extracted from the Portable Executable (PE) files, we study how a deep learning architecture using the stacked AutoEncoders (SAEs) model can be designed for intelligent malware detection. The SAEs model performs as a greedy layerwise training operation for unsupervised feature learning, followed by supervised parameter fine-tuning (e.g., weights and offset vectors). To the best of our knowledge, this is the first work that deep learning using the SAEs model based on Windows API calls is investigated in malware detection for real industrial application.

2015

Saxe J, Berlin K. Deep neural network based malware detection using two dimensional binary program features. 2015:11-20. paper

Using the statically extracted features, their system achieves a 95% detection rate at 0.1% false positive rate, based on more than 400,000 software binaries.

In this paper we introduce a deep neural network based malware detection system that Invincea has developed, which achieves a usable detection rate at an extremely low false positive rate and scales to real world training example volumes on commodity hardware.

2014

Tamersoy A, Roundy K, Chau D H. Guilt by association: large scale malware detection by mining file-relation graphs ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2014:1524-1533.paper

The increasing sophistication of malicious software calls for new defensive techniques that are harder to evade, and are capable of protecting users against novel threats. We present Aesop, a scalable algorithm that identifies malicious executable files by applying Aesop’s moral that “a man is known by the company he keeps.” We use a large dataset voluntarily contributed by the members of Norton Community Watch, consisting of partial lists of the files that exist on their machines, to identify close relationships between files that often appear together on machines. Aesop leverages locality-sensitive hashing to measure the strength of these inter-file relationships to construct a graph, on which it performs large scale inference by propagating information from the labeled files (as benign or malicious) to the preponderance of unlabeled files. Aesop attained early labeling of 99% of benign files and 79% of malicious files, over a week before they are labeled by the state-of-the-art techniques, with a 0.9961 true positive rate at flagging malware, at 0.0001 false positive rate.

2013

Karampatziakis N, Stokes J W, Thomas A, et al. Using File Relationships in Malware Classification Detection of Intrusions and Malware, and Vulnerability Assessment. Springer Berlin Heidelberg, 2012:1-20.paper

Typical malware classification methods analyze unknown files in isolation. However, this ignores valuable relationships between malware files, such as containment in a zip archive, dropping, or downloading. We present a new malware classification system based on a graph induced by file relationships, and, as a proof of concept, analyze containment relationships, for which we have much available data. However our methodology is general, relying only on an initial estimate for some of the files in our data and on propagating information along the edges of the graph. It can thus be applied to other types of file relationships. We show that since malicious files are often included in multiple malware containers, the system’s detection accuracy can be significantly improved, particularly at low false positive rates which are the main operating points for automated malware classifiers. For example at a false positive rate of 0.2%, the false negative rate decreases from 42.1% to 15.2%. Finally, the new system is highly scalable; our basic implementation can learn good classifiers from a large, bipartite graph including over 719 thousand containers and 3.4 million files in a total of 16 minutes.

2012

Eskandari M, Hashemi S. A graph mining approach for detecting unknown malwares. Journal of Visual Languages & Computing, 2012, 23(3):154-162. paper

We present Aesop, a scalable algorithm that identifies malicious executable files by applying Aesop’s moral that“a man is known by the company he keeps.” We use a large dataset voluntarily contributed by the members of Norton Community Watch, consisting of partial lists of the files that exist on their machines, to identify close relationships between files that often appear together on machines. Aesop leverages locality-sensitive hashing to measure the strength of these inter-file relationships to construct a graph, on which it performs large scale inference by propagating information from the labeled files (as benign or malicious) to the preponderance of unlabeled files. Aesop attained early labeling of 99% of benign files and 79% of malicious files, over a week before they are labeled by the state-of-the-art techniques, with a 0.9961 true positive rate at flagging malware, at 0.0001 false positive rate.

2011

Santos I, Brezo F, Ugarte-Pedrero X, et al. Opcode sequences as representation of executables for data-mining-based unknown malware detection. Information Sciences, 2013, 231(9):64-82. paper, dataSet

In this paper, we propose a new method to detect unknown malware families. This model is based on the frequency of the appearance of opcode sequences. Furthermore, we describe a technique to mine the relevance of each opcode and assess the frequency of each opcode sequence. In addition, we provide empirical validation that this new method is capable of detecting unknown malware.
Ye Y, Li T, Zhu S, et al. Combining file content and file relations for cloud based malware detection ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2011:222-230.paper

In this paper, we study how file relations can be used to improve malware detection results and develop a file verdict system (named “Valkyrie”) building on a semi-parametric classifier model to combine file content and file relations together for malware detection. To the best of our knowledge, this is the first work of using both file content and file relations for malware detection. A comprehensive experimental study on a large collection of PE files obtained from the clients of anti-malware products of Comodo Security Solutions Incorporation is performed to compare various malware detection approaches. Promising experimental results demonstrate that the accuracy and efficiency of our Valkyrie system outperform other popular anti-malware software tools such as Kaspersky AntiVirus and McAfee VirusScan, as well as other alternative data mining based detection systems. Our system has already been incorporated into the scanning tool of Comodo’s Anti-Malware software.
Horng D, Chau, Nachenberg C, et al. Polonium: Tera-Scale Graph Mining for Malware Detection 2008.paper

We present Polonium, a scalable and effective technology for detecting malware. We evaluated it with the largest anonymized file submissions dataset ever published, which spans over 60 terabytes of disk space. We formulated the problem of detecting malware as a large-scale graph mining and inference task, for which we construct a huge bipartite graph of almost 1 billion nodes from our data, 48 million of which are users, and 903 million are files. Edges, each denoting a file appearing on a machine, exceeds 37 billion. Our method for identifying malware is to locate files with low reputation. Our Polonium algorithm computes file reputation based on the fast and scalable Belief Propagation algorithm (O(|E|)), which iteratively improves inference quality. With one iteration, our method attained 85% true positive rate (in detecting malware). With more iterations, the true positive rate further improves for an additional 2%, which is a significant improvement given the baseline performance is already very good. We detail important design and implementation features of our method which enable its successful application on our dataset. We also present empirical observations on characteristics and patterns in our large billion-node graph.

Dynamic Analysis + Data Mining Techniques

2010

Firdausi I, Lim C, Erwin A, et al. Analysis of Machine learning Techniques Used in Behavior-Based Malware Detection. International Conference on Advances in Computing. 2010:201-203. paper

The increase of malware that are exploiting the Internet daily has become a serious threat. The manual heuristic inspection of malware analysis is no longer considered effective and efficient compared against the high spreading rate of malware. Hence, automated behavior-based malware detection using machine learning techniques is considered a profound solution. The behavior of each malware on an emulated (sandbox) environment will be automatically analyzed and will generate behavior reports. These reports will be preprocessed into sparse vector models for further machine learning (classification). The classifiers used in this research are k-Nearest Neighbors (kNN), Naïve Bayes, J48 Decision Tree, Support Vector Machine (SVM), and Multilayer Perceptron Neural Network (MLP).

2009

Kolbitsch, Clemens, Comparetti, et al. Effective and efficient malware detection at the end host. Proceeding SSYM’09 Proceedings of the 18th conference on USENIX security symposium Pages 351-366, Montreal, Canada, August, 2009. paper, details information and notes

In this paper, we propose a novel malware detection approach that is both effective and efficient, and thus, can be used to replace or complement traditional anti-virus software at the end host. Our approach first analyzes a malware program in a controlled environment to build a model that characterizes its behavior. Such models describe the information flows between the system calls essential to the malware’s mission, and therefore, cannot be easily evaded by simple obfuscation or polymorphic techniques. Then, we extract the program slices responsible for such information flows. For detection, we execute these slices to match our models against the runtime behavior of an unknown program. Our experiments show that our approach can effectively detect running malicious code on an end user’s host with a small overhead.

Hybird Analysis + Data Mining Techniques

2017

Bounouh T, Brahimi Z, Al-Nemrat A, et al. A Scalable Malware Classification Based on Integrated Static and Dynamic Features International Conference on Global Security, Safety, and Sustainability. Springer International Publishing, 2017:113-124. paper

2013

Islam R, Tian R, Batten L M, et al. Classification of malware based on integrated static and dynamic features. Journal of Network & Computer Applications, 2013, 36(2):646-656. paper

Concept Drift

2017

Transcend: Detecting Concept Drift in Malware Classification Models paper

Building machine learning models of malware behavior is widely accepted as a panacea towards effective malware classification. A crucial requirement for building sustainable learning models, though, is to train on a wide variety of malware samples. Unfortunately, malware evolves rapidly and it thus becomes hard—if not impossible—to generalize learning models to reflect future, previously-unseen behaviors. Consequently, most malware classifiers become unsustainable in the long run, becoming rapidly antiquated as malware continues to evolve. In this work, we propose Transcend, a framework to identify aging classification models in vivo during deployment, much before the machine learning model’s performance starts to degrade. This is a significant departure from conventional approaches that retrain aging models retrospectively when poor performance is observed. Our approach uses a statistical comparison of samples seen during deployment with those used to train the model, thereby building metrics for prediction quality. We show how Transcend can be used to identify concept drift based on two separate case studies on Android andWindows malware, raising a red flag before the model starts making consistently poor decisions due to out-of-date training.

Other Movtivationg Program Analysis Using Machine Learning

2017

Guanhua Yan, Junchen Lu, Zhan Shu, and Yunus Kucuk. ExploitMeter: Combining Fuzzing with Machine Learning for Automated Evaluation of Software Exploitability. Proceedings of The 1st IEEE Symposium on Privacy-Aware Computing (PAC’17), Washington DC, USA, August 1-3, 2017. paper,code

*In this work, we propose ExploitMeter, a fuzzingbased framework of quantifying software exploitability that
facilitates decision-making for software assurance and cyber insurance. Designed to be dynamic, efficient and rigorous,
ExploitMeter integrates machine learning-based prediction and dynamic fuzzing tests in a Bayesian manner. Using 100 Linux
applications, we conduct extensive experiments to evaluate the performance of ExploitMeter in a dynamic environment.*

some other tool

Obfuscator

Download

Obfuscator protects your code against reverse engineering
Software protection at the source code level allows the use of several methods that are not possible with compiled binaries.
Here you can download the demo version of Obfuscator. Installation & application files are all digitally signed and tested with the latest antivirus software.

Unest

Web page

Github

How to use?

Purpose：unest obfuscate the binary code,obstruct malicious software analysis effectively.
Executive Enviroment：php + nasm
Instruction: Intel x86
Format: binary code segment (such as shellcode,etc)；COFF Obj (compiled by vc6,vs2010,etc).

wcventure

发布了18 篇原创文章 · 获赞 23 · 访问量 1万+

私信关注

PC恶意软件分析文献梳理2017-Executables Malware Analysis-wcventure

Executables Malware Analysis

Executables Malware

Survey

DataSet and Benchmark

Static Analysis + Data Mining Techniques

2016

2015

2014

2013

2012

2011

Dynamic Analysis + Data Mining Techniques

2010

2009

Hybird Analysis + Data Mining Techniques

2017

2013

Concept Drift

2017

Other Movtivationg Program Analysis Using Machine Learning

2017

some other tool

Obfuscator

Unest

猜你喜欢

PC恶意软件分析文献梳理2017-Executables Malware Analysis-wcventure

Executables Malware Analysis

Executables Malware

Survey

DataSet and Benchmark

Static Analysis + Data Mining Techniques

2016

2015

2014

2013

2012

2011

Dynamic Analysis + Data Mining Techniques

2010

2009

Hybird Analysis + Data Mining Techniques

2017

2013

Related Papers

Concept Drift

2017

Other Movtivationg Program Analysis Using Machine Learning

2017

some other tool

Obfuscator

Unest

猜你喜欢