Convolutional Neural Network for Classification of Malware Assembly Code

27 Oct 2017  ·  Daniel Gibert, Javier Béjar, Carles Mateu, Jordi Planes, Daniel Solis, Ramon Vicens ·

Traditional signature-based methods have started becoming inadequnate to deal with next generation malware which utilize sophisticated obfuscation (polymorphic and metamorphic) techniques to evade detection. Recently, research efforts have been conducted on malware detection and classification by applying machine learning techniques. Despite them, most methods are build on shallow learning architectures and rely on the extraction of hand-crafted features. In this paper, based on assembly language code extracted from disassembled binary files and embedded into vectors, we present a convolutional neural network architecture to learn a set of discriminative patterns able to cluster malware files amongst families. To demonstrate the suitability of our approach we evaluated our model on the data provided by Microsoft for the BigData Innovators Gathering 2015 Anti-Malware Prediction Challenge. Experiments show that the method achieves competitive results without relying on the manual extraction of features and is resilient to the most common obfuscation techniques.

PDF
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Malware Classification Microsoft Malware Classification Challenge Opcode-based Shallow CNN Accuracy (10-fold) 0.9917 # 5
LogLoss 0.0244 # 1
Macro F1 (10-fold) 0.9856 # 6

Methods


No methods listed for this paper. Add relevant methods here