Malicious Software Classification Using VGG16 Deep Neural Network's Bottleneck features

Abstract

Malicious software (malware) has been extensively employed for illegal purposes and thousands of new samples are discovered every day. The ability to classify samples with similar characteristics into families makes possible to create mitigation strategies that work for a whole class of programs. In this paper, we present a malware family classification approach using VGG16 deep neural network’s bottleneck features. Malware samples are represented as byteplot grayscale images and the convolutional layers of a VGG16 deep neural network pre-trained on the ImageNet dataset is used for bottleneck features extraction. These features are used to train a SVM classifier for the malware family classification task. The experimental results on a dataset comprising 10,136 samples from 20 different families showed that our approach can effectively be used to classify malware families with an accuracy of 92.97%, outperforming similar approaches proposed in the literature which require feature engineering and considerable domain expertise.

Publication
Information Technology-New Generations - Springer
Antônio Theóphilo
Antônio Theóphilo
Ph.D. Student

I am a Ph.D. student at the Institute of Computing/University of Campinas (UNICAMP) in the fields of Artificial Intelligence and Natural Language Processing. My research interests include Artificial Intelligence, Natural Language Processing, and Information Security.

Related