Enhancing Ransomware Classification with Multi-stage Feature Selection and Data Imbalance Correction

Image credit: CSO

Abstract

Ransomware is a critical security concern, and developing applications for ransomware detection is paramount. Machine learning models are helpful in detecting and classifying ransomware. However, the high dimensionality of ransomware datasets divided into various feature groups such as API calls, Directory, and Registry logs has made it difficult for researchers to create effective machine learning models. Class imbalance also leads to poor results when classifying ransomware families. To tackle these challenges, in this paper, we propose a three-stage feature selection method that effectively reduces the dimensionality of the data and considers the varying importance of the different feature groups in the classification of ransomware families. We also applied cost-sensitive learning and re-sampling of the training data using SMOTE to address data imbalance. We applied these techniques to the Elderan ransomware dataset. Our results show that the proposed feature selection method significantly improves the detection of ransomware compared to other state-of-the-art studies using the same dataset. Furthermore, the data balancing techniques (cost-sensitive learning and SMOTE) were effective in the multi-class classification of ransomware.

Publication
7th International Symposium on Cyber Security, Cryptology, and Machine Learning
Faithful Chiagoziem OWNUEGBUCHE
Faithful Chiagoziem OWNUEGBUCHE
PhD Candidate in Machine Learning and Blockchain Technology

My research focuses on the intersection of machine learning and blockchain technology, particularly their applications in the fields of cybersecurity and finance.