Ransomware Early Detection using Machine Learning Approach and Pre-Encryption Boundary Identification

Wira Zanoramy; Mohd Faizal Abdollah; Othman Abdollah; S.M. Warusia Mohamed S.M.M

doi:10.37934/araset.47.2.121137

Authors

Wira Zanoramy MyCERT, Cybersecurity Malaysia, Menara Cyber Axis, Jalan Impact, 63000 Cyberjaya, Selangor, Malaysia
Mohd Faizal Abdollah Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, 76100 Durian Tunggal, Melaka, Malaysia
Othman Abdollah Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, 76100 Durian Tunggal, Melaka, Malaysia
S.M. Warusia Mohamed S.M.M Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, 76100 Durian Tunggal, Melaka, Malaysia

DOI:

https://doi.org/10.37934/araset.47.2.121137

Keywords:

Ransomware, Early detection, Pre-encryption, Pre-encryption boundary, Crypto-ransomware, Cryptographic ransomware

Abstract

The escalating ransomware threat has catalysed the formation of a sophisticated network of cybercriminal enterprises. Addressing this issue, our research provides a detailed exploration of the ransomware menace and an evaluation of contemporary detection methodologies. A successful ransomware attack leverages many factors: robust encryption methods that defy decryption, the anonymity of cyber currencies, and the widespread availability of ransomware kits that enable even inexperienced actors to launch attacks. Such dynamics have cultivated a niche for cybercriminal specialists in the digital underworld. In response to these challenges, our study proposes a detection framework based on machine learning, a domain where regression algorithms have gained popularity without yielding a definitive protective model. We employ API call analysis as the foundation to assess various machine learning classifiers' efficiency in identifying ransomware. The evaluation demonstrates that the Naive Bayes classifier underperforms due to suboptimal accuracy, making it unsuitable for this application. Conversely, Logistic Regression, with an AUC of 0.951, minimal training time, and substantial efficacy gains, emerges as a strong contender. The Decision Tree and Random Forest classifiers exhibit comparable proficiency; however, the Decision Tree's interpretability and Random Forest's computational swiftness present unique advantages. Superior still, SVM and Gradient Boosted Trees command the highest AUC and gains, albeit at the cost of increased training duration. Our findings affirm the pivotal role of API call analysis in ransomware detection and the potency of machine learning approaches in learning from extensive datasets to identify novel malware strains. Given the continual evolution of malware, detection methodologies must adapt correspondingly. This study's comparative analysis elucidates the trade-offs between accuracy, computational speed, and training time, guiding the selection of the optimal machine learning algorithm for robust ransomware detection.