Main Article Content
Framework for the detection and classification of malware using machine learning
Abstract
Malware constitute a major threat to Network Infrastructure which are vulnerable to several devastating Malware attacks such as Virus and Ransomware. Traditional Antimalware software provides limited efficiency against Malware removal due to evolving evasion techniques capabilities of Malware such as polymorphism. Antimalware only removes Malware they have signatures for and are ineffective and helpless against zero day attack, several research works have made use of supervised and unsupervised learning algorithms to detect and classify Malware but False Positives prevails. This research made use of Machine Learning to detect and classify Malware by employing Machine Learning techniques including Feature Selection techniques as well as Grid Search hyperparameter optimization. Principal Component Analysis was combined with Chi Square to cure the curse of dimensionality. Support Vector Machine, K Nearest Neighbor and Decision Tree were used to train the model separately with two datasets. The research model was evaluated with Confusion Matrix, Precision, Recall and F1 Score. Accuracy of 99%,98.64% and 100% was achieved with K Nearest Neighbor, Decision Tree and Support Vector Machine respectively using CICMalmem dataset which has equal number of Malware and Benign files, K Nearest Neighbor achieved no False Positive. Accuracy of 97.7%,70% and 96% was achieved with K Nearest Neighbor, Decision Tree and Support Vector Machine respectively with Dataset_Malware.csv dataset, K Nearest Neighbor achieved False Positives of 38.The Model was trained separately with default hyperparameters of the chosen algorithms as well as the optimal hyperparameters obtained from Grid Search and it was discovered that optimizing hyperparameters and combining features obtained with Principal Component Analysis and Chi Square to train the Model using the dataset with equal number of Benign and Malicious files(CICMalmem dataset) yielded optimal performance with Support Vector Machine. Future works includes employing deep learning and ensemble learning as classifiers as well as implementing other hyperparameter optimization techniques.