Main Article Content

Cyber Attack Detection in A Global Network Using Machine Learning Approach


Nureni A. Azeez
Oluwaseun T. Odeyemi
Chinyere C. Isiekwene
Ademola P. Abidoye

Abstract

In this digital age, inter-device communication is key to seamless and smooth handshaking. Communication can range from Internet of Things communication (IoT), autonomous vehicles, mobile communication and a plethora of other uses. These communications need to be protected against attacks. Unfortunately, with the widespread use of the internet, cyberattacks have become rampant. This research introduces the use of seven (7) machine- learning models alongside four different ensemble methods to compare the effectiveness of different Machine learning algorithms and ensemble models for intrusion detection. The network traffic was categorized as The Onion Router (TOR or non-TOR) traffic and further categorized if the network traffic data was Benign or Bot/Infiltration traffic data. This was achieved using: – Naïve Bayes, Decision Tree, K-Nearest Neighbor, Logistic Regression, Neural Network, Quadratic Discriminant Analysis, and Support Vector Machine. The ensemble models used are Adaboost, Gradient Boosting, Random Forest, and Max Voting. The "CIC IDS 2017", ("CSE-CIC-IDS2018"), "01-03-2018" and "02-03-2018" datasets were used. For dataset 1, among the regular machine learning models, Decision Trees had the highest values. Accuracy was 97.46% and precision was 89.88%. The highest ensemble performer was the Random Forest ensemble, which had an accuracy of 98.28% and a precision score of 93.20%. For dataset 2, Decision Trees also had the highest accuracy score of 99.86% and a precision score of 99.66%. The highest ensemble performer was the Random Forest ensemble which had an accuracy score of 99.89% and a precision score of 99.70%. For dataset 3, amongst the regular machine learning models, Neural Network had the highest accuracy score of 78.68% and a precision value of 72.92% while the highest ensemble performer was Gradient Boosting with an accuracy of 79.16% and a precision score of 81.25%. The results were shown using line charts and a confusion matrix. From the experiment, it is evident that amongst the traditional Machine Learning Models, Decision Tree- is (or Trees are) the most efficient while the ensemble Models revealed Random Forest as the most efficient of the ensemble models.