Main Article Content

Cyber Persecution: Classification Using Ensemble Learning


O.B. Okunoye
N.A. Azeez
C.C. Isiekwene
O.A. Sennaike

Abstract

Cyber persecution, which is popularly known as cyber harassment is one of the major crimes being committed on a daily basis in the  cyber-world. Virtual Harassment or Harassment includes remarks made in chat rooms, the sending of rude or nasty emails, or even  disturbing others by commenting on blogs or social networking sites. This paper classifies any form of harassment in the cyberspace with  ensemble learning approach. This paper compares traditional classifiers and ensemble learning in classifying virtual harassment in  online social media networks by training both models with four different datasets: seven machine learning algorithms (Nave Bayes NB,  Decision Tree DT, K Nearest Neighbour KNN, Logistics Regression LR, Neural Network NN, Quadratic Discriminant Analysis QDA, and  Support Vector Machine SVM) and four ensemble learning models (Ada Boosting, Gradient Boosting, Random Forest, and Max Voting).  Finally, the study made a comparison of the results using twelve evaluation metrics, namely: Accuracy, Precision, Recall, F1-measure, Specificity, Matthew’s Correlation Coefficient (MCC), Cohen’s Kappa Coefficient KAPPA, Area Under Curve (AUC), False Discovery Rate  (FDR), False Negative Rate (FNR), False Positive Rate (FPR), and Negative Predictive Value (NPV) were used to show the validity of the  algorithms. At the end of the experiments, for Dataset 1, Logistics Regression had the highest accuracy of 0.6923 for machine learning  algorithms; Max Voting Ensemble had the highest accuracy of 0.7047. With dataset 2, K-Nearest Neighbour, Support Vector Machine, and  Logistics Regression all had the same highest accuracy of 0.8769 in the machine learning algorithm, while Random Forest and Gradient  Boosting Ensemble both had the highest accuracy of 0.8779. For dataset 3, the Support Vector Machine had the highest accuracy of  0.9243 for the machine learning algorithms, while the Random Forest ensemble had the highest accuracy of 0.9258. 


Journal Identifiers


eISSN: 2636-6134