Main Article Content
Heterogeneous Ensemble Feature Selection and Multilevel Ensemble Approach to Machine Learning Phishing Attack Detection
Abstract
Over the decade, technology has presented human facets with easiest means of accomplishing complex tasks seamlessly, especially in the area of communication. Malicious and vicious links are consciously doctored to resemble the original and sent through emails to millions of users at once at a lower price. Since the emergence of phishing and its cohorts, every solution and means to mitigate the attacks has proven unsuccessful due to the dynamic nature of the attacks. Meanwhile, machine learning (ML) is adopted as the right antidote to phishing detection, with its performance based on diverse steps, especially feature selection. Most studies in the problem domain concentrate more on model optimization than sourcing for a reliable feature selection system and fail to integrate a reliable feature selection along with the classification model. The systems are fed with low-quality data that hampers the performance of such models. The authors noticed the contribution of feature selection to the performance of machine learning models and developed a novel Heterogeneous Ensemble Feature Selection (HEFS) framework for multilevel ensemble machine learning-based phishing detection. In HEFS, three filter-based statistical techniques were exploited to produce a primary subset of phishing features, and the variable selected by each of the techniques was automatically aggregated to produce the baseline features. The selection of the techniques is to overcome each limitation since their ranking principles are different. The experiment revealed that the multilevel ensemble (stacked) on the baseline features outperformed others with an accuracy of 98.8%., including multilevel model on each filter-based method.