Main Article Content

Detection of unhealthy websites using machine learning


O.A. Gbadamosi
A.M. Oduwale

Abstract

In recent years, advancements in Internet and cloud technologies have led to a significant increase in electronic trading in which  consumers make online purchases and transactions. Accompanying this achievement are vices like unauthorized access to users'  sensitive information and damages to enterprise resources. Phishing is one of the familiar attacks that trick users to access malicious  content and gain their information. This study aims to develop an efficient machine-learning program to detect phishing websites with  high accuracy. Most phishing webpages look identical to the actual web pages and various strategies for detecting phishing websites,  such as blacklisting, and heuristics, among others have been suggested. Existing research works showed that the performance of the  phishing detection system is limited and there is a demand for intelligent techniques to protect users from cyber-attacks. A Uniform resource locator (URL) detection technique based on a supervised machine learning approach – Naïve Bayes is employed and  implemented in Python programming language. The efficacy of this approach was determined on a phishing dataset made up of 7900  malicious and 5800 legitimate sites, respectively. The results show that using the proposed methodology an accuracy of 96% can be  achieved by using stacking, filtering along the Naïve Bayes and logistic regression. This study thoroughly investigates the use of machine  laearning with features extracted from the URLs and was able to showcase common words for the identification of either phishing  (unhealthy) or good websites and proffered a guide to end users against the recent approaches in malicious URLs detection. 


Journal Identifiers


eISSN:
print ISSN: 2714-3716