Main Article Content
Detection of unhealthy websites using machine learning
Abstract
In recent years, advancements in Internet and cloud technologies have led to a significant increase in electronic trading in which consumers make online purchases and transactions. Accompanying this achievement are vices like unauthorized access to users' sensitive information and damages to enterprise resources. Phishing is one of the familiar attacks that trick users to access malicious content and gain their information. This study aims to develop an efficient machine-learning program to detect phishing websites with high accuracy. Most phishing webpages look identical to the actual web pages and various strategies for detecting phishing websites, such as blacklisting, and heuristics, among others have been suggested. Existing research works showed that the performance of the phishing detection system is limited and there is a demand for intelligent techniques to protect users from cyber-attacks. A Uniform resource locator (URL) detection technique based on a supervised machine learning approach – Naïve Bayes is employed and implemented in Python programming language. The efficacy of this approach was determined on a phishing dataset made up of 7900 malicious and 5800 legitimate sites, respectively. The results show that using the proposed methodology an accuracy of 96% can be achieved by using stacking, filtering along the Naïve Bayes and logistic regression. This study thoroughly investigates the use of machine laearning with features extracted from the URLs and was able to showcase common words for the identification of either phishing (unhealthy) or good websites and proffered a guide to end users against the recent approaches in malicious URLs detection.