Main Article Content

Development of a Medical Condition Prediction Model Using Natural Language Processing with K-Nearest Neighbour


Bolaji Omodunbi
Afeez Soladoye
Nnamdi Okomba
Charity Odeyemi
Mutiu Ayinla

Abstract

Capturing the effect of drugs being used by patients and using this review to predict the medical ailment they are facing is a good approach to easily predict medical conditions. A lot of researchers use clinical and demographic data (risk factors) to predict diseases, the limitation of this approach is that not all the instances would have the right clinical results and there is usually missing values, low prediction accuracy, inadequately pre-processed dataset, failure to consider feature selection and un-experimentation of alternative values of K when using K-nearest neighbour. Using drug review would go a long way as their effect and symptoms as reported by the user through their review would capture relevant information needed. This study employed an open access drug review dataset to predict the medical condition, this dataset consist of training and testing split which was integrated and later split using 80-20 splitting with stratification. The dataset went through some natural language processing techniques such as lemmatization, stemming, removal of stop words, tokenization, and vectorization among others. Forward –backward feature selection technique was employed with the comments having significant effect to the prediction of the condition. K-nearest neighbour was then employed to predict the medical condition using the drug review as the feature with the condition as the target variable. Different values of nearest neighbours were used to train the model with k=1 given the best predictive average accuracy of 89% with weighted average precision of 90%. The model gave the same average accuracy of 84% when k was initialised to 3, 4, 5 and 6 respectively. Moreover, the model obtained a better result when compared with exciting systems. Therefore, with the use of artificial intelligence, medical doctors and patients can easily use drug review to predict certain medical condition using clinical predictive decision support system.


Journal Identifiers


eISSN: 2579-0617
print ISSN: 2579-0625