Main Article Content
An Improved Data Privacy and Data Availability Model for Medical Diagnosis System Using a Hybrid Weighted KNN and Rule-Based Algorithm
Abstract
Background: This study focuses on improving medical diagnosis systems, with a particular emphasis on addressing the challenges of data availability and data privacy associated with medical systems.
Objective: The goal is to develop a model that can be trained on large amounts of data and can accurately diagnose medical conditions while ensuring the privacy of patient data.
Method: To achieve this objective, we employ a combination of techniques, including the use of synthetic data generated of 100,000 samples from a sample of 4,390 by the Synthpop package.
Results: The synthetic data closely mimics the characteristics of the original observations, enabling us to overcome the limitations of limited data availability. This allows researchers to perform analysis without directly accessing sensitive patient information. Additionally, this research introduces an approach to protect patient privacy in clinical data sharing. It explores techniques for encapsulating data that maintains the statistical properties of the original data, allowing researchers to perform analysis without directly accessing sensitive patient information.
Conclusions: The hybrid weighted KNN with Rule-based model outperforms other conventional models by achieving an accuracy of 98% on the training data and 98% on the test data, 100% on precision and recall.