Main Article Content

Evaluating the Significance of Data Engineering Techniques in Multi-Class Prediction: Multi-Factor Educational Data Mining Experiments


A.Z. Umar
H.S. Tuge
Y.G. Ibrahim

Abstract

Artificial Intelligence, particularly predictive modelling, is increasingly influencing education. For instance, a specific algorithm predicted  with 74% accuracy the students that would fail within three weeks of the course. These results could lead to interventions that promote  inclusivity and personalized learning, supporting the UN's goals of quality education and reducing inequalities. While predictive analytics  holds great promise for education, datasets often suffer from small sample sizes and class imbalances which can result in inaccurate  predictions and biased machine learning models. In this study, we evaluate the significance of various data engineering techniques in the  context of educational data mining using a multi-factor supervised learning experiment. We applied data augmentation and balancing techniques to assess their impact on model performance. Additionally, data discretization for continuous features and feature  selection, to identify the most relevant features for model training, were implemented and evaluated. The experimental design followed a  2 X 2 X 3 X3 factorial structure, incorporating different combinations of these techniques. We employed three models: Random Forest, Decision Tree, and Feed Forward Neural Network. The performance was measured using accuracy and F1 score metrics. The results also  show that the data augmentation and balancing techniques seem to improve testing accuracy and F1 scores slightly, particularly for  simpler models like Decision Trees. Feedforward Neural Networks perform more consistently across different datasets, while Decision  Trees and Random Forests are more prone to overfitting, particularly without proper data balancing or augmentation.   


Journal Identifiers


eISSN: 2635-3490
print ISSN: 2476-8316