Evaluating the Significance of Data Engineering Techniques in Multi-Class Prediction: Multi-Factor Educational Data Mining Experiments

A.Z.  Umar; H.S. Tuge; Y.G. Ibrahim

download PDF

Published:

Feb 12, 2025

DOI:

Keywords:

Data Engineering, Feature Selection, Data Augmentation, Educational Mining

Issue

Vol. 10 No. 4b (2024)

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

The Faculty of Science, Federal University Dutse. Jigawa State, Nigeria.

A.Z. Umar

H.S. Tuge

Y.G. Ibrahim

Abstract

Artificial Intelligence, particularly predictive modelling, is increasingly influencing education. For instance, a specific algorithm predicted with 74% accuracy the students that would fail within three weeks of the course. These results could lead to interventions that promote inclusivity and personalized learning, supporting the UN's goals of quality education and reducing inequalities. While predictive analytics holds great promise for education, datasets often suffer from small sample sizes and class imbalances which can result in inaccurate predictions and biased machine learning models. In this study, we evaluate the significance of various data engineering techniques in the context of educational data mining using a multi-factor supervised learning experiment. We applied data augmentation and balancing techniques to assess their impact on model performance. Additionally, data discretization for continuous features and feature selection, to identify the most relevant features for model training, were implemented and evaluated. The experimental design followed a 2 X 2 X 3 X3 factorial structure, incorporating different combinations of these techniques. We employed three models: Random Forest, Decision Tree, and Feed Forward Neural Network. The performance was measured using accuracy and F1 score metrics. The results also show that the data augmentation and balancing techniques seem to improve testing accuracy and F1 scores slightly, particularly for simpler models like Decision Trees. Feedforward Neural Networks perform more consistently across different datasets, while Decision Trees and Random Forests are more prone to overfitting, particularly without proper data balancing or augmentation.

Dutse Journal of Pure and Applied Sciences
Journal / Dutse Journal of Pure and Applied Sciences / Vol. 10 No. 4b (2024) / Articles

Published:

DOI:

Keywords:

Evaluating the Significance of Data Engineering Techniques in Multi-Class Prediction: Multi-Factor Educational Data Mining Experiments

A.Z. Umar

H.S. Tuge

Y.G. Ibrahim

Abstract

Journal Identifiers

Article Sidebar

Published:

DOI:

Keywords:

Article Details

Main Article Content

A.Z. Umar

H.S. Tuge

Y.G. Ibrahim

Abstract

Journal Identifiers