Main Article Content
Comparing the Prediction Accuracy of Ridge, Lasso and Elastic Net Regression Models with Linear Regression Using Breast Cancer Data
Abstract
Regularised regression methods have been developed in the past to overcome the shortcomings of ordinarily least squares (OLS) regression of not performing well with respect to both prediction accuracy and model complexity. OLS method may fail or produce regression estimates with high variance in the presence of multi-collinearity or when the predictor variables are greater than the number of observations. This study compares the predictive performance and additional information gained of Ridge, Lasso and Elastic net regularised methods with the classical OLS method using data of breast cancer patients. The findings have shown that using all the predictor variables, the OLS method failed because of the presence of multiple collinearity, while regularised Ridge, Lasso and Elastic net methods produced results that showed the predictor variables mostly significant. Using the training data, the Elastic net and Lasso seemed to indicate more significant predictor variables than the Ridge method. The result also indicated that breast cancer patients in age groups 30-39, those that are married and in stage1 of the disease, have longer survival times, while patients that are in stage2 and stage3 have shorter survival times. The OLS regression produced results only when four of the predictor variables were excluded; even then, the regularised methods still outperformed the OLS regression in terms of prediction accuracy.