Abstract
Collinearity of predictor variables is a severe problem in the least square regression analysis. It contributes to the instability of regression coefficients and leads to a wrong prediction accuracy. Despite these problems, studies are conducted with a large number of observed and derived variables linked with a response variable. The aim of this study is to highlight a better understanding of the misleading effect of collinearity introduced by derived variables and the efficiency of alternative methods. Twelve variables selection models were subjected to five parameter estimation methods characterized by their ability to reduce the collinearity effect. The response variable and eight anthropometric variables and two derived variables were collected with 200 children of 5 to 10 years old. We found that the selection methods do not mitigate the collinearity of selected subset variables, the size of selected subset variables depends on the collinearity of data samples and no significant correlation exists between sample and selected subset data collinearities. The analysis show that predictive quality did not improve with the introduction of derived variables. The alternative methods did not result in significant efficiency of prediction quality. We recommend avoiding the introduction of derived variables for the establishment of regression equation for prediction use.
Keywords: Collinearity, prediction, regression, ridge regression, conditional likelihood, basal metabolism rate.