Main Article Content
The impact of pre-selected variance in ation factor thresholds on the stability and predictive power of logistic regression models in credit scoring
Abstract
Standard Bank, South Africa, currently employs a methodology when developing application or behavioural scorecards that involves logistic regression. A key aspect of building logistic regression models entails variable selection which involves dealing with multicollinearity. The objective of this study was to investigate the impact of using dierent variance in
ation factor1 (VIF) thresholds on the performance of these models in a predictive and discriminatory context and to study the stability of the estimated coecients in order to advise the bank. The impact of the choice of VIF thresholds was researched by means of an empirical and simulation study. The empirical study involved analysing two large data sets that represent the typical size encountered in a retail credit scoring context. The rst analysis concentrated on tting the various VIF models and comparing the tted models in terms of the stability of coecient estimates and goodness-of-t statistics while the second analysis focused on evaluating the tted models' predictive ability over time. The simulation study was used to study the eect of multicollinearity in a controlled setting. All the above-mentioned studies indicate that the presence of multicollinearity in large data sets is of much less concern than in small data sets and that the VIF criterion could be relaxed considerably when models are tted to large data sets. The recommendations in this regard have been accepted and implemented by Standard Bank.
Key words: Logistic regression, multicollinearity, variance in ation factor, variation of coecient estimates, elastic net, prediction and discriminatory power, large credit scoring data sets, risk analysis.