Main Article Content
Multiple Imputation: An Iterative Regression Imputation
Abstract
Multiple imputation (MI) is a commonly applied method of statistically handling missing data. It involves imputing missing values repeatedly to account for the variability due to imputations. There are different techniques of MI that have proven to be effective and available in many statistical software packages. However, the main problem that arises when statistically handling missing data, namely, bias, still remains. Indeed, as multiple imputation techniques are simulation-based methods, estimates of a sample of fully complete data may substantially vary in every application using the same original data and the same implementation method.
Therefore, the uncertainty is often under- or overestimated, exhibiting poor predictive capability. A new approach of MI based on regression method is presented. The proposed approach consists of constructing a possible lower and upper bound around the sum of square of residuals (SSE) that would have been obtained in a complete case (that is, if there were no missing data). Then, iteratively implement regression imputation (RI) to replace the missing values and compute a new SSE with fully completed data. If the new SSE does not fall within the constructed bounds, the RI method is repeated until the SSE estimated falls into those bounds. The SSEs of the prediction are used to assess the performance of the proposed approach compared to expectation-maximization (EM) imputation and multiple imputation by chained equations (MICE). The results indicate that the three methods work reasonably well in many situations, particularly when the amount of missingness is low and when data are missing at random (MAR) and missing completely at random (MCAR). However, when the proportion of missingness is severe and the data are missing not at random (MNAR), the proposed method performs better than MICE and EM algorithms.