Using Multiple Imputation and Inverse Probability Weighting to Adjust for Missing Data in HIV Prevalence Estimates: A Cross-Sectional Study in Mwanza, North Western Tanzania

Abstract

Introduction

Population surveys and demographic studies are the gold standard for estimating HIV prevalence. However, non-response in these surveys is of major concern, especially if it is not random and complete case analysis becomes an inappropriate data analysis method. Therefore, a comprehensive analysis that will account for the missing data must be used to obtain unbiased HIV prevalence estimates.

Methods

Serological samples were collected from participants who were residents of a Demographic Surveillance System (DSS) in Kisesa, Tanzania. HIV prevalence was estimated using three methods. Firstly, using the Complete case analysis (CCA), assuming data were Missing Completely at Random (MCAR). The other two methods, multiple imputations (MI) and inverse probability weighting (IPW) assumed that non-response was missing at random (MAR). For MI, a logistic regression model adjusting for age, sex, residence, and marital status was used to impute 20 datasets to re-estimate the HIV prevalence. The propensity for participating in the sero-survey and being tested for HIV given age, sex, residence, and marital status were generated using logistic regression models. Using the propensity scores, inverse probability weights were derived for participants who were tested for HIV.

Results

The overall CCA HIV prevalence estimate was 6.6% (95% CI: 6.0-7.2), with 5.4% (95% CI: 4.6-6.3) in males and 7.3% (95% CI: 6.6-8.1) in females. Using MI, the overall HIV prevalence was 6.8% (95% CI: 6.2-7.5), 6.2% (95% CI: 5.1-7.3) in males, and 7.4% (95% CI: 6.6-8.2) in females. Using IPW the overall HIV prevalence was 6.7% (95% CI: 6.1-7.4), with 5.5% (95% CI: 4.7-6.5) in males and 7.7% (95% CI: 7.0 - 8.6) in females. HIV prevalence differed significantly between age groups (p<0.001), with the highest estimate in males aged 35-39 and females aged 40-44, and the lowest in both males and females aged 15-19 years.

Conclusion

Complete case analysis underestimates HIV prevalence compared to methods that adjust for missing data. After comparing CCA, MI, and IPW, we found out that the best method to adjust for missing data in population surveys is through the use of multiple imputations.

Journal Identifiers

eISSN: 2953-2663
print ISSN: 2591-6769

East African Journal of Applied Health Monitoring and Evaluation
Journal / East African Journal of Applied Health Monitoring and Evaluation / Vol. 6 (2023) / Articles

Published:

Keywords:

Tinashe Mhike

Jim Todd

Mark Urassa

Neema R. Mosha

Abstract

Journal Identifiers

Article Sidebar

Published:

Keywords:

Article Details

Main Article Content

Tinashe Mhike

Jim Todd

Mark Urassa

Neema R. Mosha

Abstract

Journal Identifiers