Leveraging External Validation Dataset to Adjust for Missing Confounders in an Enhanced Two-Stage Zero-Inflated Poisson Model Design: A Methodological and Simulation Study
Author(s)
Wu DBC1, Lin HW2
1Johnson & Johnson Innovative Medicine, Singapore, Singapore, 2Soochow University, Taipei, Taiwan
Presentation Documents
OBJECTIVES: Analyzing large administrative claims databases has become popular in medical research due to cost efficiency and data availability. However, these databases often lack detailed clinical and socioeconomic confounding variables (CVs). This limitation can bias treatment effects in observational studies as unobserved CVs cannot be controlled. This study developed a statistical method to adjust missing CVs and enhance testing power.
METHODS: We developed a two-stage-calibration zero-inflated Poisson (TSC-ZIP) model that accounts for excessive number of zeros. In stage 1, a ZIP model was fitted to a large dataset (LD) with the observed CVs. In stage 2, another ZIP was built based on the 2nd smaller external dataset containing missing CVs from LD. To mitigate the risk of overfitting and potential divergence of estimated parameters, propensity scores for stages 1 and 2 were calculated using the covariates available at each stage by fitting a multivariate logistic regression model. A series of simulations were performed to verify the performance of the TSC-ZIP vs. ZIP model.
RESULTS: First, we’ve mathematically proved that the regression coefficients derived from the above TSC-ZIP model were unbiased and consistent with their variances effectively reduced leading to improved power compared to a single ZIP model. Second, the simulation showed that under different level of assumed true treatment effect statistical powers of the TSC-ZIP model are 0.608, 0.826, and 0.904, respectively while those of the ZIP model are 0.460, 0.678, and 0.878, i.e. an average of 30% improvement. Third, a larger sample size in stage 1 led to greater power for the TSC-ZIP model, resulting in reduced variances. Fourth, the regression coefficients of the TSC-ZIP model remain unbiased even when the missing CVs have a stronger association with the outcome.
CONCLUSIONS: The TSC-ZIP model is demonstrated to be a reliable framework that effectively adjusts for missing CVs.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 12, S2 (December 2024)
Code
MSR101
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Confounding, Selection Bias Correction, Causal Inference, Electronic Medical & Health Records
Disease
No Additional Disease & Conditions/Specialized Treatment Areas