Dealing With Missing Data in Health Economic Studies of Small to Moderate Size: How to Minimize Bias
Author(s)
Simon Lafrance, PT, PhD1, Simon LaRue, MSc2, Denis Talbot, PhD1, Rose Gagnon, PT, MSc1, Simon Berthelot, MD, MSc1, Jason Robert Guertin, PhD1.
1Université Laval, Quebec City, QC, Canada, 2CHU de Québec, Quebec City, QC, Canada.
1Université Laval, Quebec City, QC, Canada, 2CHU de Québec, Quebec City, QC, Canada.
OBJECTIVES: Missing data in health economic evaluations can introduce bias, particularly in studies with small sample size or high levels of missingness. Methods like complete case analysis (CCA) or multiple imputation using predictive mean matching (MI-PMM) are commonly used to address missing data. However, it remains unclear which approach is the most valid depending on sample size and data missingness. The aim is to estimate the error of various approaches for handling missing data in health economic studies of various sample size and data missingness.
METHODS:
1) Dataset simulation: four complete datasets (n = 80, 100, 200, 400) were generated to reflect a two-arm study design, including baseline covariates and follow-up data (6, 12, and 26 weeks). 2) Reference analysis: between-arm differences in costs and quality-adjusted life years (QALYs) were calculated for each dataset as the reference. 3) Missing data simulation: three levels of missingness (10%, 20%, and 40%) were introduced in cost and utility variables across all datasets, according to a missing completely at random mechanism. 4) Approaches: each scenario was analyzed using CCA and MI-PMM applied to the full cohort (MI-PMMcohort) and separately by arm (MI-PMMarm). 5) Performance analysis: between-arm differences in costs and QALY were calculated for each approach and compared to the reference analysis. 6) Repetition: steps 3-5 were repeated 1,000 times, and mean absolute and relative errors were calculated for each approach in each scenario.
RESULTS: Across the twelve scenarios, CCA consistently produced the highest error. On average, its relative error was more than twice that of the imputation approaches. Among the imputation approaches, MI-PMMcohort outperformed MI-PMMarm, with an average relative error that was 15% lower. Overall, error rates increased with smaller sample sizes and higher proportions of missing data.
CONCLUSIONS: The imputation approaches consistently outperformed CCA across all scenarios. MI-PMMcohort demonstrated slightly better performance than MI-PMMarm.
METHODS:
1) Dataset simulation: four complete datasets (n = 80, 100, 200, 400) were generated to reflect a two-arm study design, including baseline covariates and follow-up data (6, 12, and 26 weeks). 2) Reference analysis: between-arm differences in costs and quality-adjusted life years (QALYs) were calculated for each dataset as the reference. 3) Missing data simulation: three levels of missingness (10%, 20%, and 40%) were introduced in cost and utility variables across all datasets, according to a missing completely at random mechanism. 4) Approaches: each scenario was analyzed using CCA and MI-PMM applied to the full cohort (MI-PMMcohort) and separately by arm (MI-PMMarm). 5) Performance analysis: between-arm differences in costs and QALY were calculated for each approach and compared to the reference analysis. 6) Repetition: steps 3-5 were repeated 1,000 times, and mean absolute and relative errors were calculated for each approach in each scenario.
RESULTS: Across the twelve scenarios, CCA consistently produced the highest error. On average, its relative error was more than twice that of the imputation approaches. Among the imputation approaches, MI-PMMcohort outperformed MI-PMMarm, with an average relative error that was 15% lower. Overall, error rates increased with smaller sample sizes and higher proportions of missing data.
CONCLUSIONS: The imputation approaches consistently outperformed CCA across all scenarios. MI-PMMcohort demonstrated slightly better performance than MI-PMMarm.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
MSR66
Topic
Clinical Outcomes, Economic Evaluation, Methodological & Statistical Research
Topic Subcategory
Missing Data