IMPUTING MISSING DATA IN OBSERVATIONAL STUDIES: WHAT METHODS ARE BETTER?

Author(s)

Vincent McCarty, MSc1, John S. Sampalis, PhD2, Neil R. Brett, PhD2, Marielle Bassel, BA2;
1Montréal, QC, Canada, 2Thermo Fisher Scientific, Montreal, QC, Canada
OBJECTIVES: Missing outcome data are common in longitudinal observational studies and can introduce bias if not appropriately addressed. While several methods can be used to impute missing data, there is uncertainty with respect to potential bias introduced by replacing missing data with each method, particularly in longitudinal disease activity measures.
METHODS: We used data from 286 Rheumatoid Arthritis patients in an observational study with complete longitudinal assessments for Clinical Disease Activity Index (CDAI), Simple Disease Activity Index (SDAI), and Disease Activity Score 28-C-Reactive Protein (DAS-28 CRP) over 18 months of follow-up assessed at 0,3,6,9,12 and 18 months (1,716 observations). Across all values except baseline, 10% of data points were randomly replaced with missing values. The missing data were then replaced using four commonly applied imputation methods: Last Observation carried Forward (LOCF), Expectation-Maximization (EM), Linear Regression (REG) and Multiple Imputations (MI). Imputed datasets were compared with the original complete dataset using descriptive statistics and one sample t-tests to assess results.
RESULTS: Across all 1,716 observations, only LOCF-based imputed means for all outcomes were significantly different (p<0.001) from the original data. The relative difference between imputed and original data ranged from 3.2% to 0.7% for LOCF and 0.7% to 0.01% for the other methods. However, with later follow-up time points (12 and 18 months) the difference between the four imputation methods became less profound. Overall, MI resulted in the smallest differences from the original data.
CONCLUSIONS: In longitudinal observational studies, MI based imputations produce less biased results while highest bias is expected for LOCF. These findings support the use of MI for handling missing outcome data in real-world studies and suggest that stratifying imputation approaches by follow-up periods may further mitigate bias over extended observation periods.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR211

Topic

Methodological & Statistical Research

Topic Subcategory

Missing Data

Disease

SDC: Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal)

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×