Methods for Recovering Missing Longitudinal Biomarker Values from Electronic Health Record Data
Author(s)
Rodriguez P1, Heagerty PJ1, Hahn EE2, Haupt EC2, Bansal A1
1University of Washington, Seattle, WA, USA, 2Southern California Permanente Medical Group, Pasadena, CA, USA
OBJECTIVES: Longitudinal biomarker values in EHR data measure patient health and are typically correlated with important adverse events. These data can be valuable in healthcare research (e.g. risk prediction) but are often subject to a high degree of missingness. We develop an imputation approach that leverages and preserves the relationship of the biomarker with adverse events. We demonstrate our approach using carcinoembryonic antigen (CEA) biomarker values in the setting of colorectal cancer (CRC) surveillance, where elevated CEA is associated with recurrence.
METHODS: We used real-world EHR data on patients diagnosed with CRC between 2008-2013 and monitored for recurrence after primary treatment (n=3,156). Patients were followed for up to 5 years, unless recurrence, informative censoring (death, hospice initiation, second primary cancer) or random censoring (membership end) occurred first. To impute missing CEA at 3-month, guideline-consistent intervals, we used a pattern mixture model (PMM) that modeled CEA trajectories as a function of recurrence or censoring event and its timing, since patients with earlier recurrence are expected to have steeper slopes than those with later recurrence. Random measurement error was added to fitted PMM values. We validated our approach by summarizing imputed biomarker values among high-risk (recurrence in 1-2 years), medium-risk (recurrence in 3-5 years), low-risk (no recurrence by 5 years), informatively censored, and randomly censored groups.
RESULTS: Imputed trajectories preserved the association between CEA and recurrence risk. Baseline CEA and slope were highest in the high risk group (n=250; baseline=1.15; slope=0.12), followed by medium risk (n=121; baseline=0.78; slope=0.05) and low risk (n=2070; baseline=0.64; slope=0.00). CEA was stable for those with random censoring events (n=280; slope=0.00), but increasing for those with informative censoring events (n=435; slope=0.04).
CONCLUSIONS: A PMM approach that imputes longitudinal biomarker trajectories, while preserving underlying relationships between biomarker values and adverse events, is feasible and can be applied to several disease areas.
Conference/Value in Health Info
Value in Health, Volume 25, Issue 6, S1 (June 2022)
Code
MSR37
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Electronic Medical & Health Records, Missing Data
Disease
Personalized and Precision Medicine