Methods for Recovering Missing Longitudinal Biomarker Values from Electronic Health Record Data

Author(s)

Rodriguez P1, Heagerty PJ1, Hahn EE2, Haupt EC2, Bansal A1
1University of Washington, Seattle, WA, USA, 2Southern California Permanente Medical Group, Pasadena, CA, USA

OBJECTIVES: Longitudinal biomarker values in EHR data measure patient health and are typically correlated with important adverse events. These data can be valuable in healthcare research (e.g. risk prediction) but are often subject to a high degree of missingness. We develop an imputation approach that leverages and preserves the relationship of the biomarker with adverse events. We demonstrate our approach using carcinoembryonic antigen (CEA) biomarker values in the setting of colorectal cancer (CRC) surveillance, where elevated CEA is associated with recurrence.

METHODS: We used real-world EHR data on patients diagnosed with CRC between 2008-2013 and monitored for recurrence after primary treatment (n=3,156). Patients were followed for up to 5 years, unless recurrence, informative censoring (death, hospice initiation, second primary cancer) or random censoring (membership end) occurred first. To impute missing CEA at 3-month, guideline-consistent intervals, we used a pattern mixture model (PMM) that modeled CEA trajectories as a function of recurrence or censoring event and its timing, since patients with earlier recurrence are expected to have steeper slopes than those with later recurrence. Random measurement error was added to fitted PMM values. We validated our approach by summarizing imputed biomarker values among high-risk (recurrence in 1-2 years), medium-risk (recurrence in 3-5 years), low-risk (no recurrence by 5 years), informatively censored, and randomly censored groups.

RESULTS: Imputed trajectories preserved the association between CEA and recurrence risk. Baseline CEA and slope were highest in the high risk group (n=250; baseline=1.15; slope=0.12), followed by medium risk (n=121; baseline=0.78; slope=0.05) and low risk (n=2070; baseline=0.64; slope=0.00). CEA was stable for those with random censoring events (n=280; slope=0.00), but increasing for those with informative censoring events (n=435; slope=0.04).

CONCLUSIONS: A PMM approach that imputes longitudinal biomarker trajectories, while preserving underlying relationships between biomarker values and adverse events, is feasible and can be applied to several disease areas.

Conference/Value in Health Info

2022-05, ISPOR 2022, Washington, DC, USA

Value in Health, Volume 25, Issue 6, S1 (June 2022)

Code

MSR37

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Electronic Medical & Health Records, Missing Data

Disease

Personalized and Precision Medicine

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×