CHARACTERIZING ONCOLOGY PATIENT JOURNEYS AND HEALTH STATE TRANSITIONS USING A DATA-DRIVEN MARKOV TRANSITION MATRIX IN LARGE-SCALE ELECTRONIC HEALTH RECORDS
Author(s)
Youngwon Kim, PhD1, Wilson Lau, PhD1, Ehsan Alipour, MD, PhD1, Sihang Zeng, PhD Candidate2, Anand Oka, PhD1, Jay Nanduri, MTech, MBA1;
1Truveta, Bellevue, WA, USA, 2University of Washington, Seattle, WA, USA
1Truveta, Bellevue, WA, USA, 2University of Washington, Seattle, WA, USA
OBJECTIVES: Patient journeys for people with cancer are non-linear, with periods of stability, deterioration, and remission. Many EHR studies preselect features based on oncology-specific knowledge, potentially missing latent patterns in longitudinal data. Markov transition matrices(MTMs) offer a data-driven way to empirically define health states and estimate transition probabilities over time without prior feature selection. This study evaluates the descriptive and structural validity of MTMs derived from longitudinal EHR data to characterize condition persistence and comorbidity in cancer populations and support parameterization of state-transition models in health outcomes research.
METHODS: We analyzed EHR data from a 5% random sample of approximately 2 million real-world U.S. patients with cancer(N=104,810;43,296,642 clinical conditions). MTMs were constructed over a 0-24-month window following each patient’s first hospital visit to estimate transition probabilities P(St+1∣St) from sequential condition occurrences. To support stable estimation and interpretable state spaces, transitions were summarized among the 30 most frequent conditions. Persistence(self-transition) and cross-condition transitions(comorbidity) were evaluated against known clinical relationships.
RESULTS: The mean self-transition probability across the top 30 conditions was 0.20, with substantially higher persistence for chronic conditions such as atrial fibrillation(0.35), malignant tumor of breast(0.33), diabetes mellitus(0.22), and essential hypertension(0.21). Transition structure revealed clinically coherent clustering, with dense bidirectional transitions among cardiometabolic states(hypertension, hyperlipidemia, diabetes), and strong linkages among respiratory symptoms and diagnoses(dyspnea, asthma, chronic obstructive pulmonary disease). Cancer‑related diagnoses demonstrated stability through high self‑transition probabilities and few transitions to unrelated conditions. Transitions from anxiety and asthma to depressive episodes were strong, while reverse transitions were weak.
CONCLUSIONS: Data-driven MTMs provide an interpretable summary of condition persistence and comorbidity and empirically grounded transition parameters for state-transition models using self-supervised EHR data. The results align with expected clinical relationships, while highlighting the intersections between physical and emotional health. This framework supports transparent foundation for parameterizing state-transition models in oncology and other chronic disease populations.
METHODS: We analyzed EHR data from a 5% random sample of approximately 2 million real-world U.S. patients with cancer(N=104,810;43,296,642 clinical conditions). MTMs were constructed over a 0-24-month window following each patient’s first hospital visit to estimate transition probabilities P(St+1∣St) from sequential condition occurrences. To support stable estimation and interpretable state spaces, transitions were summarized among the 30 most frequent conditions. Persistence(self-transition) and cross-condition transitions(comorbidity) were evaluated against known clinical relationships.
RESULTS: The mean self-transition probability across the top 30 conditions was 0.20, with substantially higher persistence for chronic conditions such as atrial fibrillation(0.35), malignant tumor of breast(0.33), diabetes mellitus(0.22), and essential hypertension(0.21). Transition structure revealed clinically coherent clustering, with dense bidirectional transitions among cardiometabolic states(hypertension, hyperlipidemia, diabetes), and strong linkages among respiratory symptoms and diagnoses(dyspnea, asthma, chronic obstructive pulmonary disease). Cancer‑related diagnoses demonstrated stability through high self‑transition probabilities and few transitions to unrelated conditions. Transitions from anxiety and asthma to depressive episodes were strong, while reverse transitions were weak.
CONCLUSIONS: Data-driven MTMs provide an interpretable summary of condition persistence and comorbidity and empirically grounded transition parameters for state-transition models using self-supervised EHR data. The results align with expected clinical relationships, while highlighting the intersections between physical and emotional health. This framework supports transparent foundation for parameterizing state-transition models in oncology and other chronic disease populations.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR68
Topic
Methodological & Statistical Research
Disease
SDC: Oncology