CAN A GENERATIVE PATIENT JOURNEY FOUNDATION MODEL ALLEVIATE THE BURDEN OF CANCER SCREENING?
Author(s)
Wilson Lau, PhD1, Ehsan Alipour, PhD1, Youngwon Kim, PhD1, Sihang Zeng, B.Eng1, Anand Oka, PhD1, Jay Nanduri2;
1Truveta, Bellevue, WA, USA, 2Truveta, issaquah, USA
1Truveta, Bellevue, WA, USA, 2Truveta, issaquah, USA
OBJECTIVES: The average cost of cancer screening, such as mammogram or colonoscopy, can range from hundreds to over a thousand dollars. This study explores the potential of building a cancer foundation model based on Generative Pre-trained Transformers (GPT) to predict future outcomes for patients. We assess the prediction accuracy and feasibility of leveraging the foundation model to inform when screening can be prioritized, thereby reducing the associated burden of unnecessary procedures.
METHODS: In this study, we extended the GPT architecture and pre-trained it with a subset of Truveta Data containing the electronic health record (EHR) from the journeys of 1.4 million de-identified patients diagnosed with 4 types of cancers (lung, breast, colorectal, prostate) across the United States. For validation, 500 patients were randomly sampled from our test data, in which each type of cancer diagnosis constituted 20%-29% of the samples. We used the model to generate synthetic future patient journeys for the selected patients and compared the predicted outcomes with the actual cancer diagnoses within one year.
RESULTS: The model achieved sensitivities of 57% (lung), 71% (breast), 38% (colorectal), and 79% (prostate), with corresponding positive predictive values (PPV) of 86%, 83%, 85%, and 67%. More importantly, it demonstrated high specificities of 97% (lung), 94% (breast), 98% (colorectal), 89% (prostate), with corresponding negative predictive values (NPV) of 85%, 89%, 86%, 94%.
CONCLUSIONS: The high specificities and NPV indicate the feasibility of applying generative foundation model pre-trained with EHR data to predict negative cancer outcomes with high accuracy. Since the percentage of screening tests leading to positive cancer diagnosis is relatively low, the projected negative predictive outcomes offer valuable signals for clinicians, which they can use to complement their expert assessment to avoid unnecessary screening and subsequently reduce the burden of screening costs.
METHODS: In this study, we extended the GPT architecture and pre-trained it with a subset of Truveta Data containing the electronic health record (EHR) from the journeys of 1.4 million de-identified patients diagnosed with 4 types of cancers (lung, breast, colorectal, prostate) across the United States. For validation, 500 patients were randomly sampled from our test data, in which each type of cancer diagnosis constituted 20%-29% of the samples. We used the model to generate synthetic future patient journeys for the selected patients and compared the predicted outcomes with the actual cancer diagnoses within one year.
RESULTS: The model achieved sensitivities of 57% (lung), 71% (breast), 38% (colorectal), and 79% (prostate), with corresponding positive predictive values (PPV) of 86%, 83%, 85%, and 67%. More importantly, it demonstrated high specificities of 97% (lung), 94% (breast), 98% (colorectal), 89% (prostate), with corresponding negative predictive values (NPV) of 85%, 89%, 86%, 94%.
CONCLUSIONS: The high specificities and NPV indicate the feasibility of applying generative foundation model pre-trained with EHR data to predict negative cancer outcomes with high accuracy. Since the percentage of screening tests leading to positive cancer diagnosis is relatively low, the projected negative predictive outcomes offer valuable signals for clinicians, which they can use to complement their expert assessment to avoid unnecessary screening and subsequently reduce the burden of screening costs.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR172
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, SDC: Oncology