ZERO-SHOT LUNG CANCER RISK PREDICTION?FROM?LONGITUDINAL?ELECTRONIC?HEALTH RECORDS?WITH CHAIN-OF-AGENTS?FRAMEWORK
Author(s)
Sihang Zeng, BS1, Youngwon Kim, PhD1, Wilson Lau, PhD1, Ehsan Alipour, MD, PhD1, Ruth Etzioni, PhD2, Meliha Yetisgen, PhD3, Anand Oka, PhD1, Jay Nanduri, MBA1.
1Truveta, Bellevue, WA, USA, 2Fred Hutch Cancer Center, Seattle, WA, USA, 3University of Washington, Seattle, WA, USA.
1Truveta, Bellevue, WA, USA, 2Fred Hutch Cancer Center, Seattle, WA, USA, 3University of Washington, Seattle, WA, USA.
Presentation Documents
OBJECTIVES: Early identification of individuals at higher risk for lung cancer can improve outcomes and help target screening resources. We evaluate whether a large language model (LLM)-based chain-of-agents (CoA) framework can estimate 1-year lung cancer risk directly from raw longitudinal electronic health record (EHR) data, reducing the need for data cleaning, feature engineering, and task-specific model training required by traditional machine learning (ML) models.
METHODS: Using Truveta Data (de-identified EHR for 120 million patients from leading US health systems), we identified lung cancer cases with clinician-curated diagnostic codes and randomly sampled a test cohort of 500 cases and 125,000 controls. For each patient, all EHR history prior to one year before diagnosis (or index date) was used. The CoA framework applied sequential LLM agents to summarize key clinical events from chronological EHR segments and aggregated a consolidated risk profile to predict a 1-year lung cancer risk score from 1 to 10. We compared CoA performance with common ML models like XGBoost, as well as a single-agent LLM baseline.
RESULTS: CoA based on GPT-4.1-mini achieved strong discrimination (AUROC 0.871; 95% CI: 0.855-0.885). Using a threshold chosen to balance sensitivity and specificity, CoA achieved NPV 0.999, sensitivity 0.772, specificity 0.825, and PPV 0.017 in this low-incidence cohort. Performance was comparable to, or slightly lower than, that of trained ML models, but was obtained without feature engineering and model training. Further evaluation showed that CoA produced more complete and temporally coherent clinical reasoning than the single-agent LLM, aligned well with clinical knowledge.
CONCLUSIONS: The zero-shot LLM-based CoA framework can predict lung cancer risk directly from heterogeneous real-world EHR, with clinically meaningful reasoning and performance comparable to ML models, while eliminating the costly data pre-processing and training. This approach may lower implementation barriers and support scalable deployment of early detection tools to improve lung cancer outcomes.
METHODS: Using Truveta Data (de-identified EHR for 120 million patients from leading US health systems), we identified lung cancer cases with clinician-curated diagnostic codes and randomly sampled a test cohort of 500 cases and 125,000 controls. For each patient, all EHR history prior to one year before diagnosis (or index date) was used. The CoA framework applied sequential LLM agents to summarize key clinical events from chronological EHR segments and aggregated a consolidated risk profile to predict a 1-year lung cancer risk score from 1 to 10. We compared CoA performance with common ML models like XGBoost, as well as a single-agent LLM baseline.
RESULTS: CoA based on GPT-4.1-mini achieved strong discrimination (AUROC 0.871; 95% CI: 0.855-0.885). Using a threshold chosen to balance sensitivity and specificity, CoA achieved NPV 0.999, sensitivity 0.772, specificity 0.825, and PPV 0.017 in this low-incidence cohort. Performance was comparable to, or slightly lower than, that of trained ML models, but was obtained without feature engineering and model training. Further evaluation showed that CoA produced more complete and temporally coherent clinical reasoning than the single-agent LLM, aligned well with clinical knowledge.
CONCLUSIONS: The zero-shot LLM-based CoA framework can predict lung cancer risk directly from heterogeneous real-world EHR, with clinically meaningful reasoning and performance comparable to ML models, while eliminating the costly data pre-processing and training. This approach may lower implementation barriers and support scalable deployment of early detection tools to improve lung cancer outcomes.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR67
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
SDC: Oncology