Performance Assessment and Validation of Real-World Response Data Generated Using a Deep Learning-Based Natural Language Processing Model Across Multiple Solid Tumors
Author(s)
Kelly Magee, MS, RN, Qianyu Yuan, PhD, Auriane Blarre, MEng, Aaron B. Cohen, MD, MSCE, Aaron Dolor, PhD, Konstantin Krismer, PhD, Tori Williams, BA, Qianyi Zhang, MS;
Flatiron Health, New York, NY, USA
Flatiron Health, New York, NY, USA
OBJECTIVES: This study describes the reliability, completeness, and internal validity of a novel machine learning (ML)-generated real-world response (rwR) approach.
METHODS: This study used the nationwide, Flatiron Health electronic health record (EHR)-derived, de-identified database. A deep learning-based, natural language processing model extracted clinicians’ documentation of changes in disease burden (complete response [CR], partial response [PR], stable disease, progressive disease, or unknown) at imaging timepoints. Data from 18 treatment and/or biomarker-defined cohorts across 7 solid tumors were used to train the model and test the correlation between human-abstracted and ML-extracted real-world response rate (rwRR). In 15 cohorts of common solid tumors, the proportion of treated patients with at least 1 assessment and the time to first, second, third, and median number of assessments for first (1L), second (2L), and third lines (3L) of therapies, were examined. Additionally, real-world overall survival (rwOS) was compared for responders (ever achieved CR or PR) versus non-responders (never achieved CR or PR) for the most frequent regimens in 1L to 3L for each disease (with ≥30 patients).
RESULTS: Within the test cohort (n = 4047), the correlation between human-abstracted and ML-extracted rwRR was r = 0.86. The solid tumor cohorts included 3406-129 807 treated patients. 57.8%-80.6% of patients had at least 1 assessment, with a median of 1-3 assessments within 1L, 2L, and 3L. Median times to first, second, and third assessments for 1L-3L were 1.9-4.4, 3.8-8.7, and 5.7-12.9 months, respectively. Across all most frequent 1L-3L regimens for each disease, responders for each cohort had significantly longer survival compared to non-responders (P < .05).
CONCLUSIONS: This study establishes the performance and validation of a novel ML approach for capturing rwR data from EHRs; supporting the efficient and reliable generation of valuable outcome data across large cohorts.
METHODS: This study used the nationwide, Flatiron Health electronic health record (EHR)-derived, de-identified database. A deep learning-based, natural language processing model extracted clinicians’ documentation of changes in disease burden (complete response [CR], partial response [PR], stable disease, progressive disease, or unknown) at imaging timepoints. Data from 18 treatment and/or biomarker-defined cohorts across 7 solid tumors were used to train the model and test the correlation between human-abstracted and ML-extracted real-world response rate (rwRR). In 15 cohorts of common solid tumors, the proportion of treated patients with at least 1 assessment and the time to first, second, third, and median number of assessments for first (1L), second (2L), and third lines (3L) of therapies, were examined. Additionally, real-world overall survival (rwOS) was compared for responders (ever achieved CR or PR) versus non-responders (never achieved CR or PR) for the most frequent regimens in 1L to 3L for each disease (with ≥30 patients).
RESULTS: Within the test cohort (n = 4047), the correlation between human-abstracted and ML-extracted rwRR was r = 0.86. The solid tumor cohorts included 3406-129 807 treated patients. 57.8%-80.6% of patients had at least 1 assessment, with a median of 1-3 assessments within 1L, 2L, and 3L. Median times to first, second, and third assessments for 1L-3L were 1.9-4.4, 3.8-8.7, and 5.7-12.9 months, respectively. Across all most frequent 1L-3L regimens for each disease, responders for each cohort had significantly longer survival compared to non-responders (P < .05).
CONCLUSIONS: This study establishes the performance and validation of a novel ML approach for capturing rwR data from EHRs; supporting the efficient and reliable generation of valuable outcome data across large cohorts.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
MSR142
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, SDC: Oncology