Assessing Bias in LLM-Extracted Real-World Data: A Health Equity Analysis of Access to Care and Outcomes in Metastatic Breast Cancer

Author(s)

Olive M. Mbah, PhD, MHS, Gene G. Ho, MPH, Catherine Keane, BSN, MSN, Qianyu Yuan, PhD, Cleo Ryals, PhD.
Flatiron Health, New York, NY, USA.
OBJECTIVES: As Large Language Models (LLMs) are increasingly used to extract clinical data from electronic health records, assessing model fairness is critical. We evaluated the reproducibility of scientific conclusions when using LLM-extracted versus human-abstracted data to examine inequities in biomarker testing and overall survival among patients with HR+/HER2- metastatic breast cancer (mBC).
METHODS: We used the US-based Flatiron Health Research Database to select women diagnosed with HR+/HER2- mBC between January 2018 and March 2025. LLMs and human abstraction were used to curate clinical variables from unstructured clinical documents. Cox proportional hazard models were used to assess associations between race/ethnicity, social determinants of health (SDOH) and: (1) biomarker testing, and (2) overall survival. We compared hazard ratios (HR) and overlap of 95% CIs across cohorts.
RESULTS: Patients in the LLM-extracted (N = 25,055) and human-abstracted cohorts (N = 8530) exhibited similar sociodemographic and clinical characteristics. Across both cohorts, Latinx, Black, and Asian patients were generally less likely to undergo biomarker testing than White patients (e.g., Black versus White, LLM: HR=0.89; 95%CI:0.84-0.93 versus human: HR=0.91; 95%CI:0.83-0.99). SDOH estimates were also similar across cohorts (e.g., patients from the most affluent neighborhoods were more likely to receive biomarker testing than patients from the least affluent neighborhoods [LLM: HR=1.23; 95%CI:1.17-1.29 versus human: HR=1.28; 95%CI:1.17-1.40]. Survival estimates were similar across cohorts, with worse survival among Black patients [LLM: HR=1.27; 95%CI:1.19-1.36 versus human: HR=1.34; 95%CI:1.21-1.48] and those living in low-income [highest versus lowest income: LLM: HR=0.75; 95%CI:0.70-0.80 versus human: HR=0.77; 95%CI:0.69-0.86], rural [LLM: HR=1.14; 95%CI:1.08-1.21 versus human: HR=1.13; 95%CI:1.02-1.26], and predominantly Black neighborhoods (LLM: HR=1.33; 95%CI:1.24-1.43 versus human: HR=1.38; 95%CI:1.21-1.57).
CONCLUSIONS: Health equity analyses using LLM-derived data mirrored findings from analyses using abstracted data, indicating model fairness and appropriateness for use in equity-focused cancer research. With appropriate validation, LLMs offer a scalable and algorithmically fair alternative to manual abstraction.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

MSR39

Topic

Methodological & Statistical Research, Real World Data & Information Systems

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Confounding, Selection Bias Correction, Causal Inference

Disease

Oncology

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×