Using a Large Language Model (LLM) for Data Extraction of Studies: Learnings From a Targeted Literature Review (TLR) in Non-Small Cell Lung Cancer (NSCLC)

Author(s)

Mariana Farraia, PhD¹, Anuja Pandey, MD², Eugenia Priedane, MA³, Allie Cichewicz, MSc⁴, Caroline von Wilamowitz-Moellendorff, PhD².
¹Thermo Fischer Scientific, Ede, Netherlands, ²Thermo Fisher Scientific, London, United Kingdom, ³HEOR EU and New Markets, BeOne Medicines (UK), Ltd., London, United Kingdom, ⁴Thermo Fisher Scientific, Waltham, MA, USA.

OBJECTIVES: Data from published literature is accurately extracted by LLMs, reducing the human effort for literature reviews. However, underlying challenges faced by evidence synthesis experts in addressing complex research questions, such as those involving mixed populations and subgroups, are not fully understood. This study aimed to evaluate GPT-4-assisted extraction of clinical outcomes in a NSCLC subpopulation and highlight the learnings/challenges from its application.
METHODS: A TLR assessed the effectiveness/safety of treatments for NSCLC with programmed death-ligand-1 (PD-L1) expression ≥50%. Data from sixteen publications covering ten observational studies were extracted using a proprietary LLM. Zero-shot prompts were developed, tested, and optimised using one publication, then applied to all publications. The LLM outputs were copied into a pre-defined data extraction table to capture study/patient characteristics, and effectiveness/safety outcomes, including subgroup data (e.g., PD-L1, sex, age). Extractions were validated by an experienced investigator, and the main challenges were noted.
RESULTS: Two main challenges were identified; difficulties in isolating data for subpopulations (PD-L1 ≥50%) in mixed population studies, and incorrect or missing data extracted by LLM for subgroups. Detailed validation of results, additional extraction and re-validation of subgroup data, and correction of formatting issues resulted in time expenditure equal to or greater than validating manual extractions. The lack of standardisation in reporting observational studies also contributed to errors in LLM-assisted extraction. The LLM also did not recognise related publications reporting on the same studies.
CONCLUSIONS: Using LLMs for data extraction for nuanced populations may not yield significant time savings due to increased validation efforts. Experienced, human supervision and validation remain crucial for accuracy and completeness. Reviews must account for time spent on prompt optimisation to capture relevant subpopulations across publications. Different prompts for subpopulations and related publications are recommended, but prompt development time should be considered. Future work should explore LLM capabilities to better handle complex data.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

SA101

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)