Generative A-Powered Extraction of Immune-Related Adverse Events From Oncology Case Reports
Author(s)
Fernando Andres Martin, MS1, Manuel Cossio, MPhil, MS2.
1Cytel, Boston, MA, USA, 2Director, Artificial Intelligence Lead, Cytel, Dubendorf, Switzerland.
1Cytel, Boston, MA, USA, 2Director, Artificial Intelligence Lead, Cytel, Dubendorf, Switzerland.
OBJECTIVES: This study assesses the feasibility of using an open AI large language model (LLM) to automatically construct a database of immune-related adverse events (irAEs) from unstructured case reports.
METHODS: Open-access pembrolizumab case reports were retrieved from Google Scholar using the terms “oncology” and “pembrolizumab.” A prompt was designed and iteratively optimized across six cycles on an initial report to extract eight fields: article (author, year), patient age, sex, ethnicity, baseline weight, initial findings, oncological treatment, and reported irAEs. Extraction accuracy was assessed against manual, human-in-the-loop annotations. Concordance was scored as follows: 2 points for full concordance, 1 point for partial concordance, and 0 points for hallucinated or incorrect content.
RESULTS: After uploading the 38 reports as a knowledge source, the LLM generated the complete extraction table in 18 s. Mean accuracy across all fields was 0.78. Field-specific performance varied: oncological treatment 0.97; sex, ethnicity, and weight 0.86; author + year 0.84; age 0.78; and reported irAEs 0.73. The lowest accuracy corresponded to the field with the highest contextual complexity.
CONCLUSIONS: The proposed generative-AI workflow rapidly produced a structured irAE database from complex narrative reports in under 20 s with acceptable overall accuracy. Refinement of prompt engineering, particularly for context-rich variables, is warranted to achieve fully reliable automated curation.
METHODS: Open-access pembrolizumab case reports were retrieved from Google Scholar using the terms “oncology” and “pembrolizumab.” A prompt was designed and iteratively optimized across six cycles on an initial report to extract eight fields: article (author, year), patient age, sex, ethnicity, baseline weight, initial findings, oncological treatment, and reported irAEs. Extraction accuracy was assessed against manual, human-in-the-loop annotations. Concordance was scored as follows: 2 points for full concordance, 1 point for partial concordance, and 0 points for hallucinated or incorrect content.
RESULTS: After uploading the 38 reports as a knowledge source, the LLM generated the complete extraction table in 18 s. Mean accuracy across all fields was 0.78. Field-specific performance varied: oncological treatment 0.97; sex, ethnicity, and weight 0.86; author + year 0.84; age 0.78; and reported irAEs 0.73. The lowest accuracy corresponded to the field with the highest contextual complexity.
CONCLUSIONS: The proposed generative-AI workflow rapidly produced a structured irAE database from complex narrative reports in under 20 s with acceptable overall accuracy. Refinement of prompt engineering, particularly for context-rich variables, is warranted to achieve fully reliable automated curation.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
MT21
Topic
Clinical Outcomes, Medical Technologies, Methodological & Statistical Research
Topic Subcategory
Digital Health
Disease
Oncology