VALIDATION OF AN AGENTIC LARGE LANGUAGE MODEL (LLM) SYSTEM IN THE EXTRACTION STAGE OF A REAL-TIME AI-ASSISTED LIVING SYSTEMATIC LITERATURE REVIEW (REAL-SLR): A SOLUTION TO INSTANT AND EASY ACCESS TO CLINICAL TRIAL DATA (CTD)

Author(s)

Rozee Liu, MSc¹, Rhiannon Campden, PhD¹, Eddie Xiaole Liu, BSc², Triston grayston, BSc³, Oscar Correa, BSc³, Anna Forsythe, MBA, MSc, PharmD¹;
¹Oncoscope-AI, Miami, FL, USA, ²Independent, Toronto, ON, Canada, ³Eviviz Inc., Vancouver, BC, Canada

OBJECTIVES: To address challenges faced by health economics and outcomes research (HEOR) professionals in staying current with clinical trial data (CTD) and the time-intensive nature of de novo systematic literature reviews (SLRs), we evaluated the feasibility of using an agentic large language model (LLM) to develop a REAL-SLR for CTD, assessing extraction accuracy and potential time savings.
METHODS: An agentic large language model (LLM) system was developed to autonomously generate clinical trial data extraction annotations without human input. The system combined multiple LLMs (OpenAI GPT-5 and GPT-4.1, Gemini 2.5 Pro, and Claude Sonnet 4.5) in a matrix of processes designed to emulate trained human reviewers by following a standardized annotation manual, decomposing tasks into subtasks, and recording reasoning for traceability. A retrieval-augmented generation (RAG) architecture with semantic embeddings, parallel population, intervention/comparator, outcome, study design (PICOS)-aligned extraction chains, and study-type-adaptive prompting was implemented. Annotations were generated for 32 extraction variables, and accuracy was evaluated against human annotations in publications for prostate (PC) and breast cancer (BC).
RESULTS: Our agentic LLM system generated annotations for 32 extraction variables for 3,098 (1,200 PC, 1,898 BC) publications. Twelve out of 32 variables achieved above 90% accuracy, 50% of which were above 95%. As an example, overall survival (OS) extraction includes 3 variables: OS measure, median, landmark. Our agentic system strictly adhered to the specified format: OS measure (“OS”), median (months; hazard ratio; confidence interval; p-value), landmark (percentage, p-value). The accuracy of 3 variables were 93.62%, 83.6%, and 90.9%.
CONCLUSIONS: The agentic LLM system with a RAG architecture demonstrated high accuracy in extracting publications. These findings suggest the system can enable real-time clinical data generation, supporting faster evidence development for HEOR decision-making and potentially improving patient access.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR221

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

SDC: Oncology

Presentation (CTI)