VALIDATION OF AN AGENTIC LARGE LANGUAGE MODEL (LLM) SYSTEM IN THE EXTRACTION STAGE OF A REAL-TIME AI-ASSISTED LIVING SYSTEMATIC LITERATURE REVIEW (REAL-SLR): A SOLUTION TO INSTANT AND EASY ACCESS TO CLINICAL TRIAL DATA (CTD)

Author(s)

Rozee Liu, MSc1, Rhiannon Campden, PhD1, Eddie Xiaole Liu, BSc2, Triston grayston, BSc3, Oscar Correa, BSc3, Anna Forsythe, MBA, MSc, PharmD1;
1Oncoscope-AI, Miami, FL, USA, 2Independent, Toronto, ON, Canada, 3Eviviz Inc., Vancouver, BC, Canada
OBJECTIVES: To address challenges faced by health economics and outcomes research (HEOR) professionals in staying current with clinical trial data (CTD) and the time-intensive nature of de novo systematic literature reviews (SLRs), we evaluated the feasibility of using an agentic large language model (LLM) to develop a REAL-SLR for CTD, assessing extraction accuracy and potential time savings.
METHODS: An agentic large language model (LLM) system was developed to autonomously generate clinical trial data extraction annotations without human input. The system combined multiple LLMs (OpenAI GPT-5 and GPT-4.1, Gemini 2.5 Pro, and Claude Sonnet 4.5) in a matrix of processes designed to emulate trained human reviewers by following a standardized annotation manual, decomposing tasks into subtasks, and recording reasoning for traceability. A retrieval-augmented generation (RAG) architecture with semantic embeddings, parallel population, intervention/comparator, outcome, study design (PICOS)-aligned extraction chains, and study-type-adaptive prompting was implemented. Annotations were generated for 32 extraction variables, and accuracy was evaluated against human annotations in publications for prostate (PC) and breast cancer (BC).
RESULTS: Our agentic LLM system generated annotations for 32 extraction variables for 3,098 (1,200 PC, 1,898 BC) publications. Twelve out of 32 variables achieved above 90% accuracy, 50% of which were above 95%. As an example, overall survival (OS) extraction includes 3 variables: OS measure, median, landmark. Our agentic system strictly adhered to the specified format: OS measure (“OS”), median (months; hazard ratio; confidence interval; p-value), landmark (percentage, p-value). The accuracy of 3 variables were 93.62%, 83.6%, and 90.9%.
CONCLUSIONS: The agentic LLM system with a RAG architecture demonstrated high accuracy in extracting publications. These findings suggest the system can enable real-time clinical data generation, supporting faster evidence development for HEOR decision-making and potentially improving patient access.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR221

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

SDC: Oncology

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×