VALIDATION OF AN AGENTIC LARGE LANGUAGE MODEL (LLM) SYSTEM IN THE REVIEW STAGE OF A REAL-TIME AI-ASSISTED LIVING SYSTEMATIC LITERATURE REVIEW (REAL-SLR): A SOLUTION TO INSTANT AND EASY ACCESS TO CLINICAL TRIAL DATA (CTD)

Author(s)

Rozee Liu, MSc1, Rhiannon Campden, PhD1, Eddie Xiaole Liu, BSc2, Oscar Correa, BSc3, Anna Forsythe, MBA, MSc, PharmD1.
1Oncoscope-AI, Miami, FL, USA, 2Independent, Toronto, ON, Canada, 3Eviviz Inc., Vancouver, BC, Canada.
OBJECTIVES: Health economics and outcomes research (HEOR) professionals often struggle to stay updated on the latest published CTD. Traditionally, building de novo SLRs requires extensive time and manual effort. To address these challenges, we explored assembling a REAL-SLR of CTD using an agentic LLM system to generate review annotations and evaluated the system’s accuracy and associated time savings.
METHODS: Agentic LLM systems are autonomous systems where multiple LLMs maintain control over how they accomplish tasks with no human input or supervised training. Our system used two OpenAI LLMs (GPT-5, GPT4.1), Gemini 2.5Pro, and Claude Sonnet 4.5 in a matrix of processes, that emulate trained human experts by following an annotation manual, subdividing complex processes into smaller subtasks, and documenting its reasoning for traceable results. Annotations were created for 4 review variables independently (population, intervention/comparator, outcome, study design - PICOS). Accuracy of review was evaluated on publications in four cancers: non-small cell lung cancer (NSCLC), prostate cancer (PC), breast cancer (BC), bladder cancer (BldC) and multiple myeloma (MM) compared to human results.
RESULTS: Our agentic LLM system generated annotations for 4 review variables for 61,069 publications (17,085 NSCLC, 15,114 PC, 21,904 BC, 9,719 BldC, 6,966 MM) publications. Accuracy ranged from 93.73% to 99.82%. The sensitivity and specificity ranged from 93.86% to 99.58%, and 86.08% to 98.15%, respectively. The false negative rates were 0.34%, 0.33%, 0.88%, 0.00% for the 4 PICOS variables, with a 0.30% cumulative rate. Our system completed review in 33.93 hours, compared to an estimated 763.36 hours by trained human researchers, resulting in 95.56% time savings.
CONCLUSIONS: Our agentic LLM system can accurately review publications with performance superior to human experts. This level of accuracy highlights our system’s potential to deliver real-time clinical data, empowering HEOR professionals with expedited evidence generation, with the hopes of ultimately improving patient access.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR18

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

SDC: Oncology

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×