ADAPTING AGENTIC LARGE LANGUAGE MODELS (ALLM) TRAINED IN SOLID TUMORS FOR SYSTEMATIC LITERATURE REVIEW (SLR) IN HEMATOLOGICAL MALIGNANCIES: VALIDATION IN MULTIPLE MYELOMA (MM) AND CHRONIC LYMPHOCYTIC LEUKEMIA (CLL)
Author(s)
Rhiannon Campden, PhD1, Rozee Liu, MSc1, Eddie Xiaole Liu, BSc2, Anna Forsythe, MBA, MSc, PharmD1;
1Oncoscope-AI, Miami, FL, USA, 2Independent, Toronto, ON, Canada
1Oncoscope-AI, Miami, FL, USA, 2Independent, Toronto, ON, Canada
OBJECTIVES: We have previously published data on aLLM systems demonstrating strong performance in automating systematic literature reviews (SLRs) in solid tumors. However, extending these systems to hematological malignancies presents distinct challenges, including differences in therapeutic classes, endpoints, disease definitions, and study designs. This study evaluates the adaptation and validation of a Real-time AI-assisted Living SLR (REAL-SLR) system, originally developed for solid tumors, to robustly identify and extract clinical trial evidence in MM and CLL.
METHODS: Our aLLM system comprises multiple autonomous LLMs operating without direct supervision, including GPT-5, GPT-4.1, Gemini 2.5 Pro, and Claude Sonnet 4.5, designed to emulate trained human reviewers. Models were adapted using hematology-specific treatment guidelines and annotation manuals aligned with PRISMA and Cochrane standards and structured around the Population, Intervention/Comparator, Outcomes, and Study Design (PICOS) framework. Inclusion and exclusion decisions were recorded independently for each PICOS element and benchmarked against expert human screening. An iterative refinement process was applied to the annotation manual until >95% accuracy performance thresholds were achieved.
RESULTS: In MM, the aLLM reviewed 800 abstracts, achieving an initial accuracy of 93.7% with a false negative rate of 3.5%. Following targeted refinement of instructions addressing hematology-specific interventions, outcomes, and study designs, final accuracy increased to 97.4% (Population 98.3%, Intervention/Comparator 97.3%, Outcomes 97.5%, Study Design 96.9%), exceeding single human reviewer performance. The overall false negative rate was <0.7%, below the predefined 1% threshold. In CLL, evaluation of 298 abstracts yielded a final overall accuracy of 96.6% and a false negative rate of 0.34% after initial adaptation.
CONCLUSIONS: Agentic LLM systems originally trained in solid tumors can be successfully adapted to hematological malignancies through disease-specific instruction and governance. This approach enables accurate, scalable, and real-time SLRs in MM and CLL, supporting living evidence generation for HEOR, HTA, and clinical decision making.
METHODS: Our aLLM system comprises multiple autonomous LLMs operating without direct supervision, including GPT-5, GPT-4.1, Gemini 2.5 Pro, and Claude Sonnet 4.5, designed to emulate trained human reviewers. Models were adapted using hematology-specific treatment guidelines and annotation manuals aligned with PRISMA and Cochrane standards and structured around the Population, Intervention/Comparator, Outcomes, and Study Design (PICOS) framework. Inclusion and exclusion decisions were recorded independently for each PICOS element and benchmarked against expert human screening. An iterative refinement process was applied to the annotation manual until >95% accuracy performance thresholds were achieved.
RESULTS: In MM, the aLLM reviewed 800 abstracts, achieving an initial accuracy of 93.7% with a false negative rate of 3.5%. Following targeted refinement of instructions addressing hematology-specific interventions, outcomes, and study designs, final accuracy increased to 97.4% (Population 98.3%, Intervention/Comparator 97.3%, Outcomes 97.5%, Study Design 96.9%), exceeding single human reviewer performance. The overall false negative rate was <0.7%, below the predefined 1% threshold. In CLL, evaluation of 298 abstracts yielded a final overall accuracy of 96.6% and a false negative rate of 0.34% after initial adaptation.
CONCLUSIONS: Agentic LLM systems originally trained in solid tumors can be successfully adapted to hematological malignancies through disease-specific instruction and governance. This approach enables accurate, scalable, and real-time SLRs in MM and CLL, supporting living evidence generation for HEOR, HTA, and clinical decision making.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR170
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
SDC: Oncology