A MULTI-MODEL LARGE LANGUAGE FRAMEWORK FOR AUTOMATING SYSTEMATIC LITERATURE REVIEWS

Author(s)

Ritesh Dubey, PharmD, Ankita Sood, PharmD, Vedant Soni, B.Tech, Gagandeep Kaur, M.Pharm, Rajdeep Kaur, PhD, Barinder Singh, RPh;
Pharmacoevidence, Mohali, India

Presentation Documents

OBJECTIVES: The first artificial intelligence (AI)-assisted health technology assessment (HTA) submission, accepted by the NICE Evidence Assessment Group, demonstrated that using AI as a second reviewer for title and abstract screening can reduce systematic literature review (SLR) time and cost by approximately 50%. The objective of this study is to assess whether additional efficiencies can be achieved through fully automated SLR screening using multiple large language models (LLMs) and confidence-guided outputs.
METHODS: A Python-based interface was developed to facilitate automated title/abstract screening using multiple LLMs (Claude Sonnet 3.7, Gemini Flash 2.5, and GPT4-o-mini), guided by predefined inclusion and exclusion criteria. Screening decisions were finalized based on the model confidence matrix. The records with low confidence or conflicts were escalated for manual review. A subject matter expert (SME) optimized, fine-tuned the final prompt, and conducted quality control on the records excluded by AI.
RESULTS: Compared to the semi‑automated benchmark (AI as a second reviewer), which provides approximately 50%-time savings, the multi-LLM screening approach substantially enhanced efficiency, reducing screening time by approximately 90%. Only 7% of the screened citations (1840 in total) were flagged for human review. Moreover, SME assessment of all the AI-excluded citations confirmed that LLMs did not exclude any relevant citation. The models showed a modest over‑inclusion rate of approximately 1-2%. Overall, the findings suggest that LLM‑based screening can substantially accelerate SLR workflows while maintaining decision quality comparable to human reviewers.
CONCLUSIONS: This study shows that a fully automated, multi-LLM approach can produce high-quality SLRs up to ten times faster than conventional methods. The approach achieves around 90% efficiency and reduces costs, while maintaining human oversight. Moreover, it is scalable and practical, and may support faster evidence synthesis and decision-making in healthcare systems.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR78

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×