ACCELERATING LITERATURE REVIEWS WITH LARGE LANGUAGE MODELS (LLMS): AN EVALUATION OF PERFORMANCE AND EFFICIENCY

Author(s)

Raju Gautam, PhD1, Saeed Anwar, MSc2, Tushar Srivastava, MSc1, Ratna Pandey, MSc2;
1ConnectHEOR, London, United Kingdom, 2ConnectHEOR, Delhi, India
OBJECTIVES: Systematic/Literature reviews (SLRs/LRs) are crucial for health research and evidence-based decision making but are often time-and labor-intensive. Artificial intelligence (AI) tools like LLMs have shown promising ways to automate these processes. The aim of this research was to evaluate the performance and efficiency of an AI-SLR tool.
METHODS: A retrospective analysis was conducted to evaluate the performance and efficiency of a web-based AI-SLR tool (EasySLR™) across four LRs (2 targeted, 2 SLRs; 2 clinical, 2 economic). AI performance (accuracy, sensitivity and specificity) was assessed for title/abstract screening, full-text screening and data-extraction. An AI-only approach was used for title/abstract screening in targeted reviews and for data-extraction in all reviews, while a hybrid AI-human reviewer approach was applied for all other review stages. AI-only and AI-human hybrid performance were compared with retrospectively completed human-only reviews.
RESULTS: Sample size comprised 794−1,594 studies (title/abstract screening), 12−92 (full-text screening), and 5−92 (data extraction). Across all four LRs, AI-human accuracy ranged from 84%-100% for title/abstract screening, 60%-92% for full-text screening and 9%-60% for data-extraction. Sensitivity (correct inclusion by AI) varied from 70%-97% for title/abstract screening and 90-100% for full-text screening. Specificity (correct exclusion by AI) ranged from 84%-100% for title/abstract and 70%-88% for full-text screening. Performance for clinical review was considerably poorer versus economic review. Compared to human-only LRs, AI-only reviewers improve efficiency by 100%-150% for title/abstract screening, and 300%-500% for data-extraction but with low accuracy. Whereas a hybrid approach improves efficiency by 40%-60% for title/abstract screening and 12%-20% for full-text screening.
CONCLUSIONS: The used AI-SLR tool appears to be a promising tool for straightforward reviews, and it saved considerable time in title/abstract screening and data-extraction using AI-only reviewer feature. The performance for complex reviews and data-extraction requires further improvements. Nevertheless, ongoing and future model developments may improve suitability for data-extraction and complex reviews.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR190

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×