From Manual to Machine: A Deep Dive into AI-Powered Systematic Literature Reviews

Author(s)

Gagandeep Kaur, M.Pharm1, Barinder Singh, RPh2, Rajdeep Kaur, PhD1, Ankita Sood, PharmD1;
1Pharmacoevidence, SAS Nagar Mohali, India, 2Pharmacoevidence, London, United Kingdom
OBJECTIVES: Artificial intelligence (AI)/Machine-learning (ML) offer promising solutions to streamline the systematic literature review (SLR) process, particularly in areas like screening, data extraction, and evidence synthesis. The study aims to assess the performance of AI/ML tools in conducting an SLR compared to manual review processes.
METHODS: Key biomedical databases (EMBASE®, MEDLINE®) were searched for last five years to identify studies evaluating the integration of AI/ML techniques in SLR process across various healthcare domains. The review followed the standard methodology for conducting SLR as per guidelines provided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA).
RESULTS: Across the 92 included studies, 37 assessed AI/ML tools against manual processes at various stages of SLR. Majority of the studies utilised GPT (n=8), followed by Distiller (n=6) AI models. AI tools were utilized for title and abstract screening across the majority of studies (n=31). The sensitivity and specificity ranged from 14% (DistillerAI) to 98.1% (FSL model), and 11.6% (GPT-3.5) to 99.6% (GPT-4), respectively. AI/ML demonstrated accuracy ranging from 38.0% (GPT-3.5) to 100% (EXACT). Two studies reported kappa coefficients, with one showing an almost perfect agreement between GPT and human reviewers in the abstract screening process (Cohen's kappa>0.9) and the other achieving moderate inter-reviewer reliability (kappa coefficient ≥ 0.75). Sixteen studies also demonstrated a significant reduction in the time required compared to humans. In addition to the genAI models, other NLP models were also used for the screening purpose (Bert, Alberta, srBERT, SVM, Naïve Bayes and BIBOT etc.).
CONCLUSIONS: While AI/ML tools demonstrated notable efficiency improvements, such as reduced manual screening efforts, limitations like dependence on domain expertise and challenges in assessing study quality persist. The findings underscore AI’s potential to enhance SLR processes but highlight the need for further refinement in accuracy and interpretability.

Conference/Value in Health Info

2025-05, ISPOR 2025, Montréal, Quebec, CA

Value in Health, Volume 28, Issue S1

Code

MSR110

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×