How Much Time Does Artificial Intelligence Really Save in Evidence Synthesis? A Systematic Literature Review

Author(s)

Ania Bobrowska, BSc, MSc, PhD¹, Molly Murton², Liz Ashworth, BA², Kallista Chan, PhD².
¹Principal Consultant, Costello Medical, Cambridge, United Kingdom, ²Costello Medical, Cambridge, United Kingdom.

OBJECTIVES: Traditional literature reviews can be time and resource-intensive. We aimed to understand the extent to which AI can save time and workload in the conduct of literature reviews (LRs).
METHODS: MEDLINE and Embase were searched in June 2025. Records were reviewed at title and abstract by two experienced reviewers and at full text by a single reviewer. We included primary research studies that reported time or workload saved by using AI on a specific aspect of a LR compared with humans. LRs were hand-searched and excluded. Data was extracted and synthesised qualitatively due to heterogeneity of outcome reporting. Where possible, saved hours per study were calculated. Where ranges were reported, midpoints were used for calculations. Authors' conclusions were subjectively judged as "positive", "cautiously positive" or "neutral/negative" towards AI-generated efficiencies in LRs.
RESULTS: Searches produced 2,091 unique hits; 2,011 records were removed after title/abstract review. Ultimately, 56 studies were included. Studies used proprietary tools (n=29), widely-available general AI tools like ChatGPT (n=16) or a trained, bespoke algorithm (n=11). Most time savings were reported for study selection at title/abstract stage (n=45 studies), with fewer studies reporting time saved on quality assessments (n=6), extractions (n=2) or deduplication, feasibility assessment, or search strategy generation (n=1 each). Median time saved per study was 0.017 hours (n=31 data points). The median workload saved was 65% (n=25 data points) and the median time saved was 60% (n=8 data points). Authors were generally positive (n=27) or cautiously positive (n=17), rather than negative (n=12) about the potential of AI to help conduct LRs.
CONCLUSIONS: Most benefits of AI are currently seen at the screening stage of a LR, rather than data extraction or quality assessment stages. Comparisons are hampered by lack of a unified outcome to measure the performance of AI in LRs, both in terms of precision and efficiencies gained.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

SA50

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)