Traditional vs. Generative AI: A Rapid Systematic Review Assessing Accuracy and Efficiency of AI in Title/Abstract Screening
Author(s)
Emily Hardy, MBiol, Amelia Peddle, MSc, Judith Peatman, MSc, Janine Ross, MSc, Shona Lang, PhD.
Petauri Evidence, Bicester, United Kingdom.
Petauri Evidence, Bicester, United Kingdom.
OBJECTIVES: The aim of this systematic literature review (SLR) was to assess the accuracy and efficiency of artificial intelligence (AI) tools for SLR title/abstract screening.
METHODS: Electronic database searches were conducted in Embase® from inception to March 2025, and supplemented with desktop research. Search results were uploaded to Laser AI for screening and data extraction; two independent reviewers performed screening, with data extraction performed by one reviewer and checked by a second. AI platforms were categorised as commercially available tools (i.e. machine learning and natural language processing models) or generative large language models (LLMs). Prototype or in-house algorithms/models were excluded. Quantitative data for comparative accuracy and efficiency were extracted in addition to a qualitative summary of factors influencing rates.
RESULTS: The SLR identified 51 studies investigating commercially available tools, the most common being Distiller (n=14), Abstrackr (n=9), and ASReview (n=9), plus 45 studies investigating LLMs, primarily OpenAI’s GPT models (n=41). Accuracy (tools, n=43; LLMs, n=45) was reported more frequently than efficiency (tools, n=37; LLMs, n=15), however AI use cases plus outcome definitions were highly heterogenous across publications. Time and cost savings were clear, however the impact on accuracy was less consistent. There were notable differences in practical methodology for integrating commercially available tools versus LLMs into SLR workflows (i.e. variable human role), and in factors influencing outcomes of interest. However, consistent factors modulating accuracy and efficiency potential included review methodology, research question complexity, and review size.
CONCLUSIONS: Integration of AI technology into SLR workflows offers clear efficiency savings versus conventional methodology, however conclusions regarding comparative accuracy are less clear. Researchers considering the use of AI should identify similar use cases and manage expectations based on potential modulating factors. Clear guidelines on AI methodology across health economics and outcomes research are essential to validate and quantify both accuracy and efficiency.
METHODS: Electronic database searches were conducted in Embase® from inception to March 2025, and supplemented with desktop research. Search results were uploaded to Laser AI for screening and data extraction; two independent reviewers performed screening, with data extraction performed by one reviewer and checked by a second. AI platforms were categorised as commercially available tools (i.e. machine learning and natural language processing models) or generative large language models (LLMs). Prototype or in-house algorithms/models were excluded. Quantitative data for comparative accuracy and efficiency were extracted in addition to a qualitative summary of factors influencing rates.
RESULTS: The SLR identified 51 studies investigating commercially available tools, the most common being Distiller (n=14), Abstrackr (n=9), and ASReview (n=9), plus 45 studies investigating LLMs, primarily OpenAI’s GPT models (n=41). Accuracy (tools, n=43; LLMs, n=45) was reported more frequently than efficiency (tools, n=37; LLMs, n=15), however AI use cases plus outcome definitions were highly heterogenous across publications. Time and cost savings were clear, however the impact on accuracy was less consistent. There were notable differences in practical methodology for integrating commercially available tools versus LLMs into SLR workflows (i.e. variable human role), and in factors influencing outcomes of interest. However, consistent factors modulating accuracy and efficiency potential included review methodology, research question complexity, and review size.
CONCLUSIONS: Integration of AI technology into SLR workflows offers clear efficiency savings versus conventional methodology, however conclusions regarding comparative accuracy are less clear. Researchers considering the use of AI should identify similar use cases and manage expectations based on potential modulating factors. Clear guidelines on AI methodology across health economics and outcomes research are essential to validate and quantify both accuracy and efficiency.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
MSR203
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas