COMPARING AN EXPERT-IN-THE-LOOP AI WORKFLOW WITH MANUAL SYSTEMATIC LITERATURE REVIEW: IMPLICATIONS FOR EFFICIENCY AND EVIDENCE QUALITY IN HEOR

Author(s)

Elizabeth Weathers, PhD, RN, RGN, FAAN¹, Melissa Bathish, PhD, RN, CPNP-PC ², Samy Ateia, MSc³, Julie Thompson, PhD⁴, Thomas Hopkins, MD, MBA⁵, Nicole Darling, PhD, EIT⁶.
¹College of Health and Agricultural Sciences, University College Dublin, Dublin, Ireland, ²School of Nursing, University of Michigan, Ann Arbor, MI, USA, ³Computer Science, University of Regensburg, Regensburg, Germany, ⁴Statistics, Duke University, Raleigh, NC, USA, ⁵AccuVein Inc., Boston, MA, USA, ⁶ECNE Research, LLC, Denver, CO, USA.

OBJECTIVES: Systematic literature reviews (SLRs) are essential to health economics and outcomes research (HEOR) but are resource-intensive and time-consuming. This study evaluated whether an expert-in-the-loop AI-enabled SLR workflow achieves comparable evidence coverage and classification accuracy to manual dual-human review while reducing workload and cognitive burden.
METHODS: A comparative methods evaluation was conducted using two parallel SLR workflows applied to the same literature dataset: (A) an AI-assisted workflow with structured human validation and (B) dual independent human review using conventional SLR procedures. Reviewers completed a calibration exercise to align inclusion, exclusion, and extraction criteria. Primary outcomes included sensitivity, specificity, precision, classification accuracy, and missed-evidence rates. Primary outcomes were analyzed using 95% confidence intervals and McNemar’s test. Efficiency outcomes included time spent on screening and data extraction and were analyzed using paired t-tests or non-parametric Wilcoxon tests used. Reviewer cognitive load and usability were assessed using the Paas Mental Effort Scale, Rating Scale of Mental Effort, and the System Usability Scale. Error analyses quantified false positive and false negative rates, and a structured error taxonomy summarized discordant decisions.
RESULTS: The AI-assisted workflow demonstrated classification performance comparable to manual review, with high sensitivity and precision for study inclusion and low missed-evidence rates. A substantial reduction in total review time was found, driven primarily by faster, streamlined screening and data extraction. Cognitive load scores indicated lower perceived effort, and usability scores were acceptable. Error analyses indicated low false positive and false negative rates, with discordant decisions summarized using a structured error taxonomy.
CONCLUSIONS: An expert-in-the-loop AI-enabled workflow can substantially reduce the time and cognitive burden required to conduct SLRs while maintaining evidence coverage and classification accuracy comparable to manual review. For HEOR applications, this approach may support more efficient evidence generation for economic evaluations, value assessments, and real-world evidence synthesis without compromising methodological rigor.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

SA48

Topic

Study Approaches

Topic Subcategory

Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)