PERFORMANCE OF AI METHODS FOR TITLE/ABSTRACT SCREENING: COMPARING CRITERIA-BASED AND ADVANCEMENT PROBABILITY-BASED SCREENING APPROACHES

Author(s)

Darsh Devani, MS¹, Ella Jones, BS², Grace E. Fox, PhD¹.
¹OPEN Health, New York, NY, USA, ²OPEN Health, London, United Kingdom.

OBJECTIVES: With increasing use of artificial intelligence (AI) in literature reviews, understanding the performance of different screening approaches is important. This study compares criteria-based and advancement probability-based AI methods for title/abstract screening against a human-reviewed reference standard.
METHODS: This analysis evaluated 2 AI-supported title/abstract screening approaches using a literature review dataset. One human reviewer identified includable references, which served as the reference standard. Criteria-based screening applied predefined inclusion/exclusion criteria, while advancement probability-based screening prioritized references using model-estimated relevance. Both approaches were applied independently to the same dataset. Sensitivity (capture of human-included references), specificity (correct exclusion), accuracy (overall agreement), and precision (confirmation of AI-included references) were assessed by comparing AI included/excluded decisions with the human reference standard. Screening was implemented using the Nested Knowledge platform.
RESULTS: Criteria-based screening flagged 471 references and identified 97 of 98 references included by human review (sensitivity = 0.99), missing 1 human-included reference. Of excluded references, 99 matched human exclusions, while 374 AI-flagged references were not included by the human reviewer. Specificity was 0.21, accuracy 0.34, and precision 0.21. Advancement probability-based screening advanced 327 references and identified 70 of 98 references included by human review (sensitivity = 0.71), missing 28 human-included references. Among excluded references, 216 matched human exclusions, while 257 AI-flagged references were not included by the human reviewer. Specificity was 0.46, accuracy 0.50, and precision 0.21.
CONCLUSIONS: The 2 AI screening approaches produced different screening patterns when applied to the same dataset. Criteria-based screening captured more included references, while advancement probability-based screening advanced a smaller subset of references for review. These findings show that AI screening approach selection affects screening workload and the balance between completeness and efficiency. Collectively, these findings suggest that future AI-based screening should account for performance differences across methods, as design choices can lead to distinct screening outcomes.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR233

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Confounding, Selection Bias Correction, Causal Inference

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)