PERFORMANCE OF AI METHODS FOR TITLE/ABSTRACT SCREENING: COMPARING CRITERIA-BASED AND ADVANCEMENT PROBABILITY-BASED SCREENING APPROACHES
Author(s)
Darsh Devani, MS1, Ella Jones, BS2, Grace E. Fox, PhD1;
1OPEN Health, New York, NY, USA, 2OPEN Health, London, United Kingdom
1OPEN Health, New York, NY, USA, 2OPEN Health, London, United Kingdom
OBJECTIVES: With increasing use of artificial intelligence (AI) in literature reviews, understanding the performance of different screening approaches is important. This study compares criteria-based and advancement probability-based AI methods for title/abstract screening against a human-reviewed reference standard.
METHODS: This analysis evaluated 2 AI-supported title/abstract screening approaches using a literature review dataset. One human reviewer identified includable references, which served as the reference standard. Criteria-based screening applied predefined inclusion/exclusion criteria, while advancement probability-based screening prioritized references using model-estimated relevance. Both approaches were applied independently to the same dataset. Sensitivity (capture of human-included references), specificity (correct exclusion), accuracy (overall agreement), and precision (confirmation of AI-included references) were assessed by comparing AI included/excluded decisions with the human reference standard. Screening was implemented using the Nested Knowledge platform.
RESULTS: Criteria-based screening flagged 471 references and identified 97 of 98 references included by human review (sensitivity = 0.99), missing 1 human-included reference. Of excluded references, 99 matched human exclusions, while 374 AI-flagged references were not included by the human reviewer. Specificity was 0.21, accuracy 0.34, and precision 0.21. Advancement probability-based screening advanced 327 references and identified 70 of 98 references included by human review (sensitivity = 0.71), missing 28 human-included references. Among excluded references, 216 matched human exclusions, while 257 AI-flagged references were not included by the human reviewer. Specificity was 0.46, accuracy 0.50, and precision 0.21.
CONCLUSIONS: The 2 AI screening approaches produced different screening patterns when applied to the same dataset. Criteria-based screening captured more included references, while advancement probability-based screening advanced a smaller subset of references for review. These findings show that AI screening approach selection affects screening workload and the balance between completeness and efficiency. Collectively, these findings suggest that future AI-based screening should account for performance differences across methods, as design choices can lead to distinct screening outcomes.
METHODS: This analysis evaluated 2 AI-supported title/abstract screening approaches using a literature review dataset. One human reviewer identified includable references, which served as the reference standard. Criteria-based screening applied predefined inclusion/exclusion criteria, while advancement probability-based screening prioritized references using model-estimated relevance. Both approaches were applied independently to the same dataset. Sensitivity (capture of human-included references), specificity (correct exclusion), accuracy (overall agreement), and precision (confirmation of AI-included references) were assessed by comparing AI included/excluded decisions with the human reference standard. Screening was implemented using the Nested Knowledge platform.
RESULTS: Criteria-based screening flagged 471 references and identified 97 of 98 references included by human review (sensitivity = 0.99), missing 1 human-included reference. Of excluded references, 99 matched human exclusions, while 374 AI-flagged references were not included by the human reviewer. Specificity was 0.21, accuracy 0.34, and precision 0.21. Advancement probability-based screening advanced 327 references and identified 70 of 98 references included by human review (sensitivity = 0.71), missing 28 human-included references. Among excluded references, 216 matched human exclusions, while 257 AI-flagged references were not included by the human reviewer. Specificity was 0.46, accuracy 0.50, and precision 0.21.
CONCLUSIONS: The 2 AI screening approaches produced different screening patterns when applied to the same dataset. Criteria-based screening captured more included references, while advancement probability-based screening advanced a smaller subset of references for review. These findings show that AI screening approach selection affects screening workload and the balance between completeness and efficiency. Collectively, these findings suggest that future AI-based screening should account for performance differences across methods, as design choices can lead to distinct screening outcomes.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR233
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Confounding, Selection Bias Correction, Causal Inference
Disease
No Additional Disease & Conditions/Specialized Treatment Areas