OLD DATA, NEW TRICKS: ACCURACY AND EFFICIENCY GAINS FOR AI-DRIVEN SLR UPDATES

Author(s)

Kelly Bell, MS, PharmD¹, Ramsha Khan, PhD², Sara Lucas, PhD³, CAITLYN SOLEM, PhD⁴;
¹GSK, Phoenixville, PA, USA, ²Cytel, Toronto, ON, Canada, ³Cytel, London, United Kingdom, ⁴GSK US, Bethesda, MD, USA

Presentation Documents

ISPOR26_Bell_MSR238_POSTER.pdf

OBJECTIVES: AI-driven screening models can be trained to screen SLR citations to reduce burden and accelerate timelines. Our objective was to compare accuracy of human screening with machine-learning (ML) based systems under two training scenarios when screening a 3-month SLR update. Secondly, this analysis aimed to evaluate efficiency gains of AI-assisted versus human screening.
METHODS: ML training scenarios were tested in two “nests” in Nested Knowledge®. Scenario 1: ML was trained using a fully-human-screened Clinical SLR (n=5,850); scenario 2: ML was trained on the minimum number of human-screened citations (n=50). Both MLs screened a 3-month SLR update (n=244) at title/abstract level. Results were compared with fully human screening to assess concordance, sensitivity and specificity. To determine efficiencies, models were trained using human-screened SLRs (Clinical [n=5,850]; HRQoL [n=1,790]; Economic [n=955]) and time taken compared to a traditional (two experienced human reviewers, rate n=85 citations/hour) and hybrid approach (one human-one AI).
RESULTS: Model performance varied by training scenario. The scenario 1 model included n=34/224 citations and the scenario 2 model included n=169/224 citations. Human screeners included n= 72/224 at title/abstract level and n=6 studies at full-text level. All relevant studies were included by the scenario 2 model. One study was missed by the scenario 1 model. Compared to two human reviewers, substantial efficiency gains were noted for clinical (n=2873 studies reviewed; traditional: 68h; hybrid: 34h; fully-AI 1h), HRQoL (n=4247 studies; traditional: 100h; hybrid: 50h; fully-AI 0.6h), and economic (n=5288 studies; traditional: 124h; hybrid: 62h; fully-AI 0.6h) updates.
CONCLUSIONS: Results demonstrate that AI-assisted screening can substantially reduce workload and accelerate SLR execution and update turnaround times. Models can feasibly replace one reviewer with high sensitivity but require a human for nuance and final discretion on included studies. The balance between sensitivity, specificity and human oversight will depend on the acceptable risks and intended purpose of the review.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR238

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)