A Simulation Study Assessing a Stopping Rule to Reduce Literature Review Title and Abstract Screening Burden

Speaker(s)

Dolin O1, Langford B1, Masselot P2, Zhang H1, Gonçalves-Bradley D3
1Symmetron Limited, London, LON, UK, 2London School of Hygiene and Tropical Medicine, London, LON, UK, 3Symmetron Limited, London, UK

OBJECTIVES: Literature review screening automation research has focused on prioritised screening: continually using screener decisions to re-sort records by predicted relevance. Its use is widespread; however, reviewers often screen every record, minimising efficiency gains. To reduce screening burden, stopping rules – criteria to stop screening before reviewing all records – exploit prioritised screening’s ability to push relevant records up the queue. One rule uses order statistics to quantify the certainty that a desired recall threshold (e.g. 95%) has been reached during screening, based on an initial random sample of records. This work quantifies the screening burden reduction permitted by that rule, when used alongside prioritised screening, via an extensive simulation study.

METHODS: We used gold-standard title and abstract screening decisions (two reviewers, with dispute resolution) from six different systematic literature review datasets. We used the ASReview Python package to run 2000 prioritised screening simulations for each dataset. For each simulation we determined where the stopping rule would have triggered for different recall thresholds and confidence levels and estimated the reduction in screening burden.

RESULTS: At the 95% confidence and 95% recall threshold, the median screening burden reduction was between 32-38% for the largest datasets (3801, 2996 records), between 10-15% for the midsize datasets (1267, 1497 records), and below 10% for the two smallest datasets (<500 records). In contrast, at the 99% confidence and 99% recall threshold, median screening burden reduction was <10% for even the largest dataset. Increasing the size of the initial random sample, or the confidence/recall thresholds, increased screening burden across datasets.

CONCLUSIONS: Pairing this stopping rule with prioritised screening is a promising approach to reducing screening burden, especially for targeted and pragmatic reviews, where not capturing every relevant record may be acceptable. This is especially the case for large datasets (1000+ records) with many relevant records.

Code

MSR163

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas