Automation of Abstract Screening: A Case Study of Multiple Systematic Literature Reviews
Author(s)
Cichewicz A1, Kadambi A2, Lavoie L3, Mittal L4, Pierre V3, Raorane R5
1Evidera, Waltham, MA, USA, 2Evidera, San Francisco, CA, USA, 3Evidera, Montreal, QC, Canada, 4Evidera, Bengaluru, India, 5Evidera, London, LON, UK
Presentation Documents
OBJECTIVES: Integrating artificial intelligence (AI) into screening of systematic literature reviews (SLRs) may reduce time and effort needed to conduct robust, fully comprehensive reviews. This study aimed to determine the ability of AI to accurately identify relevant literature on the following topics: epidemiology, treatment patterns, health utilities, treatment guidelines, and SLRs and/or meta-analyses (MA).
METHODS: Title and abstract screening decisions from five SLRs previously completed by human reviewers were replicated using DistillerSR AI reviewer. The AI reviewer for each SLR was trained with a selection of relevant and irrelevant references pertaining each SLR’s selection criteria. These training sets accounted for 10% of each search yield (53-188 references). The AI was run on the remaining search yield using a prediction score threshold of 0-0.2 for exclusions and 0.8-1 for inclusions. Screening decisions from each SLR were compared between AI and human reviewers and agreement between reviewers was calculated using kappa statistics.
RESULTS: Following training, the AI reviewer screened roughly 75% of references for each SLR, with the exception of the treatment guidelines SLR (90% reviewed by AI). The AI and human reviewers had the best agreement with treatment guidelines SLR (98.6%), followed by epidemiology (98.3%), treatment patterns (91.8%), and utilities (91.2 The SLR of SLRs/MAs was the most difficult for the AI reviewer (73.1% agreement); this SLR also had the smallest training set (n=53). Of the incorrect screening decisions made by AI, most (>80% in 3 of the SLRs) were exclusions that humans deemed relevant for the SLR
CONCLUSIONS: With a robust training set (10% of total yield), the DistillerSR AI reviewer consistently screened most references from various SLRs, although inter-rater reliability was impacted by the type of SLR and its pre-specified selection criteria. Further exploration is underway to assess generalizability to other SLRs and the impact of variations in training sets.
Conference/Value in Health Info
Value in Health, Volume 25, Issue 12S (December 2022)
Code
MSR67
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis
Disease
No Additional Disease & Conditions/Specialized Treatment Areas