Automation of Abstract Screening: A Case Study of Multiple Systematic Literature Reviews

Author(s)

Cichewicz A¹, Kadambi A², Lavoie L³, Mittal L⁴, Pierre V³, Raorane R⁵
¹Evidera, Waltham, MA, USA, ²Evidera, San Francisco, CA, USA, ³Evidera, Montreal, QC, Canada, ⁴Evidera, Bengaluru, India, ⁵Evidera, London, LON, UK

Presentation Documents

AI in Abstract Screening.pdf

OBJECTIVES: Integrating artificial intelligence (AI) into screening of systematic literature reviews (SLRs) may reduce time and effort needed to conduct robust, fully comprehensive reviews. This study aimed to determine the ability of AI to accurately identify relevant literature on the following topics: epidemiology, treatment patterns, health utilities, treatment guidelines, and SLRs and/or meta-analyses (MA).

METHODS: Title and abstract screening decisions from five SLRs previously completed by human reviewers were replicated using DistillerSR AI reviewer. The AI reviewer for each SLR was trained with a selection of relevant and irrelevant references pertaining each SLR’s selection criteria. These training sets accounted for 10% of each search yield (53-188 references). The AI was run on the remaining search yield using a prediction score threshold of 0-0.2 for exclusions and 0.8-1 for inclusions. Screening decisions from each SLR were compared between AI and human reviewers and agreement between reviewers was calculated using kappa statistics.

RESULTS: Following training, the AI reviewer screened roughly 75% of references for each SLR, with the exception of the treatment guidelines SLR (90% reviewed by AI). The AI and human reviewers had the best agreement with treatment guidelines SLR (98.6%), followed by epidemiology (98.3%), treatment patterns (91.8%), and utilities (91.2 The SLR of SLRs/MAs was the most difficult for the AI reviewer (73.1% agreement); this SLR also had the smallest training set (n=53). Of the incorrect screening decisions made by AI, most (>80% in 3 of the SLRs) were exclusions that humans deemed relevant for the SLR

CONCLUSIONS: With a robust training set (10% of total yield), the DistillerSR AI reviewer consistently screened most references from various SLRs, although inter-rater reliability was impacted by the type of SLR and its pre-specified selection criteria. Further exploration is underway to assess generalizability to other SLRs and the impact of variations in training sets.

Conference/Value in Health Info

2022-11, ISPOR Europe 2022, Vienna, Austria

Value in Health, Volume 25, Issue 12S (December 2022)

Code

MSR67

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic

Methodology

Presentation