Machines As a Second Reviewer in Systematic Literature Reviews

Author(s)

Queiros L¹, Witzmann A², Sumner M³, Wehler P³, Baehrens D³, Abogunrin S⁴
¹F. Hoffmann La Roche, Basel, Switzerland, ²F. Hoffmann La Roche, Kaiseraugst, AG, Switzerland, ³Averbis GmbH, Freiburg, Germany, ⁴F. Hoffmann-La Roche Ltd., Basel, BS, Switzerland

Presentation Documents

ISPOR 2021 - Machines As a Second Reviewer in Systematic Literature Reviews.pdf

Background: Systematic literature reviews (SLR) are burdensome especially when it is necessary to double screen records, e.g. for health technology assessment documentation. Various artificial intelligence methods, including support vector machines (SVMs), have been studied for the automation of title and abstract screening (TIABS). We explored the role of SVM-based classifiers as a second reviewer during TIABS.

Methods: Ten retrospective SLRs addressing different health-related problems were assessed independently by two different approaches. A binary classifier using one SVM and an ensemble classifier using multiple SVMs were separately used to assign accept or reject statuses to TIAB records. The results of the two classifiers were then compared to the human results using confusion matrices, precision, and recall. Work-saved-over-sampling at 95%-recall (WSS@95) was computed to determine the human effort averted when using either SVM.

Results: The research questions’ sample sizes varied between 319 and 16,962 records and covered haematology, infectious diseases and oncology. For the binary classifier, the recall, precision and WSS varied between 0.53 and 1.00, 0.07 and 0.65, 0.57 and 0.81, respectively. In comparison, the ensemble classifier showed a recall between 0.47 and 0.95, precision between 0.17 and 0.83, and WSS between 0.71 and 0.90. The proportion of conflicts ranged from 12.3% to 33.1% for the binary classifier and 4.5% to 20.6% for the ensemble classifier.

Conclusions The percentage of conflicts between humans and the two machines was relatively minimal, implying that a machine could be employed as a second reviewer when conducting SLRs. The findings show that using machines to augment TIABS may shorten the time to complete SLRs and thus, enable swifter decision-making. Further work should assess what automatic methods can be used for optimal single screening of TIAB.

Conference/Value in Health Info

2021-11, ISPOR Europe 2021, Copenhagen, Denmark

Value in Health, Volume 24, Issue 12, S2 (December 2021)

Code

POSB317

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Specific Disease

Explore Related HEOR by Topic

Methodology

Presentation