Machines As a Second Reviewer in Systematic Literature Reviews
Author(s)
Queiros L1, Witzmann A2, Sumner M3, Wehler P3, Baehrens D3, Abogunrin S4
1F. Hoffmann La Roche, Basel, Switzerland, 2F. Hoffmann La Roche, Kaiseraugst, AG, Switzerland, 3Averbis GmbH, Freiburg, Germany, 4F. Hoffmann-La Roche Ltd., Basel, BS, Switzerland
Presentation Documents
Background: Systematic literature reviews (SLR) are burdensome especially when it is necessary to double screen records, e.g. for health technology assessment documentation. Various artificial intelligence methods, including support vector machines (SVMs), have been studied for the automation of title and abstract screening (TIABS). We explored the role of SVM-based classifiers as a second reviewer during TIABS. Methods: Ten retrospective SLRs addressing different health-related problems were assessed independently by two different approaches. A binary classifier using one SVM and an ensemble classifier using multiple SVMs were separately used to assign accept or reject statuses to TIAB records. The results of the two classifiers were then compared to the human results using confusion matrices, precision, and recall. Work-saved-over-sampling at 95%-recall (WSS@95) was computed to determine the human effort averted when using either SVM. Results: The research questions’ sample sizes varied between 319 and 16,962 records and covered haematology, infectious diseases and oncology. For the binary classifier, the recall, precision and WSS varied between 0.53 and 1.00, 0.07 and 0.65, 0.57 and 0.81, respectively. In comparison, the ensemble classifier showed a recall between 0.47 and 0.95, precision between 0.17 and 0.83, and WSS between 0.71 and 0.90. The proportion of conflicts ranged from 12.3% to 33.1% for the binary classifier and 4.5% to 20.6% for the ensemble classifier. Conclusions The percentage of conflicts between humans and the two machines was relatively minimal, implying that a machine could be employed as a second reviewer when conducting SLRs. The findings show that using machines to augment TIABS may shorten the time to complete SLRs and thus, enable swifter decision-making. Further work should assess what automatic methods can be used for optimal single screening of TIAB.
Conference/Value in Health Info
2021-11, ISPOR Europe 2021, Copenhagen, Denmark
Value in Health, Volume 24, Issue 12, S2 (December 2021)
Code
POSB317
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Specific Disease