A Mixed-Methods Review of the Systematic Reviewer's Inter-Reviewer Reliability Using the Kappa Statistic
Speaker(s)
Hanegraaf P1, Mosselman JJ1, Abogunrin S2, Queiros L3, Van der Pol S4, Boersma C4, Postma M4
1Pitts, Zeist, UT, Netherlands, 2F. Hoffmann-La Roche, Basel, BS, Switzerland, 3F. Hoffmann-La Roche Ltd., Basel, Switzerland, 4Health-Ecore, Zeist, UT, Netherlands
Presentation Documents
OBJECTIVES: Artificial intelligence (AI) is a potential solution for reducing the systematic literature reviews (SLRs) workload. The level of inter-reviewer reliability (IRR) in SLRs is unclear. We aimed to establish an average baseline IRR for SLRs in order to provide a valuable tool for evaluating AI applications compared to human performance. We also aimed to establish the human-AI reviewer team's expected minimum IRR.
METHODS: Using a mixed-methods approach that combined a structured scoping PubMed search and a survey, we identified records of SLRs reporting IRR using the kappa statistic of the reviewers. Abstracts of these records were screened for the reporting of kappa statistic of the inter-reviewer agreement for all literature review steps. Following this, full-texts of eligible abstracts were critically appraised to confirm the reporting of IRR. Records using AI in a literature review step were ineligible. Authors of the included SLRs were surveyed about their standards for human as well as AI SLR performance for different types of SLRs.
RESULTS: From the records identified by the scoping review, most did not report a kappa statistic and/or IRR. For those that did, these data were not reported consistently for the different literature review steps. There were variations in inter-reviewer kappa estimates likely driven by differences in researcher experience. Further detailed results will be presented on specific kappa statistic and/or IRR reporting in accepted studies, including recommendations for expected AI applications performance based on the findings of the survey.
CONCLUSIONS: Currently, there is no commonly acceptable inter-reviewer kappa standard reported in the scientific literature. For best scientific practice, future SLRs should report IRR kappa scores to showcase how well systematic reviewers understood the topic being researched.
Code
SA12
Topic
Methodological & Statistical Research, Organizational Practices, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Best Research Practices, Literature Review & Synthesis
Disease
No Additional Disease & Conditions/Specialized Treatment Areas