A Mixed-Methods Review of the Systematic Reviewer's Inter-Reviewer Reliability Using the Kappa Statistic

Speaker(s)

Hanegraaf P¹, Mosselman JJ¹, Abogunrin S², Queiros L³, Van der Pol S⁴, Boersma C⁴, Postma M⁴
¹Pitts, Zeist, UT, Netherlands, ²F. Hoffmann-La Roche, Basel, BS, Switzerland, ³F. Hoffmann-La Roche Ltd., Basel, Switzerland, ⁴Health-Ecore, Zeist, UT, Netherlands

Presentation Documents

ISPOR 2022 - Mixed-methods review.pdf

OBJECTIVES: Artificial intelligence (AI) is a potential solution for reducing the systematic literature reviews (SLRs) workload. The level of inter-reviewer reliability (IRR) in SLRs is unclear. We aimed to establish an average baseline IRR for SLRs in order to provide a valuable tool for evaluating AI applications compared to human performance. We also aimed to establish the human-AI reviewer team's expected minimum IRR.

METHODS: Using a mixed-methods approach that combined a structured scoping PubMed search and a survey, we identified records of SLRs reporting IRR using the kappa statistic of the reviewers. Abstracts of these records were screened for the reporting of kappa statistic of the inter-reviewer agreement for all literature review steps. Following this, full-texts of eligible abstracts were critically appraised to confirm the reporting of IRR. Records using AI in a literature review step were ineligible. Authors of the included SLRs were surveyed about their standards for human as well as AI SLR performance for different types of SLRs.

RESULTS: From the records identified by the scoping review, most did not report a kappa statistic and/or IRR. For those that did, these data were not reported consistently for the different literature review steps. There were variations in inter-reviewer kappa estimates likely driven by differences in researcher experience. Further detailed results will be presented on specific kappa statistic and/or IRR reporting in accepted studies, including recommendations for expected AI applications performance based on the findings of the survey.

CONCLUSIONS: Currently, there is no commonly acceptable inter-reviewer kappa standard reported in the scientific literature. For best scientific practice, future SLRs should report IRR kappa scores to showcase how well systematic reviewers understood the topic being researched.

Code

SA12

Topic

Methodological & Statistical Research, Organizational Practices, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Best Research Practices, Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

ISPOR Europe 2022

6 - 9 November