Evaluation of the Use of Topic Modelling (TM) to Support Systematic Literature Review (SLR) Updates for Screening Truncation of Non-Relevant Citations
Author(s)
Bravo À1, Kodjamanova P2, Atanasov P3
1Amaris Consulting, Barcelona, Spain, 2Amaris Consulting, London, LON, UK, 3Amaris Consulting, Barcelona, B, Spain
Presentation Documents
OBJECTIVES: In this study, we explore if topic modelling (TM) can enable more efficient identification of relevant publications in the screening process of an SLR, aiming to exclude a safe proportion of non-relevant publications to save working time.
METHODS: In an SLR, a large set of citations are appraised based on titles and abstracts (TAs) and full texts (FTs) and selected according to pre-specified criteria. Several Natural Language Processing (NLP) and Machine Learning (ML) techniques have been proposed to assist the screening process. TM is a ML technique used to extract hidden topics from large volumes of documents, which is often the case in SLRs. Here, we applied Latent Dirichlet allocation, an unsupervised TM method, to analyse 9,594 citations as part of an SLR and three subsequent updates of it. We observed the proportion of relevant and non-relevant citations (in TA and FT) for each topic, and we applied the TM model to multiples updates. Based on the observed proportions we evaluated the performance of removing the topics with less relevant citations.
RESULTS: Our best performance showed that removing the 12 least relevant topics (59% of the dataset) out of 25 topics would results in no loss of relevant citations in the FT (and 21% of relevant citations in TA). On average in the three updates, by removing the 2 least relevant topics (22% of the citations), we lose 5% of relevant citations in FT (and 12% in TA). Combining datasets from the original SLR and first two updates, by removing the 9 least relevant topics (40% of the dataset) out of 25, we lose 4% of relevant citations in FT (and 2% in TA).
CONCLUSIONS: We show that TM and ML applied to SLRs can be an effective method that can assist the SLR process by accelerating the identification of relevant citations.
Conference/Value in Health Info
Value in Health, Volume 25, Issue 12S (December 2022)
Code
MSR76
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Confounding, Selection Bias Correction, Causal Inference, Literature Review & Synthesis
Disease
No Additional Disease & Conditions/Specialized Treatment Areas