Evaluation of the Use of Topic Modelling (TM) to Support Systematic Literature Review (SLR) Updates for Screening Truncation of Non-Relevant Citations

Author(s)

Bravo À¹, Kodjamanova P², Atanasov P³
¹Amaris Consulting, Barcelona, Spain, ²Amaris Consulting, London, LON, UK, ³Amaris Consulting, Barcelona, B, Spain

Presentation Documents

ISPOR Europe 2022 poster - 121768.pdf

OBJECTIVES: In this study, we explore if topic modelling (TM) can enable more efficient identification of relevant publications in the screening process of an SLR, aiming to exclude a safe proportion of non-relevant publications to save working time.

METHODS: In an SLR, a large set of citations are appraised based on titles and abstracts (TAs) and full texts (FTs) and selected according to pre-specified criteria. Several Natural Language Processing (NLP) and Machine Learning (ML) techniques have been proposed to assist the screening process. TM is a ML technique used to extract hidden topics from large volumes of documents, which is often the case in SLRs. Here, we applied Latent Dirichlet allocation, an unsupervised TM method, to analyse 9,594 citations as part of an SLR and three subsequent updates of it. We observed the proportion of relevant and non-relevant citations (in TA and FT) for each topic, and we applied the TM model to multiples updates. Based on the observed proportions we evaluated the performance of removing the topics with less relevant citations.

RESULTS: Our best performance showed that removing the 12 least relevant topics (59% of the dataset) out of 25 topics would results in no loss of relevant citations in the FT (and 21% of relevant citations in TA). On average in the three updates, by removing the 2 least relevant topics (22% of the citations), we lose 5% of relevant citations in FT (and 12% in TA). Combining datasets from the original SLR and first two updates, by removing the 9 least relevant topics (40% of the dataset) out of 25, we lose 4% of relevant citations in FT (and 2% in TA).

CONCLUSIONS: We show that TM and ML applied to SLRs can be an effective method that can assist the SLR process by accelerating the identification of relevant citations.

Conference/Value in Health Info

2022-11, ISPOR Europe 2022, Vienna, Austria

Value in Health, Volume 25, Issue 12S (December 2022)

Code

MSR76

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Confounding, Selection Bias Correction, Causal Inference, Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic

Methodology

Presentation