A Natural Language Processing Solution for Health Economics and Outcomes Research Systematic Literature Review
Du J1, Manion F1, Wang D2, He L1, Wang X3, Li Y4, Eckels D2, Cossrow N5, Yao L6
1Melax Tech, Houston, TX, USA, 2Merck & Co., Inc, Rahway, NJ, USA, 3Melax Tech, Stamford, CT, USA, 4Merck, Chatham, NJ, USA, 5Center for Observational and Real-World Evidence, Merck & Co., Inc., Rahway, NJ, USA, 6Merck & Co., Inc., Rahway, NJ, USA
OBJECTIVES: Systematic literature review (SLR), an important tool in health economics and outcomes research (HEOR), is routinely conducted to understand the research landscape and synthesize evidence related to disease burden and treatment options. Conducting an SLR is time-consuming and labor-intensive. In this work, we report on a Natural Language Processing (NLP) solution that automates and accelerates tasks in the SLR process.
METHODS: We followed an agile software development and iterative software engineering methodology to build a customized intelligent NLP solution for SLR tasks. Multiple machine learning-based NLP algorithms were adopted to automate article screening and data element extraction processes. The NLP prediction results can be further reviewed and verified by domain experts, following the human-in-the-loop design. We integrated Explainable AI (XAI) to provide evidence to NLP algorithms and add transparency to extracted literature data elements. The system was further validated on three existing SLR projects, including the epidemiology studies of human papillomavirus-associated diseases, the disease burden of pneumococcal diseases, and cost-effectiveness studies of pneumococcal vaccines.
RESULTS: As a user-centered and end-to-end intelligent solution, the resulting system covers major SLR steps, including study protocol setting, literature retrieval, abstract screening, full-text screening, data element extraction from full-text articles, results summary, and data visualization. The NLP algorithms have achieved 0.86 to 0.89 accuracy scores on two article screening tasks and 0.52 to 0.74 macro-average F1 scores on three data element extraction tasks.
CONCLUSIONS: Our solution integrates cutting-edge NLP algorithms to automate and accelerate the SLR process, thus allowing scientists to have more time to focus on the quality of data and the synthesis of evidence in HEOR studies. Aligning the living systematic literature review concept, the solution has the potential to update literature data and enable scientists to easily stay current with the literature related to HEOR prospectively and continuously.
Conference/Value in Health Info
Value in Health, Volume 26, Issue 6, S2 (June 2023)
Methodological & Statistical Research, Study Approaches
Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis
Infectious Disease (non-vaccine)