Enhancing Systematic Literature Reviews With GenAI: Development, Applications, and Performance Evaluation

Speaker(s)

Li Y1, Datta S2, Lee K3, Paek H2, Bergrath E4, Glasgow J2, Liston C2, He L2, Rastergar-mojarad M2, Wang X5, Xu Y4
1Regeneron Pharmaceuticals, Inc., Scarsdale, NY, USA, 2IMO Health, Rosemont, IL, USA, 3IMO Health, Ardsley, NY, USA, 4Regeneron Pharmaceuticals, Inc., Tarrytown, NY, USA, 5IMO Health, Westport, CT, USA

OBJECTIVES: A systematic literature review (SLR) is a crucial step in establishing the existing evidence base for informed health technology assessments (HTAs). Conducting a SLR is a labor-intensive, time-consuming, and costly process, which is further exacerbated by the ever-increasing volume of scientific publications. With the advent of large language models (LLMs), we aim to explore whether these models can serve as an assistive tool to facilitate and streamline the SLR process.

METHODS: We developed a customized AI assisted SLR system consisting of five distinct steps: 1. Searching the PubMed database using the medical terms with Boolean strategies; 2. Setting up the study protocol in Population, Intervention/Comparison, Outcome, and Study Design (PICOs) format; 3. Performing abstract screening by PICOs criteria using LLM; 4. Implementing targeted data extraction using LLM on abstracts accepted for full-text review; 5. Generating summary reports to assist researchers in detailed data extraction strategy.

RESULTS: A random sample comprising 49 titles/abstracts for multiple myeloma (MM) and 50 abstracts for melanoma was selected from Step 1. These titles/abstracts were then manually reviewed by domain experts to determine their relevance, thereby creating a reference standard. The relevance or irrelevance decisions made by the LLM, based on predefined PICO criteria, were compared against this reference standard. The LLM-based system achieved accuracies of 87.76% for MM and 92% for melanoma, respectively. Cohen's kappa scores of 0.74 for MM and 0.84 for melanoma indicate a substantial agreement between the human reviewer and AI-assisted SLR.

CONCLUSIONS: The initial evaluation revealed that the AI assisted SLR system achieved a reasonably high level of accuracy and agreement with human reviewers. The next phase involves assessing the system against a larger-scale reference standard. It is important to note that the developed system necessitates the involvement of human experts to review the results generated by the system and provide final approval.

Code

MSR234

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas