Breaking Through Limitations: Enhanced Systematic Literature Reviews With Large Language Models

Author(s)

Reason T¹, Langham J², Malcolm B³, Klijn S⁴, Gimblett A¹
¹Estima Scientific Ltd, London, UK, ²Estima Scientific Ltd, London, LON, UK, ³Bristol Myers Squibb, Middlesex, LON, UK, ⁴Bristol Myers Squibb, Lawrence Township, NJ, USA

Presentation Documents

ISPOREurope23_ Reason _ MSR46_POSTER_30_10_2023_v_final132992.pdf

OBJECTIVES: The potential for utilising AI to improve efficiency of systematic reviews (SLRs) is increasingly recognised and the capabilities of Large Language Models (LLMs) like GPT-4 warrant further investigation. Our objective was to assess the accuracy of GPT-4 in selecting eligible randomised controlled trials (RCTs) from titles and abstracts for an SLR and network meta-analysis (NMA) on overall survival of adult patients with advanced non-small cell lung cancer.

METHODS: Titles and abstracts of RCTs identified in a systematic literature search using EMBASE, MEDLINE and CENTRAL, were screened by two human reviewers and GPT-4. GPT-4 was utilised using a series of prompts delivered via a Python API to identify data relevant to the key inclusion and exclusion criteria and to assess eligibility. The results of screening by AI and human reviewers were compared to assess level of agreement and the successful identification of publications used in the final NMA.

RESULTS: After deduplication, GPT-4 screened 1994 abstracts identifying 14.6% of abstracts as fulfilling the criteria for inclusion (compared with 6% human reviewers). The overall agreement between GPT-4 and reviewers was 80.9% sensitivity, 89.2% specificity and 88.8% accuracy. Both reviewers and GPT-4 identified all studies providing data for the NMA. AI screening took 4 hours to process and deliver output.

CONCLUSIONS: This study shows the potential of using LLMs to quickly and correctly identify a shortlist of studies from titles and abstracts given specific instructions about type of study design, population and intervention. This may help minimise risk of human error and improve the accessibility of results. Further studies are required to test the generalisability of these results, and test how variation in prompts vary sensitivity and specificity of results.

Conference/Value in Health Info

2023-11, ISPOR Europe 2023, Copenhagen, Denmark

Value in Health, Volume 26, Issue 11, S2 (December 2023)

Code

MSR46

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis, Meta-Analysis & Indirect Comparisons

Disease

Drugs, Oncology

Explore Related HEOR by Topic

Methodology

Presentation