Breaking Through Limitations: Enhanced Systematic Literature Reviews With Large Language Models
Author(s)
Reason T1, Langham J2, Malcolm B3, Klijn S4, Gimblett A1
1Estima Scientific Ltd, London, UK, 2Estima Scientific Ltd, London, LON, UK, 3Bristol Myers Squibb, Middlesex, LON, UK, 4Bristol Myers Squibb, Lawrence Township, NJ, USA
Presentation Documents
OBJECTIVES: The potential for utilising AI to improve efficiency of systematic reviews (SLRs) is increasingly recognised and the capabilities of Large Language Models (LLMs) like GPT-4 warrant further investigation. Our objective was to assess the accuracy of GPT-4 in selecting eligible randomised controlled trials (RCTs) from titles and abstracts for an SLR and network meta-analysis (NMA) on overall survival of adult patients with advanced non-small cell lung cancer.
METHODS: Titles and abstracts of RCTs identified in a systematic literature search using EMBASE, MEDLINE and CENTRAL, were screened by two human reviewers and GPT-4. GPT-4 was utilised using a series of prompts delivered via a Python API to identify data relevant to the key inclusion and exclusion criteria and to assess eligibility. The results of screening by AI and human reviewers were compared to assess level of agreement and the successful identification of publications used in the final NMA.
RESULTS: After deduplication, GPT-4 screened 1994 abstracts identifying 14.6% of abstracts as fulfilling the criteria for inclusion (compared with 6% human reviewers). The overall agreement between GPT-4 and reviewers was 80.9% sensitivity, 89.2% specificity and 88.8% accuracy. Both reviewers and GPT-4 identified all studies providing data for the NMA. AI screening took 4 hours to process and deliver output.
CONCLUSIONS: This study shows the potential of using LLMs to quickly and correctly identify a shortlist of studies from titles and abstracts given specific instructions about type of study design, population and intervention. This may help minimise risk of human error and improve the accessibility of results. Further studies are required to test the generalisability of these results, and test how variation in prompts vary sensitivity and specificity of results.
Conference/Value in Health Info
Value in Health, Volume 26, Issue 11, S2 (December 2023)
Code
MSR46
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis, Meta-Analysis & Indirect Comparisons
Disease
Drugs, Oncology