Transforming Systematic Literature Reviews: Unleashing the Potential of GPT-4: A Cutting-Edge Large Language Model, to Elevate Research Synthesis
Author(s)
Attri S1, Kaur R1, Singh B2, Rai P1
1Pharmacoevidence, Mohali, India, 2Pharmacoevidence, SAS Nagar Mohali, PB, India
Presentation Documents
OBJECTIVES: The increasing integration of Artificial Intelligence (AI) in research is driven by its extensive potential, providing clear advantages over traditional methods. It can handle large volumes of data efficiently, enhancing the overall speed and precision of the screening process. This study specifically explores the efficiency of advanced language models, such as the generative pre-trained transformer (GPT-4), in automating the intricate procedures involved in systematic literature reviews (SLRs).
METHODS: Embase®, Medline®, and Cochrane were utilised to identify relevant randomised controlled trials (RCTs) in patients with post-traumatic stress disorder. A subject matter expert with over ten years of experience in conducting SLRs optimized and refined the prompt, which was delivered through a Python FastAPI, to identify evidence aligning with crucial inclusion and exclusion criteria. An assessment of screening results between AI and human reviewers was conducted to measure the agreement levels and assess the precise identification of publications incorporated into the SLR.
RESULTS: The study identified a total of 545 publications from the biomedical databases. After deduplication, 519 publications were finally considered for title and abstract based screening. Automated screening using GPT-4 resulted in inclusion of 17.73% publications in comparison to 10.98% by a human reviewer. The overall agreement or accuracy, between GPT-4 and the human reviewer stood at 90.17%, with reported sensitivity and specificity rates of 85.96% and 90.6%, respectively. Both the screening techniques identified all relevant publications however human reviewer required an additional five hours to complete the screening of the 519 publications.
CONCLUSIONS: GPT-4, with accuracy comparable to a SME, could replace one of the two reviews in a standard human-driven SLR. Further investigations are needed to check if these benefits apply to different language models and see how changing the prompts might affect the results.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 6, S1 (June 2024)
Code
MSR57
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis
Disease
Injury & Trauma, Mental Health (including addition)