Transforming Systematic Literature Reviews: Unleashing the Potential of GPT-4: A Cutting-Edge Large Language Model, to Elevate Research Synthesis

Author(s)

Attri S1, Kaur R1, Singh B2, Rai P1
1Pharmacoevidence, Mohali, India, 2Pharmacoevidence, SAS Nagar Mohali, PB, India

OBJECTIVES: The increasing integration of Artificial Intelligence (AI) in research is driven by its extensive potential, providing clear advantages over traditional methods. It can handle large volumes of data efficiently, enhancing the overall speed and precision of the screening process. This study specifically explores the efficiency of advanced language models, such as the generative pre-trained transformer (GPT-4), in automating the intricate procedures involved in systematic literature reviews (SLRs).

METHODS: Embase®, Medline®, and Cochrane were utilised to identify relevant randomised controlled trials (RCTs) in patients with post-traumatic stress disorder. A subject matter expert with over ten years of experience in conducting SLRs optimized and refined the prompt, which was delivered through a Python FastAPI, to identify evidence aligning with crucial inclusion and exclusion criteria. An assessment of screening results between AI and human reviewers was conducted to measure the agreement levels and assess the precise identification of publications incorporated into the SLR.

RESULTS: The study identified a total of 545 publications from the biomedical databases. After deduplication, 519 publications were finally considered for title and abstract based screening. Automated screening using GPT-4 resulted in inclusion of 17.73% publications in comparison to 10.98% by a human reviewer. The overall agreement or accuracy, between GPT-4 and the human reviewer stood at 90.17%, with reported sensitivity and specificity rates of 85.96% and 90.6%, respectively. Both the screening techniques identified all relevant publications however human reviewer required an additional five hours to complete the screening of the 519 publications.

CONCLUSIONS: GPT-4, with accuracy comparable to a SME, could replace one of the two reviews in a standard human-driven SLR. Further investigations are needed to check if these benefits apply to different language models and see how changing the prompts might affect the results.

Conference/Value in Health Info

2024-05, ISPOR 2024, Atlanta, GA, USA

Value in Health, Volume 27, Issue 6, S1 (June 2024)

Code

MSR57

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis

Disease

Injury & Trauma, Mental Health (including addition)

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×