Revolutionizing Systematic Literature Reviews: Harnessing the Power of Large Language Model (GPT-4) for Enhanced Research Synthesis
Speaker(s)
Kaur R1, Rai P1, Attri S1, Kaur G1, Singh B2
1Pharmacoevidence, Mohali, India, 2Pharmacoevidence, SAS Nagar Mohali, PB, India
Presentation Documents
OBJECTIVES: The growing utilization of Artificial Intelligence (AI) in the field of research is propelled by its widespread potential, offering distinct advantages over conventional methods. Notably, it minimizes the human errors, workload, increase productivity, ensures quick turnaround, and maintains consistency. This study specifically examines the capabilities of large language models, like generative pre-trained transformer (GPT-4), in automating the complex processes of systematic literature reviews (SLRs).
METHODS: Embase®, Medline®, and Cochrane were searched to identify relevant randomised controlled trials (RCTs) in patients with schizophrenia. A subject matter expert (SME) with over a decade of experience in conducting SLRs optimized and fine-tuned the final prompt, delivered through a Python FastAPI to identify evidence meeting key inclusion and exclusion criteria. Comparison of the screening results obtained via AI and human reviewer was conducted to evaluate agreement levels and assess the successful identification of publications incorporated in the final SLR.
RESULTS: The title and abstract based screening of 985 publications was commenced by a human reviewer and using GPT-4. 18.78% publications were considered for inclusion by GPT-4 in comparison to 15.12% by a human reviewer. Using predictive analytics, the overall agreement i.e., accuracy with GPT-4 and human reviewer was 94.91%. The sensitivity and specificity of GPT-4 was 95.30%, and 94.85% respectively. While both the screening techniques identified all relevant publications, the human reviewer necessitated an additional 10 hours to conduct a thorough assessment of the 985 publications.
CONCLUSIONS: This investigation highlights the efficiency of GPT-4 over traditional SLR methods. Practically, attaining almost 95% concurrence rate with two-review human process is challenging. The outstanding accuracy of GPT-4, comparable to SME, suggests substituting one review of the traditional approach with GPT-4 review for the purpose of expediting the screening process. Future research should explore these benefits across language models and assess the impact of diverse prompts on outcomes.
Code
MSR15
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis, Meta-Analysis & Indirect Comparisons
Disease
Mental Health (including addition)