Role of Generative Artificial Intelligence in Assisting Systematic Review Process in Health Research: A Systematic Review

Author(s)

Muhammed Rashid, MPharm., PhD1, Cheng Su Yi, MHS2, Suwapat Lawin, PharmD Candidate3, Pongsapat Limhensin, PharmD Candidate3, Suppachai Insuk, PharmD3, Sajesh K. Veettil, PhD4, Nai Ming Lai, MBBS5, Xiangyang Ye, PhD1, Nathorn Chaiyakunapruk, PharmD, PhD1, Teerapon Dhippayom, PharmD, PhD3;
1College of Pharmacy, University of Utah, Pharmacotherapy, Salt Lake City, UT, USA, 2Independant Researcher, Kuala Lampur, Malaysia, 3Faculty of Pharmaceutical Sciences, Naresuan University, Phitsanulok, Thailand, 4School of Pharmacy, IMU University, Pharmacy Practice, Kuala Lampur, Malaysia, 5Faculty of Health and Medical Sciences, Taylor's University, Subang Jaya, Malaysia

Presentation Documents

OBJECTIVES: Generative artificial intelligence (GAI) is widely used in healthcare for various purposes including the systematic review (SR) process. We aim to summarize the evidence on performance metrics of GAI in SR process.
METHODS: PubMed, EMBASE, and ProQuest Dissertations & Theses Global were searched from their inception up to May 2024. Only experimental studies that compared GAI with other GAIs or human reviewers at any stage of the SR were included. Modified QUADAS-2 was employed to assess quality of studies that used GAI in study selection process. We summarized the findings of the included studies using a narrative approach.
RESULTS: A total of 8 out of 3663 records published were included. The included studies used multiple methods of prompt development, evaluation, reliability and model training. Three studies used GAI for study selection alone. One study each used GAI for PICO development, literature search, data extraction, risk of bias assessment, and both study selection and data extraction. GPT-3.5 and GPT-4 demonstrated good accuracy in PICO question formulation. The performance of GAI in the study selection process varied across studies. Though GPT-4 had a better performance in Tit/Abs screening, performance was low in full-text screening and combined Tit/Abs and full-text screening. This variation may be attributed to different prompts used, field of study, and nature of performance assessment. GPT-3.5 has good agreement with human reviewers in extracting simple information, but not with complex information. There was lower agreement between the Cochrane SRs and GPT-4 in performing risk of bias assessment using ROBINS-I. GAI studies focus on selection process had low risk of bias based on modified QUADAS-2.
CONCLUSIONS: GAI can assist in PICO formulation and simple data extraction. Although GAI is revolutionizing healthcare, more practically validated evidence is needed to integrate it into the SR process.

Conference/Value in Health Info

2025-05, ISPOR 2025, Montréal, Quebec, CA

Value in Health, Volume 28, Issue S1

Code

MT14

Topic

Medical Technologies

Topic Subcategory

Digital Health

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×