Can ChatGPT Generate Synthetic Data to Train Systematic Literature Review Machine Learning Models?

Author(s)

Abogunrin S¹, Marti-Gil Y¹, Lane M¹, Witzmann A²
¹F. Hoffmann-La Roche, Basel, BS, Switzerland, ²F. Hoffmann La Roche, Kaiseraugst, AG, Switzerland

Presentation Documents

OBJECTIVES: Large language models like ChatGPT show promise to aid systematic review-related tasks. It is unclear if, and how, they can be used to generate text for training other machine learning (ML) models to overcome limitations such as small or unbalanced datasets. This research investigates the feasibility of employing ChatGPT to generate realistic synthetic peer-reviewed journal-looking abstracts.

METHODS: Subject matter experts (SMEs) asked OpenAI’s ChatGPT 3.5 to create abstracts for a clinical research question based on the chain-of-thought prompting method using the Population-P, Intervention/Comparison-I/C, Outcome-O, Study Design-S, framework. Two groups of abstracts were generated. The first was expected to meet all the pre-specified inclusion criteria for the research question. The second group covered each pre-specified exclusion criterion separately. The SMEs qualitatively evaluated the abstracts against the research question to assess the reliability and effectiveness of ChatGPT versus human-written formats.

RESULTS: At least 11 prompts were required for ChatGPT to generate realistic abstracts. ChatGPT performed well when asked to generate abstracts that mentioned all the PICOS inclusion criteria (10/10). For the exclusion reasons, ChatGPT generated realistic abstracts relating to Excluded Population (10/10), Excluded Intervention/Comparison (9/10-10/10), and Excluded Study design (10/10). 6/10-10/10 of the Excluded Outcomes’ abstracts were appropriate, although the exact numbers reported within the abstracts were fabricated. The generated abstracts were limited by ChatGPT, unnecessarily assuming the same output structure, and the model was sometimes unable to understand the prompts until they were rephrased.

CONCLUSIONS: ChatGPT can generate synthetic peer-reviewed journal-looking abstracts but finding the prompts that will produce realistic results requires iterations. There is still a need for a human-in-the-loop with subject matter expertise to assess the appropriateness of the machine’s output. Future research should explore the reliability of such synthetically generated text for ML model training.

Conference/Value in Health Info

2023-11, ISPOR Europe 2023, Copenhagen, Denmark

Value in Health, Volume 26, Issue 11, S2 (December 2023)

Code

MSR153

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Missing Data

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation