Can ChatGPT Generate Synthetic Data to Train Systematic Literature Review Machine Learning Models?
Speaker(s)
Abogunrin S1, Marti-Gil Y1, Lane M1, Witzmann A2
1F. Hoffmann-La Roche, Basel, BS, Switzerland, 2F. Hoffmann La Roche, Kaiseraugst, AG, Switzerland
Presentation Documents
OBJECTIVES: Large language models like ChatGPT show promise to aid systematic review-related tasks. It is unclear if, and how, they can be used to generate text for training other machine learning (ML) models to overcome limitations such as small or unbalanced datasets. This research investigates the feasibility of employing ChatGPT to generate realistic synthetic peer-reviewed journal-looking abstracts.
METHODS: Subject matter experts (SMEs) asked OpenAI’s ChatGPT 3.5 to create abstracts for a clinical research question based on the chain-of-thought prompting method using the Population-P, Intervention/Comparison-I/C, Outcome-O, Study Design-S, framework. Two groups of abstracts were generated. The first was expected to meet all the pre-specified inclusion criteria for the research question. The second group covered each pre-specified exclusion criterion separately. The SMEs qualitatively evaluated the abstracts against the research question to assess the reliability and effectiveness of ChatGPT versus human-written formats.
RESULTS: At least 11 prompts were required for ChatGPT to generate realistic abstracts. ChatGPT performed well when asked to generate abstracts that mentioned all the PICOS inclusion criteria (10/10). For the exclusion reasons, ChatGPT generated realistic abstracts relating to Excluded Population (10/10), Excluded Intervention/Comparison (9/10-10/10), and Excluded Study design (10/10). 6/10-10/10 of the Excluded Outcomes’ abstracts were appropriate, although the exact numbers reported within the abstracts were fabricated. The generated abstracts were limited by ChatGPT, unnecessarily assuming the same output structure, and the model was sometimes unable to understand the prompts until they were rephrased.
CONCLUSIONS: ChatGPT can generate synthetic peer-reviewed journal-looking abstracts but finding the prompts that will produce realistic results requires iterations. There is still a need for a human-in-the-loop with subject matter expertise to assess the appropriateness of the machine’s output. Future research should explore the reliability of such synthetically generated text for ML model training.
Code
MSR153
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Missing Data
Disease
No Additional Disease & Conditions/Specialized Treatment Areas