Automated Non-Interventional Research Protocol Generation: A Case Study in Melanoma


Langham J1, Benbow E1, Reason T2, Malcolm B3, Gimblett A1, Hill N4
1Estima Scientific Ltd, London, UK, 2Estima Scientific Ltd, South Ruislip, LON, UK, 3Bristol Myers Squibb, Middlesex, LON, UK, 4Bristol Myers Squibb Company, Princeton, NJ, USA

OBJECTIVES: Assess the potential to utilise large language models, such as GPT-4, for the automation of Non-Interventional Research (NIR) study protocols to enhance efficiency in the ability to conduct research

METHODS: To automate the development of specific sections of a protocol a Python API was used to send prompts to GPT-4 and receive output. Prompts were developed to provide specific inputs for each protocol, such as the population of interest, the aims and objectives, and the data source. Further information about the structure and content required for each section, and a template or example text for GPT-4 to modify was also developed and provided for each protocol section. The accuracy and completeness of GPT-4’s outputs were qualitatively assessed against the original human-produced protocol content, focusing on the identification of critical points, and noting any omissions or inaccuracies.

RESULTS: Two protocols for retrospective cohort studies with objectives to describe patient characteristics, treatment patterns, and clinical outcomes for melanoma patients were autogenerated. Overall, there was close alignment between the original text and autogenerated text for the Study Design and Study Population sections. GPT-4 gave general aspects of data collection but lacked specifics related to the data sources and their use unless it was specified in the prompt. There was a substantial match in the description of statistical methods, with GPT-4 following the overall guidelines and providing clear methodology for analysis for each objective.

CONCLUSIONS: GPT-4 demonstrates potential in automating the drafting of sections of NIR protocols, with a high degree of alignment with original human-generated content. There was no inaccurate text reported. Where details were missing, the GPT-4 text could be enhanced by incorporating more specific details in the prompts, for example, subgroup analyses, how patients are selected from a data source, and the definition of the index date.

Conference/Value in Health Info

2024-05, ISPOR 2024, Atlanta, GA, USA

Value in Health, Volume 27, Issue 6, S1 (June 2024)




Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Prospective Observational Studies


No Additional Disease & Conditions/Specialized Treatment Areas, Oncology

Explore Related HEOR by Topic

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on Update my browser now