Leveraging Large Language Models for Efficient Observational Research Protocol Drafting: Insights and Best Practices for Openllama

Author(s)

Texeira BC¹, Merrill C², Gao W³, Gao C⁴, Bao Y², Anstatt D²
¹Bristol Myers Squibb, Uxbridge, UK, ²Bristol Myers Squibb, Princeton Pike, NJ, USA, ³Bristol Myers Squibb, Buffalo Grove, IL, USA, ⁴Bristol Myers Squibb, Princeton, NJ, USA

OBJECTIVES: Protocols are a mandatory component for conducting research studies and time-consuming to develop. While specialised knowledge is required, protocols typically have a repetitive flow and content structure, which may benefit from automated generation. Recent advancements in artificial intelligence, particularly in Large Language Models (LLMs) such as OpenLLaMa, offer a promising avenue to streamline protocol writing. Our study investigated the ability of OpenLLaMa to generate initial protocol drafts in terms of accuracy and time savings.

METHODS: We adopted a stepwise methodology. First, OpenLLaMa was trained with a dataset of 700 observational research protocols approved within the last two years. The model then generated protocols based on provided contexts and research questions. These drafts were assessed for adherence to standard content flow, scientific and methodological appropriateness, procedural and administrative adequacy, estimated time savings, and identification of areas for improvement.

RESULTS: Protocols drafted by OpenLLaMa demonstrated competent introductions, including detailed descriptions of interventions or diseases and data sources. They proposed suitable statistical methods aligned with study objectives. Significantly, these LLM-generated drafts served as comprehensive starting points, curtailing the time for developing a submission-ready protocol to roughly one-third of the typical duration.

CONCLUSIONS: Our findings indicate that LLMs like OpenLLaMa can significantly streamline the process of drafting research protocols. While further refinement by human experts is needed, the initial versions produced by the LLM are promising and could lead to considerable savings in terms of work hours.

Conference/Value in Health Info

2024-05, ISPOR 2024, Atlanta, GA, USA

Value in Health, Volume 27, Issue 6, S1 (June 2024)

Code

MSR86

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic

Methodology

Presentation