Leveraging Large Language Models for Efficient Observational Research Protocol Drafting: Insights and Best Practices for Openllama
Author(s)
Texeira BC1, Merrill C2, Gao W3, Gao C4, Bao Y2, Anstatt D2
1Bristol Myers Squibb, Uxbridge, UK, 2Bristol Myers Squibb, Princeton Pike, NJ, USA, 3Bristol Myers Squibb, Buffalo Grove, IL, USA, 4Bristol Myers Squibb, Princeton, NJ, USA
OBJECTIVES: Protocols are a mandatory component for conducting research studies and time-consuming to develop. While specialised knowledge is required, protocols typically have a repetitive flow and content structure, which may benefit from automated generation. Recent advancements in artificial intelligence, particularly in Large Language Models (LLMs) such as OpenLLaMa, offer a promising avenue to streamline protocol writing. Our study investigated the ability of OpenLLaMa to generate initial protocol drafts in terms of accuracy and time savings.
METHODS: We adopted a stepwise methodology. First, OpenLLaMa was trained with a dataset of 700 observational research protocols approved within the last two years. The model then generated protocols based on provided contexts and research questions. These drafts were assessed for adherence to standard content flow, scientific and methodological appropriateness, procedural and administrative adequacy, estimated time savings, and identification of areas for improvement.
RESULTS: Protocols drafted by OpenLLaMa demonstrated competent introductions, including detailed descriptions of interventions or diseases and data sources. They proposed suitable statistical methods aligned with study objectives. Significantly, these LLM-generated drafts served as comprehensive starting points, curtailing the time for developing a submission-ready protocol to roughly one-third of the typical duration.
CONCLUSIONS: Our findings indicate that LLMs like OpenLLaMa can significantly streamline the process of drafting research protocols. While further refinement by human experts is needed, the initial versions produced by the LLM are promising and could lead to considerable savings in terms of work hours.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 6, S1 (June 2024)
Code
MSR86
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas