USE ARTIFICIAL INTELLIGENCE TO PREDICT REGULATORY APPROVAL BASED ON PHASE III ONCOLOGY CLINICAL TRIAL PUBLICATIONS
Author(s)
Beverly Fuerte, PharmD, Da Sol Kim, PharmD, Kenneth Youens, MD, Timothy Reynolds, PharmD, MS, Linda Chen, PharmD, MS, Paul Godley, PharmD, Harry Liu, MS, PhD;
Baylor Scott & White Health, Temple, TX, USA
Baylor Scott & White Health, Temple, TX, USA
OBJECTIVES: Improving the ability to predict regulatory outcomes using artificial intelligence (AI) could streamline research and development. This shift could save the industry billions in wasted capital, while society benefits from faster access to effective treatments. This study assesses accuracy of AI to predict approval in phase-III oncology clinical trials and identifies key drivers of regulatory outcomes.
METHODS: A set of 208 phase-III oncology clinical trials with overall survival endpoints initiated between January 1990 to January 2021 was curated from literature. Published phase-III trial manuscripts were systematically processed by large language models (LLMs) to predict U.S. Food and Drug Administration (FDA) approval or denial using information only from text. OpenAI’s GPT-4.1-mini and GPT-5-mini were selected for their cost-efficiency and comparable outputs. Across twelve study attributes, LLMs were prompted to assign a relative weight between -1 and 1, where positive values indicated contribution towards approval and negative values towards denial. Weights were then rescaled between 0 and 1 where larger values indicated greater decision contributions. Sensitivity analysis was conducted using retrieval-augmented-generation (RAG) with FDA guidance serving as a knowledge base.
RESULTS: Of 208 studies, 79 were associated with FDA approval. Across all testing conditions, models achieved 100% sensitivity for approval. GPT-4.1-mini achieved specificity of 85.3% and 84.5% with and without RAG, respectively. GPT-5-mini achieved specificity of 92.3% and 89.9% with and without RAG, respectively. The highest F1 score of 0.940 and balanced accuracy of 96% were achieved with GPT-5-mini and RAG. Primary endpoint type, hazard ratio (HR), and overall survival (OS) were LLM-identified as the most salient features that influenced overall decision-making on regulatory outcomes.
CONCLUSIONS: AI was able to predict regulatory outcomes with great sensitivity and specificity. Of the 12 study features assessed, performance of the primary endpoint, especially with regards to HR and OS, was identified as key factors in predicting regulatory outcomes.
METHODS: A set of 208 phase-III oncology clinical trials with overall survival endpoints initiated between January 1990 to January 2021 was curated from literature. Published phase-III trial manuscripts were systematically processed by large language models (LLMs) to predict U.S. Food and Drug Administration (FDA) approval or denial using information only from text. OpenAI’s GPT-4.1-mini and GPT-5-mini were selected for their cost-efficiency and comparable outputs. Across twelve study attributes, LLMs were prompted to assign a relative weight between -1 and 1, where positive values indicated contribution towards approval and negative values towards denial. Weights were then rescaled between 0 and 1 where larger values indicated greater decision contributions. Sensitivity analysis was conducted using retrieval-augmented-generation (RAG) with FDA guidance serving as a knowledge base.
RESULTS: Of 208 studies, 79 were associated with FDA approval. Across all testing conditions, models achieved 100% sensitivity for approval. GPT-4.1-mini achieved specificity of 85.3% and 84.5% with and without RAG, respectively. GPT-5-mini achieved specificity of 92.3% and 89.9% with and without RAG, respectively. The highest F1 score of 0.940 and balanced accuracy of 96% were achieved with GPT-5-mini and RAG. Primary endpoint type, hazard ratio (HR), and overall survival (OS) were LLM-identified as the most salient features that influenced overall decision-making on regulatory outcomes.
CONCLUSIONS: AI was able to predict regulatory outcomes with great sensitivity and specificity. Of the 12 study features assessed, performance of the primary endpoint, especially with regards to HR and OS, was identified as key factors in predicting regulatory outcomes.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR165
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
SDC: Oncology