AI Models for Predicting Clinical Trial Success: Capabilities and Risks of Pattern-Driven Approaches
Author(s)
Ruth Bartelli Grigolon, PhD1, JULIA LIMA, MSc1, Otavio Clark, PhD2, Elise Berliner, PhD3, Renato Mantelli Picoli, PhD4.
1Oracle Life Sciences, São Paulo, Brazil, 2Oracle Life Sciences, New York, NY, USA, 3Oracle Life Science, Austin, TX, USA, 4Oracle Life Sciences, Monte Azul Paulista, Brazil.
1Oracle Life Sciences, São Paulo, Brazil, 2Oracle Life Sciences, New York, NY, USA, 3Oracle Life Science, Austin, TX, USA, 4Oracle Life Sciences, Monte Azul Paulista, Brazil.
OBJECTIVES: This study investigates the use of artificial intelligence (AI) to predict clinical trial outcomes, focusing on two key approaches: HINT (Hierarchical Interaction Network for Clinical Trial Outcome Predictions) and TrialBench (Multi-Modal AI-Ready Clinical Trial Datasets). Both aim to improve drug development efficiency by forecasting trial success, patient dropout, adverse events, and dosing decisions using multimodal data.
METHODS: HINT employs a hierarchical neural network that integrates drug structures, disease phenotypes, and trial design criteria. TrialBench compiles 23 standardized datasets from ClinicalTrials.gov and other sources. Both platforms use deep learning models that combine textual, tabular, and ontological data.
RESULTS: TrialBench showed high performance in tasks like dropout (F1 > 0.95) and SAE prediction (~0.93) but underperformed in dosing predictions, trial approval classifications, and failure cause identification (F1 < 0.50). HINT presented mixed results: F1 for outcome prediction were 0.66 in Phase I, 0.62 in Phase II, and 0.84 in Phase III, with precision varying by disease (ranging from 0.58 for neoplasms to 0.86 for respiratory diseases). Despite promising results, both models face important challenges. They offer limited interpretability, complicating clinical and regulatory adoption, and their data sources are often incomplete, inconsistently labeled, or biased. Annotations from generative models may introduce further uncertainty. Both approaches mainly target small-molecule drugs, limiting applicability to biologics, vaccines, and devices. They overlook critical social, operational, and contextual factors that can affect trial outcomes. Performance across tasks remains uneven, especially in classifying trial failure causes.
CONCLUSIONS: The models show future potential for predicting clinical trial outcomes but have important limitations. Their reliance on historical data may reinforce "me-too" drug development, focusing on well-known therapeutic areas and standard dosing. Databases like ClinicalTrials.gov overrepresent Phase I-III trials, Western regions, and successful outcomes, creating feedback loops that amplify biases and underrepresentation. Finally, the models lack robust validation.
METHODS: HINT employs a hierarchical neural network that integrates drug structures, disease phenotypes, and trial design criteria. TrialBench compiles 23 standardized datasets from ClinicalTrials.gov and other sources. Both platforms use deep learning models that combine textual, tabular, and ontological data.
RESULTS: TrialBench showed high performance in tasks like dropout (F1 > 0.95) and SAE prediction (~0.93) but underperformed in dosing predictions, trial approval classifications, and failure cause identification (F1 < 0.50). HINT presented mixed results: F1 for outcome prediction were 0.66 in Phase I, 0.62 in Phase II, and 0.84 in Phase III, with precision varying by disease (ranging from 0.58 for neoplasms to 0.86 for respiratory diseases). Despite promising results, both models face important challenges. They offer limited interpretability, complicating clinical and regulatory adoption, and their data sources are often incomplete, inconsistently labeled, or biased. Annotations from generative models may introduce further uncertainty. Both approaches mainly target small-molecule drugs, limiting applicability to biologics, vaccines, and devices. They overlook critical social, operational, and contextual factors that can affect trial outcomes. Performance across tasks remains uneven, especially in classifying trial failure causes.
CONCLUSIONS: The models show future potential for predicting clinical trial outcomes but have important limitations. Their reliance on historical data may reinforce "me-too" drug development, focusing on well-known therapeutic areas and standard dosing. Databases like ClinicalTrials.gov overrepresent Phase I-III trials, Western regions, and successful outcomes, creating feedback loops that amplify biases and underrepresentation. Finally, the models lack robust validation.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
MSR19
Topic
Clinical Outcomes, Methodological & Statistical Research, Study Approaches
Disease
No Additional Disease & Conditions/Specialized Treatment Areas