Optimizing Baseline Characteristics in Clinical Trials: Leveraging Machine Learning for Enhanced Trial Design and Cohort Selection
Author(s)
Paul Choudhury S1, Dutta S1, Dutta Majumdar A2, Chatterjee K1, Mahon R3
1PharmaQuant Insights Pvt. Ltd., Kolkata, West Bengal, India, 2PharmaQuant Insights Pvt. Ltd., Kolkata, WB, India, 3University of Galway, Galway, Ireland
Presentation Documents
OBJECTIVES: After framing the research question, selecting the study population for a clinical trial is crucial. Heterogeneity in the baseline characteristics (BAC) is often observed among the existing trials due to the non-comparability of trial populations, study settings, inclusion, exclusion criteria, and outcome assessment measures, potentially leading to inconsistent estimates. Considering the new European Health Technology Assessment (HTA) regulation, the optimality of trial designs at an early stage is poised to play a critical role. We propose a machine-learning (ML) based approach to arrive at optimized (less entropy) marginal distributions for chosen BACs during trial planning at early stages, aiding in cohort selection and accurate subgroup analyses, improving sample specificity and recruitment for future trials.
METHODS: A simulated dataset with six baseline characteristics from forty clinical trials was used for the analysis. Based on the study features, the elbow method was used to calculate the within-cluster sum of squares (WCSS) and identify the 'elbow' point, where the WCSS was minimized. Next, K-means clustering was applied and adjusted for the available features using the Hartigan and Wong (1979) algorithm. Relative Importance Analysis (RIA) was conducted to evaluate the variability in BAC across trials and to pinpoint the factors contributing to this heterogeneity among the studies.
RESULTS: K-means clustering identified one optimal cluster. The baseline characteristics associated with the cluster's centroid (PD-L1 positive patients: 48%; ECOG 1 patients: 52%; male patients: 76%; white patients: 80%; has smoking history: 35%) can inform BSC distribution in future trials targeting specific cohorts. The RIA revealed that PD-L1 positive (50%) and white patients (20%) are the top BACs contributing to heterogeneity within the cluster.
CONCLUSIONS: ML can be a valuable tool in guiding clinical trial design by identifying critical baseline characteristics that ensure homogeneity and comparability across studies. This approach contributes to more effective and precise evidence-based medical decision-making.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 12, S2 (December 2024)
Code
MSR227
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Clinical Trials
Disease
No Additional Disease & Conditions/Specialized Treatment Areas