Optimizing Baseline Characteristics in Clinical Trials: Leveraging Machine Learning for Enhanced Trial Design and Cohort Selection

Author(s)

Paul Choudhury S¹, Dutta S¹, Dutta Majumdar A², Chatterjee K¹, Mahon R³
¹PharmaQuant Insights Pvt. Ltd., Kolkata, West Bengal, India, ²PharmaQuant Insights Pvt. Ltd., Kolkata, WB, India, ³University of Galway, Galway, Ireland

Presentation Documents

Optimizing Baseline Characteristics in Clinical Trials142432.pdf

OBJECTIVES: After framing the research question, selecting the study population for a clinical trial is crucial. Heterogeneity in the baseline characteristics (BAC) is often observed among the existing trials due to the non-comparability of trial populations, study settings, inclusion, exclusion criteria, and outcome assessment measures, potentially leading to inconsistent estimates. Considering the new European Health Technology Assessment (HTA) regulation, the optimality of trial designs at an early stage is poised to play a critical role. We propose a machine-learning (ML) based approach to arrive at optimized (less entropy) marginal distributions for chosen BACs during trial planning at early stages, aiding in cohort selection and accurate subgroup analyses, improving sample specificity and recruitment for future trials.

METHODS: A simulated dataset with six baseline characteristics from forty clinical trials was used for the analysis. Based on the study features, the elbow method was used to calculate the within-cluster sum of squares (WCSS) and identify the 'elbow' point, where the WCSS was minimized. Next, K-means clustering was applied and adjusted for the available features using the Hartigan and Wong (1979) algorithm. Relative Importance Analysis (RIA) was conducted to evaluate the variability in BAC across trials and to pinpoint the factors contributing to this heterogeneity among the studies.

RESULTS: K-means clustering identified one optimal cluster. The baseline characteristics associated with the cluster's centroid (PD-L1 positive patients: 48%; ECOG 1 patients: 52%; male patients: 76%; white patients: 80%; has smoking history: 35%) can inform BSC distribution in future trials targeting specific cohorts. The RIA revealed that PD-L1 positive (50%) and white patients (20%) are the top BACs contributing to heterogeneity within the cluster.

CONCLUSIONS: ML can be a valuable tool in guiding clinical trial design by identifying critical baseline characteristics that ensure homogeneity and comparability across studies. This approach contributes to more effective and precise evidence-based medical decision-making.

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Value in Health, Volume 27, Issue 12, S2 (December 2024)

Code

MSR227

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Clinical Trials

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic

Methodology

Presentation