DATA-DRIVEN BASELINE MATCHING: ENHANCING INDIRECT COMPARISONS WITH A MACHINE LEARNING-INFORMED FRAMEWORK FOR SELECTING HOMOGENEOUS TRIAL SETS

Author(s)

Saswata Paul Choudhury, MSc, Sekhar K. Dutta, MSc, Subhajit Gupta, MSc;
PharmaQuant Insights Private Limited, Kolkata, India
OBJECTIVES: Ensuring baseline comparability across randomized trials is essential for indirect treatment comparisons (ITCs). We present an machine-learning-informed methodology that quantifies similarity across trials and ranks all possible trial combinations, so that investigators can transparently select the most homogeneous pools for downstream comparative work.
METHODS: Ten trial-level baseline characteristics were simulated using appropriate underlying distributions across 8 trials (termed T1-T8). We applied multiple clustering algorithms to identify inherent groupings and computed pairwise dissimilarities between trials in a reduced latent-space representation. For every non-empty subset of trials, we derived complementary subset-level metrics that quantify typical within-group separation (e.g. mean pairwise distance), the maximum internal discordance, and the sample-size-weighted proximity to a pooled centroid. Distance-based metrics were then mapped to bounded similarity indices via a smooth kernel transformation. Penalization was applied to prevent very small trial subsets from being over-favored, as limited pairwise comparisons can exaggerate apparent homogeneity and reduce the robustness of network meta-analysis results.
RESULTS: A total 255 non-empty trial subsets were analyzed. A Dendogram from the hierarchical clustering was used to visualize trial subset selection pathways . The composite similarity metric suggested 2 optimal trial combinations. T3, T5, and T6 were the most cohesive (mean 0.438; similarity 0.566) amongst 3 trial combinations, while T2, T3, T5, and T6 showed the highest internal consistency (mean 0.433; similarity 0.537) among 4 trial combinations.
CONCLUSIONS: This methodology provides a framework for scoring and ranking trial combinations and suggesting optimal homogenous study pools. By providing comparable similarity metrics and visualizations across trial combinations, the approach enables informed pooling decisions & structured sensitivity analyses for indirect comparisons, underscoring the utility of ML-driven methods for balancing trial homogeneity in ITCs. Future validation is required to evaluate effects on bias and precision in comparative effectiveness.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR39

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×