SYNTHETIC PROGRESSION-FREE SURVIVAL DATA GENERATION FROM PUBLISHED AGGREGATE TRIAL DATA
Author(s)
Neha Tripathi, MPH1, Parampal Bajaj, BTech1, Akanksha Sharma, MSc1, Shubhram Pandey, MSc2;
1Heorlytics Pvt. Ltd., Mohali, India, 2Heorlytics Pvt. Ltd., SAS Nagar, Mohali, India
1Heorlytics Pvt. Ltd., Mohali, India, 2Heorlytics Pvt. Ltd., SAS Nagar, Mohali, India
OBJECTIVES: Individual patient data (IPD) availability is often limited in health technology assessment (HTA), restricting indirect treatment comparisons for time-to-event outcomes. In oncology, survival results are frequently reported in aggregate form, necessitating alternative methodological approaches to derive IPD. This study develops a framework for reconstructing progression-free survival (PFS) from published aggregate data and generating synthetic control cohorts for exploratory comparative analyses
METHODS: A case study was conducted using aggregate data from the docetaxel monotherapy control arm of a phase II randomised trial where IPD were unavailable. PFS times were simulated using a Weibull distribution fitted to the reported median PFS of 3.9 months, with independent censoring applied. Baseline patient characteristics (age, sex, ECOG performance status, and prior therapy burden) were generated to match published aggregate distributions. A synthetic control cohort was created using classification and regression tree (CART) methods to enable propensity-based matching. The aggregate-calibrated and synthetic cohorts were compared using Kaplan-Meier curves and median PFS estimates
RESULTS: The aggregate-simulated control arm yielded a Kaplan-Meier median PFS of 4.5 months, representing a 15.4% deviation from the published median. The synthetic control cohort showed a comparable PFS profile with overlapping confidence intervals. These deviations were consistent with expected sampling variability in small sample sizes and did not suggest systematic bias. Kaplan-Meier curves demonstrated adequate alignment between simulated and published survival trajectories
CONCLUSIONS: Aggregate-driven simulation provides a pragmatic and reproducible approach to generating synthetic IPD when only aggregate trial data are available. While not replacing true IPD, it enables exploratory analyses and evidence synthesis in data-limited settings. This methodology may be valuable in HTA contexts requiring timely decision-making with limited data access. Future research should validate this approach using real IPD across different disease areas and outcome measures
METHODS: A case study was conducted using aggregate data from the docetaxel monotherapy control arm of a phase II randomised trial where IPD were unavailable. PFS times were simulated using a Weibull distribution fitted to the reported median PFS of 3.9 months, with independent censoring applied. Baseline patient characteristics (age, sex, ECOG performance status, and prior therapy burden) were generated to match published aggregate distributions. A synthetic control cohort was created using classification and regression tree (CART) methods to enable propensity-based matching. The aggregate-calibrated and synthetic cohorts were compared using Kaplan-Meier curves and median PFS estimates
RESULTS: The aggregate-simulated control arm yielded a Kaplan-Meier median PFS of 4.5 months, representing a 15.4% deviation from the published median. The synthetic control cohort showed a comparable PFS profile with overlapping confidence intervals. These deviations were consistent with expected sampling variability in small sample sizes and did not suggest systematic bias. Kaplan-Meier curves demonstrated adequate alignment between simulated and published survival trajectories
CONCLUSIONS: Aggregate-driven simulation provides a pragmatic and reproducible approach to generating synthetic IPD when only aggregate trial data are available. While not replacing true IPD, it enables exploratory analyses and evidence synthesis in data-limited settings. This methodology may be valuable in HTA contexts requiring timely decision-making with limited data access. Future research should validate this approach using real IPD across different disease areas and outcome measures
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR91
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas