SYNTHETIC PROGRESSION-FREE SURVIVAL DATA GENERATION FROM PUBLISHED AGGREGATE TRIAL DATA

Author(s)

Neha Tripathi, MPH¹, Parampal Bajaj, BTech¹, Akanksha Sharma, MSc¹, Shubhram Pandey, MSc²;
¹Heorlytics Pvt. Ltd., Mohali, India, ²Heorlytics Pvt. Ltd., SAS Nagar, Mohali, India

Presentation Documents

MSR91_ISPOR _Poster_synthetic PFS data generation Final.pdf

OBJECTIVES: Individual patient data (IPD) availability is often limited in health technology assessment (HTA), restricting indirect treatment comparisons for time-to-event outcomes. In oncology, survival results are frequently reported in aggregate form, necessitating alternative methodological approaches to derive IPD. This study develops a framework for reconstructing progression-free survival (PFS) from published aggregate data and generating synthetic control cohorts for exploratory comparative analyses
METHODS: A case study was conducted using aggregate data from the docetaxel monotherapy control arm of a phase II randomised trial where IPD were unavailable. PFS times were simulated using a Weibull distribution fitted to the reported median PFS of 3.9 months, with independent censoring applied. Baseline patient characteristics (age, sex, ECOG performance status, and prior therapy burden) were generated to match published aggregate distributions. A synthetic control cohort was created using classification and regression tree (CART) methods to enable propensity-based matching. The aggregate-calibrated and synthetic cohorts were compared using Kaplan-Meier curves and median PFS estimates
RESULTS: The aggregate-simulated control arm yielded a Kaplan-Meier median PFS of 4.5 months, representing a 15.4% deviation from the published median. The synthetic control cohort showed a comparable PFS profile with overlapping confidence intervals. These deviations were consistent with expected sampling variability in small sample sizes and did not suggest systematic bias. Kaplan-Meier curves demonstrated adequate alignment between simulated and published survival trajectories
CONCLUSIONS: Aggregate-driven simulation provides a pragmatic and reproducible approach to generating synthetic IPD when only aggregate trial data are available. While not replacing true IPD, it enables exploratory analyses and evidence synthesis in data-limited settings. This methodology may be valuable in HTA contexts requiring timely decision-making with limited data access. Future research should validate this approach using real IPD across different disease areas and outcome measures

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR91

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)