Survival Extrapolations With and Without Incorporating External Evidence: Evaluating the Performance of Bayesian M-Spline Models vs. Parametric Models in First-Line Metastatic Castration-Resistant Prostate Cancer (mCRPC)
Author(s)
Jack Williams, PhD1, Robert Hettle, MSc1, Iain Reid Timmins, PhD2.
1Health Technology Assessment and Modelling Science, AstraZeneca, Cambridge, United Kingdom, 2Statistical Innovation, AstraZeneca, Cambridge, United Kingdom.
1Health Technology Assessment and Modelling Science, AstraZeneca, Cambridge, United Kingdom, 2Statistical Innovation, AstraZeneca, Cambridge, United Kingdom.
OBJECTIVES: To evaluate the performance of the survextrap R package, a Bayesian M-spline approach to survival extrapolation that can incorporate trial and external evidence, versus standard parametric models.
METHODS: Pseudo-patient data were recovered from Overall Survival (OS) Kaplan-Meier (KM) plots for enzalutamide from two mCRPC studies; TALAPRO-2 (NCT03395197) and PREVAIL (NCT01212991). TALAPRO-2 reported OS curves up to 41.4 months at primary analysis and 66.5 months at final analysis. PREVAIL reported OS curves up to 81.1 months. Standard parametric models and M-spline models (with and without incorporation of historical PREVAIL data) were fit to the TALAPRO-2 primary analysis data. Survival extrapolation accuracy was evaluated against landmark survival at 48 and 60 months with the TALAPRO-2 final analysis, and by estimating the root mean squared error (RMSE) of extrapolations versus the final data over the duration of the curve.
RESULTS: TALAPRO-2 final OS was an estimated 39.5% and 30.7% at 48 and 60 months, respectively. The best-fitting parametric models (Weibull and gamma) based on visual inspection against external PREVAIL evidence and the M-spline model incorporating external evidence all provided a good fit to TALAPRO-2 (within 5% for both landmark timepoints). The M-spline model without external data performed poorly (17.5%-21.2% difference across landmarks). RMSE was lowest for the M-spline model with external evidence (0.029), with Weibull and gamma models slightly higher (0.031-0.037). The M-spline model without external evidence had much higher RMSE (0.132), demonstrating the poor fit.
CONCLUSIONS: The Bayesian M-spline model from the survextrap R package performed well when high-quality external evidence with long-term follow-up was available. In this example, it performed similarly to the best-fitting parametric models when external evidence was available. However, when extrapolating into time periods without trial or external evidence, the M-spline extrapolations were highly uncertain and generally unreliable. Moreover, extrapolations were heavily influenced by choice of knot number and knot location.
METHODS: Pseudo-patient data were recovered from Overall Survival (OS) Kaplan-Meier (KM) plots for enzalutamide from two mCRPC studies; TALAPRO-2 (NCT03395197) and PREVAIL (NCT01212991). TALAPRO-2 reported OS curves up to 41.4 months at primary analysis and 66.5 months at final analysis. PREVAIL reported OS curves up to 81.1 months. Standard parametric models and M-spline models (with and without incorporation of historical PREVAIL data) were fit to the TALAPRO-2 primary analysis data. Survival extrapolation accuracy was evaluated against landmark survival at 48 and 60 months with the TALAPRO-2 final analysis, and by estimating the root mean squared error (RMSE) of extrapolations versus the final data over the duration of the curve.
RESULTS: TALAPRO-2 final OS was an estimated 39.5% and 30.7% at 48 and 60 months, respectively. The best-fitting parametric models (Weibull and gamma) based on visual inspection against external PREVAIL evidence and the M-spline model incorporating external evidence all provided a good fit to TALAPRO-2 (within 5% for both landmark timepoints). The M-spline model without external data performed poorly (17.5%-21.2% difference across landmarks). RMSE was lowest for the M-spline model with external evidence (0.029), with Weibull and gamma models slightly higher (0.031-0.037). The M-spline model without external evidence had much higher RMSE (0.132), demonstrating the poor fit.
CONCLUSIONS: The Bayesian M-spline model from the survextrap R package performed well when high-quality external evidence with long-term follow-up was available. In this example, it performed similarly to the best-fitting parametric models when external evidence was available. However, when extrapolating into time periods without trial or external evidence, the M-spline extrapolations were highly uncertain and generally unreliable. Moreover, extrapolations were heavily influenced by choice of knot number and knot location.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
MSR191
Topic
Health Technology Assessment, Methodological & Statistical Research
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, Oncology