Testing a Survival Extrapolation Algorithm for Cancer Immunotherapies: Pass or Fail?

Author(s)

Latimer N1, Taylor K2, Hatswell A3, Ho S4, Okorogheye G4, Chen C4, Kim I5, Borrill J4, Bertwistle D4
1University of Sheffield & Delta Hat Limited, Sheffield, DBY, Great Britain, 2Delta Hat Limited, Long Eaton, NGM, UK, 3Delta Hat Ltd, Nottingham, UK, 4Bristol Myers Squibb, Uxbridge, UK, 5Bristol Myers Squibb, Lawrenceville, NJ, USA

OBJECTIVES: Accurately extrapolating survival beyond trial follow-up is essential in health technology assessment, with model choice often substantially impacting estimates of clinical benefits and cost-effectiveness. Immuno-oncology is especially affected because survival curves can flatten over time, suggesting durable long-term benefits. Recently, Palmer et al. (2022) developed an algorithm to aid immunotherapy survival model selection. We present a practical demonstration of this algorithm using multiple data-cuts from the CheckMate-649 (CM-649) study. We aimed to assess the practical applicability of the algorithm, and whether it identified survival models fitted to earlier data-cuts, that accurately predicted outcomes observed in later data-cuts.

METHODS: The Palmer et al. algorithm was used to: (i) identify candidate survival models given external data, previously expressed expert beliefs, and diagnostic analyses undertaken on the CM-649 data, and (ii) to define plausibility criteria which models must satisfy to be considered credible. Candidate models were applied to 12- and 24-month data-cuts, and predictions compared to plausibility criteria and outcomes observed in longer-term follow-up.

RESULTS: The algorithm was simple to use and offered a systematic procedure for model selection, encouraging highly detailed analyses and ensuring that crucial stages in the selection process were not overlooked. In our case study, log normal, log-logistic, Generalized Gamma, cubic spline, and cure models were identified as candidate models. Of these models, only log-logistic and non-mixture cure models (with cure assumed at 10-15 years post baseline) provided survival estimates that met plausibility criteria. Log-logistic models appeared to under-estimate survival observed in the 36-month data-cut, whereas non-mixture cure models performed well.

CONCLUSIONS: The Palmer et al. algorithm provides a systematic framework for identifying suitable survival models, and for defining plausibility criteria for extrapolation validity. The algorithm requires that model selection is based on explicit justification and evidence. Use of this approach could reduce discordance in technology appraisals.