Random Survival Forest for Survival Extrapolation—Feasibility and Performance vs Parametric Modeling: An Application in Lung Cancer

Speaker(s)

Jewiti-Rigondza KJ1, Neff-Baro S2, Gauthier A3
1Amaris Consulting, Paris, 75, France, 2Amaris Consulting, Tucson, AZ, USA, 3Amaris Consulting, London, UK

OBJECTIVES: This study aimed to assess whether Random Survival Forest (RSF) can provide more reliable extrapolation estimates than current methods based on parametric modelling.

METHODS: The study was conducted based on the National Lung Screening Trial (NLST), containing 2,058 patients diagnosed with lung cancer.

Categorical variables were pre-processed through one-hot encoding method. RSF model training was performed on 70% of the randomly drawn sample. Validation was conducted in a test dataset truncated at two years to compare model predictions with observed values beyond two years.

RSF was compared to the seven most commonly used distributions in HTA submissions. Performance was assessed based on the Mean Absolute Error (MAE) by comparing the predicted survival probabilities of each model to the observed probabilities in the validation dataset.

RESULTS: All variables related to patient characteristics and treatment patterns with less than 10% of missing data were included in the model, leading to the selection of 547 variables.

The RSF predicted survival curve fell within the 95% confidence interval of the observed Kaplan-Meier curves across the time horizon, whereas none of the parametric models achieved this outcome.

Similarly, RSF outperformed the parametric models, with an MAE of 0.01% compared to a range of 0.04% to 0.1% for the parametric models (for which the AIC values were very similar up to 2 years).

CONCLUSIONS: This study suggests that predictive modeling using RSF has the potential to provide more reliable survival estimates than traditional parametric modelling while including a multitude of variables to reflect prognostic factors. Further research is needed to determine the data requirements for implementing this algorithm.

Code

MSR90

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

Oncology