FORECASTING CLINICAL OUTCOMES FROM LIMITED REAL-WORLD DATA: A COMPARATIVE SIMULATION STUDY
Author(s)
Awa Diop, PhD1, Sheena Kayaniyil, PhD2, Lise RETAT, PhD3, Sarah Collier, MSc4, Diar Fattah, PhD5, Stefan Franzén, PhD6.
1AstraZeneca, Mississauga, ON, Canada, 2AstraZeneca, Missisauga, ON, Canada, 3AstraZeneca Spain, Barcelona, Spain, 4AstraZeneca Canada, Mississauga, ON, Canada, 5AstraZeneca, Barcelona, Spain, 6AstraZeneca Sweden, Gothenburg, Sweden.
1AstraZeneca, Mississauga, ON, Canada, 2AstraZeneca, Missisauga, ON, Canada, 3AstraZeneca Spain, Barcelona, Spain, 4AstraZeneca Canada, Mississauga, ON, Canada, 5AstraZeneca, Barcelona, Spain, 6AstraZeneca Sweden, Gothenburg, Sweden.
OBJECTIVES: There is growing appreciation for population-based location-specific real-world data (RWD) to drive targeted informed decision-making. Estimating and evaluating future trends in clinical outcomes at a national and sub-national level is essential for effective local planning. Thus, we compared different modeling approaches to identify which one is best suited for demonstrating future trends of key clinical outcomes. We assessed robustness and short horizon performance of [linear model (LM), generalized additive model (GAM), and autoregressive integrated moving average (ARIMA)], with interrupted time series (ITS) to capture COVID-related disruptions.
METHODS: A simulation study of 40 scenarios was conducted using a data-generating process calibrated to our data composed by one national series and 18 regional series considering both chronic obstructive pulmonary disease (COPD) hospital admission rates and counts, with and without seasonality, and T = 10 to 50 time points of follow-up. Separate and pooled analyses of sub-national regions were investigated with and without random-effects. Forecasts were evaluated via rolling origin at 1, 3, and 5 future time points, reporting accuracy and uncertainty metrics.
RESULTS: Without seasonality, across 10-50 time points, LM + ITS, GAM + ITS, and ARIMA + ITS achieved similar prediction errors (e.g., for T = 10 and 3 future time points, mean errors were respectively 1.78, 1.79 and 1.80). With seasonality, ARIMA+ITS achieved overall lower forecasting errors. At a regional level, ARIMA + ITS without pooling consistently delivered the best forecasts results. When the number of time points increases, the benefit of modeling cross-region correlation grows: pooled and hierarchical fits can better distinguish shared shocks from region-specific noise. However, adding correlations mainly helps uncertainty quantification rather than markedly reducing point error.
CONCLUSIONS: ARIMA + ITS is a flexible and reliable approach for timely, informed decision-making. This approach adapts to different data configurations and can account for seasonality and interruption in the time series.
METHODS: A simulation study of 40 scenarios was conducted using a data-generating process calibrated to our data composed by one national series and 18 regional series considering both chronic obstructive pulmonary disease (COPD) hospital admission rates and counts, with and without seasonality, and T = 10 to 50 time points of follow-up. Separate and pooled analyses of sub-national regions were investigated with and without random-effects. Forecasts were evaluated via rolling origin at 1, 3, and 5 future time points, reporting accuracy and uncertainty metrics.
RESULTS: Without seasonality, across 10-50 time points, LM + ITS, GAM + ITS, and ARIMA + ITS achieved similar prediction errors (e.g., for T = 10 and 3 future time points, mean errors were respectively 1.78, 1.79 and 1.80). With seasonality, ARIMA+ITS achieved overall lower forecasting errors. At a regional level, ARIMA + ITS without pooling consistently delivered the best forecasts results. When the number of time points increases, the benefit of modeling cross-region correlation grows: pooled and hierarchical fits can better distinguish shared shocks from region-specific noise. However, adding correlations mainly helps uncertainty quantification rather than markedly reducing point error.
CONCLUSIONS: ARIMA + ITS is a flexible and reliable approach for timely, informed decision-making. This approach adapts to different data configurations and can account for seasonality and interruption in the time series.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR22
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
SDC: Respiratory-Related Disorders (Allergy, Asthma, Smoking, Other Respiratory)