Boosting Predictive Power: Thoughtfully Unleashing the Potential of Machine Learning In Real-World Healthcare Outcome Estimation
Author(s)
Achal Patel, PhD1, Pluto Zhang, MS2, Zhiyu Xia, PhD1, Gurleen S. Jhuti, MSc1, Daniel Sheinson, PhD3;
1Genentech, South San Francisco, CA, USA, 2Harvard, Boston, MA, USA, 3Genentech, Inc., Principal Data Scientist, South San Francisco, CA, USA
1Genentech, South San Francisco, CA, USA, 2Harvard, Boston, MA, USA, 3Genentech, Inc., Principal Data Scientist, South San Francisco, CA, USA
OBJECTIVES: Machine learning (ML) has been proposed as a more accurate approach to predictive modeling in health economics and outcomes research (HEOR) compared to traditional methods (TM). However, supporting evidence comparing the two approaches is limited by inconsistent reporting of results, biased comparisons, and incorrect validation procedures. A rigorous comparison of ML vs TM is needed to better understand the added value of ML in real-world data (RWD) studies.
METHODS: We evaluated the performance of ML vs TM for predicting annual healthcare costs (typically challenging to predict using TM) across multiple dimensions, including sample size and the number of baseline independent variables, in cohorts of multiple sclerosis (MS) and breast cancer (BC) patients identified in the IQVIA Pharmetrics® Plus closed-claims database. For ML, we applied XGBoost and deep neural nets with hyperparameter tuning, and for TM we employed linear regression with restricted cubic splines for continuous independent variables. Performance was assessed using 10-fold cross-validation, quantified by R2 and the slope of the calibration curve.
RESULTS: ML outperformed TM at large sample sizes using conventional HEOR variables for prediction, while performance of the models was comparable at smaller sample sizes. Adding variables based on clinical classification of claim codes improved ML predictive performance. Using splines improved performance for the MS cohort but less so for the BC cohort in the TM analysis. XGBoost offered the best predictive performance among ML methods.
CONCLUSIONS: ML approaches performed equally well as TM in smaller sample sizes and improved cost prediction in larger samples. Potential efficiencies in study conduct were observed with ML. Researchers should balance the potential gains in predictive performance against the interpretability of results when considering ML for a given study. This work allows us to make recommendations for optimal ML use in HEOR.
METHODS: We evaluated the performance of ML vs TM for predicting annual healthcare costs (typically challenging to predict using TM) across multiple dimensions, including sample size and the number of baseline independent variables, in cohorts of multiple sclerosis (MS) and breast cancer (BC) patients identified in the IQVIA Pharmetrics® Plus closed-claims database. For ML, we applied XGBoost and deep neural nets with hyperparameter tuning, and for TM we employed linear regression with restricted cubic splines for continuous independent variables. Performance was assessed using 10-fold cross-validation, quantified by R2 and the slope of the calibration curve.
RESULTS: ML outperformed TM at large sample sizes using conventional HEOR variables for prediction, while performance of the models was comparable at smaller sample sizes. Adding variables based on clinical classification of claim codes improved ML predictive performance. Using splines improved performance for the MS cohort but less so for the BC cohort in the TM analysis. XGBoost offered the best predictive performance among ML methods.
CONCLUSIONS: ML approaches performed equally well as TM in smaller sample sizes and improved cost prediction in larger samples. Potential efficiencies in study conduct were observed with ML. Researchers should balance the potential gains in predictive performance against the interpretability of results when considering ML for a given study. This work allows us to make recommendations for optimal ML use in HEOR.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
P15
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
SDC: Neurological Disorders, SDC: Oncology