Healthcare Costs in Older Adults: XGBoost, NNA, and Logit
Author(s)
Jie Chen, PhD.
Professor and Chair, University of Maryland, College Park, MD, USA.
Professor and Chair, University of Maryland, College Park, MD, USA.
OBJECTIVES: To predict healthcare costs among older adults using a comprehensive dataset of integrating Medicare claims, socioeconomic, and patient-reported healthcare experience measures, and compare performance between logistic regression, XGboost, and neural networks analysis (NNA) in predicting high total medical costs.
METHODS: We used linked CMS fee-for-service (FFS) Medicare claims data with Consumer Assessment of Healthcare Providers and Systems survey data. The study focused on Medicare FFS beneficiaries aged 65 and older. The primary outcome was high total annual Medicare cost (i.e., the top 25% of the cost distributions). This cross-sectional analysis was guided by the Andersen Behavioral Model of Health Services Use. Following this framework, we controlled for predisposing, need, and enabling factors, including socioeconomic status (SES) and self-reported health measures. We implemented both logistic regression, XGboost, and neural network analyses, comparing their performance in predicting the outcome.
RESULTS: Depression and heart disease emerged as the strongest predictors of high cost. NNA and XGboost models demonstrated comparable performance to logistic regression, with similar sensitivity, specificity, positive predictive value, and negative predictive value.
CONCLUSIONS: These findings underscore the importance of preventive strategies targeting mental health and heart disease to promote whole-person care and reduce healthcare costs. Moreover, traditional statistical approaches such as logistic regression can perform as effectively as machine learning methods like NNA, emphasizing the value of theory-driven, interpretable models for informing healthcare practice and policy.
METHODS: We used linked CMS fee-for-service (FFS) Medicare claims data with Consumer Assessment of Healthcare Providers and Systems survey data. The study focused on Medicare FFS beneficiaries aged 65 and older. The primary outcome was high total annual Medicare cost (i.e., the top 25% of the cost distributions). This cross-sectional analysis was guided by the Andersen Behavioral Model of Health Services Use. Following this framework, we controlled for predisposing, need, and enabling factors, including socioeconomic status (SES) and self-reported health measures. We implemented both logistic regression, XGboost, and neural network analyses, comparing their performance in predicting the outcome.
RESULTS: Depression and heart disease emerged as the strongest predictors of high cost. NNA and XGboost models demonstrated comparable performance to logistic regression, with similar sensitivity, specificity, positive predictive value, and negative predictive value.
CONCLUSIONS: These findings underscore the importance of preventive strategies targeting mental health and heart disease to promote whole-person care and reduce healthcare costs. Moreover, traditional statistical approaches such as logistic regression can perform as effectively as machine learning methods like NNA, emphasizing the value of theory-driven, interpretable models for informing healthcare practice and policy.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
HSD56
Topic
Economic Evaluation, Health Policy & Regulatory, Health Service Delivery & Process of Care
Disease
Geriatrics