PREDICTING TREATMENT-RESISTANT DEPRESSION AMONG MEDICARE BENEFICIARIES USING MACHINE LEARNING
Author(s)
Tim C. Lai, MSc1, Jingyi Zheng, PhD2, Jingjing Qian, PhD1, Cherry W. Jackson, PharmD3, Kimberly B. Garza, MBA, PharmD, PhD1, Richard H. Chapman, MS, PhD4, Surachat Ngorsuraches, PhD1;
1Auburn University, Health Outcomes Research and Policy, College of Pharmacy, Auburn, AL, USA, 2Auburn University, Mathematics and Statistics, College of Science and Mathematics, Auburn, AL, USA, 3Auburn University, Pharmacy Practice, College of Pharmacy, Auburn, AL, USA, 4Center for Innovation & Value Research, Alexandria, VA, USA
1Auburn University, Health Outcomes Research and Policy, College of Pharmacy, Auburn, AL, USA, 2Auburn University, Mathematics and Statistics, College of Science and Mathematics, Auburn, AL, USA, 3Auburn University, Pharmacy Practice, College of Pharmacy, Auburn, AL, USA, 4Center for Innovation & Value Research, Alexandria, VA, USA
OBJECTIVES: To develop machine learning models predicting treatment-resistant depression (TRD) among Medicare beneficiaries in the United States.
METHODS: We analyzed data from the 2017-2022 Medicare Current Beneficiary Survey (MCBS). The study cohort included beneficiaries with diagnosed depression. Outcomes were categorized into TRD (exposure to ≥2 antidepressants without remission, based on Patient Health Questionnaire 8-items [PHQ-8] ≥5, or use of atypical antipsychotics or ≥3 antidepressants) versus non-TRD (remission [PHQ-8 <5] following any antidepressant treatment). Predictors included demographics, comorbidities, physical or cognitive function (e.g., difficulty concentrating), and social determinants of health (SDoH, e.g., social activity, food insecurity). We trained random forest (RF), support vector machine (SVM), and eXtreme Gradient Boosting (XGBoost) models, using logistic regression as a benchmark. We applied five-fold cross-validation during training and used a temporal split (i.e., Train: 2017-2021; Test: 2022) to assess out-of-sample performance. To enhance model utility, we applied the AUC-weighted least absolute shrinkage and selection operator (LASSO) to select a subset of training features. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) with a 95% confidence interval (CI) obtained via bootstrap.
RESULTS: We included 1363 observations, with 31% classified as TRD. Compared with the logistic benchmark (AUC: 0.78; 95%CI: 0.71-0.85), the RF (AUC: 0.85; 95%CI: 0.78-0.91, p=0.046), XGBoost (AUC: 0.83; 95%CI: 0.76-0.89, p=0.148), and SVM (AUC: 0.82; 95%CI: 0.75-0.88, p=0.092) had slightly better performance. Age, limited social activity, and cognitive decline had high variable importance. Notably, the LASSO-selected variables with the RF model retained robust performance (AUC: 0.84; 95% CI: 0.78-0.90).
CONCLUSIONS: ML models utilizing MCBS data demonstrated that age, cognitive function, and limited social activity were critical predictors of TRD. The robust performance of a parsimonious RF model suggested that health systems and policymakers could leverage these crucial factors with an RF model to proactively identify high-risk populations for individualized interventions.
METHODS: We analyzed data from the 2017-2022 Medicare Current Beneficiary Survey (MCBS). The study cohort included beneficiaries with diagnosed depression. Outcomes were categorized into TRD (exposure to ≥2 antidepressants without remission, based on Patient Health Questionnaire 8-items [PHQ-8] ≥5, or use of atypical antipsychotics or ≥3 antidepressants) versus non-TRD (remission [PHQ-8 <5] following any antidepressant treatment). Predictors included demographics, comorbidities, physical or cognitive function (e.g., difficulty concentrating), and social determinants of health (SDoH, e.g., social activity, food insecurity). We trained random forest (RF), support vector machine (SVM), and eXtreme Gradient Boosting (XGBoost) models, using logistic regression as a benchmark. We applied five-fold cross-validation during training and used a temporal split (i.e., Train: 2017-2021; Test: 2022) to assess out-of-sample performance. To enhance model utility, we applied the AUC-weighted least absolute shrinkage and selection operator (LASSO) to select a subset of training features. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) with a 95% confidence interval (CI) obtained via bootstrap.
RESULTS: We included 1363 observations, with 31% classified as TRD. Compared with the logistic benchmark (AUC: 0.78; 95%CI: 0.71-0.85), the RF (AUC: 0.85; 95%CI: 0.78-0.91, p=0.046), XGBoost (AUC: 0.83; 95%CI: 0.76-0.89, p=0.148), and SVM (AUC: 0.82; 95%CI: 0.75-0.88, p=0.092) had slightly better performance. Age, limited social activity, and cognitive decline had high variable importance. Notably, the LASSO-selected variables with the RF model retained robust performance (AUC: 0.84; 95% CI: 0.78-0.90).
CONCLUSIONS: ML models utilizing MCBS data demonstrated that age, cognitive function, and limited social activity were critical predictors of TRD. The robust performance of a parsimonious RF model suggested that health systems and policymakers could leverage these crucial factors with an RF model to proactively identify high-risk populations for individualized interventions.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR8
Topic
Methodological & Statistical Research