A PILOT STUDY APPLYING MACHINE LEARNING (ML) TO IDENTIFY PREDICTORS OF HIGH-COST CARE AMONG ADULTS WITH NON-HODGKIN LYMPHOMA (NHL) USING MEPS 2020-2023
Author(s)
Philip Yeung, MBA, MS, MSc, PharmD1, Dimitrios Tzachanis., MD, PhD2;
1KUMC, Student, San Diego, CA, USA, 2Scripps Health, Medical Oncology, San Diego, CA, USA
1KUMC, Student, San Diego, CA, USA, 2Scripps Health, Medical Oncology, San Diego, CA, USA
OBJECTIVES: Prior studies demonstrated that total healthcare expenditures among patients NHL exceed those of many other cancer populations. This study aimed to identify predictors associated with extremely high healthcare expenditures among U.S. adults with NHL.
METHODS: Adults with NHL were identified from the Medical Expenditure Panel Survey (MEPS) 2020-2023. Patients in the top 20% of annual healthcare expenditures were classified as high-cost. Person-level records were linked with condition-level data to construct demographic, socioeconomic, access-to-care, mental health, and clinical variables, including a modified Charlson Comorbidity Index (CCI) derived from ICD-10-CM codes. A weighted eXtreme Gradient Boosting (XGBoost) classifier was trained using an 80/20 train-test split with five-fold cross-validation to identify top predictors of high-cost; model interpretability was evaluated using SHAP global and local values. To avoid data-leakage, health expenditures, utilization measures, and prescription counts were excluded from model training.
RESULTS: The final analytic cohort included 269 NHL patients (mean age 60 yrs; weighted N= 849,583) with non-zero total healthcare expenditures. The high-cost cohort (weighted n=174,653) incurred substantially higher mean healthcare expenditures than non-high-cost cohort: total ($74,352.0 vs. $6,437.3); medications ($21,554.8 vs. $1,553.7), and hospitalizations ($18,420.6 vs. $216.4) (all p<0.001). High-cost patients reported lower family incomes ($68,522 vs. $95,517; p=0.042). The ML model achieved a best cross-validated AUC of 0.71. SHAP analyses identified key predictors of high-cost status, including higher modified CCI score, older age, lower family income , difficulty reaching providers after hours, travel time to care exceeding one hours, and cost-related dental care delays.
CONCLUSIONS: Although predictive performance was modest, this analysis of a nationally representative sample demonstrates the value of interpretable machine-learning approaches for understanding drivers of high-cost care among adults with NHL. The findings highlight the importance of addressing comorbidity burden, cost-related delays in care, and access barriers in patients with NHL.
METHODS: Adults with NHL were identified from the Medical Expenditure Panel Survey (MEPS) 2020-2023. Patients in the top 20% of annual healthcare expenditures were classified as high-cost. Person-level records were linked with condition-level data to construct demographic, socioeconomic, access-to-care, mental health, and clinical variables, including a modified Charlson Comorbidity Index (CCI) derived from ICD-10-CM codes. A weighted eXtreme Gradient Boosting (XGBoost) classifier was trained using an 80/20 train-test split with five-fold cross-validation to identify top predictors of high-cost; model interpretability was evaluated using SHAP global and local values. To avoid data-leakage, health expenditures, utilization measures, and prescription counts were excluded from model training.
RESULTS: The final analytic cohort included 269 NHL patients (mean age 60 yrs; weighted N= 849,583) with non-zero total healthcare expenditures. The high-cost cohort (weighted n=174,653) incurred substantially higher mean healthcare expenditures than non-high-cost cohort: total ($74,352.0 vs. $6,437.3); medications ($21,554.8 vs. $1,553.7), and hospitalizations ($18,420.6 vs. $216.4) (all p<0.001). High-cost patients reported lower family incomes ($68,522 vs. $95,517; p=0.042). The ML model achieved a best cross-validated AUC of 0.71. SHAP analyses identified key predictors of high-cost status, including higher modified CCI score, older age, lower family income , difficulty reaching providers after hours, travel time to care exceeding one hours, and cost-related dental care delays.
CONCLUSIONS: Although predictive performance was modest, this analysis of a nationally representative sample demonstrates the value of interpretable machine-learning approaches for understanding drivers of high-cost care among adults with NHL. The findings highlight the importance of addressing comorbidity burden, cost-related delays in care, and access barriers in patients with NHL.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
EPH208
Topic
Epidemiology & Public Health
Disease
SDC: Oncology