Examining Fairness in Machine Learning Predictions of Healthcare Costs in Older Women With Osteoarthritis: XGBoost Regression

Speaker(s)

Elchehabi S1, Dehghan A2, Park C3, Sambamoorthi N4, Shen C5, Sambamoorthi U6
1University of North Texas Health Science Center, League City, TX, USA, 2University of North Texas Health Science Center, Denton, TX, USA, 3The University of Texas at Austin, Austin, Texas, TX, USA, 4University of North Texas Health Sciences Center, Fort Worth, TX, USA, 5Penn State College of Medicine, Hershey, PA, USA, 6University of North Texas Health Sciences Center, Denton, TX, USA

OBJECTIVES: Osteoarthritis (OA) is the most common joint disorder worldwide, and studies suggest that women are at a higher risk than men. Furthermore, OA is associated with high direct healthcare costs, attributable to its complex disease management. Utilizing machine learning (ML) methods, the leading predictors of expenditures among older women with OA were investigated. Yet, ML model predictions of economic burden must be equitable across sensitive attributes such as race and ethnicity and socioeconomic status. This study quantifies the fairness of the ML algorithm in its predictions for sensitive attributes.

METHODS: A cross-sectional study was conducted using data from women (ge 65 years) with OA from the 2021 Medical Expenditure Panel Expenditure survey. Key predictors of log-transformed expenditures were identified using eXtreme Gradient Boosting(XGBoost) Regression and SHapley Additive exPlanations(SHAP). The sensitive attributes consisted of race and ethnicity and socioeconomic status (education and poverty status). Fairness of the model was assessed using MAE, RMSE, and R-squared across subgroups and counterfactual using swap tests.

RESULTS: The overall mean healthcare expenditures was $18,619, and the incremental expenditures compared to privileged counterparts were: $3,398 (NHB), $1,255 (Hispanic), -$1,903 (NHA), $4,005 (Other race), $4,866 (poor), $4,866 (near poor), $8,717 (middle), $4,291 (less than HS), $4,544 (HS), and -$1,574 (some college). SHAP analysis revealed age, high school (HS), some college, and comorbidities as leading predictors. Overall, the model fit was exceptional (RMSE=0.20, R-squared=0.96). RMSE for NHB, Hispanic, NHA, and Other races, poor income, and high school education level were higher in comparison to NHW race, high income, and college education level.

CONCLUSIONS: While the ML model performed well in accuracy by its low RMSE and high R-squared, the fairness metrics following the swap testing suggested bias in its predictions. The benefits of obtaining leading predictors from ML models need to be further assessed for sensitive attributes with reported fairness metrics.

Code

MSR16

Topic

Economic Evaluation, Health Policy & Regulatory, Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Health Disparities & Equity, Surveys & Expert Panels

Disease

Geriatrics, Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal)