Ethical AI in Healthcare: Evaluating Fairness in Colorectal Cancer Survivor Healthcare Expenditures Predictions with Interpretable Machine Learning (ML) Models

Author(s)

Zhou B1, Gupta M2, Pathak M3, Siddiqui ZA4, Sambamoorthi N1, Niranjan S5, Sambamoorthi U3
1University of North Texas Health Sciences Center, Fort Worth, TX, USA, 2Southern Methodist University, Dallas, TX, USA, 3University of North Texas Health Sciences Center, Denton, TX, USA, 4West Virginia University, Morgantown, WV, USA, 5University of North Texas, Denton, TX, USA

OBJECTIVES: ML predictions of healthcare expenditures need to be fair across sensitive attributes such as gender, race and ethnicity, socioeconomic status, and geographic location to achieve health equity. The objective of this study is to quantify how fair the algorithm is in its predictions of healthcare expenditures for sensitive attributes.

METHODS: A retrospective cohort analysis with one-year baseline and one-year follow up period of older adults with incident colorectal cancer survivors diagnosed between 2010 and 2017 (N = 32,702) from SEER-Medicare database. Total expenditures(2018$) were derived from Medicare payments for inpatient, outpatient, prescription drugs, home health, and others. Sensitive attributes comprised sex, geographic location, poverty status, and race and ethnicity. eXtreme Gradient Boosting(XGBoost) and Shapley additive explanations with patient-, county-, and census tract- level features (N = 86) were used. Fairness was evaluated using differences in RMSE, R-square, MAE across subgroups and counterfactual using flip tests. Subgroup differences in these metrics between -0.1 and +0.1 were considered fair.

RESULTS: The sensitive attributes distributions were: 58% female, 9% non-metro, 31% poor, 8% NHA, 6% Hispanic, and 7% AA/PI/AN and 1% other race. The average healthcare expenditures was ($66753 + $53770). The incremental expenditures were: $-5,463(females), $-9,877(non-metro), $15,881(poor), $8,943(NHB), $8,143(Hispanic), $3,945(AA/PI/AN), $8,999(multiple race). For the overall model, the RMSE and R-Square for test dataset were 0.65 and 0.50. The top 4 leading predictors were: multimorbidity, cancer treatment (chemotherapy and surgery), and localized cancer stage. RMSE for NHB, poor, and males were higher compared to NHW, not poor, and females. Counterfactual fairness did not reveal any differences in fairness metrics.

CONCLUSIONS: Fairness metrics of XGBoost models suggested bias in predictions for NHB, males, and poor colorectal cancer survivors. Predictive ML models need to routinely report fairness metrics for sensitive attributes along with model performance and leading predictors.

Conference/Value in Health Info

2024-05, ISPOR 2024, Atlanta, GA, USA

Value in Health, Volume 27, Issue 6, S1 (June 2024)

Acceptance Code

P9

Topic

Economic Evaluation, Health Policy & Regulatory, Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Health Disparities & Equity

Disease

no-additional-disease-conditions-specialized-treatment-areas, Oncology

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×