Machine Learning in Healthcare: Analyzing Cost Prediction Biases to Ensure Health Equity Among Older Adults With Incident Hodgkin’s Lymphoma

Speaker(s)

Siddiqui ZA1, Mbous Y2, Nduaguba S1, LeMasters T1, Scott VG3, Patel J4, Sambamoorthi U5
1West Virginia University, Morgantown, WV, USA, 2West Virginia University, Chicago, IL, USA, 3West Virginia University, School of Pharmacy, Morgantown, WV, USA, 4Temple University, Philadelphia, PA, USA, 5University of North Texas Health Sciences Center, Denton, TX, USA

OBJECTIVES: The objective of this study is to assess the biases and fairness in the Machine learning (ML) models for predicting healthcare expenditures across patients’ sensitive attributes.

METHODS: This retrospective study used SEER-Medicare data in older adults diagnosed with primary incident Hodgkin's lymphoma (HL) between 2009 and 2017. XGBoost with ten-fold cross-validation and random forest (RF) ML models were used to predict log-transformed healthcare expenditures (Medicare payment). Sensitive attributes consisted of gender, race, and poverty status measured as dual Medicare/Medicaid eligibility. Cost prediction of historically underprivileged groups (women, non-White, and poor) was compared to privileged groups (men, White, and not poor). Models biases and fairness across sensitive attributes were assessed using R2, RMSE, and error distribution. Additionally, the robustness of the model performance was evaluated by varying the optimized hyperparameter and visually summarized as a Kernel Density Estimation plot. Group fairness was also evaluated through counterfactual analyses by flipping underprivileged and privileged groups.

RESULTS: The study analyzed 902 patients, 52.5% female, 18.6% non-White, 20.3% poor. The XGBoost and RF models had overall R2 (RMSE) scores of 0.42 (0.57) and 0.36 (0.60), respectively. Subgroup error analysis indicated model performance varied across sensitive attributes, with lower RMSE for non-White (0.52 (95% CI: 0.42-0.62) vs. 0.576 (95% CI: 0.49-0.66)), and higher for female and poor in XGBoost and RF. The models were less robust for the sensitive attributes in the underprivileged groups. Counterfactual fairness with the flip test revealed no significant differences in model performance.

CONCLUSIONS: Model performance and the counterfactual fairness assessment did not reveal unfairness in the sensitive attributes. However, the model robustness showed potential bias against underprivileged groups. Predictive ML models of healthcare expenditures need to incorporate fairness measured to ensure equitable predictions across all sensitive demographic attributes.

Code

RWD75

Topic

Health Policy & Regulatory, Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Health Disparities & Equity

Disease

Oncology, Rare & Orphan Diseases