Can an Ensemble Machine-Learning Approach Outperform Traditional Models While Enhancing Accuracy, Fairness, and Interpretability in Clinical Risk Prediction?
Author(s)
Salma Barkaoui, PhD1, Mohammed BENNANI2, Sena Nur Bilgin, Master1, Jerome Vetillard, PhD, MD1.
1Qualees, Paris, France, 2PhD, QUALEES, PARIS, France.
1Qualees, Paris, France, 2PhD, QUALEES, PARIS, France.
OBJECTIVES: To assess whether an ensemble machine learning (ML) approach can improve predictive performance, fairness, and interpretability in both classification and regression tasks for clinical risk prediction, compared to traditional ML and statistical models.
METHODS: We evaluated an ensemble ML framework against standard models, including Ridge Regression, Random Forest, Decision Tree, Support Vector Regression (SVR), and XGBoost, across four healthcare datasets (Diabetes, Parkinson’s Disease, Cervical Cancer, and Breast Cancer). The ensemble integrates multiple base learners and employs Bayesian optimization for performance tuning. Classification metrics included precision, recall, F1 score, and AUC-ROC. For regression, Mean Squared Error (MSE) was used. SMOTETomek was applied to correct class imbalance.
RESULTS: In classification tasks, the ensemble model achieved a weighted F1 score of 0.995, outperforming all baseline models (e.g., Artificial Neural Network ANN: 0.9845). AUC values exceeded 0.99 across datasets. Confusion matrices indicated a strong predictive balance across classes. For regression tasks, the ensemble model also showed superior stability and lower error: its MSE sensitivity was 5 time lower than Ridge Regression. These results demonstrate the ensemble's robustness and consistency under varying conditions. Additionally, the approach showed improved interpretability potential, though further validation is required in real-world settings.
CONCLUSIONS: An ensemble ML approach optimized via Bayesian methods offers significant performance and stability gains over traditional models in both classification and regression for healthcare risk prediction. It effectively handles class imbalance, enhances fairness, and maintains computational efficiency—making it a strong candidate for scalable, real-world clinical decision support systems.
METHODS: We evaluated an ensemble ML framework against standard models, including Ridge Regression, Random Forest, Decision Tree, Support Vector Regression (SVR), and XGBoost, across four healthcare datasets (Diabetes, Parkinson’s Disease, Cervical Cancer, and Breast Cancer). The ensemble integrates multiple base learners and employs Bayesian optimization for performance tuning. Classification metrics included precision, recall, F1 score, and AUC-ROC. For regression, Mean Squared Error (MSE) was used. SMOTETomek was applied to correct class imbalance.
RESULTS: In classification tasks, the ensemble model achieved a weighted F1 score of 0.995, outperforming all baseline models (e.g., Artificial Neural Network ANN: 0.9845). AUC values exceeded 0.99 across datasets. Confusion matrices indicated a strong predictive balance across classes. For regression tasks, the ensemble model also showed superior stability and lower error: its MSE sensitivity was 5 time lower than Ridge Regression. These results demonstrate the ensemble's robustness and consistency under varying conditions. Additionally, the approach showed improved interpretability potential, though further validation is required in real-world settings.
CONCLUSIONS: An ensemble ML approach optimized via Bayesian methods offers significant performance and stability gains over traditional models in both classification and regression for healthcare risk prediction. It effectively handles class imbalance, enhances fairness, and maintains computational efficiency—making it a strong candidate for scalable, real-world clinical decision support systems.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
MSR51
Topic
Methodological & Statistical Research, Real World Data & Information Systems, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
Diabetes/Endocrine/Metabolic Disorders (including obesity), No Additional Disease & Conditions/Specialized Treatment Areas