Leading Predictors of Incident Hypertension Among Patients with Cancer in Community Health Centers: A Machine Learning Approach with Shapley Additive Explanations Using Electronic Health Records
Author(s)
Park C1, Han S2, Sambamoorthi U3
1The University of Texas at Austin, Austin, Texas, TX, USA, 2The University of Texas at Austin, Austin, TX, USA, 3University of North Texas Health Sciences Center, Denton, TX, USA
OBJECTIVES: Hypertension and cancer are interconnected through common risk factors, pathophysiological pathways, and the effects of certain anticancer drugs. This study aims to develop machine learning (ML)-based prediction models for new-onset hypertension among adults with cancer, identifying key contributing factors.
METHODS: This retrospective cohort study used nationwide electronic health records available from the OCHIN community health information network, a multistate collaboration of Community Health Centers (CHCs). OCHIN’s EHR data covers 170 health systems and 1,600 clinic sites serving more than 2.8 million adults (≥18 years of age) across 33 states. Hypertension-free cancer patients were identified based on diagnosis and blood pressure values (diastolic blood pressure (DBP)≥80 or systolic blood pressure (SBP)≥130). We employed a decision tree and three ensemble algorithms (random forest, AdaBoost, and XGBoost), incorporating various features, including cancer types, biological factors (age and sex), comorbidities, co-medications, and vitals (body mass index (BMI), DBP and SBP within the normotensive range). Model development involved training data, 5 by 2-fold cross-validation for tuning, and evaluation of the final model using the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F2 score. The Shapley Additive exPlanations (SHAP) method was used to identify leading predictors.
RESULTS: Among 88,237 patients with cancer (mean age=45.44; 70.93% female), 6.19% (n=5,465) developed new-onset hypertension. XGBoost displayed the best performance (AUC=0.83; accuracy=0.68; precision=0.14; recall=0.83; F-score=0.42). Older age, higher SBP within the normotensive range, higher BMI, and higher DBP within the normotensive range were identified as the top leading predictors of new-onset hypertension.
CONCLUSIONS: The study demonstrates the effectiveness of ML in predicting new-onset hypertension in patients with cancer, with XGBoost proving to be the most efficient algorithm. Identified predictors, such as age, BMI, and blood pressure levels even within the normotensive range, are crucial for managing and preventing hypertension in this group.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 6, S1 (June 2024)
Code
CO8
Topic
Clinical Outcomes, Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Clinical Outcomes Assessment
Disease
Cardiovascular Disorders (including MI, Stroke, Circulatory), Oncology