DEFINING COUNTY LEVEL SOCIAL DETERMINANTS OF HEALTH BASED PHENOTYPES RELATED TO DEPRESSION IN THE UNITED STATES USING MACHINE LEARNING APPROACH
Author(s)
Md Fitrat Hossain, PhD, Taraneh Mousavi, PharmD, Fadia T. Shaya, MPH, PhD.
University of Maryland, Baltimore, MD, USA.
University of Maryland, Baltimore, MD, USA.
OBJECTIVES: Depression prevalence has been associated with social determinants of health (SDoH). To better articulate this association, we developed SDoH based phenotypes related to depression, using machine learning algorithms.
METHODS: Data on 19 SDoH variables were collected for 2347 counties across the US, from a geography information system platform PolicyMap®. To ensure consistency of the predictor variables between the training and test sets, counties were divided into two clusters using k-means clustering based on these predictors. 70% and 30% of the counties were selected from each cluster to form training and test set, respectively. Classification and regression tree (CART) model was used on the training set to define phenotypes from SDoH. Random forest analysis was also conducted to identify additional risk factors.
RESULTS: CART model identified three phenotypes (A, B, C) for depression characterized by food insecurity and access to healthcare provider. These divided the counties into three distinct groups with significantly different mean depression rates (F-statistics: 280.7, p-value < 0.001). Counties with Phenotype-C (food insecurity rate ≥ 15 % and access to healthcare ≥ 74%) have the highest depression rates (mean 25.57, 95% CI [25.3, 25.8]). There was little difference in model performance between training and test sets (training RMSE: 2.76, test RMSE: 2.86). Random forest models identified additional risk factors: percentage of beneficiaries, foreign born, and people with disability, based on node purity.
CONCLUSIONS: The phenotypes based on food insecurity and access to healthcare using CART and random forest can help to identify geographic areas with high risks of depression. This will help target interventions more precisely in specific areas.
METHODS: Data on 19 SDoH variables were collected for 2347 counties across the US, from a geography information system platform PolicyMap®. To ensure consistency of the predictor variables between the training and test sets, counties were divided into two clusters using k-means clustering based on these predictors. 70% and 30% of the counties were selected from each cluster to form training and test set, respectively. Classification and regression tree (CART) model was used on the training set to define phenotypes from SDoH. Random forest analysis was also conducted to identify additional risk factors.
RESULTS: CART model identified three phenotypes (A, B, C) for depression characterized by food insecurity and access to healthcare provider. These divided the counties into three distinct groups with significantly different mean depression rates (F-statistics: 280.7, p-value < 0.001). Counties with Phenotype-C (food insecurity rate ≥ 15 % and access to healthcare ≥ 74%) have the highest depression rates (mean 25.57, 95% CI [25.3, 25.8]). There was little difference in model performance between training and test sets (training RMSE: 2.76, test RMSE: 2.86). Random forest models identified additional risk factors: percentage of beneficiaries, foreign born, and people with disability, based on node purity.
CONCLUSIONS: The phenotypes based on food insecurity and access to healthcare using CART and random forest can help to identify geographic areas with high risks of depression. This will help target interventions more precisely in specific areas.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
EPH17
Topic
Epidemiology & Public Health
Topic Subcategory
Public Health