CAN NEIGHBORHOOD-LEVEL SOCIAL DETERMINANTS OF HEALTH MEASURES IMPROVE MACHINE LEARNING CLASSIFICATION TASKS? DETECTING ALCOHOL USE DISORDER AS A USE CASE

Author(s)

Ruston M. Koonce, PharmD1, Bradley Martin, RPh, PharmD, PhD2;
1University of Arkansas for Medical Sciences, Student, Little Rock, AR, USA, 2University of Arkansas for Medical Sciences (UAMS), Little Rock, AR, USA
OBJECTIVES: Neighborhood level characteristics are frequently cited as important determinants of health, yet little is known how much these characteristics improve prediction or classification tasks. This study evaluated the utility of adding features derived from the Agency for Healthcare Research and Quality’s Community-Level Health Database (AHRQ-CLD) to Alcohol Use Disorder (AUD) classification models based on EHR derived features.
METHODS: Study subjects included patients from a large academic health system with an outpatient visit between 10/1/2017 and 10/1/2023. We evaluated the change in discrimination metrics (AUROC and AUPRC) between machine learning models targeting AUD built solely on EHR-derived features and those same models with added features from AHRQ-CLD. AHRQ-CLD aggregates several community surveys into zip-code based variables across five topic domains: demographics, economics, education, physical infrastructure, and health which were linked via honest broker to EHR data. Both sets of features were run through a data pipeline of feature selection, and model training, tuning, and evaluation in a hold-out test set for 4 machine learning algorithms: Random Forest, Multilayer Perceptron, Logistic Regression, and Stochastic Gradient Boosting.
RESULTS: 184,295 subjects were included with 4,483 (2.4%) being AUD positive. The average gain/loss(-) in testing AUROC varied between -0.0015 to 0.0172 and between -0.0173 to 0.0093 for AUPRC across the 4 classifiers. Multilayer Perceptron classifier saw the most benefit from adding AHRQ-CLD, boosting AUROC by 0.0172 and AUPRC by 0.0063. Stochastic Gradient Boosting performed the best (AUROC=0.8846; AUPRC=0.300) with the EHR feature set, but saw decreases in AUROC/AUPRC when AHRQ-CLD features were added.
CONCLUSIONS: Adding neighborhood level characteristics to EHR based AUD classifiers only slightly improves model discrimination and in some instances may lead to model over fitting. Despite the availability of such datasets, they may not provide adequate granularity for individual-level disease classification models.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR107

Topic

Methodological & Statistical Research

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×