DEVELOPMENT OF AN ALCOHOL USE DISORDER IDENTIFICATION MODEL USING ELECTRONIC HEALTH RECORD-DERIVED FEATURES FOR CLINICAL IMPLEMENTATION
Author(s)
Ruston M. Koonce, PharmD1, Corey J Hayes, PharmD, PhD2, Michael Cucciare, PhD2, Clare Brown, PhD2, Melody Greer, PhD2, Bradley Martin, RPh, PharmD, PhD2.
1University of Arkansas for Medical Sciences, Student, Little Rock, AR, USA, 2University of Arkansas for Medical Sciences (UAMS), Little Rock, AR, USA.
1University of Arkansas for Medical Sciences, Student, Little Rock, AR, USA, 2University of Arkansas for Medical Sciences (UAMS), Little Rock, AR, USA.
OBJECTIVES: To address under-diagnosis of Alcohol Use Disorder (AUD), a machine learning classifier to identify undiagnosed AUD utilizing electronic health records (EHR) was developed.
METHODS: To detect undiagnosed AUD, a cross sectional EHR derived feature set was assembled for those with and without AUD two years prior to/including an index visit under the framework that when a classifier labels subjects with AUD and it is undetected (model false positive), those subjects are more likely to have undiagnosed latent AUD (LAUD). Study subjects were identified from a large academic health system having an outpatient clinic visit between 10/1/2017 and 10/1/2023. To validate this framework, subjects without AUD at index were followed for AUD diagnosis 1 year post index and the ratio LAUD among false positives (FP) and true negatives (TN) was calculated:
A robust data pipeline of intake, preprocessing, feature selection, and model training, tuning, and validation in a hold-out test set was created to compare performance (AUROC, AUPRC) , across 4 algorithms: Random Forest, Multilayer Perceptron, Logistic Regression, and Stochastic Gradient Boosting.
RESULTS: 184,295 subjects were included, of which 4,483 (2.4%) were AUD positive at index, and 1,744 (0.95%) were diagnosed with AUD one year post-index. Stochastic Gradient Boosting performed best with an AUROC of 0.884 and AUPRC of 0.300 in the testing sample. This model can amplify the signal detection of latent AUD with a RRLAUD of 9.54 when setting a prediction threshold of 10% (AUD US population prevalence). At this threshold, 113 out of 228 (49.6%) subjects with LAUD were predicted positive by the model.
CONCLUSIONS: This classifier could be incorporated in EHR systems to prompt clinicians to screen for AUD in individuals most likely to have latent AUD facilitating engagement in AUD treatment. Externally validating this classifier and evaluating potential algorithmic bias would facilitate clinical adoption.
METHODS: To detect undiagnosed AUD, a cross sectional EHR derived feature set was assembled for those with and without AUD two years prior to/including an index visit under the framework that when a classifier labels subjects with AUD and it is undetected (model false positive), those subjects are more likely to have undiagnosed latent AUD (LAUD). Study subjects were identified from a large academic health system having an outpatient clinic visit between 10/1/2017 and 10/1/2023. To validate this framework, subjects without AUD at index were followed for AUD diagnosis 1 year post index and the ratio LAUD among false positives (FP) and true negatives (TN) was calculated:
RESULTS: 184,295 subjects were included, of which 4,483 (2.4%) were AUD positive at index, and 1,744 (0.95%) were diagnosed with AUD one year post-index. Stochastic Gradient Boosting performed best with an AUROC of 0.884 and AUPRC of 0.300 in the testing sample. This model can amplify the signal detection of latent AUD with a RRLAUD of 9.54 when setting a prediction threshold of 10% (AUD US population prevalence). At this threshold, 113 out of 228 (49.6%) subjects with LAUD were predicted positive by the model.
CONCLUSIONS: This classifier could be incorporated in EHR systems to prompt clinicians to screen for AUD in individuals most likely to have latent AUD facilitating engagement in AUD treatment. Externally validating this classifier and evaluating potential algorithmic bias would facilitate clinical adoption.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR54
Topic
Methodological & Statistical Research