CHALLENGES IN MACHINE LEARNING PREDICTION OF RARE EVENTS : IPW-ADJUSTED PREDICTORS AND THE ASSOCIATION OF GLP-1 RA WITH INCIDENT ALCOHOL-RELATED DISORDERS

Author(s)

Rinoj Gautam, PhD¹, Hao Wang, MD², Anjali Rajadhyaksha, PhD³, Rolake A. Neba, PharmD⁴, Bo Zhou, PhD⁵, Usha Sambamoorthi, MA, PhD⁶;
¹University of North Texas Health, Institute for Health Disparities, Fort Worth, TX, USA, ²JPS Healthcare, FORT WORTH, TX, USA, ³Lewis Katz School of Medicine at Temple University, Philadelphia, PA, USA, ⁴University of North Texas College of Pharmacy, Fort Worth, TX, USA, ⁵University of North Texas health sciences center, Fort Worth, TX, USA, ⁶UNTHSC, College of Pharmacy, Professor, Associate Dean of Health Outcomes Research, Denton, TX, USA

OBJECTIVES: Large administrative datasets are commonly used to study rare outcomes, as they allow enough numbers of events to be observed. However, accurately predicting these events at the individual level is challenging. Incident alcohol-related disorders is a low-frequency outcome but of clinical and public health interest. Emerging evidence suggests that Glucagon-like peptide-1 receptor agonists (GLP-1RA), prescribed for diabetes and obesity, may reduce alcohol intake/craving. This study analyzed leading predictors of incident alcohol-related disorders using inverse probability-weighted machine learning models.
METHODS: Methods: We adopted a retrospective cohort study using MarketScan Multi-State Medicaid Database(N=96,460) with 1-year baseline(2022) and follow-up(2023). Adults(age 18-64 years) with continuous enrollment and prescription drug coverage during 2022-2023 and no evidence of alcohol-related disorders in 2022 were included. Baseline features included demographic characteristics, social risk factors identified using Z codes, comorbid conditions, health plans, and GLP1-RA use. Inverse probability weighting (IPW) was applied to adjust for selection bias in GLP-1 RA use. Multiple machine learning models were developed to predict AUD, including logistic regression, XGBoost, LightGBM, and models with oversampling (SMOTE) or undersampling. Data were split 70/30 into training and test sets. Missing values were imputed, features standardized, and models trained with five-fold stratified cross-validation and probability calibration. Model performance was evaluated using ROC-AUC, recall, and precision. SHAP values were used to assess the meaningful contributions of features to model predictors.
RESULTS: Overall, 21.7% (N=20,899) used GLP-1 RAs; 2.1%(N=2004) had incident alcohol-related disorders. Logistic regression performed the best discrimination (ROC-AUC=0.69). Precision was low (<0.10), while recall was moderate (0.62). SHAP analysis identified GLP-1RA use among the top 15 predictors, with GLP-1RA users associated with a lower predicted risk of incident alcohol-related disorders.
CONCLUSIONS: While the prediction of low-incidence alcohol-related disorders remained challenging, SHAP analyses identified GLP-1 RA use as a potentially important predictive feature, warranting further investigation of its potential protective role.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR102

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

SDC: Diabetes/Endocrine/Metabolic Disorders (including obesity)

Presentation (CTI)