Predicting Cannabis Use Among US Adults Using Machine Learning and Deep-Learning Approaches

Author(s)

Xiangxiang Jiang, MS1, Gang Lv, MD2, Z. Kevin Lu, PhD1.
1University of South Carolina College of Pharmacy, Columbia, SC, USA, 2The First Medical Center of Chinese PLA General Hospital, Beijing, China.
OBJECTIVES: With cannabis use increasing across the U.S., identifying key predictors across biological, behavioral, physical/built environment, sociocultural, and health care system domains is essential for developing effective public health strategies. Few studies have used nationally representative data with advanced modeling approaches to capture the complex factors influencing cannabis use. This study aimed to develop and compare machine learning and deep learning models to predict cannabis use among U.S. adults.
METHODS: We analyzed data from the 2023 Behavioral Risk Factor Surveillance System (BRFSS). Cannabis use was self-reported and treated as a binary outcome. A total of 82 predictors—guided by the National Institute on Minority Health and Health Disparities (NIMHD) Research Framework—were included across multiple domains. Five machine learning models (logistic regression, random forest, k-nearest neighbors, extreme gradient boosting, and naïve Bayes) were used. Additionally, 32 deep learning models with varied activation functions were developed. Model performance was assessed using accuracy, sensitivity, specificity, F1 score, and area under the curve (AUC).
RESULTS: A total of 18,346 individuals were included, with 18.33% of them using cannabis. Among all models, the random forest achieved the highest predictive performance (accuracy: 0.89, sensitivity: 0.92, specificity: 0.87, F1 score: 0.90, AUC: 0.96). The best deep learning model reached an AUC of 0.80, with strong sensitivity (0.86) but modest specificity (0.59). Feature importance showed binge drinking, mental health, and smoking as top predictors of cannabis use.
CONCLUSIONS: This study demonstrates that machine learning, particularly random forest, offers high accuracy in predicting cannabis use among U.S. adults using diverse social and health-related predictors. Deep learning models showed competitive sensitivity but lower specificity. These findings highlight the value of integrating multidimensional data to identify high-risk individuals and inform targeted public health interventions. Future research should focus on longitudinal modeling and the integration of temporal or clinical data to enhance predictive performance.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

MSR168

Topic

Epidemiology & Public Health, Health Policy & Regulatory, Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×