Machine Learning Prediction of Drug Overdoses Among Young People
Speaker(s)
Dai-Woodys LL
University of Dundee, Dundee, Scotland, UK
Presentation Documents
OBJECTIVES: Drug overdose deaths are rising among young people in the United States. This study examines whether machine learning can predict drug overdoses among young people.
METHODS: Based on the United States National Survey on Drug Use and Health from 2005 to 2014, the sample includes 356704 people aged between 15 and 24 years, with binary variables in eight categories including demographics, drug use, drug dependence, drug abuse, mental health, insurance status, health behaviors and health conditions. The modelling adopts both algorithms of Decision Tree and Logistical Regression to inspect which performs better. Different proportions (10%, 30%, 50%, 80% and 100%) of the sample are tested to understand if a smaller sample size can predict drug overdoses as well as the full size. In addition, this study provides variables with high information gains as a simplified portfolio for applications.
RESULTS: Decision Tree yields an area under the curve (AUC) of 0.66 with 95% confidential intervals (CI) between 0.62 and 0.71, whereas Logistical Regression shows an AUC of 0.62 (95% CI: 0.58-0.67). With the increasing proportions of training sets, the AUCs of Decision Tree and Logistical Regression stay around 0.65 and 0.62 respectively, covering the values of the full set. Variables with high information gains are pain reliever dependence, marijuana dependence, depression, heroin dependence, alcohol dependence, cocaine dependence, anxiety, stimulant dependence, unemployed status, marijuana abuse and hallucinogen abuse.
CONCLUSIONS: Drug overdoses among young people are predictable by machine learning, and classification performs better than regression. The results of different proportions of training sets indicate that the full sample reaches the optimum on predicting drug overdoses among young people. Furthermore, variables with high information gains are recommended for practical applications on the prediction.
Code
HTA343
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas