Predicting Future Type 1 Diabetes Onset Risk; A Machine Learning Approach
Author(s)
DAVID HOOD, MS1, Ketan Walia, MS1, Onkar Kshirsagar, MS2, Sukriti Poddar, BSc, MS1, Brent Mankin, MS1, Keni Cs Lee, MS3, Lukas Adamek, MS4, Brandon Rufino, MS4, Jared Josleyn, JD5.
1Axtria, Berkley Heights, NJ, USA, 2Axtria, Bangalor, India, 3Sanofi, Paris, France, 4Sanofi, Toronto, ON, Canada, 5Sanofi, Chicago, IL, USA.
1Axtria, Berkley Heights, NJ, USA, 2Axtria, Bangalor, India, 3Sanofi, Paris, France, 4Sanofi, Toronto, ON, Canada, 5Sanofi, Chicago, IL, USA.
Presentation Documents
OBJECTIVES: Predicting Type 1 Diabetes (T1D) is challenging, leading to under-informed patients, missed intervention opportunities, and increased risk of Diabetic Ketoacidosis (DKA). This study explores the potential of Machine Learning (ML) to predict T1D onset and determine how early it can be accurately predicted.
METHODS: US claims data were utilized to develop two algorithms for identifying an incident T1D cohort, optimized for recall and Bayes Factor (BF), respectively. By leveraging a modified Klompas algorithm, clean T1D and non-T1D cohorts were established. Feature extraction, selection, and engineering were conducted before testing various traditional ML models, including random forest, decision tree, logistic regression, and XGBoost. To account for class imbalance, oversampling techniques such as SMOTE were implemented. To address the need for early detection in this disease space, a sliding window approach assessed the earliest accurate prediction timeframe. Additionally, a novel deep neural network BERT approach was trained and tested using SHAP to hypothesize around drivers correlated to T1D. Model performance was evaluated using metrics such as precision, recall, F1 score, and BF.
RESULTS: The BF-optimized model had a precision of 2.1%, recall of 20%, F1 score of 0.04, and a BF of 4.67, indicating robust performance in identifying patients at risk 12-24 months prior to diagnosis (1:47, surpassing the T1D prevalence of 1:200). The recall-optimized model achieved a 95% recall, albeit with lower precision.
CONCLUSIONS: Predicting diabetes has traditionally relied on basic factors like family history and age. Our predictive models, applicable to the general population, can expand access to preventative testing, education, and treatment, potentially improving patient outcomes. Early identification through ML could transform T1D management by enabling timely interventions, reducing the incidence of DKA, and enhancing the quality of life for at-risk individuals.
METHODS: US claims data were utilized to develop two algorithms for identifying an incident T1D cohort, optimized for recall and Bayes Factor (BF), respectively. By leveraging a modified Klompas algorithm, clean T1D and non-T1D cohorts were established. Feature extraction, selection, and engineering were conducted before testing various traditional ML models, including random forest, decision tree, logistic regression, and XGBoost. To account for class imbalance, oversampling techniques such as SMOTE were implemented. To address the need for early detection in this disease space, a sliding window approach assessed the earliest accurate prediction timeframe. Additionally, a novel deep neural network BERT approach was trained and tested using SHAP to hypothesize around drivers correlated to T1D. Model performance was evaluated using metrics such as precision, recall, F1 score, and BF.
RESULTS: The BF-optimized model had a precision of 2.1%, recall of 20%, F1 score of 0.04, and a BF of 4.67, indicating robust performance in identifying patients at risk 12-24 months prior to diagnosis (1:47, surpassing the T1D prevalence of 1:200). The recall-optimized model achieved a 95% recall, albeit with lower precision.
CONCLUSIONS: Predicting diabetes has traditionally relied on basic factors like family history and age. Our predictive models, applicable to the general population, can expand access to preventative testing, education, and treatment, potentially improving patient outcomes. Early identification through ML could transform T1D management by enabling timely interventions, reducing the incidence of DKA, and enhancing the quality of life for at-risk individuals.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
PCR216
Topic
Patient-Centered Research
Disease
SDC: Diabetes/Endocrine/Metabolic Disorders (including obesity)