Development of an Algorithm to Identify the Type of Diabetes in the French Administrative Health Care Database “Système National Des Données De Santé” (SNDS)

Author(s)

Bretin O1, Casarotto E1, Bessou A1, Maurel F1, Serusclat P2, Joubert M3, Fagherazzi G4, Berteau C5, Pouyet A6, Maillard C7
1IQVIA France, Courbevoie, France, 2Groupe Hospitalier Mutualiste Les Portes du Sud, Venissieux, France, 3CHU de Caen, Caen, France, 4Paris south Paris Saclay University, Villejuif, France, 5Roche Diabetes Care France, Meylan, 38, France, 6TIMKL, Montbonnot Saint Martin, France, 7IQVIA Opérations France, La défense, France

OBJECTIVES: The French administrative health care database (SNDS), covering 99% of the French population, is a powerful tool for epidemiological and pharmacoeconomic studies on diabetes. However, its lack of clinical information makes it difficult to accurately identify the type of diabetes. The objective was to develop an accurate machine learning algorithm to determine the type of diabetes in the SNDS, validated thanks to a linkage with primary care clinical data.

METHODS: Electronic medical records (EMR) of a network of French general practitioners (GP) were probabilistically linked with the SNDS. This linkage allowed to constitute a population of diabetic patients whose type of diabetes was retrieved from GP consultations. About 200 predictors were derived from SNDS data to help discriminate between type-1 diabetes (T1D) and type-2 diabetes (T2D). Various machine learning algorithms (penalized logistic regressions, RandomForest, XGBoost) were trained and optimized by a 10-fold cross-validation procedure on the training set. The best model was selected for its ability to predict T1D on the test set, via the F1-score metric. Its performance was benchmarked against already-published algorithms applied to the test set.

RESULTS: A cohort of 40,774 people with diabetes was constituted, including 39,122 (95.9%) T2D and 1,652 (4.1%) T1D. A LASSO penalized regression obtained the best performance (F1: 0.79 (T1D); precisions: 84.6% (T1D), 98.9% (T2D); sensitivities: 73.8% (T1D), 99.4% (T2D)), outperforming the Charbonnel’s decision tree (F1: 0.66) and Fuentes’s best retrained logistic regression (F1: 0.59).

CONCLUSIONS: Thanks to an innovative linkage between SNDS and EMR, we have developed a high-performance classification model that outperforms existing published algorithms to identify the type of diabetes in a large medico-administrative database. It can be reused by the scientific community to conduct epidemiological and pharmacoeconomic studies on each type of diabetes in the French population.

Conference/Value in Health Info

2023-11, ISPOR Europe 2023, Copenhagen, Denmark

Value in Health, Volume 26, Issue 11, S2 (December 2023)

Code

MSR2

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Electronic Medical & Health Records

Disease

Diabetes/Endocrine/Metabolic Disorders (including obesity), No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×