Streamlining SNDS Data for RWE Studies: Victoria an Efficient Transformation Pipeline
Speaker(s)
Doublet M1, Arnée E2, Glatt N3, Thollot R3, Ansolabehere X3, Jaubourg N3, Fabre A3, Harti F3, Leclere M3, Du Chayla F3
1Clinityx, Boulogne Billancourt, France, 2Clinityx, Boulogne Billancourt, 92, France, 3Clinityx, Boulogne Billancourt, Ile de France, France
OBJECTIVES: The French National Health Data System (SNDS) is one of the most extensive healthcare databases, covering over 65 million individuals. It includes information from health insurance (DCIR), hospital (PMSI), and death register (CépiDC), with over 1.2 billion claims and 11 million hospitalizations yearly. Despite its comprehensiveness, the SNDS poses challenges due to redundant information, multiple join keys, and vast size (over 100 tables with 4000 variables).To address the heterogeneity of Real-World Evidence study objectives and ensure reproducibility, Clinityx developed Victoria, an automatic, maintainable, scalable pipeline transforming SNDS data into structured formats suitable for healthcare studies. To validate its output completeness and accuracy, we analyzed all French diabetic patients and compared with health insurance public data
METHODS: Victoria's pipeline has two main steps: a cleaning step formatting and removing empty, erroneous, or duplicate records, and a merging stage creating a datalake with 29 tables and 188 variables. Each event is indexed with a unique patient identifier, enabling targeted patient analysis through multiple criteria. Based on the health insurance algorithm, diabetes criteria include 3 ATC A10 dispensations within a year, or hospitalization in the past two years with a diabetes or complication diagnosis . Type 2 diabetes is identified by oral antidiabetic drugs, and type 1 diabetes by basal insulin without oral antidiabetic.
RESULTS: In 2021, health insurance identified 4,171,550 diabetics in France with 6% type 1. Our algorithm identified 4,288,470 diabetics. Among treated patients, 238,720 were type 1 diabetics (6.5%). Among type 2 diabetics, we identified 1,296,190 patients (35.1%) with first-line treatments (Biguanides or Sulfonylureas), 1,523,830 (41.3%) with second-line treatment, and 139,050 (3.8%) treated with SGLT2-inhibitors
CONCLUSIONS: Victoria pipeline transforms raw SNDS data into a structured and individualized format allowing comprehensive epidemiologic and medico-economic analyses. The comparison of diabetic populations between Victoria pipeline and the raw SNDS database validated its reliability.
Code
RWD85
Topic
Epidemiology & Public Health, Health Policy & Regulatory, Real World Data & Information Systems
Topic Subcategory
Disease Classification & Coding, Health & Insurance Records Systems, Insurance Systems & National Health Care
Disease
Diabetes/Endocrine/Metabolic Disorders (including obesity), No Additional Disease & Conditions/Specialized Treatment Areas