Data Standardization of Claims Data in the InGef Research Database: A Comparison of Data Models for Epidemiological and Economic Studies of Rare Genetic Diseases in Germany

Speaker(s)

Jacob J1, Norris R2, Obermüller D3, Alibone M4, Ludwig M4
1Institute for Applied Health Research Berlin (InGef), Berlin, Germany, 2InGef – Institute for Applied Health Research Berlin GmbH, Berlin, BE, Germany, 3InGef - Institute for Applied Health Research Berlin GmbH, Berlin, Germany, 4Institute for Applied Health Research Berlin (InGef), Berlin, Berlin, Germany

BACKGROUND: Common data models (CDM), such as the Observational Medical Outcomes Partnership (OMOP) model, play a crucial role in research, especially for exchanging data across countries or institutions. The OMOP CDM could be particularly helpful when investigating rare genetic diseases, where distributed analyses across different datasets and countries are essential to generate valid, robust evidence.

OBJECTIVES: The aim of this study was to assess and compare the prevalence and annual direct costs of patients with cystic fibrosis (CF), Huntington's disease (HD), hereditary retinal dystrophy (HRD), beta-thalassemia (BT), and spinal muscular atrophy type I (SMA) in Germany using anonymized claims data from the InGef research database (RDB) in its original source format and in the OMOP CDM format.

METHODS: We conducted cross-sectional analyses for each disease using data from the years 2017 to 2022. Patients and comorbidities were identified based on ICD-10-GM codes in the source data and corresponding OMOP CDM concepts. Prevalence was directly standardized to the German population. Costs were analyzed as average direct costs per patient per year.

RESULTS: The standardized prevalence in 2022 per 100,000 individuals for CF (14.2 vs. 14.2), HD (6.9 vs 6.9), HRD (41.3 vs. 41.1), BT (12.5 vs. 12.3) and SMA (1.3 vs. 1.3) was comparable between the source and OMOP data. Total median costs in € were also similar for all diseases between the source and standardized data (CF 8,442 vs. 8,324; HD 2,839 vs. 2,836; HRD 1,899 vs. 1,844; BT 1,181 vs. 1,148; SMA 8,238 vs. 8,776). The observed minor discrepancies between source and OMOP results were mainly caused by differences in data preprocessing. Efforts to harmonize data preprocessing are ongoing.

CONCLUSIONS: The InGef RDB, both in its source format and transformed into the OMOP CDM, is a valid database for studying rare diseases due to its size and longitudinal nature.

Code

RWD55

Topic

Real World Data & Information Systems, Study Approaches

Topic Subcategory

Data Protection, Integrity, & Quality Assurance, Distributed Data & Research Networks, Reproducibility & Replicability

Disease

Rare & Orphan Diseases