Identifying Individuals With Severe Hemophilia With Limited Data of Clotting Factor Activity: An Algorithm Validation in Chinese Electronic Health Records
Author(s)
Enkhgerel Nasanbat, MPH1, Asma Hamid, MD, MPH1, Martina Furegato, MSc1, Mingyue Gao, MPH2, Marion Afonso, MSc3.
1Oracle, Paris, France, 2Formerly of Cerner Enviza, Shanghai, China, 3Sanofi, Gentilly, France.
1Oracle, Paris, France, 2Formerly of Cerner Enviza, Shanghai, China, 3Sanofi, Gentilly, France.
OBJECTIVES: Severe hemophilia is a rare disorder that requires specific lifelong therapeutic management. The level of disease severity implies more frequent treatments and potentially more clinical complications. Identifying these patients is essential to assessing disease burden in database studies. To increase the accuracy of the identification of people with severe hemophilia, an algorithm was developed and evaluated using structured and unstructured data from a Chinese electronic health records (EHR) database.
METHODS: Since ICD-10 codes do not indicate disease severity and clotting factor activity (Cfa) levels are often unavailable or can be transiently elevated after factor replacement therapy (FRT), their use is limited. Based on experts’ input, the algorithm classifies hemophilia as severe if individuals meet any of the following criteria during the whole study period (2019 to March 2024): ≥6 FRT prescriptions/year, ≥2 joint replacement procedures, ≥1 inpatient hemorrhage diagnosis/year, or prescription of prophylactic treatment. Individuals with documented Cfa and levels <1% over the study period, served as the reference standard to estimate true positives (TP), false negatives (FN), and sensitivity.
RESULTS: A cohort of 602 people with hemophilia (PWH) was identified. The algorithm classified 410 (68.1%) of them as having severe disease. The algorithm was evaluated on 178 PWH (29.6%) with severe hemophilia confirmed by a documented Cfa <1%. Of these, the algorithm correctly identified 109 (61.2%) as severe (TP), while 69 (38.8%) were not identified (FN) as they did not meet any algorithm criteria, resulting in a sensitivity of 61.2%.
CONCLUSIONS: Given the challenges of case ascertainment in Chinese EHR data, this level of sensitivity is acceptable for real-world rare disease research. However, most EHR-based algorithms, though aligned with clinical guidelines, may underperform in settings where prophylaxis is recommended but inconsistently implemented, resulting in significant FN rates. Cfa data remains essential to enhance algorithm sensitivity and assess database fitness for severe cases.
METHODS: Since ICD-10 codes do not indicate disease severity and clotting factor activity (Cfa) levels are often unavailable or can be transiently elevated after factor replacement therapy (FRT), their use is limited. Based on experts’ input, the algorithm classifies hemophilia as severe if individuals meet any of the following criteria during the whole study period (2019 to March 2024): ≥6 FRT prescriptions/year, ≥2 joint replacement procedures, ≥1 inpatient hemorrhage diagnosis/year, or prescription of prophylactic treatment. Individuals with documented Cfa and levels <1% over the study period, served as the reference standard to estimate true positives (TP), false negatives (FN), and sensitivity.
RESULTS: A cohort of 602 people with hemophilia (PWH) was identified. The algorithm classified 410 (68.1%) of them as having severe disease. The algorithm was evaluated on 178 PWH (29.6%) with severe hemophilia confirmed by a documented Cfa <1%. Of these, the algorithm correctly identified 109 (61.2%) as severe (TP), while 69 (38.8%) were not identified (FN) as they did not meet any algorithm criteria, resulting in a sensitivity of 61.2%.
CONCLUSIONS: Given the challenges of case ascertainment in Chinese EHR data, this level of sensitivity is acceptable for real-world rare disease research. However, most EHR-based algorithms, though aligned with clinical guidelines, may underperform in settings where prophylaxis is recommended but inconsistently implemented, resulting in significant FN rates. Cfa data remains essential to enhance algorithm sensitivity and assess database fitness for severe cases.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
PT26
Topic
Clinical Outcomes, Methodological & Statistical Research, Real World Data & Information Systems
Topic Subcategory
Confounding, Selection Bias Correction, Causal Inference
Disease
Rare & Orphan Diseases