Three Different Methods to Estimate a Rare Disease (RD) Prevalence in a Country with a Limited Dataset: Which One Fits?

Author(s)

Israel Rico Alba, Ph.D.1, Cesar Victoria, PhD2, Adolfo Gabriel Hernandez, MSc, ScD, MD2, Jorge Guzman, M.A.2, Eunice Carpio, M.S.1, Alberto Retana Guzman, M.B.A.1.
1Strategic Market Access, NOVO NORDISK MEXICO, México, Mexico, 2AHS Health Consulting, México, Mexico.
OBJECTIVES: To estimate current and future prevalence of a RD with 3 different methods in a middle-income country.
METHODS: The Ordinary Least Squares (OLS), Multivariate Least Squares (MLS) and Machine Learning (ML) models were performed to reckoning the prevalence of hemophilia A (HA) and B (HB) in Mexico from 2022 to 2033. Baseline estimators of prevalence and their potential explanatory variables were obtained from national official sources and the World Federation of Hemophilia dataset since 2006 to 2021. For OLS, linear, exponential, logarithmic and polynomial adjustments were performed; logarithmic regression was the best fit. For the MLS and ML modeling, “total population”, “number of births”, “health coverage” and “mortality” variables were considered. Due to the limited available datasets the ML model used a non-deep learning. The main characteristics that determined the behavior of Mexican hemophilia were extracted to train the system and predict the next decade prevalence. Several equations of the explanatory variables were explored to get the best current and future prevalences.
RESULTS: The three methods estimated a similar number of cases. In 2022 and 2033 each model estimated the following cases for HA: OLS 4,877/ 5,567; MLS 4,940/5,664 and, ML 4,963/5,717. For HB the number of cases were: MCO 99/114, MLS 101/116 and ML 101/117. A ⁓2% of differences (HA n=86; HB n=2) in 2022 and, ⁓3% of differences (HA n=150; HB n=3) in 2033 among these different methods were founded. In a decade, the prevalence of HA and HB will increase by 15%.
CONCLUSIONS: ML is an accurate tool for estimating the current prevalence of hemophilia even in data-limited settings. However, the differences between the prevalences of the three methods (ML, MCM and OLS) were marginal. For countries with limited-datasets and constricted resources, simpler and cheaper tools such as OLS or MLS could be feasible options to estimate prevalences of RD.

Conference/Value in Health Info

2025-05, ISPOR 2025, Montréal, Quebec, CA

Value in Health, Volume 28, Issue S1

Code

MSR42

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas, SDC: Rare & Orphan Diseases, SDC: Systemic Disorders/Conditions (Anesthesia, Auto-Immune Disorders (n.e.c.), Hematological Disorders (non-oncologic), Pain)

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×