Machine Learning for Predictive Modeling of Hospitalization Risk in Myasthenia Gravis Patients Using the MGFA Global Patient Registry: Addressing Imbalanced Data in Real-World Evidence Scenarios

Author(s)

Amaia Zurinaga, MSc¹, Richard Nowak, MD², Kelly Gwathmey, MD³, Jean-Francois Ricci, MBA, DrPH, PharmD⁴, Minjee Park, MSc⁵.
¹Alira Health, Barcelona, Spain, ²Yale University, New Haven, CT, USA, ³Virginia Commonwealth University, Richmond, VA, USA, ⁴Alira Health, Basel, Switzerland, ⁵Director, Alira Health, Basel, Switzerland.

Presentation Documents

ISPOR25_Ricci_MSR86_POSTER.pdf

OBJECTIVES: Myasthenia Gravis (MG) is an autoimmune condition where antibodies target neuromuscular junctions. The study aims to develop a predictive model for identifying the risk of overnight hospitalization within 12 months after enrollment.
METHODS: The Myasthenia Gravis Foundation of America (MGFA) Global MG Patient Registry (MGFAPR) is an online longitudinal patient-reported registry initiated in 2013 and hosted on the Health Storylines platform since 2022. Enrollment data collected between July 2013 and December 2024 were used in this study. Participants aged 18 years and older, with a self-reported (physician-confirmed) MG diagnosis, who had completed the first follow-up within 12 months of enrollment were included (n=1,346). Participants with incomplete response to overnight hospitalization were excluded (n=85). Descriptive analysis was followed by Lasso regression to identify the most informative variables. Multiple imputation was applied to variables with less than 30% missing data. A XGBoost classifier was used to build the predictive model. To address class imbalance in hospitalization outcomes, the training dataset was balanced through oversampling the minority class (hospitalized) and undersampling the majority class (not hospitalized). Model performance was evaluated using cross-validation, accuracy, precision, recall, F1-score, and Area Under the Precision Recall Curve (AUCPR).
RESULTS: The study involved 1,261 MG patients, with 23% experiencing overnight hospitalizations. The XGBoost model achieved a best AUCPR of 0.88 across different cross-validation folds. Other performance metrics included accuracy (75.73%), precision (46.15%), recall (65.06%), and F1-score (54%). The top 5 predictors that contributed to the model the most included number of ER visits, number of ICU visits (in the last 5 years), MG-ADL score, BMI at enrollment and age at hospitalization.
CONCLUSIONS: The predictive model showed moderate performance, though it was limited by imbalanced test data. Future studies should evaluate the performance of models employing alternative methods, such as LightGBM or logistic regression, to improve robustness and clinical relevance.

Conference/Value in Health Info

2025-05, ISPOR 2025, Montréal, Quebec, CA

Value in Health, Volume 28, Issue S1

Code

MSR86

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

SDC: Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal), SDC: Neurological Disorders, SDC: Rare & Orphan Diseases

Presentation (CTI)