Machine Learning for Predictive Modeling of Hospitalization Risk in Myasthenia Gravis Patients Using the MGFA Global Patient Registry: Addressing Imbalanced Data in Real-World Evidence Scenarios
Author(s)
Amaia Zurinaga, MSc1, Richard Nowak, MD2, Kelly Gwathmey, MD3, Jean-Francois Ricci, MBA, DrPH, PharmD4, Minjee Park, MSc5.
1Alira Health, Barcelona, Spain, 2Yale University, New Haven, CT, USA, 3Virginia Commonwealth University, Richmond, VA, USA, 4Alira Health, Basel, Switzerland, 5Director, Alira Health, Basel, Switzerland.
1Alira Health, Barcelona, Spain, 2Yale University, New Haven, CT, USA, 3Virginia Commonwealth University, Richmond, VA, USA, 4Alira Health, Basel, Switzerland, 5Director, Alira Health, Basel, Switzerland.
Presentation Documents
OBJECTIVES: Myasthenia Gravis (MG) is an autoimmune condition where antibodies target neuromuscular junctions. The study aims to develop a predictive model for identifying the risk of overnight hospitalization within 12 months after enrollment.
METHODS: The Myasthenia Gravis Foundation of America (MGFA) Global MG Patient Registry (MGFAPR) is an online longitudinal patient-reported registry initiated in 2013 and hosted on the Health Storylines platform since 2022. Enrollment data collected between July 2013 and December 2024 were used in this study. Participants aged 18 years and older, with a self-reported (physician-confirmed) MG diagnosis, who had completed the first follow-up within 12 months of enrollment were included (n=1,346). Participants with incomplete response to overnight hospitalization were excluded (n=85). Descriptive analysis was followed by Lasso regression to identify the most informative variables. Multiple imputation was applied to variables with less than 30% missing data. A XGBoost classifier was used to build the predictive model. To address class imbalance in hospitalization outcomes, the training dataset was balanced through oversampling the minority class (hospitalized) and undersampling the majority class (not hospitalized). Model performance was evaluated using cross-validation, accuracy, precision, recall, F1-score, and Area Under the Precision Recall Curve (AUCPR).
RESULTS: The study involved 1,261 MG patients, with 23% experiencing overnight hospitalizations. The XGBoost model achieved a best AUCPR of 0.88 across different cross-validation folds. Other performance metrics included accuracy (75.73%), precision (46.15%), recall (65.06%), and F1-score (54%). The top 5 predictors that contributed to the model the most included number of ER visits, number of ICU visits (in the last 5 years), MG-ADL score, BMI at enrollment and age at hospitalization.
CONCLUSIONS: The predictive model showed moderate performance, though it was limited by imbalanced test data. Future studies should evaluate the performance of models employing alternative methods, such as LightGBM or logistic regression, to improve robustness and clinical relevance.
METHODS: The Myasthenia Gravis Foundation of America (MGFA) Global MG Patient Registry (MGFAPR) is an online longitudinal patient-reported registry initiated in 2013 and hosted on the Health Storylines platform since 2022. Enrollment data collected between July 2013 and December 2024 were used in this study. Participants aged 18 years and older, with a self-reported (physician-confirmed) MG diagnosis, who had completed the first follow-up within 12 months of enrollment were included (n=1,346). Participants with incomplete response to overnight hospitalization were excluded (n=85). Descriptive analysis was followed by Lasso regression to identify the most informative variables. Multiple imputation was applied to variables with less than 30% missing data. A XGBoost classifier was used to build the predictive model. To address class imbalance in hospitalization outcomes, the training dataset was balanced through oversampling the minority class (hospitalized) and undersampling the majority class (not hospitalized). Model performance was evaluated using cross-validation, accuracy, precision, recall, F1-score, and Area Under the Precision Recall Curve (AUCPR).
RESULTS: The study involved 1,261 MG patients, with 23% experiencing overnight hospitalizations. The XGBoost model achieved a best AUCPR of 0.88 across different cross-validation folds. Other performance metrics included accuracy (75.73%), precision (46.15%), recall (65.06%), and F1-score (54%). The top 5 predictors that contributed to the model the most included number of ER visits, number of ICU visits (in the last 5 years), MG-ADL score, BMI at enrollment and age at hospitalization.
CONCLUSIONS: The predictive model showed moderate performance, though it was limited by imbalanced test data. Future studies should evaluate the performance of models employing alternative methods, such as LightGBM or logistic regression, to improve robustness and clinical relevance.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
MSR86
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
SDC: Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal), SDC: Neurological Disorders, SDC: Rare & Orphan Diseases