Machine Learning Models for Predicting Metabolic Dysfunction-Associated Steatohepatitis (MASH) in the General United States Population

Speaker(s)

Khalid J1, Aparasu RR2
1University of Houston, College of Pharmacy, Houston, TX, USA, 2University of Houston College of Pharmacy, Houston, TX, USA

Presentation Documents

OBJECTIVES: Metabolic dysfunction associated with steatohepatitis (MASH), a severe liver complication, is the second most prevalent reason for liver transplantation in the United States. Current diagnostic methods, relying on invasive procedures and imaging techniques, encounter accessibility challenges. Therefore, this study utilized machine learning (ML) techniques to predict MASH using the National Health and Nutrition Examination Survey (NHANES).

METHODS: This retrospective analysis utilized the 2017 – 2020 NHANES involving participants aged 18 and above with valid FibroScan® measurements. Exclusion criteria included high alcohol consumption, pregnancy, and other potential liver disease causes. The MASH diagnosis involved transient liver ultrasonography using the controlled attenuation parameter (CAP). The analysis includes approximately 41 demographic, socio-economic, and clinical variables. Six ML algorithms—K-Nearest Neighbor (KNN), Super Vector Machine (SVM) Classification, Decision Tree (DT), Random Forest (RF), and Neural Network (NNN), along with logistic regression were utilized to predict MASH involving training and test sets (70:30). Various model performance measures were evaluated across ML models.

RESULTS: There were 7,570 participants who met the inclusion criteria, with 45.8% (n=3,469) of participants having a MASH diagnosis. The model performance measures varied across the models – KNN (AUROC: 0.734, Accuracy: 72.45% and F1-score: 0.68), SVM (AUROC: 0.840, Accuracy: 75.09% and F1-score: 0.73), RF (AUROC: 0.841, Accuracy: 75.20% and F1-score: 0.74), and NNN (AUROC: 0.790, Accuracy: 84.50% and F1-score: 0.75). The logistic model also exhibited AUROC = 0.834 but with lesser accuracy, 75.20%. Feature importance analysis found body mass index and diabetes mellitus as the most important predictors of MASH.

CONCLUSIONS: The study found variable model performance for the ML models and logistic regression to identify patients with MASH using available demographic and clinical data. More studies are needed to refine ML models for further evaluation and external validation.

Code

MSR50

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

Diabetes/Endocrine/Metabolic Disorders (including obesity), Gastrointestinal Disorders