Applying Machine Learning Techniques to Identify Undiagnosed Patients with Nonalcoholic Steatohepatitis (NASH)
Author(s)
Baser O1, Mete F2, Yapar N2, Baser E3
1City University of New York, New York, NY, USA, 2Columbia Data Analytics, New York, NY, USA, 3Columbia Data Analytics, New York, UNITED STATES
OBJECTIVES: Nonalcoholic Steatohepatitis (NASH) is liver inflammation and damage caused by a buildup of fat in the liver. NASH is underdiagnosed as patients are often asymptomatic or present with non-specific symptoms. To develop a machine learning model that identifies patients in a Veteran Health Systems who likely have NASH but are undiagnosed.
METHODS: Scikit-learn, Python module is used as a machine learning algorithm. The study of population was selected from Veteran’s Health Administrative data, consisted of patients with NASH-prone conditions. Patients are labeled with 150 condition category flags and split into actual positive NASH cases, actual negative NASH cases, and unlabeled cases. The study population was then randomly divided into a training subset and a testing subset. The training subset was used to determine 30 models and to select the highest performing model, and the testing was used to evaluate performance of the best machine learning model.
RESULTS: The study population consisted of 30,415 actual positive NASH cases, 265,965 actual negative NASH cases, and 181,375 unlabeled cases. In the best performing model, the precision, recall, and accuracy were 0.90, 0.82, and 0.88, respectively. The best performing model estimated that the number of patients likely to have NASH was about 6 times the number of patients directly identified as NASH-positive through a claims analysis in the study population. The most important features in assigning NASH probability were presence or absence of diagnoses codes related to obesity or diabetes.
CONCLUSIONS: The prevalence of NASH is increasing, but more concerning is the disproportionate increase in those with advanced fibrosis, hepatocellular carcinoma and hepatic decompensation. In United States, NASH is currently the leading indicator for liver transplant in women and those over 65 years of age. Machine Learning Techniques can help identify undiagnosed patients so that upcoming treatment can be applied broadly to delay the disease progression.
Conference/Value in Health Info
Value in Health, Volume 26, Issue 6, S2 (June 2023)
Code
MSR39
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
Urinary/Kidney Disorders