Development and Validation of a Machine Learning-based screening Algorithm to Predict High Risk of Hepatitis C Infection

Author(s)

Suk-Chan Jang, PharmD, PhD1, Pilar Hernández Con, MD1, Chanakan Jenjai, PharmD1, Ashley Stultz, BS1, Shunhua YAN, MEd1, Debbie L Wilson, PhD1, Weihsuan Jenny Lo-Ciganic, MS, PhD2, James Huang, PhD1, Ashley Norse, MD3, Faheem Guirgis, MD1, Robert L. Cook, MD, MPH1, Christine Gage, DO3, Khoa Nguyen, PharmD1, David R. Nelson, MD1, Haesuk Park, PhD1;
1University of Florida, Gainesville, FL, USA, 2University of Pittsburgh, Pittsburg, PA, USA, 3University of Florida, Jacksonville, FL, USA

Presentation Documents

OBJECTIVES: Hepatitis C virus (HCV) infections are rising sharply in the United States amid the opioid epidemic. Due to its asymptomatic nature, nearly half of HCV-infected individuals are unaware of their infection. This study aimed to develop and validate a machine learning-based screening tool to identify individuals at high risk of HCV infection.
METHODS: We conducted prognostic modeling with retrospective cohort data from the 2016-2023 OneFlorida+ database, an all-payer electronic health records system covering approximately 75% of Floridians. This study included individuals tested for HCV (antibody, RNA, or genotype) and evaluated 275 potential predictors during a 6-month baseline period. These predictors included sociodemographic and clinical characteristics (e.g., comorbidities, procedures, medications). Four machine learning algorithms - elastic net (EN), random forest (RF), gradient boosting machine (GBM), and deep neural network (DNN) - were developed and validated to predict HCV infection. Risk stratification was performed by deciles of the predicted risk score.
RESULTS: Among 445,624 individuals tested for HCV, 11,834 individuals (2.65%) tested positive. Training (75%) and validation samples (25%) had similar characteristics (mean age 45 years; 37% male; 54% White; 19% Medicaid). The GBM model demonstrated the best performance (C statistic [95% CI]: 0.92 [0.91-0.92]), outperforming EN (0.89 [0.88-0.89]), RF (0.85 [0.85-0.86]) and DNN (0.91 [0.90-0.91]). Using the Youden index, the GBM model achieved 79.4% sensitivity and 89.1% specificity, and a testing yield of one positive HCV case per six tests. Over 90% of HCV-positive patients were classified in the top three deciles, suggesting the potential to reduce testing by 70% through targeted screening. Key risk predictors included being non-Hispanic, White, older age, smoking, history of undergoing HIV and prothrombin time testing, and fewer outpatient visits, while commercial insurance reduced risk.
CONCLUSIONS: Machine learning algorithms can effectively predict and stratify HCV infection risk, offering a promising targeted screening tool in clinical settings.

Conference/Value in Health Info

2025-05, ISPOR 2025, Montréal, Quebec, CA

Value in Health, Volume 28, Issue S1

Code

EPH75

Topic

Epidemiology & Public Health

Topic Subcategory

Public Health

Disease

SDC: Gastrointestinal Disorders, SDC: Infectious Disease (non-vaccine)

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×