FINDING UNDIAGNOSED PATIENTS WITH HEPATITIS C VIRUS- AN APPLICATION OF STATE-OF-THE-ART MACHINE LEARNING METHODS
Author(s)
Doyle OM, Jayanti H, Homola D, Rigg J
QuintilesIMS, London, UK
Presentation Documents
OBJECTIVES: The hepatitis C virus (HCV) is a chronic, life-threatening disease which is substantially under-diagnosed. Accelerating time to diagnosis can lead to earlier treatment and improved patient outcomes. This was a retrospective database study to develop an algorithm which could be used to identify undiagnosed patients with HCV based on routinely collected patient data. The effectiveness of non-parametric machine learning methods was also compared with more conventional parametric methods. METHODS: Data were extracted from US prescription and open-source medical claims between 2010 and 2016. Outcomes for HCV patients were coded as 1; outcomes for non-HCV patients were set to 0. Index date for HCV patients was the first observed date of diagnosis, ensuring only pre-diagnosed predictors were used. The most recent activity was used as the index date for non-HCV patients. Features captured information on demographics, treatments, procedures and symptomatology, including temporal associations between the timing of events and the index date. Binary classifiers were estimated based on conventional parametric methods – unconstrained logistic regression and logistic regression with penalty – and non-parametric machine learning methods - random forest, gradient boosting and an ensemble of classifiers based on logistic regression. Five-fold cross-validation was used to identify optimal hyperparameters which included a differential misclassification penalty. Predicted Positive Value (PPV) at 50% sensitivity was used to evaluate model performance based on hold-out data. RESULTS: The sample comprised 120,000 HCV and 60,000,000 non-HCV patients. PPV (based on a HCV to non-HCV ratio of 1:34) was 72.3%, 70.8%, 65.0%, 52.1% and 51.7% for the ensemble, gradient boosting, random forest, logistic regression with penalty and unconstrained logistic regression, respectively. CONCLUSIONS: The evidence suggests that algorithms leveraging routinely collected real-world data could be an effective way to screen for undiagnosed HCV patients. State-of-the-art machine learning approaches also substantially out-performed conventional approaches, highlighting the potential value of these methods.
Conference/Value in Health Info
2017-11, ISPOR Europe 2017, Glasgow, Scotland
Value in Health, Vol. 20, No. 9 (October 2017)
Code
PRM85
Topic
Methodological & Statistical Research
Topic Subcategory
Modeling and simulation
Disease
Gastrointestinal Disorders