Gleaning Novel Insights from Real World Data: A Machine-Learning Guided Analytical Framework
Speaker(s)
Xie H1, Lo-Ciganic WH2, Wang J3, Mychaskiw M4, Zhang Y5, Tian MY6
1Teva Branded Pharmaceutical Products R&D, Inc., BLUE BELL, PA, USA, 2University of Pittsburgh, Pittsburgh, PA, USA, 3KMK Consulting, Inc, Morristown, NJ, USA, 4Teva Branded Pharmaceutical Products R&D, Inc., West Chester, PA, USA, 5Teva Branded Pharmaceutical Products R&D, Inc., Malvern, PA, USA, 6Teva Branded Pharmaceutical Products R&D, Inc., Skillman, NJ, USA
Presentation Documents
OBJECTIVES: The ISPOR PALISADE checklist provides important guidance for machine learning (ML) applications using real-world data (RWD). While ML methods continue to advance, little guidance exists for identifying reliable important predictors from ML. The aim of this study was to develop an analytical framework leveraging ML to identify reliable predictors and provide novel clinical insights using RWD.
METHODS: Prognostic modeling was used to develop and validate ML algorithms in a RWD case study to investigate treatment instability of oral antipsychotics in schizophrenia patients using 2012-2022 Merative™ MarketScan® claims databases. Feature engineering was applied on demographics, baseline diagnoses (ICD-10), procedures (CPT/HCPCS), medications, healthcare resource utilization factors (measured as binary and continuous variables). We split the cohort into a 75%-25% training-testing data ratio, included features with ≥1% prevalence in model development using LASSO, elastic-net (EN), random forests, and XGBoost with a 5-fold cross validation. We identified the best-performing model using C-statistics and other metrics and used several approaches (e.g. top features identified by at least 2 models, features identified in all models) to identify reliable/novel predictors. Finally, we fitted these predictors in multivariate logistic regression (mLR) to obtain adjusted odds ratio (aOR) with 95% confidence interval (95%CI) to improve the interpretation.
RESULTS: This case study included 4,671 adult schizophrenia patients; 78.2% had treatment instability within 6-month after initiating oral antipsychotics. A total of 14,165 features were created. Through twelve ML modeling iterations, EN using the top 20 features identified from at least 2 models performed the best (C-statistics=0.61; precision=0.82; sensitivity=0.66). Top 3 predictors for treatment instability included substance abuse (aOR=1.58, 95%CI=1.19-2.10), emergency department visits (aOR=1.08, 95%CI=1.03-1.13), and less frequent psychotherapy (aOR=0.92, 95%CI=0.87-0.97).
CONCLUSIONS: Our analytical framework demonstrated that leveraging ML prediction capability with mLR’s interpretability can provide novel clinical insights using high-dimensional RWD. Further investigations are needed to examine the underlying associations.
Code
PT20
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
Neurological Disorders, No Additional Disease & Conditions/Specialized Treatment Areas