Gleaning Novel Insights from Real World Data: A Machine-Learning Guided Analytical Framework

Speaker(s)

Xie H¹, Lo-Ciganic WH², Wang J³, Mychaskiw M⁴, Zhang Y⁵, Tian MY⁶
¹Teva Branded Pharmaceutical Products R&D, Inc., BLUE BELL, PA, USA, ²University of Pittsburgh, Pittsburgh, PA, USA, ³KMK Consulting, Inc, Morristown, NJ, USA, ⁴Teva Branded Pharmaceutical Products R&D, Inc., West Chester, PA, USA, ⁵Teva Branded Pharmaceutical Products R&D, Inc., Malvern, PA, USA, ⁶Teva Branded Pharmaceutical Products R&D, Inc., Skillman, NJ, USA

Presentation Documents

ISPOR24_Xie_PT20_POSTER139407.pdf

OBJECTIVES: The ISPOR PALISADE checklist provides important guidance for machine learning (ML) applications using real-world data (RWD). While ML methods continue to advance, little guidance exists for identifying reliable important predictors from ML. The aim of this study was to develop an analytical framework leveraging ML to identify reliable predictors and provide novel clinical insights using RWD.

METHODS: Prognostic modeling was used to develop and validate ML algorithms in a RWD case study to investigate treatment instability of oral antipsychotics in schizophrenia patients using 2012-2022 Merative™ MarketScan^® claims databases. Feature engineering was applied on demographics, baseline diagnoses (ICD-10), procedures (CPT/HCPCS), medications, healthcare resource utilization factors (measured as binary and continuous variables). We split the cohort into a 75%-25% training-testing data ratio, included features with ≥1% prevalence in model development using LASSO, elastic-net (EN), random forests, and XGBoost with a 5-fold cross validation. We identified the best-performing model using C-statistics and other metrics and used several approaches (e.g. top features identified by at least 2 models, features identified in all models) to identify reliable/novel predictors. Finally, we fitted these predictors in multivariate logistic regression (mLR) to obtain adjusted odds ratio (aOR) with 95% confidence interval (95%CI) to improve the interpretation.

RESULTS: This case study included 4,671 adult schizophrenia patients; 78.2% had treatment instability within 6-month after initiating oral antipsychotics. A total of 14,165 features were created. Through twelve ML modeling iterations, EN using the top 20 features identified from at least 2 models performed the best (C-statistics=0.61; precision=0.82; sensitivity=0.66). Top 3 predictors for treatment instability included substance abuse (aOR=1.58, 95%CI=1.19-2.10), emergency department visits (aOR=1.08, 95%CI=1.03-1.13), and less frequent psychotherapy (aOR=0.92, 95%CI=0.87-0.97).

CONCLUSIONS: Our analytical framework demonstrated that leveraging ML prediction capability with mLR’s interpretability can provide novel clinical insights using high-dimensional RWD. Further investigations are needed to examine the underlying associations.

Code

PT20

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

Neurological Disorders, No Additional Disease & Conditions/Specialized Treatment Areas

ISPOR 2024

May 5-8, 2024