Gleaning Novel Insights from Real World Data: A Machine-Learning Guided Analytical Framework

Author(s)

Xie H1, Lo-Ciganic WH2, Wang J3, Mychaskiw M4, Zhang Y5, Tian MY6
1Teva Branded Pharmaceutical Products R&D, Inc., BLUE BELL, PA, USA, 2University of Pittsburgh, Pittsburgh, PA, USA, 3KMK Consulting, Inc, Morristown, NJ, USA, 4Teva Branded Pharmaceutical Products R&D, Inc., West Chester, PA, USA, 5Teva Branded Pharmaceutical Products R&D, Inc., Malvern, PA, USA, 6Teva Branded Pharmaceutical Products R&D, Inc., Skillman, NJ, USA

Presentation Documents

OBJECTIVES: The ISPOR PALISADE checklist provides important guidance for machine learning (ML) applications using real-world data (RWD). While ML methods continue to advance, little guidance exists for identifying reliable important predictors from ML. The aim of this study was to develop an analytical framework leveraging ML to identify reliable predictors and provide novel clinical insights using RWD.

METHODS: Prognostic modeling was used to develop and validate ML algorithms in a RWD case study to investigate treatment instability of oral antipsychotics in schizophrenia patients using 2012-2022 Merative™ MarketScan® claims databases. Feature engineering was applied on demographics, baseline diagnoses (ICD-10), procedures (CPT/HCPCS), medications, healthcare resource utilization factors (measured as binary and continuous variables). We split the cohort into a 75%-25% training-testing data ratio, included features with ≥1% prevalence in model development using LASSO, elastic-net (EN), random forests, and XGBoost with a 5-fold cross validation. We identified the best-performing model using C-statistics and other metrics and used several approaches (e.g. top features identified by at least 2 models, features identified in all models) to identify reliable/novel predictors. Finally, we fitted these predictors in multivariate logistic regression (mLR) to obtain adjusted odds ratio (aOR) with 95% confidence interval (95%CI) to improve the interpretation.

RESULTS: This case study included 4,671 adult schizophrenia patients; 78.2% had treatment instability within 6-month after initiating oral antipsychotics. A total of 14,165 features were created. Through twelve ML modeling iterations, EN using the top 20 features identified from at least 2 models performed the best (C-statistics=0.61; precision=0.82; sensitivity=0.66). Top 3 predictors for treatment instability included substance abuse (aOR=1.58, 95%CI=1.19-2.10), emergency department visits (aOR=1.08, 95%CI=1.03-1.13), and less frequent psychotherapy (aOR=0.92, 95%CI=0.87-0.97).

CONCLUSIONS: Our analytical framework demonstrated that leveraging ML prediction capability with mLR’s interpretability can provide novel clinical insights using high-dimensional RWD. Further investigations are needed to examine the underlying associations.

Conference/Value in Health Info

2024-05, ISPOR 2024, Atlanta, GA, USA

Value in Health, Volume 27, Issue 6, S1 (June 2024)

Code

PT20

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

Neurological Disorders, No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×