Claims-Based Machine Learning Algorithms to Predict ECOG Performance Status (ECOG) for Pan-Cancer (Prostate, Breast, Colorectal, Gastric, Non-small Cell Lung (NSCLC), and Pancreatic Cancer) Patients

Author(s)

Saha J1, Zhang N2, Das A3, Liao J4
1Guardant Health, Minnetonka, MN, USA, 2Guardant Health, Palo Alto, CA, USA, 3Guardant Health, Cambridge, MA, USA, 4Guardant Health, San Mateo, CA, USA

Presentation Documents

OBJECTIVES: ECOG is used extensively in oncology to assess progression and determine treatment and prognosis. As it is a clinical measurement, claims databases do not capture it. Previous studies relied on logistic regression models to predict ECOG with claims information. In this study, we assessed the utility of automated machine learning (AutoML) models in enhancing prediction accuracy.

METHODS: Patients with prostate, breast, colorectal, gastric, NSCLC or pancreatic cancer were identified from the claims-based clinical-genomic database GuardantINFORM from June 2014 to December 2022. Patients with a valid ECOG value extracted from their pathology reports (gold standard) and at least 2 medical claims in the 180 days prior to date of ECOG test result were included. ECOG value was dichotomized to 0-1 (good) vs 2+ (poor). Data was split into 60% training and 40% testing. The H2O R package was used to obtain the top 20 AutoML models ranked by performance, as measured by the area under the precision-recall curve (AUPRC). In a secondary analysis, we used principal component analysis (PCA) to improve the model efficiency.

RESULTS: Data from 8673 patients encompassing 85 claims-based variables were used. 1375 (16%) had poor ECOG score. The AutoML generated best model was a gradient boosting machine model (AUPRC=0.35; AUC=0.73; sensitivity = 62%; specificity= 93%). Across all the top models, age, number of hospitalizations, outpatient visits and Charlson comorbidity score were the major contributing variables. Using PCA (first 49 principal components, representing 75% of total variability), running efficiency was improved and Lasso regression model was the best model identified (AUPRC=0.34; AUC=0.72; sensitivity=53% and specificity=78%).

CONCLUSIONS: Using AutoML, we were able to develop highly specific models to predict the ECOG score for six different cancer types using claims data. This framework can be extended to other claims databases with some linked ECOG status from medical records.

Conference/Value in Health Info

2024-05, ISPOR 2024, Atlanta, GA, USA

Value in Health, Volume 27, Issue 6, S1 (June 2024)

Code

PT13

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

Oncology

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×