Comparison of Nomogram and Machine Learning Methods to Predict Survival in Patients with Non-Small Cell Lung Cancer Using Multicenter Databases
Author(s)
ABSTRACT WITHDRAWN
OBJECTIVES: Patients with non-small cell lung cancer (NSCLC) often have a poor prognosis. Overall survival (OS) prediction through the early diagnosis of cancer has many benefits, such as allowing providers to design the best treatment plan for patients. In this study, we aimed to evaluate the prognostic factors in NSCLC patients, construct a nomogram, and develop machine learning models to predict the OS.
METHODS: Multiple machine learning models, include logistic regression, random forest, XGBoost, decision tree and LightGBM, were adopted in a retrospective cohort of patients from the Surveillance, Epidemiology, and End Results (SEER) dataset (n=34567) and patients from Chongqing University Cancer Hospital, China (CUCH) dataset (n=6586). Independent prognostic factors for NSCLC were determined using Cox proportional hazards regression analysis. We modeled OS and vital status as the outcomes and constructed and validated a nomogram to predict the OS of NSCLC. Models performance were evaluated using accuracy, sensitivity, specificity, precision and the area under the receiver operating characteristic curve (AUC).
RESULTS: We conducted group LASSO regression and multivariate Cox regression analysis to understand how relevant factors of NSCLC patients impact their OS. Eight most related variables were selected for modeling. For SEER dataset, among the classifiers, XGBoost had the best prediction performance, with an AUC of 0.806. For CUCH dataset, random forest was the best one with AUC of 0.773. The prediction accuracy changed in both nomogram and machine learning models with the follow-up time varies from 1 to 5 years.
CONCLUSIONS: The results demonstrated that machine learning-based classifier models are capable of predicting the survival of patients with NSCLC. Cox regression model-based nomogram interpreted the results well. Although the model validation has not achieved completely consistent results in different datasets or different follow-up time, the slight evaluation results difference demonstrated that our model has adaptability and generalizability for future applications.
Conference/Value in Health Info
Value in Health, Volume 25, Issue 6, S1 (June 2022)
Code
CO74
Topic
Clinical Outcomes, Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Clinical Outcomes Assessment
Disease
Oncology