Machine Learning Models in Prediction of Head and Neck Squamous Cell Carcinoma Survivability
Author(s)
Rohatgi O1, Agrawal N2, Goswami S3, Vivek V2, Chaudhuri M2, Aparasu RR4
1Complete HEOR Solutions (CHEORS), Jaipur, RJ, India, 2Complete HEOR Solutions (CHEORS), Chalfont, PA, USA, 3Complete HEOR Solutions (CHEORS), Irvine, CA, USA, 4University of Houston, College of Pharmacy, Houston, TX, USA
Presentation Documents
OBJECTIVES: Predicting cancer survival is important for patient care management. The objective of the study was to evaluate the performance of common machine learning (ML) models in predicting 5-year overall survival of Head and Neck Squamous Cell Carcinoma (HNSCC) based on clinical and demographic prognostic factors.
METHODS: Patients diagnosed with malignant HNSCC between 2008 and 2019 of age≥20 years were identified from the Surveillance, Epidemiology, and End Results (SEER) 17 registries database. Patients having unknown/missing values, or multiple primary sites were excluded. The study evaluated the performance of three ML models - decision trees (DT), random forest (RF), and support vector machine (SVM) along with logistic regression (LR). The predictors included patient demographic and clinical characteristics. for 5-year survival. The classification performance of ML models was evaluated based on accuracy, the area under the receiver operating characteristic curve (AUROC), and the F1 score.
RESULTS: The study cohort included 54,263 patients with HNSCC. Most of the selected patients were White (84.3%), male (74.0%), aged ≥65 years (44.9%), and had oropharyngeal cancer (9.4%). The machine learning algorithms under study were able to predict the overall survival of HNSCC patients. DT outperformed RF, SVM, and LR in terms of AUROC score (DT:72%, RF:70%, SVM:66%, LR:67%). SVM and LR had the highest accuracy followed by DT and RF (SVM:70%, LR:70%, DT:69%, RF:67%) while having comparable F1 scores (SVM:56%, LR:57%, DT:55%, RF:56%).
CONCLUSIONS: Overall, the performance of all ML models varied with better performance with DT based on the AUROC. However, the F1 scores were similar for the three ML models. More work is needed to evaluate the ML models involving external validations for further considerations in healthcare.
Conference/Value in Health Info
Value in Health, Volume 26, Issue 6, S2 (June 2023)
Code
MSR75
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Missing Data, PRO & Related Methods
Disease
No Additional Disease & Conditions/Specialized Treatment Areas