Supervised Machine Learning for Predicting Mortality in Acute Myeloid Leukemia Patients Using Electronic Health Record Data
Author(s)
Marinaro X, Meng Z, Zhang X, Lodaya K, Hayashida DK, Munson S, D'Souza F
Boston Strategic Partners, Inc., Boston, MA, USA
OBJECTIVES This study implements supervised machine learning (ML) to predict mortality in acute myeloid leukemia (AML) patients and determine the important features in this prediction. METHODS Patients were selected from a large US electronic health records database (Cerner Real-World Data) that contains over 87 million patients. We investigated the first visit for patients with an AML ICD-10 diagnosis code, inpatient stay, length of stay of at least 48 hours, and non-missing gender and age. Patient characteristics, hospital characteristics, Charlson Index, quick sequential organ failure assessment (qSOFA), interventions (e.g., mechanical ventilation) and lab values (e.g., minimum white blood cell count) were included in this analysis. Several ML algorithms were compared through 10-fold cross validation; the best performing algorithm was tuned and evaluated with a test dataset. Feature importance was extracted from the final model through permutation importance. RESULTS There were 8,968 patients included in this study. The ML algorithms that were compared included (mean cross-validation accuracy ± cross-validation standard deviation): logistic regression (72.9% ± 1.6%); random forests (77.5% ± 1.5%); extreme gradient boosting (XGBoost) (78.0% ± 1.2%); k-nearest neighbors (70.8% ± 1.1%); support vector machines (75.8% ± 1.3%). XGBoost was selected for the final model and after hyperparameter tuning had a prediction accuracy of 80.0%. The final model had an F1 score of 0.52, an area under the receiver operator characteristic curve (AUC ROC) of 0.79, a precision of 0.68, and a recall of 0.42. The top five most important features in this prediction were mechanical ventilation, qSOFA, age, intensive care unit admission and minimum white blood cell count. CONCLUSIONS Supervised ML performed well in predicting mortality in AML patients, while identifying the most important features. Similar ML algorithms may identify higher risk AML patients earlier in the hospital to support earlier efforts to modify routine management.
Conference/Value in Health Info
2021-05, ISPOR 2021, Montreal, Canada
Value in Health, Volume 24, Issue 5, S1 (May 2021)
Code
PCN244
Topic
Clinical Outcomes, Health Service Delivery & Process of Care, Methodological & Statistical Research, Real World Data & Information Systems
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Clinical Outcomes Assessment, Disease Management, Health & Insurance Records Systems
Disease
Oncology, Rare and Orphan Diseases, Systemic Disorders/Conditions