Supervised Machine Learning for Predicting Mortality in Acute Myeloid Leukemia Patients Using Electronic Health Record Data

Author(s)

Marinaro X, Meng Z, Zhang X, Lodaya K, Hayashida DK, Munson S, D'Souza F
Boston Strategic Partners, Inc., Boston, MA, USA

OBJECTIVES

This study implements supervised machine learning (ML) to predict mortality in acute myeloid leukemia (AML) patients and determine the important features in this prediction.

METHODS

Patients were selected from a large US electronic health records database (Cerner Real-World Data) that contains over 87 million patients. We investigated the first visit for patients with an AML ICD-10 diagnosis code, inpatient stay, length of stay of at least 48 hours, and non-missing gender and age. Patient characteristics, hospital characteristics, Charlson Index, quick sequential organ failure assessment (qSOFA), interventions (e.g., mechanical ventilation) and lab values (e.g., minimum white blood cell count) were included in this analysis. Several ML algorithms were compared through 10-fold cross validation; the best performing algorithm was tuned and evaluated with a test dataset. Feature importance was extracted from the final model through permutation importance.

RESULTS

There were 8,968 patients included in this study. The ML algorithms that were compared included (mean cross-validation accuracy ± cross-validation standard deviation): logistic regression (72.9% ± 1.6%); random forests (77.5% ± 1.5%); extreme gradient boosting (XGBoost) (78.0% ± 1.2%); k-nearest neighbors (70.8% ± 1.1%); support vector machines (75.8% ± 1.3%). XGBoost was selected for the final model and after hyperparameter tuning had a prediction accuracy of 80.0%. The final model had an F1 score of 0.52, an area under the receiver operator characteristic curve (AUC ROC) of 0.79, a precision of 0.68, and a recall of 0.42. The top five most important features in this prediction were mechanical ventilation, qSOFA, age, intensive care unit admission and minimum white blood cell count.

CONCLUSIONS

Supervised ML performed well in predicting mortality in AML patients, while identifying the most important features. Similar ML algorithms may identify higher risk AML patients earlier in the hospital to support earlier efforts to modify routine management.

Conference/Value in Health Info

2021-05, ISPOR 2021, Montreal, Canada

Value in Health, Volume 24, Issue 5, S1 (May 2021)

Code

PCN244

Topic

Clinical Outcomes, Health Service Delivery & Process of Care, Methodological & Statistical Research, Real World Data & Information Systems

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Clinical Outcomes Assessment, Disease Management, Health & Insurance Records Systems

Disease

Oncology, Rare and Orphan Diseases, Systemic Disorders/Conditions

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×