GENERALIZABLE MACHINE LEARNING FRAMEWORK FOR PREDICTIVE MODELING OF PATIENT OUTCOMES USING ONCOLOGY ELECTRONIC HEALTH RECORDS
Author(s)
Stasiw A1, Falk S1, Garapati S1, Sridharma S1, Mendelsohn D1, Lakhtakia S1, Rech A2, Oldridge D2, Adamson B1, Chen R1
1Flatiron Health, New York, NY, USA, 2Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
OBJECTIVES: In oncology, accurate and reliable prognostic assessments enhance clinical decision-making, both in practice and research contexts. We describe a generalizable, broadly-applicable machine learning framework for predicting patient outcomes using oncology electronic health record (EHR) data. METHODS: The framework consists of: (i) Research question identification, including a patient-level variable (“label”) to predict. (ii) Index date and observation window definition, extracting patient features for a relevant cohort. (iii) Model training using standard cross-validation techniques to optimize predictive ability. (iv) Model evaluation on an unseen cohort subset to predict the outcome label, evaluate model effectiveness, and rank features on their predictive importance. We applied this approach to a cohort of multiple myeloma patients in a predictive model of 5-year mortality post-autologous-transplant. RESULTS: The framework provided two outputs: 1) four standard machine learning models (logistic regression, random forest, support vector machine, gradient boosted trees) with performance metrics (AUC, precision, recall, accuracy); 2) a ranked list of outcome-predictive patient features according to the models. In the multiple myeloma analysis (n=1099) the label was 5-year-survival after transplant date, and EHR-defined features included demographics, medication administrations, lines of therapy, lab results, and cytogenetic or biomarker testing status. Random forest and gradient boosted trees achieved AUC of 0.79 and accuracy of 0.76; the most predictive (highest absolute weight) features identified were M-spike, chromosome 1 abnormalities, and diagnosis age. CONCLUSIONS: We have developed a generalizable machine learning framework, agnostic to specific cancer diagnosis, to improve the prediction of specific outcomes and to identify potentially-predictive features using oncology EHR-derived data. Customized models based on this framework could be applied to adverse event prediction, early detection of disease progression, and hospital readmission risk with relatively minimal labor duplication, streamlining HEOR opportunities.
Conference/Value in Health Info
2020-05, ISPOR 2020, Orlando, FL, USA
Value in Health, Volume 23, Issue 5, S1 (May 2020)
Code
PCN286
Topic
Medical Technologies, Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Digital Health
Disease
Oncology