GENERALIZABLE MACHINE LEARNING FRAMEWORK FOR PREDICTIVE MODELING OF PATIENT OUTCOMES USING ONCOLOGY ELECTRONIC HEALTH RECORDS

Author(s)

Stasiw A¹, Falk S¹, Garapati S¹, Sridharma S¹, Mendelsohn D¹, Lakhtakia S¹, Rech A², Oldridge D², Adamson B¹, Chen R¹
¹Flatiron Health, New York, NY, USA, ²Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA

OBJECTIVES: In oncology, accurate and reliable prognostic assessments enhance clinical decision-making, both in practice and research contexts. We describe a generalizable, broadly-applicable machine learning framework for predicting patient outcomes using oncology electronic health record (EHR) data. METHODS: The framework consists of: (i) Research question identification, including a patient-level variable (“label”) to predict. (ii) Index date and observation window definition, extracting patient features for a relevant cohort. (iii) Model training using standard cross-validation techniques to optimize predictive ability. (iv) Model evaluation on an unseen cohort subset to predict the outcome label, evaluate model effectiveness, and rank features on their predictive importance. We applied this approach to a cohort of multiple myeloma patients in a predictive model of 5-year mortality post-autologous-transplant. RESULTS: The framework provided two outputs: 1) four standard machine learning models (logistic regression, random forest, support vector machine, gradient boosted trees) with performance metrics (AUC, precision, recall, accuracy); 2) a ranked list of outcome-predictive patient features according to the models. In the multiple myeloma analysis (n=1099) the label was 5-year-survival after transplant date, and EHR-defined features included demographics, medication administrations, lines of therapy, lab results, and cytogenetic or biomarker testing status. Random forest and gradient boosted trees achieved AUC of 0.79 and accuracy of 0.76; the most predictive (highest absolute weight) features identified were M-spike, chromosome 1 abnormalities, and diagnosis age. CONCLUSIONS: We have developed a generalizable machine learning framework, agnostic to specific cancer diagnosis, to improve the prediction of specific outcomes and to identify potentially-predictive features using oncology EHR-derived data. Customized models based on this framework could be applied to adverse event prediction, early detection of disease progression, and hospital readmission risk with relatively minimal labor duplication, streamlining HEOR opportunities.

Conference/Value in Health Info

2020-05, ISPOR 2020, Orlando, FL, USA

Value in Health, Volume 23, Issue 5, S1 (May 2020)

Code

PCN286

Topic

Medical Technologies, Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Digital Health

Disease

Oncology

Explore Related HEOR by Topic

Presentation