Real-World Data at Scale: How Machine Learning Can Enable Learning From All Patients
Author(s)
Discussion Leader: Lotte Steuten, PhD, MSc, Office of Health Economics, London, LON, UK
Discussants: Corey Benedum, PhD, MPH, Flatiron Health, New York, NY, USA; Maarten IJzerman, PhD, Cancer Health Services Research, Erasmus School of Health Policy and Management, Rotterdam, Netherlands; Natalia Kunst, PhD, MSc, Department of Health Management and Health Economics, Faculty of Medicine, University of Oslo, Oslo, Norway
Presentation Documents
PURPOSE: This workshop explores the application of machine learning (ML) to efficiently extract information from electronic health records (EHRs) and generate variables for analysis. First, we introduce use cases for ML extraction of patient data at scale. Second, we describe approaches to evaluate the fitness-for-purpose of ML-extracted real-world data (RWD). Third, we discuss HEOR considerations for decision-making based on evidence including ML-extracted RWD.
DESCRIPTION: Several data elements critical for outcomes research are stored as unstructured (e.g., clinical notes, pathology reports) data in EHRs. Collecting this data is resource intensive and requires trained experts to manually review patient documents. ML offers a scalable solution as these models can be used to learn patterns in language associated with characteristics of interest and to subsequently extract clinically relevant information from unstructured sources. Conventional evaluation practices focus on model performance in a vacuum. Such an approach ignores how model errors interact together and potentially introduce bias into downstream analyses involving many ML-extracted variables. Thus, it is necessary to not only evaluate the performance of ML models, but also the entire dataset including ML-extracted variables to understand how model errors in combination may impact results and uncertainties in decision-making.
This workshop will focus on strengths and limitations of using ML to extract RWD, approaches to evaluate data fit-for-use, and considerations for utilization for HTA decision-making. Dr. Ijzerman will provide an overview of how ML models are being used to generate RWD. Dr. Benedum will describe approaches for evaluating the fit-for-use of ML-generated variables for outcomes research. Dr. Kunst will discuss what considerations should be made to enable the utilization of ML-extracted RWD for regulatory purposes. Panelists will pose 1-2 polling questions to guide a 15 minute interactive discussion with additional questions and comments from the audience.Conference/Value in Health Info
Code
128