Larger, Deeper, and in Real Time: Applications of Machine Learning and Natural Language Processing on Electronic Health Records to Learn From the Patient Journey at Scale


Selen . Bozkurt, PhD, MS, Stanford University, Palo Alto, CA, USA, Katherine Tan, PhD, Flatiron Health, Stamford, CT, USA, Ravi Parikh, MD, MPP, University of Pennsylvania, Philadelphia, PA, USA and Joe Vandigo, MBD, PhD, Applied Patient Experience, Greensburg, PA, USA

PURPOSE: This session describes the pragmatic impact of applying machine learning (ML) and natural language processing (NLP) on Electronic Health Records (EHRs) to generate and accelerate insights on the patient journey through scale, depth, and speed.

DESCRIPTION: The emergence of novel prognostic factors (biomarkers, mutations) and associated therapies have increased appreciation and emphasis of precision medicine. To understand patient outcomes in increasingly smaller populations and to keep up with the rapidly evolving standard of care, real-world data (RWD) information systems need to be scalable, relevant, and timely. Applications to understand outcomes for patients with rare biomarkers, to identify at-risk populations, and to address inequities in care delivery and research all require large-scale and detailed RWD in order to provide precise estimates in all relevant subgroups to assist decision-making. Though conventional data sources such as claims data provide information at scale, they lack the clinical depth that unstructured EHR data (e.g., clinical notes, pathology reports) offers, such as information about biomarker results. While traditional manual curation by clinical experts provides rich information about the patient journey, ML/NLP techniques have improved upon these resource-intensive curation methods, generating deep insights at scale and in real-time.

This session will focus on case studies where ML/NLP is applied to EHRs to obtain RWD and real-world evidence (RWE) quickly and at scale. First, an overview of using ML/NLP to generate RWD/RWE is provided. Then, case studies on increasing stratification cohort sizes (scale), keeping up with standard of care (speed), and improving representation of underserved populations (depth) are discussed. Speakers will share metrics to evaluate the pragmatic impact of using ML/NLP on their applications. Speakers will pose 1-2 polling questions to guide a discussion, where the audience is invited to share their perspectives and experiences.

Signal—ISPOR’s signature program—looks beyond today’s linear thinking to explore topics that will shape healthcare decision making over the next decade. Seeking to strengthen strategic foresight and adaptive leadership capacities in the complex world of healthcare, the Signal series focuses on the “big picture,” while also addressing how health economics and outcomes research (HEOR) can best contribute to solving healthcare’s greatest challenges.




Real World Data & Information Systems