Extracting Severity Markers from Unstructured Clinical Data of Congestive Heart Failure Patients Using a Pretrained Text-to-Text Transfer Transformer Model

Author(s)

Kumar V¹, Rasouliyan L¹, Althoff AG¹, Long S¹, Zema C², Rao MB¹
¹OMNY Health, Atlanta, GA, USA, ²Zema Consulting, Huntsville, AL, USA

OBJECTIVES: Our objective was to determine if severity measures commonly present in unstructured clinical data of congestive heart failure (CHF) patients could be extracted using a text-to-text transfer transformer (T5) model.

METHODS: Comprehensive clinical notes collected from a large United States medical institution were included if they contained strings suggesting a patient history of CHF. These notes were randomly sampled and analyzed using a pretrained T5 model available in a licensed software package using free-text sentences as the questions (e.g., “What is the ejection fraction?”) and complete clinical notes as the context. For New York Heart Association (NYHA) class, returned answers were marked correct if they were integers or roman numerals between 1 and 4. For ejection fraction (EF), returned answers were marked correct if they were in percentage form, integers between 0 and 100, or decimals between 0 and 1. Ranges for these data elements were also marked as correct.

RESULTS: Of approximately 9.6 million notes collected, 440 thousand contained CHF-specific strings, and 200 of these notes were randomly sampled for analysis. Processing time was approximately 0.7 hours on a 128-core, 512 GB RAM CPU processor. Four questions (2 for EF and 2 for NYHA class) were presented to the 200 notes for a total of 800 possible answers. Valid EF and NYHA class values or ranges were detected in 31/400 (7.7%) and 16/400 (4.0%) answers, respectively.

CONCLUSIONS: These results suggest that T5 models can extract disease-specific knowledge related to CHF severity from unstructured clinical data. Potential applications include characterizing patients for incorporation into health economic models and improving predictive accuracy for CHF readmission and mortality. Further work is required to determine sensitivity and specificity of valid data elements in analyzed notes and to improve performance, yield, and process automation of extracting these disease severity concepts.

Conference/Value in Health Info

2022-05, ISPOR 2022, Washington, DC, USA

Value in Health, Volume 25, Issue 6, S1 (June 2022)

Code

MSR43

Topic

Methodological & Statistical Research, Patient-Centered Research, Real World Data & Information Systems, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Distributed Data & Research Networks, Electronic Medical & Health Records, Patient-reported Outcomes & Quality of Life Outcomes

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic

Presentation