Extraction of Ejection Fraction From Clinical Notes Using a Pretrained Albert-for-Question-Answering Model

Author(s)

Kumar V, Rasouliyan L, Althoff AG, Nikbakht M
OMNY Health, Atlanta, GA, USA

OBJECTIVES: Our goal was to determine the accuracy of a pretrained Albert-for-Question-Answering model for extraction of ejection fraction (EF) values from relevant sentences for patients having congestive heart failure.

METHODS: Comprehensive notes were collected for patients having a primary cardiovascular diagnosis from a large United States medical institution within the OMNY Health Platform. Encounter text was split into sentences; only sentences containing the terms “ef” or “ejection fraction” were further processed. The sentences were then presented to the model as context while the strings “What is the EF?” or “What is the ejection fraction?” were presented as questions. Results of the model (i.e., the answers) were further processed if they were non-null and if their confidence score was greater than or equal to 0.9. These results were manually examined for the presence of a numerical value that corresponded with the raw text, and precision and recall were calculated.

RESULTS: Data from 11,107 congestive heart failure patients and 21,856 corresponding encounters were collected. After applying inclusion and exclusion criteria, a total of 5,563 sentences from 907 patients and 1,027 encounters were ultimately presented to the model. A total of 333 sentences from 242 patients and 263 encounters met the threshold confidence score and were manually examined. Of 275 raw sentences that had a numerical EF value, 270 (recall: 98.2%) had a correct numeric EF extracted. No incorrect EF values were extracted (precision: 100%).

CONCLUSIONS: These results show that when a numerical EF value is provided in the sentence of clinical documentation, a natural language processing pipeline centered around an Albert-for-Question-Answering model can effectively extract it with high accuracy. Further work is needed to generalize the pipeline for a variety of severity scores.

Conference/Value in Health Info

2022-11, ISPOR Europe 2022, Vienna, Austria

Value in Health, Volume 25, Issue 12S (December 2022)

Code

MSR55

Topic

Methodological & Statistical Research, Patient-Centered Research, Real World Data & Information Systems

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Distributed Data & Research Networks, Patient-reported Outcomes & Quality of Life Outcomes, PRO & Related Methods

Disease

SDC: Cardiovascular Disorders (including MI, Stroke, Circulatory)

Explore Related HEOR by Topic

Presentation