Extraction of Ejection Fraction From Clinical Notes Using a Pretrained Albert-for-Question-Answering Model
Author(s)
Kumar V, Rasouliyan L, Althoff AG, Nikbakht M
OMNY Health, Atlanta, GA, USA
OBJECTIVES: Our goal was to determine the accuracy of a pretrained Albert-for-Question-Answering model for extraction of ejection fraction (EF) values from relevant sentences for patients having congestive heart failure.
METHODS: Comprehensive notes were collected for patients having a primary cardiovascular diagnosis from a large United States medical institution within the OMNY Health Platform. Encounter text was split into sentences; only sentences containing the terms “ef” or “ejection fraction” were further processed. The sentences were then presented to the model as context while the strings “What is the EF?” or “What is the ejection fraction?” were presented as questions. Results of the model (i.e., the answers) were further processed if they were non-null and if their confidence score was greater than or equal to 0.9. These results were manually examined for the presence of a numerical value that corresponded with the raw text, and precision and recall were calculated.
RESULTS: Data from 11,107 congestive heart failure patients and 21,856 corresponding encounters were collected. After applying inclusion and exclusion criteria, a total of 5,563 sentences from 907 patients and 1,027 encounters were ultimately presented to the model. A total of 333 sentences from 242 patients and 263 encounters met the threshold confidence score and were manually examined. Of 275 raw sentences that had a numerical EF value, 270 (recall: 98.2%) had a correct numeric EF extracted. No incorrect EF values were extracted (precision: 100%).
CONCLUSIONS: These results show that when a numerical EF value is provided in the sentence of clinical documentation, a natural language processing pipeline centered around an Albert-for-Question-Answering model can effectively extract it with high accuracy. Further work is needed to generalize the pipeline for a variety of severity scores.
Conference/Value in Health Info
Value in Health, Volume 25, Issue 12S (December 2022)
Code
MSR55
Topic
Methodological & Statistical Research, Patient-Centered Research, Real World Data & Information Systems
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Distributed Data & Research Networks, Patient-reported Outcomes & Quality of Life Outcomes, PRO & Related Methods
Disease
SDC: Cardiovascular Disorders (including MI, Stroke, Circulatory)