Language Model-Based Approach for Extracting Comorbidities and Complications in Fabry Disease Clinical Notes
Author(s)
Cossio M1, Gilardino R2
1HE-Xperts Consulting LLC, Barcelona, Spain, 2MSD, Dubendorf, ZH, Switzerland
Presentation Documents
OBJECTIVES: Rare disease morbidity burdens healthcare systems. Equipping teams to manage rare diseases can alleviate strain. This study develops an automated system using language models to extract complications and comorbidities from clinical notes of Fabry disease patients.
METHODS: Clinical note analysis utilized prompts engineering with chat GPT API (in Google Colaboratory with Python 3.9). BIOBERT model generated embeddings for extracted terms, followed by K-means clustering. GPT chat extracted two representative terms per cluster. Various cluster numbers (CN) were experimented with and evaluated, considering metrics such as the mean and standard deviation (SD) of the number of terms per cluster, the number of clusters (including outliers and those containing appropriate representative terms), and the number of clusters characterized by representative terms that are pathognomonic of Fabry disease.
RESULTS: Term extraction and transformation into embeddings averaged 7 seconds. 17 tests with varying CN were conducted. Increasing the CN led to a decrease in both the mean and SD (from 12.2 to 2.3 and 8.5 to 1.6, respectively), as anticipated. Additionally, an inverse relationship was observed between the number of clusters and outliers, with a corresponding increase in agreement between the clusters and their representative terms. It was further noted that beyond 16 clusters, there was no significant augmentation in the pathognomonic representative terms specific to Fabry disease (e.g., proteinuria, acroparesthesia, etc.). Conversely, the representative terms encompassed nonspecific factors (e.g., dizziness, nausea, etc.) beyond this threshold.
CONCLUSIONS: This study facilitated the extraction of comorbidities and complications from a set of 20 clinical notes within an average time of approximately 7 seconds. While successful in clustering terms exhibiting characteristics associated with Fabry disease, further research is warranted to mitigate the presence of nonspecific terms in future analyses.
Conference/Value in Health Info
Value in Health, Volume 26, Issue 11, S2 (December 2023)
Code
CO146
Topic
Clinical Outcomes, Medical Technologies, Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Clinical Outcomes Assessment
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, Rare & Orphan Diseases