Large Language Models and Case Reports: An Innovative Approach to Real-World Data for Rare Disease Natural History Analysis
Author(s)
Paek H1, Lee K2, Huang LC2, Annan A2, Rastergar-mojarad M1, Wang X2
1IMO Health, Rosemont, IL, USA, 2IMO health, Rosemont, IL, USA
Presentation Documents
OBJECTIVES: Clinical trials for rare diseases face unique challenges, including small patient populations and a limited understanding of the natural history of diseases, which complicates the setting of the clinical trial endpoints. Case reports often include rich narratives of detailed clinical observations of individual patients. Despite their value as real-world data (RWD) sources, these case reports are often underutilized. We aimed to develop a system for extracting comprehensive clinical features of rare diseases from “case report” studies by leveraging the large language models (LLMs) and structuring them into a computable format
METHODS: We selected two use cases, Fabry disease and Immunoglobulin A nephropathy (IGAN), and collected full-text “case reports” from PubMed. Using 20 abstracts from each disease group, we developed an LLM-based case report processing system, which extracted all clinical features described in case reports and conducted both quantitative and qualitative evaluations on 50 case reports for each disease.
RESULTS: Our system extracted an average of 286 clinical features and corresponding values per report for Fabry disease, ranging from 129-452 features. For IGAN, we extracted an average of 94 features and corresponding values per report, ranging from 67-127. These clinical features include patient demographics, disease characteristics such as diagnosis and genetic information, laboratory test results, comorbidities, treatment history, and outcomes. Our model achieved precision, recall, and F1 scores of 0.9956, 0.9966, and 0.9961 for Fabry disease, 0.9835, 0.9736, and 0.9785 for IGAN, respectively. We also visualized the geographical locations of each rare disease case using the first author’s affiliation.
CONCLUSIONS: Our study validates the potential of using case reports as sources of RWD and demonstrates the effectiveness of LLM in extracting clinical data from case reports. This approach enhances the generation of robust real-world evidence and improves our understanding of the natural history of rare diseases.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 12, S2 (December 2024)
Code
RWD51
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
Rare & Orphan Diseases