Leveraging Large Language Models and EMR Data to Identify Undiagnosed Rare Diseases: A Hybrid AI Approach

Author(s)

Sandy Balkin, PhD.
SVP Strategy & Analytics, Royalty Pharma, New York, NY, USA.

Presentation Documents

SBalkin ISPOR-2025 Poster.pdf

OBJECTIVES: To evaluate how large language models and our electronic medical record data provided by NextGen EMR can be leveraged to identify patients with undiagnosed rare diseases, and to assess the effectiveness of hybrid AI approaches in extracting clinical phenotypes and prioritizing high-risk cases for follow-up.
METHODS: Four foundational LLMs were used to analyze structured EMR data from patients with genetically confirmed rare diseases to identify characteristic phenotype patterns. These patterns were then applied to the broader patient population, enabling LLMs and machine learning algorithms to screen for individuals with similar profiles and flag potential undiagnosed rare disease cases for further review.
RESULTS: LLM-driven analysis of structured EMR data identified characteristic phenotypes of genetically confirmed rare diseases. Applying these patterns to the full patient population flagged additional high-risk individuals, improving sensitivity and specificity over rule-based methods and enabling earlier identification for genetic evaluation.
CONCLUSIONS: LLM-based analysis of structured EMR data enables more accurate and scalable screening for rare diseases. Incorporating unstructured clinical data in the future could further enhance identification and support earlier diagnosis.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

RWD117

Topic

Methodological & Statistical Research, Patient-Centered Research, Real World Data & Information Systems

Disease

Rare & Orphan Diseases

Presentation (CTI)