Clinical Validation of Large Language Model for Automated Extraction of Genetic Testing Data

Moderator

Colleen Caleshu, Genome Medical, South San Francisco, CA, United States

Speakers

Wenjun He; Chloe Thorpe; Carra Eagen, Fort Gratiot, MI, United States; Sara Riordan; Andi Hila

OBJECTIVES: To evaluate the performance and clinical utility of an out-of-box Large Language Model (LLM) (Gemini Pro-1.5) in extracting structured genetic testing data from unstructured clinical notes, addressing a critical gap in real-world evidence generation where genetic data remains largely inaccessible for research applications.
METHODS: A multidisciplinary validation study was conducted on 596 clinical notes from 592 unique patients, encompassing 1,048 discrete genetic variants. The extraction protocol focused on five genetic data elements: gene, variant (HGVS DNA and protein nomenclature), variant classification, and allelic state. The validation process incorporated three key components: (1) standardized validation protocols developed by genetic counselors, trainees, and data scientists; (2) performance evaluation using accuracy, precision, recall, and F1-score metrics; and (3) quality assessment of extracted data against source documentation. Each variant underwent independent verification for data integrity.
RESULTS: The model demonstrated robust performance across four of five data elements. For gene names, DNA changes, protein changes, and variant classification, the model achieved accuracy (proportion of correct predictions) of 96-97%, precision (reliability of positive predictions) exceeding 99.7%, and F1-scores above 98%. The high precision rate indicates that when the model identified a genetic variant, its extraction was nearly always correct, demonstrating excellent reliability for real-world applications. The model maintained zero hallucination rate at the variant level, meaning no fabricated or incorrectly generated genetic variants were produced during extraction. The high recall rates (>96%) demonstrate successful identification of relevant genetic information from clinical notes. Allelic state extraction presented the only notable challenge, with moderate performance (accuracy 78.7%, recall 68.3%).
CONCLUSIONS: This validation study establishes the feasibility of using zero-shot LLM implementation for automated genetic data extraction from clinical documentation. The high precision across most categories supports potential applications in real-world evidence generation. These results establish a promising approach for improving access to structured genetic data in healthcare and health research settings.

Conference/Value in Health Info

2025-05, ISPOR 2025, Montréal, Quebec, CA

Value in Health, Volume 28, Issue S1

Code

RWD44

Topic

Real World Data & Information Systems

Topic Subcategory

Distributed Data & Research Networks, Health & Insurance Records Systems, Reproducibility & Replicability

Disease

No Additional Disease & Conditions/Specialized Treatment Areas, STA: Genetic, Regenerative & Curative Therapies

Presentation (CTI)