USE OF A SMALL LANGUAGE MODEL TO IDENTIFY MG-ADL SCORES FROM ENCOUNTER NOTES IN AN EMR SYSTEM

Author(s)

Ravindra Telidevara, BS1, Neisha Opper, PhD, MPH2, Ishtiyaque Ahmad, PhD1, Vivek Rudrapatna, MD, PhD3, Trinabh Gupta, PhD1, Shivani Aggarwal, PhD, MS4.
1DataUnite, Cupertino, CA, USA, 2Landmark Science, La Crescenta, CA, USA, 3Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA, 4Landmark Science, Inc, Los Angeles, CA, USA.
OBJECTIVES: Myasthenia Gravis Activities of Daily Living (MG-ADL) scores are frequently used in clinical trials as a key indicator of functional status for generalized myasthenia gravis (gMG) patients. Electronic medical records (EMRs) represent a rich source of real-world data for ADL-related information. Large language models show promise for extraction of unstructured EMR data, but their infrastructure requirements limit feasibility in healthcare environments. Here, we characterize the performance of a computationally-lightweight, CPU-only small language model (SLM) in extracting available MG-ADL scores within an EMR system.
METHODS: Adult patients (≥18 years) with prevalent gMG (ICD-10-CM diagnosis code: G70.x, with earliest as index date) between January 2016-September 2025 at the University of California San Francisco were included. MG-ADL scores were extracted from encounter notes using the SLM. A sample of randomly selected notes classified by the SLM as present or absent (n=200) was clinically reviewed. Performance was evaluated for presence/absence classification, total score value, and domain values. Demographic and clinical characteristics were described among all gMG patients and those with ≥1 MG-ADL score(s).
RESULTS: Among 1,962 gMG patients, 7.8% (n=153) had ≥1 MG-ADL score on/after the index date. Demographic and clinical characteristics were similar across groups. Of encounter notes reviewed, the SLM correctly classified 195/200. Sensitivity, specificity, negative predictive, and positive predictive values ranged from 97-98% for absence/presence (F1-score=97.50%). Among true positives (n=97), the SLM extracted total scores with 100% accuracy; concordance of domain values ranged from 94.85% for ‘brushing teeth/hair’ to 98.97% for ‘arising from chair’ and ‘diplopia’.
CONCLUSIONS: MG-ADL scores were infrequently documented within the EMR. Where present, a computationally-lightweight SLM demonstrated highly discriminative performance for identifying score presence and strong concordance for total and domain-level values. These findings highlight the potential of SLM-assisted approaches for MG-ADL extraction to enable real-world outcomes research in gMG populations.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR213

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

SDC: Neurological Disorders

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×