USE OF A SMALL LANGUAGE MODEL TO IDENTIFY MG-ADL SCORES FROM ENCOUNTER NOTES IN AN EMR SYSTEM

Author(s)

Ravindra Telidevara, BS¹, Neisha Opper, PhD, MPH², Ishtiyaque Ahmad, PhD¹, Vivek Rudrapatna, MD, PhD³, Trinabh Gupta, PhD¹, Shivani Aggarwal, PhD, MS⁴.
¹DataUnite, Cupertino, CA, USA, ²Landmark Science, La Crescenta, CA, USA, ³Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA, ⁴Landmark Science, Inc, Los Angeles, CA, USA.

Presentation Documents

Telidevara_SLM for gMG, ISPOR 2026.pdf

OBJECTIVES: Myasthenia Gravis Activities of Daily Living (MG-ADL) scores are frequently used in clinical trials as a key indicator of functional status for generalized myasthenia gravis (gMG) patients. Electronic medical records (EMRs) represent a rich source of real-world data for ADL-related information. Large language models show promise for extraction of unstructured EMR data, but their infrastructure requirements limit feasibility in healthcare environments. Here, we characterize the performance of a computationally-lightweight, CPU-only small language model (SLM) in extracting available MG-ADL scores within an EMR system.
METHODS: Adult patients (≥18 years) with prevalent gMG (ICD-10-CM diagnosis code: G70.x, with earliest as index date) between January 2016-September 2025 at the University of California San Francisco were included. MG-ADL scores were extracted from encounter notes using the SLM. A sample of randomly selected notes classified by the SLM as present or absent (n=200) was clinically reviewed. Performance was evaluated for presence/absence classification, total score value, and domain values. Demographic and clinical characteristics were described among all gMG patients and those with ≥1 MG-ADL score(s).
RESULTS: Among 1,962 gMG patients, 7.8% (n=153) had ≥1 MG-ADL score on/after the index date. Demographic and clinical characteristics were similar across groups. Of encounter notes reviewed, the SLM correctly classified 195/200. Sensitivity, specificity, negative predictive, and positive predictive values ranged from 97-98% for absence/presence (F1-score=97.50%). Among true positives (n=97), the SLM extracted total scores with 100% accuracy; concordance of domain values ranged from 94.85% for ‘brushing teeth/hair’ to 98.97% for ‘arising from chair’ and ‘diplopia’.
CONCLUSIONS: MG-ADL scores were infrequently documented within the EMR. Where present, a computationally-lightweight SLM demonstrated highly discriminative performance for identifying score presence and strong concordance for total and domain-level values. These findings highlight the potential of SLM-assisted approaches for MG-ADL extraction to enable real-world outcomes research in gMG populations.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR213

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

SDC: Neurological Disorders

Presentation (CTI)