USE OF A SMALL LANGUAGE MODEL TO IDENTIFY MG-ADL SCORES FROM ENCOUNTER NOTES IN AN EMR SYSTEM
Author(s)
Ravindra Telidevara, BS1, Neisha Opper, PhD, MPH2, Ishtiyaque Ahmad, PhD1, Vivek Rudrapatna, MD, PhD3, Trinabh Gupta, PhD1, Shivani Aggarwal, PhD, MS4.
1DataUnite, Cupertino, CA, USA, 2Landmark Science, La Crescenta, CA, USA, 3Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA, 4Landmark Science, Inc, Los Angeles, CA, USA.
1DataUnite, Cupertino, CA, USA, 2Landmark Science, La Crescenta, CA, USA, 3Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA, 4Landmark Science, Inc, Los Angeles, CA, USA.
OBJECTIVES: Myasthenia Gravis Activities of Daily Living (MG-ADL) scores are frequently used in clinical trials as a key indicator of functional status for generalized myasthenia gravis (gMG) patients. Electronic medical records (EMRs) represent a rich source of real-world data for ADL-related information. Large language models show promise for extraction of unstructured EMR data, but their infrastructure requirements limit feasibility in healthcare environments. Here, we characterize the performance of a computationally-lightweight, CPU-only small language model (SLM) in extracting available MG-ADL scores within an EMR system.
METHODS: Adult patients (≥18 years) with prevalent gMG (ICD-10-CM diagnosis code: G70.x, with earliest as index date) between January 2016-September 2025 at the University of California San Francisco were included. MG-ADL scores were extracted from encounter notes using the SLM. A sample of randomly selected notes classified by the SLM as present or absent (n=200) was clinically reviewed. Performance was evaluated for presence/absence classification, total score value, and domain values. Demographic and clinical characteristics were described among all gMG patients and those with ≥1 MG-ADL score(s).
RESULTS: Among 1,962 gMG patients, 7.8% (n=153) had ≥1 MG-ADL score on/after the index date. Demographic and clinical characteristics were similar across groups. Of encounter notes reviewed, the SLM correctly classified 195/200. Sensitivity, specificity, negative predictive, and positive predictive values ranged from 97-98% for absence/presence (F1-score=97.50%). Among true positives (n=97), the SLM extracted total scores with 100% accuracy; concordance of domain values ranged from 94.85% for ‘brushing teeth/hair’ to 98.97% for ‘arising from chair’ and ‘diplopia’.
CONCLUSIONS: MG-ADL scores were infrequently documented within the EMR. Where present, a computationally-lightweight SLM demonstrated highly discriminative performance for identifying score presence and strong concordance for total and domain-level values. These findings highlight the potential of SLM-assisted approaches for MG-ADL extraction to enable real-world outcomes research in gMG populations.
METHODS: Adult patients (≥18 years) with prevalent gMG (ICD-10-CM diagnosis code: G70.x, with earliest as index date) between January 2016-September 2025 at the University of California San Francisco were included. MG-ADL scores were extracted from encounter notes using the SLM. A sample of randomly selected notes classified by the SLM as present or absent (n=200) was clinically reviewed. Performance was evaluated for presence/absence classification, total score value, and domain values. Demographic and clinical characteristics were described among all gMG patients and those with ≥1 MG-ADL score(s).
RESULTS: Among 1,962 gMG patients, 7.8% (n=153) had ≥1 MG-ADL score on/after the index date. Demographic and clinical characteristics were similar across groups. Of encounter notes reviewed, the SLM correctly classified 195/200. Sensitivity, specificity, negative predictive, and positive predictive values ranged from 97-98% for absence/presence (F1-score=97.50%). Among true positives (n=97), the SLM extracted total scores with 100% accuracy; concordance of domain values ranged from 94.85% for ‘brushing teeth/hair’ to 98.97% for ‘arising from chair’ and ‘diplopia’.
CONCLUSIONS: MG-ADL scores were infrequently documented within the EMR. Where present, a computationally-lightweight SLM demonstrated highly discriminative performance for identifying score presence and strong concordance for total and domain-level values. These findings highlight the potential of SLM-assisted approaches for MG-ADL extraction to enable real-world outcomes research in gMG populations.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR213
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
SDC: Neurological Disorders