Machine Learning Models to Estimate Disease Activity Measures in Real-World Data Sources: Lessons Learned from Four Autoimmune Diseases

Author(s)

Alves P1, Spencer A1, Bandaria J1, Leavy M2, Weiss S1, Curhan G1, Marci C1, Boussios C1
1OM1, Inc., Boston, MA, USA, 2OM1, Inc., Falmouth, ME, USA

Presentation Documents

OBJECTIVES: Validated measures are important to track disease activity and outcomes longitudinally in autoimmune diseases, but these scores often are missing from real-world data (RWD) sources. This effort assessed the feasibility of applying machine learning methods to routinely-recorded unstructured clinical data to estimate scores for validated measures in autoimmune diseases.

METHODS: Machine learning (ML) models were developed to estimate scores for the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) for systemic lupus erythematosus; Clinical Disease Activity Index (CDAI) for rheumatoid arthritis; Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) for ankylosing spondylitis; and Expanded Disability Status Scale (EDSS) for multiple sclerosis. For each model, training and validation cohorts were created from RWD sources, and performance metrics were calculated to assess the models.

RESULTS: The ML models performed very well when estimating scores. Binarizing the outcome as low versus high at clinically meaningful thresholds yields an area under the receiver-operating-characteristic curve (AUC) of 0.91 (SLEDAI model), 0.88 (CDAI model), 0.82 (BASDAI model), and 0.91 (EDSS model). Model development yielded several lessons that are informative for future machine learning efforts. First, the rich detail included in the clinical notes was sufficient to estimate scores for most patient encounters, suggesting that clinicians often document critical information about symptoms, progress, and medication needs even when they do not use a validated instrument to measure disease activity. Second, model features do not entirely overlap with the items on the validated instruments, emphasizing the value of a machine learning approach over simple string searches for phrases. Successful development of models for four different instruments and diseases suggests this approach is scalable to other instruments and diseases.

CONCLUSIONS: Application of these models to RWD sources is useful for addressing missing data and increasing the number of patients available for real-world research studies focused on treatment response and outcomes.

Conference/Value in Health Info

2023-05, ISPOR 2023, Boston, MA, USA

Value in Health, Volume 26, Issue 6, S2 (June 2023)

Code

MSR99

Topic

Clinical Outcomes, Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Clinician Reported Outcomes, Missing Data

Disease

Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal)

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×