Machine Learning Models to Estimate Disease Activity Measures in Real-World Data Sources: Lessons Learned from Four Autoimmune Diseases

Author(s)

Alves P¹, Spencer A¹, Bandaria J¹, Leavy M², Weiss S¹, Curhan G¹, Marci C¹, Boussios C¹
¹OM1, Inc., Boston, MA, USA, ²OM1, Inc., Falmouth, ME, USA

Presentation Documents

ISPOR23_Paulus_POSTER126315.pdf

OBJECTIVES: Validated measures are important to track disease activity and outcomes longitudinally in autoimmune diseases, but these scores often are missing from real-world data (RWD) sources. This effort assessed the feasibility of applying machine learning methods to routinely-recorded unstructured clinical data to estimate scores for validated measures in autoimmune diseases.

METHODS: Machine learning (ML) models were developed to estimate scores for the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) for systemic lupus erythematosus; Clinical Disease Activity Index (CDAI) for rheumatoid arthritis; Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) for ankylosing spondylitis; and Expanded Disability Status Scale (EDSS) for multiple sclerosis. For each model, training and validation cohorts were created from RWD sources, and performance metrics were calculated to assess the models.

RESULTS: The ML models performed very well when estimating scores. Binarizing the outcome as low versus high at clinically meaningful thresholds yields an area under the receiver-operating-characteristic curve (AUC) of 0.91 (SLEDAI model), 0.88 (CDAI model), 0.82 (BASDAI model), and 0.91 (EDSS model). Model development yielded several lessons that are informative for future machine learning efforts. First, the rich detail included in the clinical notes was sufficient to estimate scores for most patient encounters, suggesting that clinicians often document critical information about symptoms, progress, and medication needs even when they do not use a validated instrument to measure disease activity. Second, model features do not entirely overlap with the items on the validated instruments, emphasizing the value of a machine learning approach over simple string searches for phrases. Successful development of models for four different instruments and diseases suggests this approach is scalable to other instruments and diseases.

CONCLUSIONS: Application of these models to RWD sources is useful for addressing missing data and increasing the number of patients available for real-world research studies focused on treatment response and outcomes.

Conference/Value in Health Info

2023-05, ISPOR 2023, Boston, MA, USA

Value in Health, Volume 26, Issue 6, S2 (June 2023)

Code

MSR99

Topic

Clinical Outcomes, Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Clinician Reported Outcomes, Missing Data

Disease

Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal)

Explore Related HEOR by Topic

Presentation