Machine Learning Models to Estimate Disease Activity Measures in Real-World Data Sources: Lessons Learned from Four Autoimmune Diseases
Author(s)
Alves P1, Spencer A1, Bandaria J1, Leavy M2, Weiss S1, Curhan G1, Marci C1, Boussios C1
1OM1, Inc., Boston, MA, USA, 2OM1, Inc., Falmouth, ME, USA
Presentation Documents
OBJECTIVES: Validated measures are important to track disease activity and outcomes longitudinally in autoimmune diseases, but these scores often are missing from real-world data (RWD) sources. This effort assessed the feasibility of applying machine learning methods to routinely-recorded unstructured clinical data to estimate scores for validated measures in autoimmune diseases.
METHODS: Machine learning (ML) models were developed to estimate scores for the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) for systemic lupus erythematosus; Clinical Disease Activity Index (CDAI) for rheumatoid arthritis; Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) for ankylosing spondylitis; and Expanded Disability Status Scale (EDSS) for multiple sclerosis. For each model, training and validation cohorts were created from RWD sources, and performance metrics were calculated to assess the models.
RESULTS: The ML models performed very well when estimating scores. Binarizing the outcome as low versus high at clinically meaningful thresholds yields an area under the receiver-operating-characteristic curve (AUC) of 0.91 (SLEDAI model), 0.88 (CDAI model), 0.82 (BASDAI model), and 0.91 (EDSS model). Model development yielded several lessons that are informative for future machine learning efforts. First, the rich detail included in the clinical notes was sufficient to estimate scores for most patient encounters, suggesting that clinicians often document critical information about symptoms, progress, and medication needs even when they do not use a validated instrument to measure disease activity. Second, model features do not entirely overlap with the items on the validated instruments, emphasizing the value of a machine learning approach over simple string searches for phrases. Successful development of models for four different instruments and diseases suggests this approach is scalable to other instruments and diseases.
CONCLUSIONS: Application of these models to RWD sources is useful for addressing missing data and increasing the number of patients available for real-world research studies focused on treatment response and outcomes.
Conference/Value in Health Info
Value in Health, Volume 26, Issue 6, S2 (June 2023)
Code
MSR99
Topic
Clinical Outcomes, Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Clinician Reported Outcomes, Missing Data
Disease
Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal)