Predictive Models Leveraging Machine Learning and Real-World Data for Early Diagnosis: An Application in Amyotrophic Lateral Sclerosis

Author(s)

Nathan R1, Miller C2, Shukla O2, Garbayo A2, Hagan M3, Harrison A4, Ciepielewska M5, Apple S5
1EVERSANA Life Sciences Inc., Milwaukee, WI, USA, 2EVERSANA Life Sciences Inc., Wayne, PA, USA, 3Mitsubishi Tanabe Pharma America, Inc., Rockaway, NJ, USA, 4Mitsubishi Tanabe Pharma America, Inc., Fleming Island, FL, USA, 5Mitsubishi Tanabe Pharma America, Inc., Jersey City, NJ, USA

OBJECTIVES: To assess the utility of machine learning for predicting early diagnosis of amyotrophic lateral sclerosis (ALS) based on real-world data (RWD).

METHODS: We identified 4779 patients with ALS and without primary lateral sclerosis from the Optum® de-identified Electronic Health Record (EHR) dataset (2007-2020), and 47,781 patients as the control cohort who did not have ALS and were demographically matched by age and gender in a 1:10 target to control ratio. Mutual information was used to explore and identify features in RWD, including lab, microbiology, and natural language processing biomarkers available in EHR, by comparing the target population (ALS patients) with the demographically matched control cohort. We trained various machine learning models (eg, logistic regression, random forest, gradient boosting, support vector machines, neural networks, soft voting) spanning different periods of time relative to a defined index date and compared their performance in predicting early diagnosis of ALS.

RESULTS: Predictive models trained with gradient boosting on data closer to the defined index date, including lab tests from EHR, performed the best and had a very low false positive rate (AUC=0.9463). This model suggested that the top 5 predictors of an undiagnosed ALS patient were muscle weakness (generalized), normal thyroid stimulating hormone levels, dysphagia (unspecified), cramp of limb/abnormal involuntary movements, and other musculoskeletal symptoms referable to limbs. Many of the features were diagnoses that could be considered for an earlier evaluation of ALS in clinical practice. Indeed, the model had a sensitivity of 1%, specificity >99.0%, and was able to identify with a precision of 63% patients not yet identified with ALS, suggesting that early screening for ALS would be beneficial.

CONCLUSIONS: This study highlights opportunities of leveraging machine learning utilizing EHR RWD to identify features that predict early diagnosis of ALS.

Conference/Value in Health Info

2021-05, ISPOR 2021, Montreal, Canada

Value in Health, Volume 24, Issue 5, S1 (May 2021)

Code

PND53

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

Neurological Disorders

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×