Identifying Clinical Subgroups/Clusters of Alzheimer’s Patients from Optum’s De-Identified Market Clarity Database Using Machine Learning Techniques

Speaker(s)

Verma V1, Markan R2, Khan S2, Brooks L3, Khandelwal H2, Dawar V2, Roy A2, Bhargava S4, Gaur A2, Kukreja I5, Nayyar A2, Paul A2
1Optum, Gurgaon, HR, India, 2Optum, Gurugram, HR, India, 3Optum, Basking Ridge, NJ, USA, 4Optum Tech, Eden Prarie, MN, USA, 5Optum, New Delhi, DL, India

OBJECTIVES:

To identify and categorize Alzheimer’s disease (AD) patients into clinically relevant groups based on demographics, symptoms, and comorbidities.

METHODS:

AD patients were identified using Optum® de-identified Market Clarity Dataset, which links medical, and pharmacy claims with EHR data using ICD-9 and ICD-10 codes. Continuous eligible patients (3 years pre index) above the age of 60 with at 2 outpatient diagnosis (30 days apart) OR one inpatient diagnosis recorded between 1st January 2019 and 31st Dec 2020 were included in the analysis. Patients with no symptoms, associated co-morbidities and prior claim of AD in the pre-index period were excluded from the analysis. For identifying subgroups, we used ML techniques like K-Means with multiple correspondence analysis (MCA), Agglomerative Hierarchical, and DBSCAN. Silhouette score was used to determine the optimal number of clusters (K).

RESULTS:

Among 108,714 patients, mean age of 81 years and predominant female (61%) population was observed. 21 features were included in the study using K-Means, Hierarchical and DBSCAN algorithm and 5, 6, 4 clusters were observed respectively. K-means with MCA gave most consistent subgroups based on distance measures cosine and Eigen values.

Hypertension was identified as prominent risk factor and was present in all the clusters. Cluster1(Mild) had 49.5k patients with hypertension~85% and diabetes~34%. Cluster2(Severe) had 31.8k patients with hypertension~97%, diabetes~67%, heart failure~63%, coronary artery disease~78%, Kidney-disease~69%, atrial-fibliration~53%. Cluster3(Moderate) had 5k patients with hypertension~91%, diabetes~50%, fall~20%. Cluster4(Onset) had 6k patients with no significant comorbidities. Cluster5(Caution) had 16k patients with hypertension~ 92%, fall~85%, confusion~54%, depression~45%, memory loss~ 41%.

CONCLUSIONS:

This study can be leveraged for personalized and targeted healthcare intervention among different clusters. The study can also be used to determine if clusters have similar genetic makeup increasing their risk of developing AD. Early intervention in such cases can thus be beneficial in slowing disease progression and reducing the overall cost of care.

Code

RWD12

Topic

Methodological & Statistical Research, Real World Data & Information Systems, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Electronic Medical & Health Records, Reproducibility & Replicability

Disease

Neurological Disorders