PREDICTIVE MODEL OF PARKINSON'S DISEASE IN LARGE ELECTRONIC HEALTH RECORDS DATABASE
Author(s)
Kabadi S1, Lee A2, Kuhn M1, Gray D3
1Pfizer, Inc., Groton, CT, USA, 2University of Florida, Gainesville, FL, USA, 3Pfizer, Inc., Cambridge, MA, USA
OBJECTIVES: The objective of this investigation was to analyze de-identified electronic health record (EHR) data to predict a Parkinson’s Disease (PD) diagnosis. METHODS: Patients ≥ 30 years of age with evidence of continuous activity from January 1, 2012 to December 31, 2013 were eligible for inclusion (n = 3,057,540). PD cases (n = 2,097) were identified by two diagnoses for PD (ICD-9: 332.0) in calendar year 2013 and controls (n=2,548,563) were without a diagnosis for PD. A “training” dataset (n = 1,912,996) was used for model development and a “test” dataset (n = 637,664) was reserved to confirm model performance. Sixty demographic, clinical diagnosis and healthcare resource utilization (HRU) variables derived from the calendar year 2012 were entered into logistic regression (LR), classification and regression tree (CART), and random forest (RF) models. The LR and CART models used the full dataset, however, downsampling was applied to the RF model to handle class imbalance. Importance of the variables was estimated and predictive accuracy was evaluated using area under the curve (AUC). RESULTS: The LR model (AUC=0.84) was the better fit when applied to training data compared to CART (AUC = 0.53) and RF (AUC= 0.72) models. Age, sex, diagnosis of postural instability, and diagnosis of sleep disorders were important variables in predicting a PD diagnosis. Furthermore, number of levodopa prescriptions written and visits to a general practitioner in the year prior to diagnosis were important HRU variables. LR model performance metrics were acceptable when applied to the test dataset (AUC=0.85, specificity=0.75, sensitivity=0.81). CONCLUSIONS: Data mining methods can be used to identify patients with Parkinson’s Disease using 60 variables in EHR data with acceptable AUC, sensitivity, and specificity. Sleep disorders may be more predictive of PD in the year prior to diagnosis than previous research suggests.
Conference/Value in Health Info
2016-05, ISPOR 2016, Washington DC, USA
Value in Health, Vol. 19, No. 3 (May 2016)
Code
PND3
Topic
Epidemiology & Public Health
Topic Subcategory
Safety & Pharmacoepidemiology
Disease
Neurological Disorders