HARNESSING BIG DATA- A METHODOLOGICAL APPROACH TO LINKING ELECTRONIC HEALTH RECORDS WITH PATIENT-REPORTED SURVEY DATA
Author(s)
Liebert R1, Lee LK2, Jaffe DH3, Doane MJ4, Haskell T4
1Kantar Health, New York, NY, USA, 2Kantar Health, San Mateo, CA, USA, 3Kantar Health, Tel Aviv, Israel, 4Kantar Health, Horsham, PA, USA
OBJECTIVES : To assess the feasibility of linking a large nationally representative patient-reported database with an electronic health records (EHR) database to enhanced patient data. METHODS : Patient-Centered-Research (PaCeR) datasets comprising 3 years (2015-2017; total N=270207) of patient-reported data were included in a HIPAA-compliant linking methodology involving 50 million+ patients from an EHR database. Linking was performed by comparing Protected Health Information from EHR and Personal Identifiable Information from PaCeR. Data used in the linking included first and last name, address, zip code, gender, date of birth, email address, and phone number. Once data was linked, the prevalence of diagnosed type 2 diabetes (T2D), rheumatoid arthritis (RA), psoriasis, inflammatory bowel disease (IBD), depression, and migraine was examined for linked, non-linked, and all PaCeR respondents. RESULTS : Post linking, 7266 PaCeR respondents were identified as having linked records in the EHR database. Of these, 941 self-reported a physician’s diagnosis for T2D, 308 for RA, 271 for psoriasis, 149 for IBD, 1902 for depression, and 1028 for migraines. Prevalence estimates were highest for the linked respondent subsample, followed by the full PaCeR sample, and lowest for the non-linked subsample. This relationship held for the prevalence of T2D (13.98% vs. 8.91% vs. 8.75%), RA (4.47% vs. 2.92% vs. 2.87%), psoriasis (3.68% vs. 2.72% vs. 2.69%), IBD (1.94% vs. 1.24% vs. 1.22%), depression (25.69% vs. 19.63% vs. 19.44%), and migraine (13.34% vs. 9.81% vs. 9.70%). CONCLUSIONS : Linking of PaCeR and EHR databases using HIPAA-compliant methods was successful, giving a sub-sample of linked patients for which both patient-reported data and clinical data can be used to address research questions. Prevalence estimates for linked, non-linked, and the full PaCeR samples were as expected, with the highest prevalence being among those seeking care (linked), and the lowest among those who may or may not be seeking care (non-linked).
Conference/Value in Health Info
2018-05, ISPOR 2018, Baltimore, MD, USA
Value in Health, Vol. 21, S1 (May 2018)
Code
PHP173
Topic
Health Service Delivery & Process of Care
Topic Subcategory
Health Care Research
Disease
Multiple Diseases