Quality of Electronic Health Records: Managing Aggregated Patient Demographic Information

Speaker(s)

Jaffe D1, Ruo P2, Way N3
1Cerner Enviza, an Oracle Company, Jerusalem, Israel, 2Cerner Enviza, an Oracle Company, Kansas City, MO, USA, 3Cerner Enviza, an Oracle Company, Santa Barbara, CA, USA

OBJECTIVES: Electronic health records (EHRs) are a real-time aggregate of patient health and related data for improved care and outcomes. Inconsistencies in patient data within and between healthcare systems may result from variability in collection methods, documentation, and coding practices, in addition to longitudinal changes in a patient’s profile. This study examines the impact of multiple entries for patient demographic information on data quality in the US Cerner Real-World Data (CRWD).

METHODS: Data were examined for all patients in the US CRWD, a cloud-based, de-identified, and Health Insurance Portability and Accountability Act-compliant dataset (extract 7/2022). Demographic data for age, gender, race, ethnicity, state of residence, marital status, and spoken language were examined. Patient data were considered informative if they satisfied acceptable values or coding standards. Uninformative data included null values, responses of refused/declined to answer, or uninterpretable data. Descriptive statistics were used to examine all CRWD patients (100+ million) and those with encounters in the past five years (50+ million) with multiple responses. This study received IRB exemption status.

RESULTS: Approximately one in five patients had ≥1 multiple response demographic variables (all=18%; past five years=23%). Of those with multiple responses, more patients had ≥2 demographic variable multiples in the past 5 years (46%) than overall (39%), however, fewer uninformative responses were observed for those with encounters in the past five years than overall (relative decrease range=10-70%). Multiple responses for year of birth, gender, ethnicity, and state predominantly comprised a single informative value (>93%) compared to race (74%), marital status (37%), or language (36%) (all data). Following removal of uninformative data, multiple discrepant responses represented <2% (all) or <3% (past five years) of a patient’s demographic data.

CONCLUSIONS: Alongside improvements in data collection and quality over time, attention to data cleaning and purposeful data management are necessary for creating fit-for-purpose EHR data.

Code

RWD38

Topic

Real World Data & Information Systems, Study Approaches

Topic Subcategory

Data Protection, Integrity, & Quality Assurance, Electronic Medical & Health Records

Disease

No Additional Disease & Conditions/Specialized Treatment Areas