Validation of Race in Cerner Real-World Data: Understanding Missing and Uninterpretable Data
Author(s)
Jaffe D1, Sridhar S2, Balkaran BL3, Berliner E2
1Cerner Enviza, Jerusalem, Israel, 2Cerner Enviza, Kansas City, MO, USA, 3Cerner Enviza, Malvern, PA, USA
OBJECTIVES: Information on race is often missing or incorrect in Electronic Health Records (EHR) potentially rendering study findings uninterpretable. This following seeks to characterize patients with unknown race.
METHODS: The 2021 US Cerner Real-World Data (CRWD), a cloud-based, de-identified, and Health Insurance Portability and Accountability Act-compliant dataset was examined. In the CRWD, race is mapped to a standardized set using the input and codes particular to each health system. Included were patients with ≥1 encounter in 2021 classified only as ‘unknown’ race (11.1%). Excluded from this group were patients with an invalid age (1.3%). Patient, health system, and encounter characteristics were assessed. Unknown race was grouped as: not asked/unknown (NA/UNK), refused/declined to answer (REF/DEC), and ethnicity reported (ETH). Chi-square tests and pairwise comparisons tested group differences.
RESULTS: In 2021, 2,636,457 patients were identified as ‘unknown’ race, with NA/UNK=79.0%, REF/DEC=11.8%, and ETH=9.2%. By age groups, children (<18 years) compared to adults (18-64 years) or older adults (≥65 years) were more likely to report REF/DEC (14.9% versus 10.5% versus 10.6%; p<0.001). Race as ETH differed by age group at a decreasing frequency (<18 years=12.7%, 18-64 years=8.6%, ≥65 years=4.5%; p<0.001). Minimal but statistically significant differences in unknown-type reporting were observed by gender (p<0.001), however, more substantial differences were noted for zip code zone and health system (p<0.001). For example, patients residing in zip code zone 4 were more likely than those in other zones to report REF/DEC (39.3% versus 12.8%; p<0.001). Notably, data were unknown for zip code (22.9%) and gender (1.4%). Median annual number of encounters were NA/UNK=3, REF/DEC=7, and ETH=9.
CONCLUSIONS: Missing and uninterpretable data, often quantified using the unknown category, can inform on poor data quality and bias. This study distinguishes between subgroups of patients with unknown race offering future opportunities for improving data quality, identifying at-risk groups, and developing health equity-related policies.
Conference/Value in Health Info
Value in Health, Volume 25, Issue 12S (December 2022)
Code
RWD46
Topic
Health Policy & Regulatory, Methodological & Statistical Research
Topic Subcategory
Health Disparities & Equity, Missing Data
Disease
No Additional Disease & Conditions/Specialized Treatment Areas