Validation of Race in Cerner Real-World Data: Understanding Missing and Uninterpretable Data

Author(s)

Jaffe D1, Sridhar S2, Balkaran BL3, Berliner E2
1Cerner Enviza, Jerusalem, Israel, 2Cerner Enviza, Kansas City, MO, USA, 3Cerner Enviza, Malvern, PA, USA

OBJECTIVES: Information on race is often missing or incorrect in Electronic Health Records (EHR) potentially rendering study findings uninterpretable. This following seeks to characterize patients with unknown race.

METHODS: The 2021 US Cerner Real-World Data (CRWD), a cloud-based, de-identified, and Health Insurance Portability and Accountability Act-compliant dataset was examined. In the CRWD, race is mapped to a standardized set using the input and codes particular to each health system. Included were patients with ≥1 encounter in 2021 classified only as ‘unknown’ race (11.1%). Excluded from this group were patients with an invalid age (1.3%). Patient, health system, and encounter characteristics were assessed. Unknown race was grouped as: not asked/unknown (NA/UNK), refused/declined to answer (REF/DEC), and ethnicity reported (ETH). Chi-square tests and pairwise comparisons tested group differences.

RESULTS: In 2021, 2,636,457 patients were identified as ‘unknown’ race, with NA/UNK=79.0%, REF/DEC=11.8%, and ETH=9.2%. By age groups, children (<18 years) compared to adults (18-64 years) or older adults (≥65 years) were more likely to report REF/DEC (14.9% versus 10.5% versus 10.6%; p<0.001). Race as ETH differed by age group at a decreasing frequency (<18 years=12.7%, 18-64 years=8.6%, ≥65 years=4.5%; p<0.001). Minimal but statistically significant differences in unknown-type reporting were observed by gender (p<0.001), however, more substantial differences were noted for zip code zone and health system (p<0.001). For example, patients residing in zip code zone 4 were more likely than those in other zones to report REF/DEC (39.3% versus 12.8%; p<0.001). Notably, data were unknown for zip code (22.9%) and gender (1.4%). Median annual number of encounters were NA/UNK=3, REF/DEC=7, and ETH=9.

CONCLUSIONS: Missing and uninterpretable data, often quantified using the unknown category, can inform on poor data quality and bias. This study distinguishes between subgroups of patients with unknown race offering future opportunities for improving data quality, identifying at-risk groups, and developing health equity-related policies.

Conference/Value in Health Info

2022-11, ISPOR Europe 2022, Vienna, Austria

Value in Health, Volume 25, Issue 12S (December 2022)

Code

RWD46

Topic

Health Policy & Regulatory, Methodological & Statistical Research

Topic Subcategory

Health Disparities & Equity, Missing Data

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×