CURATING FIT-FOR-PURPOSE GEOGRAPHIC REAL-WORLD DATA

Author(s)

Dena H. Jaffe, PhD1, Amy Price, PhD2;
1Oracle Health, Petah Tikva, Israel, 2Oracle Health, Kansas City, MO, USA
OBJECTIVES: Greater geospatial granularity in real-world data (RWD) is essential for understanding local variation in healthcare quality, outcomes, and equity. However, privacy regulations often necessitate reducing geographic precision—for example, by aggregating postal (ZIP) codes to 3-digit prefixes (ZIP3). This process can, however, introduce information bias and obscure meaningful differences. This study examines issues related to curating de-identified RWD with respect to geographic locations.
METHODS: A comparative analysis of geographies using USPS ZIP codes (3- and 5-digit; ZIP3, ZIP5) and U.S. Census Bureau data, including 5-digit ZIP Code Tabulation Areas [ZCTAs] and population estimates for the 2010 and 2020 censuses, was performed.
RESULTS: We describe curation considerations for geographic identifiers across two domains: privacy and data quality. Privacy issues include the suppression of ZIP3s due to populations <20,000 (representing 1.3% [n=13/894] and 2.0% [n=18/894] of ZCTA areas in the 2010 and 2020 censuses, respectively). Certain ZIP3s also function as quasi-identifiers, such as USPS-designated military and diplomatic locations (e.g., 090-099) and federal government areas (e.g., 202-205), which typically have small populations. Considerations for data quality curation for ZIP3s include cleaning or flagging non-residential areas designated for post office boxes (e.g., 311, 772) or single organizations (e.g., 842, 938), and identifying invalid or discontinued ZIP3 codes (e.g., 213, 517-519). Finally, to improve geographic accuracy, USPS ZIP5 codes should be cross walked to ZCTAs when feasible, given that approximately 17.7% of ZIP5 codes are misaligned with ZCTAs.
CONCLUSIONS: Enhancing transparency in curating geographic data to create fit-for-purpose RWD is critical to ensure the validity and relevance of real-world evidence for decision making and policy. Researchers must understand the limitations of USPS and Census geographies, document geographic transformations, and account for potential sources of bias.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

RWD135

Topic

Real World Data & Information Systems

Topic Subcategory

Data Protection, Integrity, & Quality Assurance, Reproducibility & Replicability

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×