WHEN ONE CODE DEFINES MILLIONS: SENSITIVITY OF PERIPHERAL ARTERY DISEASE COHORTS TO OPERATIONAL DEFINITIONS CHOICES IN REAL-WORLD DATA

Author(s)

Scott L. DuVall, PhD1, Jared H. Kamauu, BA2, Aimee Harrison, MFA3, Michael Buck, PhD3, Craig G Parker, MD, MS4, Allise G Kamauu, MS5, Aaron Kamauu, MPH, MS, MD6;
1PurpleLab Healthcare Analytics, Senior Vice President, Real-World Evidence, Taylorsville, UT, USA, 2Navidence Inc, Lehi, UT, USA, 3Navidence, Aurora, CO, USA, 4Navidence, Sandy, UT, USA, 5Navidence, Salt Lake City, UT, USA, 6Navidence, Inc., Bountiful, UT, USA
OBJECTIVES: Composite phenotypes such as peripheral artery disease (PAD) are commonly defined in real-world research using diagnosis code lists derived from clinical guidelines or prior studies. While variability in these operational definitions is recognized, the population-level impact of individual codes is rarely quantified. This study assessed the sensitivity of PAD cohort size to operational definition choices, with corroborating analyses using other phenotypes.
METHODS: Multiple published real-world PAD cohort definitions with explicit ICD-10-CM code lists were identified from the literature. Each definition was replicated using PurpleLab® CLEAR Claims. Cohort sizes were compared across definitions, with overlap assessed at the patient and code levels. Code-level contributions to cohort inclusion were examined to identify high-impact diagnosis codes. Parallel sensitivity assessments were conducted for MI, stroke, and TIA to evaluate generalizability.
RESULTS: Five PAD cohorts were constructed using definitions ranging from 52 to 351 diagnosis codes. Three large code lists (324-351 codes) identified 6.21-6.91 million patients, while two smaller lists (52-57 codes) identified 5.15 and 5.76 million patients, respectively. Despite minimal overlap between the two smaller lists (60 shared codes), cohort sizes were comparable to those derived from substantially larger definitions. The three larger lists included approximately 300 codes absent from the smaller definitions, collectively contributing minimal incremental patients, while a single diagnosis code (I73.9 peripheral vascular disease, unspecified) absent from all three larger lists accounted for approximately 1.88 million patients. Similar sensitivity to individual diagnosis codes was observed in all cohorts.
CONCLUSIONS: Operational definition choices for PAD can result in multi-million-patient differences in cohort size driven by a small number of high-impact diagnosis codes rather than overall code list size. These findings underscore the importance of data-informed phenotype design aligned with study intent, as inclusion or exclusion of specific codes may substantially alter cohort size and underlying patient populations.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

RWD1

Topic

Real World Data & Information Systems

Topic Subcategory

Data Protection, Integrity, & Quality Assurance, Reproducibility & Replicability

Disease

SDC: Cardiovascular Disorders (including MI, Stroke, Circulatory)

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×