IMPACT OF VARIANCE IN COHORT PHENOTYPE DEFINITIONS ON REAL-WORLD RESEARCH: AN ASSESSMENT OF ATHEROSCLEROTIC CARDIOVASCULAR DISEASE (ASCVD) DEFINITIONS ON REAL-WORLD DATA

Author(s)

Aaron Kamauu, MD, MS, MPH1, Jared H. Kamauu, BA2, Aimee Harrison, MFA3, Michael Buck, PhD4, Craig G Parker, MD, MS5, Allise Kamauu, MS1, Scott L. DuVall, PhD6;
1Navidence Inc., Bountiful, UT, USA, 2Navidence Inc., Lehi, UT, USA, 3Navidence Inc., Aurora, CO, USA, 4Navidence Inc., Highland, UT, USA, 5Navidence Inc., Sandy, UT, USA, 6PurpleLab Healthcare Analytics, Taylorsville, UT, USA
OBJECTIVES: Multiple composite medical conditions are often combined to define a complex clinical concept (such as Atherosclerotic Cardiovascular Disease (ASCVD)), which are used in cohort phenotypes in real-world research (RWR). However, even small differences in conceptual and computable operational definitions (CODefs) “may have a large impact on study results.” [FDA 2023] We assessed the impact of definition variability on cohort phenotypes in real-world data (RWD).
METHODS: We identified RWR ASCVD cohort definitions from a targeted literature review, where specific diagnosis code lists were published. Each cohort was precisely replicated (matching code lists exactly) using PurpleLab® CLEAR Claims. The defined cohorts (overall) were compared to each other to assess overlap of patient coverage and significant differences that would have an impact on study outcomes and results.
RESULTS: Five ASCVD cohorts were identified and replicated; cohort CODefs ranging from 202-637 distinct ICD-10-CM codes covering 3-13 composite medical conditions. Peripheral artery disease (PAD), ischemic stroke and myocardial infarction (MI) were common across all definitions, followed by sub-conditions of angina and aortic aneurysm (4/5). Cohort patient counts ranged from 9,179,200 to 17,632,632. 7,145,904 patients qualified for all cohort definitions; the next largest grouped subset (2,651,823 patients) qualified for 3 cohort definitions.
CONCLUSIONS: Variance of CODefs for complex phenotypes, whether from differences in either composite conditions (algorithms) or code lists, poses the potential for compounding errors. Despite the largest cohort having >3x codes (637 compared to 202), it resulted in <2x patient counts (~17.6M vs. ~9.1M). This demonstrates that the difference in patient count is not directly correlated to the number of codes, but rather that the impact is largely due to which codes are included. This assessment underlines the importance of data-driven determination for CODef selection for research cohorts, given the potential significant impact of definitions on cohort creation in RWD and downstream health outcomes/endpoints.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

PT7

Topic

Methodological & Statistical Research

Topic Subcategory

Confounding, Selection Bias Correction, Causal Inference

Disease

SDC: Cardiovascular Disorders (including MI, Stroke, Circulatory)

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×