IMPACT OF VARIANCE IN COHORT PHENOTYPE DEFINITIONS ON REAL-WORLD RESEARCH: AN ASSESSMENT OF ATHEROSCLEROTIC CARDIOVASCULAR DISEASE (ASCVD) DEFINITIONS ON REAL-WORLD DATA
Author(s)
Aaron Kamauu, MD, MS, MPH1, Jared H. Kamauu, BA2, Aimee Harrison, MFA3, Michael Buck, PhD4, Craig G Parker, MD, MS5, Allise Kamauu, MS1, Scott L. DuVall, PhD6;
1Navidence Inc., Bountiful, UT, USA, 2Navidence Inc., Lehi, UT, USA, 3Navidence Inc., Aurora, CO, USA, 4Navidence Inc., Highland, UT, USA, 5Navidence Inc., Sandy, UT, USA, 6PurpleLab Healthcare Analytics, Taylorsville, UT, USA
1Navidence Inc., Bountiful, UT, USA, 2Navidence Inc., Lehi, UT, USA, 3Navidence Inc., Aurora, CO, USA, 4Navidence Inc., Highland, UT, USA, 5Navidence Inc., Sandy, UT, USA, 6PurpleLab Healthcare Analytics, Taylorsville, UT, USA
OBJECTIVES: Multiple composite medical conditions are often combined to define a complex clinical concept (such as Atherosclerotic Cardiovascular Disease (ASCVD)), which are used in cohort phenotypes in real-world research (RWR). However, even small differences in conceptual and computable operational definitions (CODefs) “may have a large impact on study results.” [FDA 2023] We assessed the impact of definition variability on cohort phenotypes in real-world data (RWD).
METHODS: We identified RWR ASCVD cohort definitions from a targeted literature review, where specific diagnosis code lists were published. Each cohort was precisely replicated (matching code lists exactly) using PurpleLab® CLEAR Claims. The defined cohorts (overall) were compared to each other to assess overlap of patient coverage and significant differences that would have an impact on study outcomes and results.
RESULTS: Five ASCVD cohorts were identified and replicated; cohort CODefs ranging from 202-637 distinct ICD-10-CM codes covering 3-13 composite medical conditions. Peripheral artery disease (PAD), ischemic stroke and myocardial infarction (MI) were common across all definitions, followed by sub-conditions of angina and aortic aneurysm (4/5). Cohort patient counts ranged from 9,179,200 to 17,632,632. 7,145,904 patients qualified for all cohort definitions; the next largest grouped subset (2,651,823 patients) qualified for 3 cohort definitions.
CONCLUSIONS: Variance of CODefs for complex phenotypes, whether from differences in either composite conditions (algorithms) or code lists, poses the potential for compounding errors. Despite the largest cohort having >3x codes (637 compared to 202), it resulted in <2x patient counts (~17.6M vs. ~9.1M). This demonstrates that the difference in patient count is not directly correlated to the number of codes, but rather that the impact is largely due to which codes are included. This assessment underlines the importance of data-driven determination for CODef selection for research cohorts, given the potential significant impact of definitions on cohort creation in RWD and downstream health outcomes/endpoints.
METHODS: We identified RWR ASCVD cohort definitions from a targeted literature review, where specific diagnosis code lists were published. Each cohort was precisely replicated (matching code lists exactly) using PurpleLab® CLEAR Claims. The defined cohorts (overall) were compared to each other to assess overlap of patient coverage and significant differences that would have an impact on study outcomes and results.
RESULTS: Five ASCVD cohorts were identified and replicated; cohort CODefs ranging from 202-637 distinct ICD-10-CM codes covering 3-13 composite medical conditions. Peripheral artery disease (PAD), ischemic stroke and myocardial infarction (MI) were common across all definitions, followed by sub-conditions of angina and aortic aneurysm (4/5). Cohort patient counts ranged from 9,179,200 to 17,632,632. 7,145,904 patients qualified for all cohort definitions; the next largest grouped subset (2,651,823 patients) qualified for 3 cohort definitions.
CONCLUSIONS: Variance of CODefs for complex phenotypes, whether from differences in either composite conditions (algorithms) or code lists, poses the potential for compounding errors. Despite the largest cohort having >3x codes (637 compared to 202), it resulted in <2x patient counts (~17.6M vs. ~9.1M). This demonstrates that the difference in patient count is not directly correlated to the number of codes, but rather that the impact is largely due to which codes are included. This assessment underlines the importance of data-driven determination for CODef selection for research cohorts, given the potential significant impact of definitions on cohort creation in RWD and downstream health outcomes/endpoints.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
PT7
Topic
Methodological & Statistical Research
Topic Subcategory
Confounding, Selection Bias Correction, Causal Inference
Disease
SDC: Cardiovascular Disorders (including MI, Stroke, Circulatory)