HOW REPRESENTATIVE ARE LICENSABLE DATASETS COMPARED TO THEIR SOURCE POPULATION
Author(s)
Amita G. Ketkar, MS, Other, Lauren E. Parlett, PhD, Judith J. Stephenson, BS, MS, Michael Grabner, PhD, Katherine Marie Harris, PhD, Vincent J. Willey, PharmD.
Carelon Research, Wilmington, DE, USA.
Carelon Research, Wilmington, DE, USA.
OBJECTIVES: As data accessibility grows, stakeholders seek to understand generalizability of these data to their source population. Carelon Real World Data (CRWD) is a licensable subset of the Healthcare Integrated Research Database (HIRD®), a large US database composed of individuals with private insurance or Medicare Advantage coverage. While prior work assessed HIRD representativeness compared to the US Census population, the representativeness of CRWD has not been analyzed. The current study addresses this gap by evaluating the representativeness of CRWD relative to the HIRD.
METHODS: This analysis compared individuals in the HIRD in 2024 with those in CRWD based on age, sex, race/ethnicity, region, insurance and plan types, neighborhood socioeconomic status (SES), clinical characteristics like Quan-Charlson Comorbidity Index (QCI), and documentation of EHR and lab results. Healthcare utilization and costs in different settings were also compared. We compared variables’ probability distributions using (1) the overlap index (η), where 0% means no overlap and 100% means complete overlap, and (2) standardized mean differences (SMD), where an SMD less than 0.2 suggests similar means.
RESULTS: HIRD (N=13,970,339) and a CRWD subset (N=7,000,545) were compared. Sex showed nearly complete overlap(η=99.7%), while age (SMD = 0.06), age-groups (η=94.9%), region (η=92.9%), race and ethnicity (η=90.0%), SES (η=99.5%), and QCI (η=97.5%) had high level of overlap. Overall, clinical characteristics had high overlap index as well. HCRU and cost SMDs were below 0.2 threshold except for mean number of pharmacy claims (SMD=0.26).
CONCLUSIONS: We found the 2024 HIRD and CRWD to be very similar across all characteristics compared, suggesting the CRWD is representative of the overall HIRD population. As the availability of licensable datasets continues to expand, performing analyses such as these are critical when interpreting and applying the results of studies executed within these datasets.
METHODS: This analysis compared individuals in the HIRD in 2024 with those in CRWD based on age, sex, race/ethnicity, region, insurance and plan types, neighborhood socioeconomic status (SES), clinical characteristics like Quan-Charlson Comorbidity Index (QCI), and documentation of EHR and lab results. Healthcare utilization and costs in different settings were also compared. We compared variables’ probability distributions using (1) the overlap index (η), where 0% means no overlap and 100% means complete overlap, and (2) standardized mean differences (SMD), where an SMD less than 0.2 suggests similar means.
RESULTS: HIRD (N=13,970,339) and a CRWD subset (N=7,000,545) were compared. Sex showed nearly complete overlap(η=99.7%), while age (SMD = 0.06), age-groups (η=94.9%), region (η=92.9%), race and ethnicity (η=90.0%), SES (η=99.5%), and QCI (η=97.5%) had high level of overlap. Overall, clinical characteristics had high overlap index as well. HCRU and cost SMDs were below 0.2 threshold except for mean number of pharmacy claims (SMD=0.26).
CONCLUSIONS: We found the 2024 HIRD and CRWD to be very similar across all characteristics compared, suggesting the CRWD is representative of the overall HIRD population. As the availability of licensable datasets continues to expand, performing analyses such as these are critical when interpreting and applying the results of studies executed within these datasets.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
RWD134
Topic
Real World Data & Information Systems
Topic Subcategory
Reproducibility & Replicability
Disease
No Additional Disease & Conditions/Specialized Treatment Areas