Empirical Comparison of Four Approaches for Generating Confidence Intervals Around Weighted Trial-Level Correlation
Author(s)
Paul Serafini, BA, Victoria Wan, MSc, Mir Sohail Fazeli, PhD, MD, Murat Kurt, BS, MS, PhD.
Evidinno Outcomes Research Inc, Vancouver, BC, Canada.
Evidinno Outcomes Research Inc, Vancouver, BC, Canada.
OBJECTIVES: Trial-level surrogacy is an integral step in surrogate endpoint validation in which the association between endpoints’ treatment effects across studies is quantified, often by the weighted Pearson’s correlation and its confidence interval (CI). The appropriate choice of sample size in the standard error (SE) calculation for a weighted correlation is unclear; however, bootstrapping can generate a CI without the SE. This study compared the performance of four approaches for generating CIs around the weighted correlation.
METHODS: Treatment effects on a hypothetical endpoint and its surrogate were repeatedly simulated from a hierarchical bivariate normal model for experiments with different numbers of studies (10-80) and within- and between-study correlation parameters (0.0-0.9). In each iteration, 95% CIs around the inverse-variance weighted correlation were computed using bootstrapping and three different approaches for sample size in the SE formula: Number of studies [the naïve method] and two distinct definitions of effective sample size (ESS) [Kish’s and Hill’s]. Coverage rate—the fraction of replications in which the 95% CI captured the true correlation—was compared to the expected 95% rate for each method.
RESULTS: Across all experiments, the median coverage rate using Hill’s ESS was 95.0% (best method [coverage rate closest to 95%] in 78.1% of experiments), using Kish’s ESS was 97.0% (best method in 17.2% of experiments), using the naïve method was 92.4% (best method in 4.7% of experiments), and using bootstrapping was 91.9% (never the best method). Applied to the evidence base of a published trial-level surrogacy analysis between progression-free survival and overall survival in lung cancer (Ostoros et al., 2023), Hill’s ESS yielded a 95% CI of 0.268-0.983, compared to the reported bootstrapped 95% CI of 0.809-0.992.
CONCLUSIONS: Bootstrapping may produce excessively narrow CIs around the weighted correlation. Using ESS for sample size in the SE calculation may produce more accurate CIs.
METHODS: Treatment effects on a hypothetical endpoint and its surrogate were repeatedly simulated from a hierarchical bivariate normal model for experiments with different numbers of studies (10-80) and within- and between-study correlation parameters (0.0-0.9). In each iteration, 95% CIs around the inverse-variance weighted correlation were computed using bootstrapping and three different approaches for sample size in the SE formula: Number of studies [the naïve method] and two distinct definitions of effective sample size (ESS) [Kish’s and Hill’s]. Coverage rate—the fraction of replications in which the 95% CI captured the true correlation—was compared to the expected 95% rate for each method.
RESULTS: Across all experiments, the median coverage rate using Hill’s ESS was 95.0% (best method [coverage rate closest to 95%] in 78.1% of experiments), using Kish’s ESS was 97.0% (best method in 17.2% of experiments), using the naïve method was 92.4% (best method in 4.7% of experiments), and using bootstrapping was 91.9% (never the best method). Applied to the evidence base of a published trial-level surrogacy analysis between progression-free survival and overall survival in lung cancer (Ostoros et al., 2023), Hill’s ESS yielded a 95% CI of 0.268-0.983, compared to the reported bootstrapped 95% CI of 0.809-0.992.
CONCLUSIONS: Bootstrapping may produce excessively narrow CIs around the weighted correlation. Using ESS for sample size in the SE calculation may produce more accurate CIs.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
MSR84
Topic
Methodological & Statistical Research
Disease
Oncology