VIRTUAL POOLING IS ACCURATE AND LIGHTWEIGHT FOR MULTI-INSTITUTION CAUSAL INFERENCE WITHOUT CENTRALIZED DATA SHARING
Author(s)
Ishtiyaque Ahmad, PhD1, Aryan Ayati, MD, MPH2, Kunlong Liu, PhD1, Stella Ko, PharmD, MS3, Nicole Bonine, PhD, MPH3, David Tabano, MA, PhD3, Nina Malik, PharmD3, Tianchu Lyu, MPH, PhD, MBBS4, Kai Zheng, PhD4, Vivek Rudrapatna, MD, PhD2, Trinabh Gupta, PhD1;
1DataUnite, Cupertino, CA, USA, 2UCSF, San Francisco, CA, USA, 3Genentech, South San Francisco, CA, USA, 4UC Irvine, Irvine, CA, USA
1DataUnite, Cupertino, CA, USA, 2UCSF, San Francisco, CA, USA, 3Genentech, South San Francisco, CA, USA, 4UC Irvine, Irvine, CA, USA
OBJECTIVES: Multicenter retrospective studies often require pooling of patient-level data, creating significant regulatory and operational barriers. Federated analytics offers a privacy-preserving alternative, but evidence of real-world deployability and fidelity for causal inference remains limited. We evaluated whether virtual pooling (VP), a novel federated analysis system, can be deployed across institutions to reproduce a published causal inference study without centralizing data.
METHODS: We extended VP to support core components of EHR-based retrospective clinical studies, including data harmonization, feature engineering, imputation, propensity score estimation, patient matching, and model estimation. Using this enhanced system, we replicated a recently published study on diabetic eye disease screening practices at UCSF and UC Irvine (N=8,240). Descriptive statistics and causal estimates generated via VP were compared with those from the original centralized analysis.
RESULTS: VP was deployed at UCSF and UCI without infrastructure modifications or new or non-standard governance agreements, with site-specific security approvals completed within 32 days. Descriptive statistics across all 30 baseline covariates were numerically identical between VP and the original study. Univariate analyses likewise reproduced the original effect sizes across all covariates; in both VP and the original study, prior eye clinic referral within the past year (OR = 56.7; 95% CI: 42.1-76.4) and history of eye disease (OR = 6.4; 95% CI: 5.6-7.4) were the strongest predictors. Causal inference analyses estimating the effect of an automated referral system on screening adherence also matched, with screening rates increasing from 21% to 36% at UCSF and from 13% to 34% at UCI.
CONCLUSIONS: VP is an accurate, feasible, and secure platform for multicenter clinical research without requiring patient-level data sharing. VP's successful deployment and our findings validate its practical potential to expand real-world evidence generation to diverse healthcare systems, when data sharing is time-consuming, administratively burdensome, or restricted.
METHODS: We extended VP to support core components of EHR-based retrospective clinical studies, including data harmonization, feature engineering, imputation, propensity score estimation, patient matching, and model estimation. Using this enhanced system, we replicated a recently published study on diabetic eye disease screening practices at UCSF and UC Irvine (N=8,240). Descriptive statistics and causal estimates generated via VP were compared with those from the original centralized analysis.
RESULTS: VP was deployed at UCSF and UCI without infrastructure modifications or new or non-standard governance agreements, with site-specific security approvals completed within 32 days. Descriptive statistics across all 30 baseline covariates were numerically identical between VP and the original study. Univariate analyses likewise reproduced the original effect sizes across all covariates; in both VP and the original study, prior eye clinic referral within the past year (OR = 56.7; 95% CI: 42.1-76.4) and history of eye disease (OR = 6.4; 95% CI: 5.6-7.4) were the strongest predictors. Causal inference analyses estimating the effect of an automated referral system on screening adherence also matched, with screening rates increasing from 21% to 36% at UCSF and from 13% to 34% at UCI.
CONCLUSIONS: VP is an accurate, feasible, and secure platform for multicenter clinical research without requiring patient-level data sharing. VP's successful deployment and our findings validate its practical potential to expand real-world evidence generation to diverse healthcare systems, when data sharing is time-consuming, administratively burdensome, or restricted.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
P22
Topic
Real World Data & Information Systems
Topic Subcategory
Data Protection, Integrity, & Quality Assurance, Distributed Data & Research Networks
Disease
SDC: Diabetes/Endocrine/Metabolic Disorders (including obesity), SDC: Sensory System Disorders (Ear, Eye, Dental, Skin)