VIRTUAL POOLING IS ACCURATE AND LIGHTWEIGHT FOR MULTI-INSTITUTION CAUSAL INFERENCE WITHOUT CENTRALIZED DATA SHARING

Author(s)

Ishtiyaque Ahmad, PhD¹, Aryan Ayati, MD, MPH², Kunlong Liu, PhD¹, Stella Ko, PharmD, MS³, Nicole Bonine, PhD, MPH³, David Tabano, MA, PhD³, Nina Malik, PharmD³, Tianchu Lyu, MPH, PhD, MBBS⁴, Kai Zheng, PhD⁴, Vivek Rudrapatna, MD, PhD², Trinabh Gupta, PhD¹;
¹DataUnite, Cupertino, CA, USA, ²UCSF, San Francisco, CA, USA, ³Genentech, South San Francisco, CA, USA, ⁴UC Irvine, Irvine, CA, USA

OBJECTIVES: Multicenter retrospective studies often require pooling of patient-level data, creating significant regulatory and operational barriers. Federated analytics offers a privacy-preserving alternative, but evidence of real-world deployability and fidelity for causal inference remains limited. We evaluated whether virtual pooling (VP), a novel federated analysis system, can be deployed across institutions to reproduce a published causal inference study without centralizing data.
METHODS: We extended VP to support core components of EHR-based retrospective clinical studies, including data harmonization, feature engineering, imputation, propensity score estimation, patient matching, and model estimation. Using this enhanced system, we replicated a recently published study on diabetic eye disease screening practices at UCSF and UC Irvine (N=8,240). Descriptive statistics and causal estimates generated via VP were compared with those from the original centralized analysis.
RESULTS: VP was deployed at UCSF and UCI without infrastructure modifications or new or non-standard governance agreements, with site-specific security approvals completed within 32 days. Descriptive statistics across all 30 baseline covariates were numerically identical between VP and the original study. Univariate analyses likewise reproduced the original effect sizes across all covariates; in both VP and the original study, prior eye clinic referral within the past year (OR = 56.7; 95% CI: 42.1-76.4) and history of eye disease (OR = 6.4; 95% CI: 5.6-7.4) were the strongest predictors. Causal inference analyses estimating the effect of an automated referral system on screening adherence also matched, with screening rates increasing from 21% to 36% at UCSF and from 13% to 34% at UCI.
CONCLUSIONS: VP is an accurate, feasible, and secure platform for multicenter clinical research without requiring patient-level data sharing. VP's successful deployment and our findings validate its practical potential to expand real-world evidence generation to diverse healthcare systems, when data sharing is time-consuming, administratively burdensome, or restricted.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

P22

Topic

Real World Data & Information Systems

Topic Subcategory

Data Protection, Integrity, & Quality Assurance, Distributed Data & Research Networks

Disease

SDC: Diabetes/Endocrine/Metabolic Disorders (including obesity), SDC: Sensory System Disorders (Ear, Eye, Dental, Skin)

Presentation (CTI)