DARAH Project: Deploying a Datashield Federated Network in Real-World Healthcare Environments
Author(s)
Camille Bachot, MSc1, François Margraff, Master2, Thierry Chanet, MSc3, Maëlle Baillet, MSc3, Olivier Girardot, MSc3.
1Medical Data Platform Specialist, ROCHE, Boulogne Billancourt Cedex, France, 2Roche, Boulogne Billancourt Cedex, France, 3Arkhn, Paris, France.
1Medical Data Platform Specialist, ROCHE, Boulogne Billancourt Cedex, France, 2Roche, Boulogne Billancourt Cedex, France, 3Arkhn, Paris, France.
OBJECTIVES: Federated analytics allows scientists to perform statistical analysis without direct access to the raw data from each site. Despite some pilotes and proof of concepts, federated analytics is still not widely used on real-world data, and to our knowledge, no real-world study has yet combined it with other privacy-enhancing techniques such as differential privacy. The first objective of this study was to deploy a federated network of hospitals in a real-world setting. The oncology study used for this deployment compared the medical healthcare management of patients with metastatic non-small cell lung cancer before and during/after the 1st wave of COVID-19. The second goal was to test differential privacy in this real-world scenario to assess its practicality and utility as a privacy enhancing technology.
METHODS: A federated architecture platform was set up in 3 french hospitals. After harmonization of the data in each center, statistical analyses were performed using DataSHIELD, a federated analysis R library and a new open source differential privacy DataSHIELD package was implemented: dsPrivacy.149 patients were enrolled and 7 variables was collected from chemotherapy management software CHIMIO.
RESULTS: We have shown that DataSHIELD is a practical tool to efficiently conduct our study across all 3 centers without exposing data on a central node, once sufficient setup has been made to configure a secure network between hospitals. All planned aggregated results were successfully generated. We also observed that differential privacy can be implemented in practice with promising trade-offs between privacy and accuracy, and we built a library that will prove useful for future work.
CONCLUSIONS: DARAH project illustrates that federated analytics is operative to conduct real world data projects while improving data privacy, by keeping patient data stored in the hospitals and leveraging their already existing data architecture. It highlights some key challenges as specific data preparation and data structuration.
METHODS: A federated architecture platform was set up in 3 french hospitals. After harmonization of the data in each center, statistical analyses were performed using DataSHIELD, a federated analysis R library and a new open source differential privacy DataSHIELD package was implemented: dsPrivacy.149 patients were enrolled and 7 variables was collected from chemotherapy management software CHIMIO.
RESULTS: We have shown that DataSHIELD is a practical tool to efficiently conduct our study across all 3 centers without exposing data on a central node, once sufficient setup has been made to configure a secure network between hospitals. All planned aggregated results were successfully generated. We also observed that differential privacy can be implemented in practice with promising trade-offs between privacy and accuracy, and we built a library that will prove useful for future work.
CONCLUSIONS: DARAH project illustrates that federated analytics is operative to conduct real world data projects while improving data privacy, by keeping patient data stored in the hospitals and leveraging their already existing data architecture. It highlights some key challenges as specific data preparation and data structuration.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
RWD47
Topic
Real World Data & Information Systems
Topic Subcategory
Distributed Data & Research Networks
Disease
Oncology