Does Federated Analytics Preserve Statistical and Scientific Value of Real-World Data?

Speaker(s)

Pau D1, Diabate DI2, Kaczmarek L3, Chmiel J4, Jegou R5, Boucher M5, Monteil C6, Bachot C7
1Roche, Boulogne-Billancourt, France, 2Roche, Montpellier, France, 3F. Hoffmann-La Roche AG, Kaiseraugst, Switzerland, 4Avenga, Krakow, Poland, 5Keyrus Life Science, Nantes, France, 6Roche, Boulogne Billancourt, France, 7Roche, Boulogne Billancourt Cedex, 92, France

Presentation Documents

OBJECTIVES:

Medical service providers are not allowed to access each other's data, copy or store it in a common data infrastructure. The sum of this data is a very valuable asset, which may benefit all partners involved. Federated Analytics (FA) allows to generate aggregated results from separate data sources without data transfer and is respectful of data privacy and property of each data source.

The objective of this project is to evaluate if FA methods allow to preserve the statistical and scientific value of data.

METHODS:

This research is performed from longitudinal real world Kador study in early breast cancer that has been randomly splitted in 3 datasets (nodes), stored in a data platform which integrates DataSHIELD (data processing open source software). No individual data is leaving nodes. Only results of analysis (aggregates) can be provided by the node.

Results of the 3 nodes will be generated using DataSHIELD statistical functions, and aggregation of results from the 3 nodes will be performed using DataSHIELD aggregation functions or specific statistical methods (i.e. meta-analysis).

Descriptive statistics, correlation matrix, regression models and survival analysis will be performed first on the raw Kador study data, and same analysis will be reproduced on each node and then aggregated whenever possible. Results will be compared between raw and aggregated data.

RESULTS:

315 patients were included in the KADOR study, mean age was 52.18 (SD:12.64), 92.8% of patients had an invasive ductal carcinoma, 95.7% of patients had a SBR grade II or III. 151 patients (47.9%) had a pathological Complete Response at surgery.

KADOR study was splitted in 3 samples (N1=157 patients, N2=94 and N3=64), analyses are currently ongoing to reproduce the analysis using DataSHIELD on each node and results will be presented at the conference.

CONCLUSIONS:

This project will evaluate if scientific value and statistical results are maintained with FA.

Code

MSR119

Topic

Real World Data & Information Systems

Topic Subcategory

Distributed Data & Research Networks

Disease

No Additional Disease & Conditions/Specialized Treatment Areas