On the Validity of Statistical Analyses with Privacy-Preserving Synthetic Data

September 29, 2022

Back to all webinars

Open to all ISPOR Members and Non-members

Title: On the Validity of Statistical Analyses with Privacy-Preserving Synthetic Data

Thursday, September 29, 2022
11:00AM EDT | 3:00PM UTC | 5:00PM CEST

Click here for time zone conversion

Register Here


Synthetic data generation is a contemporary method for preserving patient privacy in real-world health data, which reduces friction when sharing this data internally or externally for secondary analysis. Synthetic data is generated by training a machine learning model (called a generative model) to learn the patterns in an original dataset. That model then generates a dataset that looks and operates like the original dataset, with the intention of preserving the statistical properties of the original. Different machine learning and deep learning techniques can be used to train such a generative model.

The questions about synthetic data include whether it can provide valid statistical analysis. In this webinar we will present a brief tutorial on synthetic data generation, an overview of its privacy preserving properties, its advantages over traditional de-identification methods, and then review the results from a simulation of the validity of inference on synthetic oncology datasets. Using multiple imputation principles, we show that logistic regression parameter estimates on synthetic data have low bias, close to nominal coverage and power, and comparable precision to the original data. These results contribute to the growing evidence that inferences from synthetic datasets are valid. The appropriate parameterizations, strengths and limitations of the approach will be discussed.
All materials will be presented with relevant illustrative examples.

Learning Objectives

  • Be able to describe basic techniques for synthetic data generation
  • Learn how to evaluate the privacy risks and utility of synthetic data
  • Understand how to perform statistically valid population inferences from synthetic data


Khaled El Emam, PhD, SVP and General Manager, Replica Analytics & Professor, School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada

Lucy Mosquera, MS, 
Director of Data Science, Replica Analytics, Ottawa, ON, Canada

Sponsored by: Replica Analytics

Please note:
On the day of the scheduled webinar, the first 1000 registered participants will be accepted into the webinar. For those who are unable to attend, or would like to review the webinar at a later date, the full-length webinar recording will be made available at the ISPOR Educational Webinar Series webpage approximately 2 days after the scheduled Webinar.

Reservations are on a first-come, first-served basis.

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now