Evaluating the Potential of Synthetic Patient Data Generation to Accelerate Real-World Evidence (RWE) Generation

Author(s)

Törnqvist M, Dry L, Pinon G, Movschin A
Quinten Health, Paris, France

Presentation Documents

OBJECTIVES: The increasing rise of machine learning methods in medical research requires large-scale and high-quality patient data. However, concerns regarding privacy, cost, and availability limit their accessibility. Leveraging synthetic data that mimics Real-World data (RWD) emerges as a promising solution, increasingly considered by pharmaceutical industries. This approach enables the generation of customized synthetic patient data of various sizes, without some of the limitations of RWD such as missing values and class imbalances. Recently, deep learning methods, such as Generative Adversarial Networks (GANs), have demonstrated remarkable performance in generating reference RWD, particularly in the field of economics. This study evaluated GANs, for synthesizing electronical health records (EHRs).

METHODS: MIMIC-III, a publicly accessible database of EHRs from intensive care, was chosen to train GANs specifically designed for synthesizing tabular data, CTGAN and CTABGAN. CTGAN addresses class imbalance by incorporating conditional generation, while CTABGAN can model a mixture of continuous and categorical variables through innovative data encoding. The synthetic data was then evaluated for fidelity, privacy and correlation with the original data using statistical measures and comparative visualizations of data distributions.

RESULTS: This study highlighted the potential of GAN-based deep learning approaches for generating synthetic patient data. The evaluation of two GANs on MIMIC-III demonstrated their ability to produce realistic synthetic health data while preserving privacy. However, GANs require large datasets, significant computational resources, and can be challenging to converge.

CONCLUSIONS: There is a need for consensus on the evaluation of synthetic data among researchers, regulators, and pharmaceutical industries. The level and quantity of evidence required to consider synthetic data reliable and validated for practical use depends on the judgment criteria and the objective of use. For instance, data-augmentation for modeling improvement regulatory-grade synthetic control arms may have different validation requirements.

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Value in Health, Volume 27, Issue 12, S2 (December 2024)

Code

MSR138

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

Injury & Trauma, No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×