Augmenting External Control Arms Using Synthetic Data Generation

Author(s)

O Meachair S1, Malagarriga D1, Mosquera L2
1Aetion, Barcelona, Catalunya, Spain, 2Aetion, Ottawa, ON, Canada

OBJECTIVES: To determine whether Synthetic Data Generation (SDG) methods can be used to improve parameter estimation in rare disease studies where limited or no control data is available from clinical trials, and external control arms have limited patients meeting inclusion/exclusion criteria with which to estimate treatment effectiveness.

METHODS: We compare two different methods of SDG for tabular data - Sequential Decision Trees and Bayesian Networks - to two standard baseline approaches: propensity score weighting of external control data, and bootstrap sampling of available data. The methods are compared on two different datasets - a simulated dataset where all true parameters are known, and a subset of diabetes patients from the Marketscan dataset. Both datasets are split into two cohorts - a ‘clinical trial’ cohort where treatment is randomly assigned across both treatment and control arms, and an ‘external’ arm which is a biased sample from the overall population. Available control and external data is pooled for input into the SDG methods, and are augmented with synthetic data. All methods are compared in terms of bias and variance of the estimate of median Progression Free Survival (PFS) in the control data, relative to the known population PFS estimate.

RESULTS: We show that in certain conditions SDG methods provide improved accuracy of effect estimates over baseline methods. For the Marketscan data, Sequential Decision Trees consistently provide less biased estimates compared to other methods for the majority of data scenarios, while for simulated data, Bayesian Networks provide better or comparable performance to baseline approaches.

CONCLUSIONS: SDG has shown improved estimation for population parameters in certain data scenarios. Further work is required to describe which SDG method is appropriate given characteristics of each dataset, as well as how to assess the performance of SDG methods in real world scenarios where the true population parameters are not known.

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Value in Health, Volume 27, Issue 12, S2 (December 2024)

Code

MSR175

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×