Validating the Use of Large Language Models to Generate Individual Patient-Level Data for Use in Health Technology Assessment

Author(s)

Emily Foreman, MSc, Oliver Pople, BSc, MSc, Laura Sawyer, BA, MSc, Bryony Langford, MPhil, MSc.
Symmetron, London, United Kingdom.

OBJECTIVES: Individual Patient-Level Data (IPD) are a cornerstone of economic evaluations and statistical analysis. The potential of generative AI, including Large Language Models (LLMs), across Health Economics and Outcome Research has been documented extensively; however feasibility of adopting LLMs to generate IPD remains unclear. This study aims to assess whether LLMs can emulate IPD with limited information, and whether this data could support analyses in health technology assessments.
METHODS: IPD were simulated using R, containing characteristics often observed in real data such as variable correlation and skewness, and issues such as missingness and ceiling effects. Baseline characteristics were summarised and entered into an LLM (ChatGPT 4.0) to generate IPD, which was compared to the original dataset using standardised mean difference (SMD) for continuous variables, categorised SMD for dichotomous variables and data visualisations. The LLM dataset was then re-generated, after providing incrementally more information about characteristics of the original dataset, with the suitability then re-assessed.
RESULTS: The LLM could generate IPD consistent with the distribution of baseline characteristics provided; however, it would assume independence between variables unless told otherwise. The LLM could account for dependence, as well as inherent data characteristics (skewness and correlation) once prompted to do so, which was validated via comparison of data visualisations. The ability to provide such prompts relied heavily on knowledge of the data characteristics which is unlikely to be plausible in practice without clinical input. In the absence of clinical input, LLMs can make plausible clinical assumptions about data characteristics on request (such as likely variable relationships), but these often rely on a general understanding of data characteristics rather than a population-specific understanding.
CONCLUSIONS: The study suggests LLMs can simulate IPD data efficiently when given detailed prompts. However, inherent data characteristics make reproducing IPD challenging, which remains an issue when generating a dataset with an LLM or manually.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

MSR218

Topic

Clinical Outcomes, Health Technology Assessment, Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Confounding, Selection Bias Correction, Causal Inference

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)