Synthetic Patient Trajectories Using Generative AI on Electronic Health Records
Author(s)
Eric Q. Wu, PhD1, Jimmy Royer, PhD2, Max Leroux, MSc2, Intekhab Hossain, PhD1, Robert W. Platt, PhD3;
1Analysis Group, Inc., Boston, MA, USA, 2Analysis Group, Inc., Montréal, QC, Canada, 3McGill University, Montréal, QC, Canada
1Analysis Group, Inc., Boston, MA, USA, 2Analysis Group, Inc., Montréal, QC, Canada, 3McGill University, Montréal, QC, Canada
OBJECTIVES: To provide a comprehensive understanding of obesity patients’ long-term disease progression and impact of weight loss interventions using a generative AI (GenAI) disease model.
METHODS: Data was obtained from Dandelion’s multimodal EHR database which includes high fidelity detailed clinical data of over 10 million lives from three major US health systems. A Conditional Restricted Boltzmann Machine (CRBM) with >11,000 parameters was trained on 205,517 patients. The CRBM was subsequently used to simulate over 10,000 patients’ characteristics and long-term outcomes including weight, comorbidities, and laboratory results. The model was statistically validated by comparing simulated and observed variable distributions. To study the long-term effect of weight loss, a synthetic obese cohort was then simulated to have the same baseline characteristics as observed obese patients, while implementing a 10% weight reduction over one year. The subsequent impact on cardiovascular outcomes was assessed as incidence rate ratios (IRR) during year 5 and year 10 post weight loss.
RESULTS: Model validation showed strong alignment between synthetic and observed distributions in terms of binary proportions (correlation=0.96), continuous means (0.99), variances (0.98), and covariance structures (0.89). Validity was additionally confirmed when binary classification tests were unable to distinguish between synthetic and observed patients. Finally, the model estimated significant benefits of weight loss in reducing the incidence of cardiovascular events. IRR was 0.81 (95% confidence interval=0.72-0.91) for heart failure and 0.84 (0.74-0.95) for atrial fibrillation during year 5, and was 0.79 (0.70-0.89) for heart failure and 0.83 (0.74-0.95) for atrial fibrillation during year 10.
CONCLUSIONS: A robust GenAI-based obesity disease model was developed using EHR data based on a large US cohort of obese patients. The model was validated to accurately represent patient journeys and distributions of multiple clinical outcomes simultaneously and has been successfully used to predict the long-term cardiovascular benefits of weight loss.
METHODS: Data was obtained from Dandelion’s multimodal EHR database which includes high fidelity detailed clinical data of over 10 million lives from three major US health systems. A Conditional Restricted Boltzmann Machine (CRBM) with >11,000 parameters was trained on 205,517 patients. The CRBM was subsequently used to simulate over 10,000 patients’ characteristics and long-term outcomes including weight, comorbidities, and laboratory results. The model was statistically validated by comparing simulated and observed variable distributions. To study the long-term effect of weight loss, a synthetic obese cohort was then simulated to have the same baseline characteristics as observed obese patients, while implementing a 10% weight reduction over one year. The subsequent impact on cardiovascular outcomes was assessed as incidence rate ratios (IRR) during year 5 and year 10 post weight loss.
RESULTS: Model validation showed strong alignment between synthetic and observed distributions in terms of binary proportions (correlation=0.96), continuous means (0.99), variances (0.98), and covariance structures (0.89). Validity was additionally confirmed when binary classification tests were unable to distinguish between synthetic and observed patients. Finally, the model estimated significant benefits of weight loss in reducing the incidence of cardiovascular events. IRR was 0.81 (95% confidence interval=0.72-0.91) for heart failure and 0.84 (0.74-0.95) for atrial fibrillation during year 5, and was 0.79 (0.70-0.89) for heart failure and 0.83 (0.74-0.95) for atrial fibrillation during year 10.
CONCLUSIONS: A robust GenAI-based obesity disease model was developed using EHR data based on a large US cohort of obese patients. The model was validated to accurately represent patient journeys and distributions of multiple clinical outcomes simultaneously and has been successfully used to predict the long-term cardiovascular benefits of weight loss.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
CO144
Topic
Clinical Outcomes
Topic Subcategory
Relating Intermediate to Long-term Outcomes
Disease
SDC: Cardiovascular Disorders (including MI, Stroke, Circulatory), SDC: Diabetes/Endocrine/Metabolic Disorders (including obesity), SDC: Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal), SDC: Urinary/Kidney Disorders