Evaluating Generative AI in Replicating Health Economic Models: A Case Study on Ulcerative Colitis
Author(s)
Sumeyye Samur, PhD1, Jakob Langer, MSc2, Emir Gursel, MS1, Ismail F. Yildirim, MSc1, Turgay Ayer, PhD3, Jag Chhatwal, PhD4, Ipek Ozer Stillman, MBA, MSc5;
1Value Analytics Labs, Boston, MA, USA, 2Takeda Pharmaceuticals International AG, Zurich, Switzerland, 3Georgia Institute of Technology, Atlanta, GA, USA, 4Massachusetts General Hospital Institute for Technology Assessment, Harvard Medical School, Boston, MA, USA, 5Takeda Pharmaceuticals U.S.A., Inc., Boston, MA, USA
1Value Analytics Labs, Boston, MA, USA, 2Takeda Pharmaceuticals International AG, Zurich, Switzerland, 3Georgia Institute of Technology, Atlanta, GA, USA, 4Massachusetts General Hospital Institute for Technology Assessment, Harvard Medical School, Boston, MA, USA, 5Takeda Pharmaceuticals U.S.A., Inc., Boston, MA, USA
Presentation Documents
OBJECTIVES: Generative AI has shown promise in health economics, particularly in automating and accelerating model development. This study explores the feasibility of using Generative AI to replicate published health economic models, with implications for early-phase decision-making.
METHODS: We replicated a Markov model for ulcerative colitis described in a publication by Salcedo et al. Our approach consisted of two stages. First, we used ValueGen.AI, a GPT-4-based platform integrating multi-agent pipelines (CrewAI, LangChain, and OpenAI libraries), to extract model structures and parameters from the publication. These extracted elements were implemented in R’s heemod package to replicate the model and evaluate outcomes. In the second iteration, we conducted the same process using a more detailed technical report of the model. We assessed the performance of Generative AI based on its ability to accurately conceptualize health states, extract key parameters, and replicate modeling approach.
RESULTS: From the publication, the Generative AI platform successfully identified cost and quality-of-life inputs but encountered difficulties in interpreting health states and transition probabilities due to insufficient textual detail and ambiguous descriptions. These limitations led to incomplete or conflicting parameterization in the initial replication attempt. By using the detailed technical report, the platform’s performance significantly improved, yielding clearer and more accurate extractions of model components. However, extracting transition probability formulas and adapting them to the specified model cycle length remained challenging, highlighting the dependency of AI-based approaches on the clarity and structure of input sources.
CONCLUSIONS: Generative AI has the potential to transform health economic modeling by introducing efficiencies in replicating or adapting existing models. However, its application is contingent on the availability of standardized and explicit reporting in model publications. This underscores the need for improved transparency and consistency in the documentation of health economic models to maximize the utility of AI in supporting and accelerating evidence generation for decision-making.
METHODS: We replicated a Markov model for ulcerative colitis described in a publication by Salcedo et al. Our approach consisted of two stages. First, we used ValueGen.AI, a GPT-4-based platform integrating multi-agent pipelines (CrewAI, LangChain, and OpenAI libraries), to extract model structures and parameters from the publication. These extracted elements were implemented in R’s heemod package to replicate the model and evaluate outcomes. In the second iteration, we conducted the same process using a more detailed technical report of the model. We assessed the performance of Generative AI based on its ability to accurately conceptualize health states, extract key parameters, and replicate modeling approach.
RESULTS: From the publication, the Generative AI platform successfully identified cost and quality-of-life inputs but encountered difficulties in interpreting health states and transition probabilities due to insufficient textual detail and ambiguous descriptions. These limitations led to incomplete or conflicting parameterization in the initial replication attempt. By using the detailed technical report, the platform’s performance significantly improved, yielding clearer and more accurate extractions of model components. However, extracting transition probability formulas and adapting them to the specified model cycle length remained challenging, highlighting the dependency of AI-based approaches on the clarity and structure of input sources.
CONCLUSIONS: Generative AI has the potential to transform health economic modeling by introducing efficiencies in replicating or adapting existing models. However, its application is contingent on the availability of standardized and explicit reporting in model publications. This underscores the need for improved transparency and consistency in the documentation of health economic models to maximize the utility of AI in supporting and accelerating evidence generation for decision-making.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
MSR30
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
SDC: Gastrointestinal Disorders