Feasibility of Replicating a Published Health Economic Model From an ICER Report Using Generative AI
Author(s)
Jag Chhatwal, PhD1, Sumeyye Samur, PhD2, Jade Xiao, PhD2, Elif Bayraktar, BS2, Ismail F. Yildirim, MSc2, Turgay Ayer, PhD3;
1Massachusetts General Hospital Institute for Technology Assessment, Harvard Medical School, Boston, MA, USA, 2Value Analytics Labs, Boston, MA, USA, 3Georgia Institute of Technology, Atlanta, GA, USA
1Massachusetts General Hospital Institute for Technology Assessment, Harvard Medical School, Boston, MA, USA, 2Value Analytics Labs, Boston, MA, USA, 3Georgia Institute of Technology, Atlanta, GA, USA
OBJECTIVES: Generative AI has a potential to automate complex tasks, including health economic modeling. This study aimed to evaluate the feasibility and accuracy of replicating a previously published health economic model using generative AI for Alzheimer’s disease, using the Institute for Clinical and Economic Review (ICER) report as a benchmark.
METHODS: We replicated a Markov model for Alzheimer’s disease from the report using ValueGen.AI, a GPT-4-based platform with multi-agent pipelines (CrewAI, LangChain, and OpenAI libraries). Python facilitated large language model interactions, and the extracted parameters were implemented in the Heemod package in R to construct and run the Markov model, comparing Lecanemab+supportive-care against supportive-care-alone. To validate the AI-based model, we compared delta costs, delta QALYs, and incremental-cost-effectiveness-ratio and calculated error margins for these outcomes.
RESULTS: The Generative AI platform extracted health states and transition probabilities from the report but faced challenges with baseline health state distribution, requiring manual implementation in the R code. The lack of detailed age distribution data necessitated using only the mean age, limiting the accuracy of age-related adjustments. General population costs were not explicitly reported, and the absence of cited references restricted AI’s extraction capabilities and human involvement. Despite these limitations, the AI-based model estimated the incremental-cost-effectiveness-ratio for Lecanemab+supportive-care versus supportive-care-alone at $279,637, compared to $254,000 in the report, with a 10.1% error margin. The delta cost and delta QALY error margins were 4.6% ($120,244 vs. $126,000) and 14% (0.43 vs. 0.50), respectively.
CONCLUSIONS: This study demonstrates the feasibility of using Generative AI to replicate complex health economic models. While it showcases Generative AI's ability to approximate key outcomes, it also highlights the dependency on the clarity and completeness of model inputs,, emphasizing the need for standardized reporting in HEOR. Future research should replicate more decision-analytic models to validate and refine this approach.
METHODS: We replicated a Markov model for Alzheimer’s disease from the report using ValueGen.AI, a GPT-4-based platform with multi-agent pipelines (CrewAI, LangChain, and OpenAI libraries). Python facilitated large language model interactions, and the extracted parameters were implemented in the Heemod package in R to construct and run the Markov model, comparing Lecanemab+supportive-care against supportive-care-alone. To validate the AI-based model, we compared delta costs, delta QALYs, and incremental-cost-effectiveness-ratio and calculated error margins for these outcomes.
RESULTS: The Generative AI platform extracted health states and transition probabilities from the report but faced challenges with baseline health state distribution, requiring manual implementation in the R code. The lack of detailed age distribution data necessitated using only the mean age, limiting the accuracy of age-related adjustments. General population costs were not explicitly reported, and the absence of cited references restricted AI’s extraction capabilities and human involvement. Despite these limitations, the AI-based model estimated the incremental-cost-effectiveness-ratio for Lecanemab+supportive-care versus supportive-care-alone at $279,637, compared to $254,000 in the report, with a 10.1% error margin. The delta cost and delta QALY error margins were 4.6% ($120,244 vs. $126,000) and 14% (0.43 vs. 0.50), respectively.
CONCLUSIONS: This study demonstrates the feasibility of using Generative AI to replicate complex health economic models. While it showcases Generative AI's ability to approximate key outcomes, it also highlights the dependency on the clarity and completeness of model inputs,, emphasizing the need for standardized reporting in HEOR. Future research should replicate more decision-analytic models to validate and refine this approach.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
P60
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
SDC: Neurological Disorders