GENERATIVE AI IN HEALTH ECONOMIC MODELLING: A TARGETED REVIEW OF EMERGING APPLICATIONS AND METHODOLOGICAL GAPS
Author(s)
Attila Imre, PharmD1, Bertalan Németh, PhD2, Ákos Bernard Józwiak, PhD2, Balázs Nagy, PhD1, László Balkányi, PhD2, Zoltan Kalo, PhD3, Antal Zemplényi, PhD2;
1Semmelweis University, Center for Health Technology Assessment, Budapest, Hungary, 2Syreon Research Institute, Budapest, Hungary, 3Center for Health Technology Assessment, Semmelweis University & Syreon Research Institute, Budapest, Hungary
1Semmelweis University, Center for Health Technology Assessment, Budapest, Hungary, 2Syreon Research Institute, Budapest, Hungary, 3Center for Health Technology Assessment, Semmelweis University & Syreon Research Institute, Budapest, Hungary
OBJECTIVES: Generative artificial intelligence (Gen-AI) is increasingly explored for applications in health economic modeling (HEM). We aimed to map peer-reviewed Gen-AI applications in HEM, classify modeling areas addressed, and assess the practical reproducibility of published methods.
METHODS: Systematic searches of PubMed and EMBASE (from 2020-) combined Gen-AI-related terms with health economic modeling concepts. Articles were included if they reported a new application of Gen-AI for HEM, conference abstracts were excluded. Forward and backward citation chasing was applied. Data extracted from each study included sponsor, modeling stage where the reported process could be applied, Gen-AI model used, reported accuracy, reported up-front and operating cost, validation approach and practitioner-facing adaptability (availability of sufficient detail on data, methods, and results to support reproduction).
RESULTS: From 999 records (199 duplicates), 800 underwent screening. Five peer-reviewed articles and five relevant prior reviews were identified. All included studies presented proof-of-concept applications using GPT-4/4o (n=5) or Claude 3.5 (n=1). Identified stages of modeling were: conceptualization/model development (n=1), model implementation (n=2), model updating/adaptation (n=1), and reporting (n=1). Reported accuracy was promising: model code generation achieved 60-100% error-free outputs with ICER deviations ≤1-28% from published values; data extraction reached 88% concordance with researcher-validated extractions; model adaptation pipelines achieved 95-100% accuracy at $2.65-13.36 API cost per model. Practitioner-facing adaptability was high in three studies (prompts, code, and technical parameters reported) and low in two (prompts or datasets not provided). Even among reproducible studies, key parameters affecting output variability (model temperature, seed or exact version) were inconsistently reported. No study reported the upfront expert time required for prompt development and pipeline design.
CONCLUSIONS: Peer-reviewed Gen-AI applications in HEM remain limited to proof-of-concept studies across a narrow range of modeling tasks. While reported accuracy is encouraging, workflow reporting needs to be more transparent to facilitate adaptation in practice.
METHODS: Systematic searches of PubMed and EMBASE (from 2020-) combined Gen-AI-related terms with health economic modeling concepts. Articles were included if they reported a new application of Gen-AI for HEM, conference abstracts were excluded. Forward and backward citation chasing was applied. Data extracted from each study included sponsor, modeling stage where the reported process could be applied, Gen-AI model used, reported accuracy, reported up-front and operating cost, validation approach and practitioner-facing adaptability (availability of sufficient detail on data, methods, and results to support reproduction).
RESULTS: From 999 records (199 duplicates), 800 underwent screening. Five peer-reviewed articles and five relevant prior reviews were identified. All included studies presented proof-of-concept applications using GPT-4/4o (n=5) or Claude 3.5 (n=1). Identified stages of modeling were: conceptualization/model development (n=1), model implementation (n=2), model updating/adaptation (n=1), and reporting (n=1). Reported accuracy was promising: model code generation achieved 60-100% error-free outputs with ICER deviations ≤1-28% from published values; data extraction reached 88% concordance with researcher-validated extractions; model adaptation pipelines achieved 95-100% accuracy at $2.65-13.36 API cost per model. Practitioner-facing adaptability was high in three studies (prompts, code, and technical parameters reported) and low in two (prompts or datasets not provided). Even among reproducible studies, key parameters affecting output variability (model temperature, seed or exact version) were inconsistently reported. No study reported the upfront expert time required for prompt development and pipeline design.
CONCLUSIONS: Peer-reviewed Gen-AI applications in HEM remain limited to proof-of-concept studies across a narrow range of modeling tasks. While reported accuracy is encouraging, workflow reporting needs to be more transparent to facilitate adaptation in practice.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
EE305
Topic
Economic Evaluation
Disease
No Additional Disease & Conditions/Specialized Treatment Areas