Exploring LLMs in the Conceptual and Functional Construction of Health Economic Models: A Case Study on Alzheimer's Diagnostic Cost-Effectiveness Model

Author(s)

Emilija Veljanoska, MSc1, Agota Szende, PhD2.
1Market Access Consulting & HEOR, Fortrea, Munich, Germany, 2Market Access Consulting & HEOR, Fortrea, Leeds, United Kingdom.
OBJECTIVES: To assess the performance of large language models (LLMs) in supporting the conceptual and functional construction of a health economic model in Excel.
METHODS: A targeted literature search was conducted to explore the role of LLMs in health economic modeling. Thereafter, a structured case study evaluated ChatGPT-4’s ability to propose a model structure, build it in Excel, and identify parameter values for a cost-effectiveness analysis of diagnostic alternatives in Alzheimer’s disease (AD), benchmarked against a published model (PMID: 40054769).
RESULTS: The literature search identified limited evidence on the application of LLMs in health economic model construction; one study reported automated survival data extraction, R script generation, and replication of a partitioned survival model, although it suggested that complex logic still required human oversight. In our case study, the initial model structure generated by ChatGPT-4 aligned with the published version, using a unidirectional, non-reversible disease trajectory with initial health states that distinguished between mild cognitive impairment (MCI)±AD pathology. A later iteration introduced amyloid status as a decision node to reflect diagnostic sequencing; however, this refinement was not retained by ChatGPT-4 due to limited biomarker data and structural simplification. When prompted to construct an Excel-based model, the LLM generated sheet headers and structural placeholders, but produced only a static shell, with no formulas, lookup logic, named ranges, or cell-level computation. The LLM underestimated MCI incidence by 0.5-fold; cost estimates exceeded reference values by up to 2-fold; utility values deviated by less than 1.1-fold; and diagnostic sensitivity and specificity estimates deviated from benchmark inputs by 1-9%.
CONCLUSIONS: LLMs demonstrated capability in structural modeling and parameter retrieval. However, functional implementation remains limited in Excel compared to code-based platforms such as R. Hence, human input and oversight would be required in data sourcing and detailed health economic model construction.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

EE457

Topic

Economic Evaluation

Disease

Neurological Disorders

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×