Exploring LLMs in the Conceptual and Functional Construction of Health Economic Models: A Case Study on Alzheimer's Diagnostic Cost-Effectiveness Model

Author(s)

Emilija Veljanoska, MSc¹, Agota Szende, PhD².
¹Market Access Consulting & HEOR, Fortrea, Munich, Germany, ²Market Access Consulting & HEOR, Fortrea, Leeds, United Kingdom.

OBJECTIVES: To assess the performance of large language models (LLMs) in supporting the conceptual and functional construction of a health economic model in Excel.
METHODS: A targeted literature search was conducted to explore the role of LLMs in health economic modeling. Thereafter, a structured case study evaluated ChatGPT-4’s ability to propose a model structure, build it in Excel, and identify parameter values for a cost-effectiveness analysis of diagnostic alternatives in Alzheimer’s disease (AD), benchmarked against a published model (PMID: 40054769).
RESULTS: The literature search identified limited evidence on the application of LLMs in health economic model construction; one study reported automated survival data extraction, R script generation, and replication of a partitioned survival model, although it suggested that complex logic still required human oversight. In our case study, the initial model structure generated by ChatGPT-4 aligned with the published version, using a unidirectional, non-reversible disease trajectory with initial health states that distinguished between mild cognitive impairment (MCI)±AD pathology. A later iteration introduced amyloid status as a decision node to reflect diagnostic sequencing; however, this refinement was not retained by ChatGPT-4 due to limited biomarker data and structural simplification. When prompted to construct an Excel-based model, the LLM generated sheet headers and structural placeholders, but produced only a static shell, with no formulas, lookup logic, named ranges, or cell-level computation. The LLM underestimated MCI incidence by 0.5-fold; cost estimates exceeded reference values by up to 2-fold; utility values deviated by less than 1.1-fold; and diagnostic sensitivity and specificity estimates deviated from benchmark inputs by 1-9%.
CONCLUSIONS: LLMs demonstrated capability in structural modeling and parameter retrieval. However, functional implementation remains limited in Excel compared to code-based platforms such as R. Hence, human input and oversight would be required in data sourcing and detailed health economic model construction.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

EE457

Topic

Economic Evaluation

Disease

Neurological Disorders

Presentation (CTI)

Author(s)

Conference/Value in Health Info

Code

Topic

Disease

ISPOR–The Professional Society for
Health Economics and Outcomes Research

Your browser is out-of-date