EVALUATION OF OFF-THE-SHELF AGENT-BASED AI TOOL CLAUDE CODE FOR RAPID REPLICATION OF A PUBLISHED COST-EFFECTIVENESS MODEL

Author(s)

Attila Imre, PharmD¹, Bertalan Németh, PhD², Balázs Nagy, PhD¹;
¹Semmelweis University, Center for Health Technology Assessment, Budapest, Hungary, ²Syreon Research Institute, Budapest, Hungary

OBJECTIVES: Replication of published models with artificial intelligence (AI) tools is of great interest to the health economic modelling community. Previous techniques used a combination of base models and complex, problem-specific prompting to achieve this. Previous pilot projects also did not take into account the upfront resource use to develop such techniques. In this study our goal was to pilot an off-the-shelf agent-based programming tool in replicating a published health economic model without domain specific prompting. Using a non-trivial cohort model based on the FINGER study (Wimo et al., 2023), we aimed to reproduce its cost-effectiveness analysis based on the published paper and supplement.
METHODS: Reproduction involved converting PDFs to text with Mistral DocumentAI and instructing Claude Code v2.0.76 to generate Python code based solely on the materials and implement base-case results and scenario analysis. The model estimates the long-term cost-effectiveness of a dementia prevention program in Sweden. It simulates patients’ lifetime from age 60+ with states: normal cognition, mild cognitive impairment, mild/moderate/severe dementia and death. Methodology and input parameters were extracted solely from the paper. Reproduced results were compared to published results and the generated Python code was validated by expert human review.
RESULTS: AI-assisted reproduction achieved rapid and accurate results within 1 hour, with costs <2$. Reproduced results matched published ones within 3%. QALYs/patient usual care (UC): 8.649 vs 8.636 (100.1%); incremental QALYs: 0.044 vs 0.043 (102.3%); person-years alive UC: 15.40 vs 15.13 (101.8%); dementia years UC: 2.86 vs 2.76 (103.6%); incremental cases UC: 47,425 vs 46,297 (102.4%); cases prevented: 1,655 vs 1,623 (102.0%); NNT: 60 vs 62 (97.5%).
CONCLUSIONS: Claude Code demonstrates rapid and accurate HEOR model reproduction without massive up-front investment into problem-specific prompts and expert-time. However expert-validation and oversight is still required to guide reproduction and interpret differences in modelling results.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR231

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)