Single-Agent vs. Multi-Agent RAG for Health Economic Model Replication: A Comparative Evaluation

Author(s)

Sumeyye Samur, PhD¹, Jakob Langer, MSc², Emir Gursel, MS³, Ismail Fatih Yildirim, MSc³, Elif Bayraktar, BS³, Turgay Ayer, PhD⁴, Rachael Fleurence, MSc, PhD³, Jag Chhatwal, PhD⁵, Ipek Ozer Stillman, MBA, MSc⁶.
¹VP, Head of Value & Access, Value Analytics Labs, Boston, MA, USA, ²Takeda, Zurich, Switzerland, ³Value Analytics Labs, Boston, MA, USA, ⁴Georgia Institute of Technology, Atlanta, GA, USA, ⁵Harvard Medical School / Massachusetts General Hospital, Boston, MA, USA, ⁶Takeda, Cambridge, MA, USA.

OBJECTIVES: As generative AI gains traction in health economics and outcomes research (HEOR), identifying configurations that enhance model replication is essential. This study compares the performance of single-agent and multi-agent Retrieval-Augmented Generation (RAG) systems in automating the replication of a published model.
METHODS: We developed and evaluated two generative AI workflows to extract structure and parameters from a cost-effectiveness model of ulcerative colitis by Salcedo et al. The first was a single-agent RAG system using LangChain with GPT-4o, integrating retrieval, reasoning, and generation in one workflow. The second was a custom multi-agent Self-Reflective RAG (Self-RAG) workflow using LangGraph, employing discrete retrieval and generation agents along with grading agents for relevance, grounding, and quality. Each approach was run 10 times to capture uncertainty and evaluated for accuracy, transparency and processing time in extracting health states, transitions, costs, utilities and therapy lines.
RESULTS: Both approaches consistently identified core health states (active disease, remission, and death). The multi-agent system more reliably captured surgical states (90% vs. 50%) and identified remission after surgery and post-surgical complications with 80% success, while the single-agent correctly captured these in all runs. For transition probabilities, the multi-agent extracted the correct values, cycle length and time horizon, while the single-agent often hallucinated probabilities and time horizon. Both accurately extracted health state costs, but the multi-agent more consistently captured drug (80% vs. 50%) and administration costs (80% vs. 0%). Utility values were accurately extracted by both, but neither recognized the therapy lines. The multi-agent was slower but more traceable, while the single-agent was faster but lacked transparency.
CONCLUSIONS: Multi-agent generative AI approaches demonstrated superior accuracy and transparency in replicating a health economic model, though with increased processing time. These findings suggest that multi-agent frameworks hold potential for advancing model replication in HEOR. Future work should explore alternative configurations, balancing performance gains with computational cost.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

MSR188

Topic

Economic Evaluation, Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

Gastrointestinal Disorders

Presentation (CTI)