Agentic AI Assistant Boosts Writer Engagement and Content Uptake vs. Retrieval-Augmented Generation in Health Technology Assessment Dossier Drafting: Randomized Crossover Pilot Study
Author(s)
Anton O. Wiehe1, Florian Woeste, MSc.2, Pia Ana Cuk, MSc3.
1Hamburg, Germany, 2PHAROS Labs, Ahrensburg, Germany, 3PHAROS Labs GmbH, Hamburg, Germany.
1Hamburg, Germany, 2PHAROS Labs, Ahrensburg, Germany, 3PHAROS Labs GmbH, Hamburg, Germany.
OBJECTIVES: Rapid, high-quality HTA dossiers are central to timely patient access to innovative therapies. We compared writer engagement and preference between an iterative agentic AI assistant and a conventional retrieval-augmented generation (RAG) tool during dossier drafting.
METHODS: In a randomised, crossover pilot (interim n = 5 writers, target n = 38) at a specialised European medical-writing consultancy, each professional drafted live dossier sections with both systems. Every new session was randomised 1:1, ensuring all writers used both pipelines. Engagement metrics were aggregated at writer level: for each writer we calculated (i) the ratio of text copied from agentic versus RAG outputs and (ii) the proportion of forced-choice prompts favouring the agentic answer; writer-level values were then averaged across the cohort. Generation latency was compared with a Wilcoxon signed-rank test. The agentic assistant executed up to ten reasoning cycles per query, whereas the RAG tool used a single-pass retrieve-then-generate workflow. All analyses are exploratory and will be repeated with the full sample. AI assistance has been disclosed per ISPOR policy.
RESULTS: Across 62 drafting sessions, agentic outputs were copied more often (mean writer-level copy ratio 1.7; 95% CI 1.2-2.3; p = 0.01) and preferred in forced choices 73% of the time (95% CI 64-81; p < 0.001). Writers exchanged a mean 3.2 versus 2.4 messages per session (difference +0.8; 95% CI 0.1-1.5; p = 0.04). Median generation time decreased by 1.2 s (95% CI -2.0 to -0.4; p = 0.006, Wilcoxon test).
CONCLUSIONS: Interim data suggest that an agentic AI assistant improves writer engagement and is more frequently preferred than a conventional RAG tool, while slightly reducing response time. Should these trends persist in the full cohort, agentic systems could streamline HTA documentation and accelerate evidence delivery for reimbursement decision makers.
METHODS: In a randomised, crossover pilot (interim n = 5 writers, target n = 38) at a specialised European medical-writing consultancy, each professional drafted live dossier sections with both systems. Every new session was randomised 1:1, ensuring all writers used both pipelines. Engagement metrics were aggregated at writer level: for each writer we calculated (i) the ratio of text copied from agentic versus RAG outputs and (ii) the proportion of forced-choice prompts favouring the agentic answer; writer-level values were then averaged across the cohort. Generation latency was compared with a Wilcoxon signed-rank test. The agentic assistant executed up to ten reasoning cycles per query, whereas the RAG tool used a single-pass retrieve-then-generate workflow. All analyses are exploratory and will be repeated with the full sample. AI assistance has been disclosed per ISPOR policy.
RESULTS: Across 62 drafting sessions, agentic outputs were copied more often (mean writer-level copy ratio 1.7; 95% CI 1.2-2.3; p = 0.01) and preferred in forced choices 73% of the time (95% CI 64-81; p < 0.001). Writers exchanged a mean 3.2 versus 2.4 messages per session (difference +0.8; 95% CI 0.1-1.5; p = 0.04). Median generation time decreased by 1.2 s (95% CI -2.0 to -0.4; p = 0.006, Wilcoxon test).
CONCLUSIONS: Interim data suggest that an agentic AI assistant improves writer engagement and is more frequently preferred than a conventional RAG tool, while slightly reducing response time. Should these trends persist in the full cohort, agentic systems could streamline HTA documentation and accelerate evidence delivery for reimbursement decision makers.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
HTA28
Topic
Health Technology Assessment, Methodological & Statistical Research, Real World Data & Information Systems
Topic Subcategory
Value Frameworks & Dossier Format
Disease
No Additional Disease & Conditions/Specialized Treatment Areas