Mapping Writer-AI Conversations for HTA: Preliminary Analysis of 7009 Messages
Author(s)
Anton O. Wiehe1, Pia Ana Cuk, MSc2, Florian Woeste, MSc.3.
1Hamburg, Germany, 2PHAROS Labs GmbH, Hamburg, Germany, 3PHAROS Labs, Ahrensburg, Germany.
1Hamburg, Germany, 2PHAROS Labs GmbH, Hamburg, Germany, 3PHAROS Labs, Ahrensburg, Germany.
OBJECTIVES: To identify tasks an agentic AI assistant supports during health-technology-assessment (HTA) writing and how conversational tone varies, informing interface and model refinements that could shorten evidence timelines.
METHODS: Logs captured 7009 messages from 45 writers (16 Dec 2024-26 Jun 2025). Messages were embedded (OpenAI text-embedding-3) and clustered with k-means (k = 10); UMAP provided a 2-D map for visual inspection. GPT-4o labelled clusters after reviewing 50 sample messages each. Sentiment was computed as cosine similarity to positive and negative prototype vectors derived from 200 manually labelled messages, rescaled from −1-1 to 0-100; the scale correlated with human ratings (ρ = 0.78). Sentiment differences were tested with ANCOVA adjusting for writer ID (α = 0.05). Engagement telemetry will be added in the final 2025 cut.
RESULTS: Ten clusters covered 92 % of traffic. The largest were Clinical Endpoints & Study Design (18 %), Document Review & Data Analysis (15 %), and Regulatory Benefit-Assessment Queries (12 %). Sentiment differed across clusters (ANCOVA F₉,₇₀₀₀ = 11.4; p < 0.001) and explained 19 % of variance. Medical Translation Requests showed the highest tone (mean 55.4, 95 % CI 53.1-57.8), whereas Document Review & Data Analysis scored lowest (46.3), a -9.1-point gap (95 % CI -14.7 to -3.5). Example high-tone message: “Translate this AMNOG excerpt into plain English” (score 72); low-tone message: “Fix these inconsistent table references” (score 33).
CONCLUSIONS: Preliminary semantic mapping highlights high-tone task zones suitable for prompt libraries and low-tone friction points that warrant UI or model changes. Full-cohort analysis, including engagement metrics, will test whether such refinements accelerate dossier production and reduce writing costs.
METHODS: Logs captured 7009 messages from 45 writers (16 Dec 2024-26 Jun 2025). Messages were embedded (OpenAI text-embedding-3) and clustered with k-means (k = 10); UMAP provided a 2-D map for visual inspection. GPT-4o labelled clusters after reviewing 50 sample messages each. Sentiment was computed as cosine similarity to positive and negative prototype vectors derived from 200 manually labelled messages, rescaled from −1-1 to 0-100; the scale correlated with human ratings (ρ = 0.78). Sentiment differences were tested with ANCOVA adjusting for writer ID (α = 0.05). Engagement telemetry will be added in the final 2025 cut.
RESULTS: Ten clusters covered 92 % of traffic. The largest were Clinical Endpoints & Study Design (18 %), Document Review & Data Analysis (15 %), and Regulatory Benefit-Assessment Queries (12 %). Sentiment differed across clusters (ANCOVA F₉,₇₀₀₀ = 11.4; p < 0.001) and explained 19 % of variance. Medical Translation Requests showed the highest tone (mean 55.4, 95 % CI 53.1-57.8), whereas Document Review & Data Analysis scored lowest (46.3), a -9.1-point gap (95 % CI -14.7 to -3.5). Example high-tone message: “Translate this AMNOG excerpt into plain English” (score 72); low-tone message: “Fix these inconsistent table references” (score 33).
CONCLUSIONS: Preliminary semantic mapping highlights high-tone task zones suitable for prompt libraries and low-tone friction points that warrant UI or model changes. Full-cohort analysis, including engagement metrics, will test whether such refinements accelerate dossier production and reduce writing costs.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
HTA227
Topic
Health Technology Assessment, Methodological & Statistical Research, Real World Data & Information Systems
Topic Subcategory
Value Frameworks & Dossier Format
Disease
No Additional Disease & Conditions/Specialized Treatment Areas