APPLICATION OF THE ELEVATE-GENAI FRAMEWORK TO AN AI AGENTIC SYSTEM FOR DATA ANALYTICS
Author(s)
Elise Berliner, PhD1, Jessica Santos, BSc, MSc, PhD2, Michael Fronstin, MBA3.
1Vireo Strategies, Hatfield, PA, USA, 2Konovo, Cambridge, United Kingdom, 3HealthVerity, Philadelphia, PA, USA.
1Vireo Strategies, Hatfield, PA, USA, 2Konovo, Cambridge, United Kingdom, 3HealthVerity, Philadelphia, PA, USA.
OBJECTIVES: AI-agent-powered data analytics platforms empower users to query real world data and design and generate real world evidence studies using plain language with no coding experience needed. Trust in the platforms depends on objective demonstration of validity, reliability and accuracy. Emerging frameworks for evaluation of healthcare AI mostly focus on use cases such as clinical tasks, prediction modeling, and systematic review. The ELEVATE-GenAI framework provides a structured framework on GenAI in multiple use cases of HEOR but was previously tested only for systematic review and economic modeling. Our objective: Evaluate the applicability of the ELEVATE-GenAI framework to the validation of an agentic AI data analytics platform.
METHODS: Quantitative evaluation under the ELEVATE-GenAI framework focused on accuracy, comprehensiveness, and factuality; 50 unique research questions were run on the platform in five independent executions each, and the resulting outputs were systematically evaluated. Benchmarks included EQUATOR item accuracy and alignment with published literature.
RESULTS: While ELEVATE-GenAI was not developed for AI agents, its elements with some modifications are relevant to AI agents and represent a framework for validation. AI agent specific metrics were incorporated into the comprehensiveness category including whether the agent could spot and correct mistakes. Each run of the AI agent was independent, so that different valid epidemiological design choices were sometimes suggested in different runs for the same question.
CONCLUSIONS: ELEVATE-GenAI is a good starting point for the evaluation of AI agent-based data platforms and can provide confidence in reliability and accuracy. The AI agentic data platforms are intended as tools with human-in-the-loop. The AI agentic system suggests design elements, which users can adjust, including code sets and outcome definitions. As for any epidemiological study, the ultimate validity and applicability of findings will depend on the specific design choices.
METHODS: Quantitative evaluation under the ELEVATE-GenAI framework focused on accuracy, comprehensiveness, and factuality; 50 unique research questions were run on the platform in five independent executions each, and the resulting outputs were systematically evaluated. Benchmarks included EQUATOR item accuracy and alignment with published literature.
RESULTS: While ELEVATE-GenAI was not developed for AI agents, its elements with some modifications are relevant to AI agents and represent a framework for validation. AI agent specific metrics were incorporated into the comprehensiveness category including whether the agent could spot and correct mistakes. Each run of the AI agent was independent, so that different valid epidemiological design choices were sometimes suggested in different runs for the same question.
CONCLUSIONS: ELEVATE-GenAI is a good starting point for the evaluation of AI agent-based data platforms and can provide confidence in reliability and accuracy. The AI agentic data platforms are intended as tools with human-in-the-loop. The AI agentic system suggests design elements, which users can adjust, including code sets and outcome definitions. As for any epidemiological study, the ultimate validity and applicability of findings will depend on the specific design choices.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
RWD169
Topic
Real World Data & Information Systems
Disease
No Additional Disease & Conditions/Specialized Treatment Areas