REAL-WORLD USE OF GENERATIVE AI ASSISTANT TOOLS ACROSS HEALTH ECONOMICS AND OUTCOMES RESEARCH WORKFLOWS: A SCOPING REVIEW
Author(s)
Eniola A. Olatunji, MPH, PhD1, Yao Ding, PhD2, Abimbola Williams, MPH, MS3.
1Principal Health Economics, Boston Scientific, Marlborough, MA, USA, 2Boston Scientific, Bethesda, MD, USA, 3Boston Scientific, Marlborough, MA, USA.
1Principal Health Economics, Boston Scientific, Marlborough, MA, USA, 2Boston Scientific, Bethesda, MD, USA, 3Boston Scientific, Marlborough, MA, USA.
OBJECTIVES: To characterize the real-world use of generative artificial intelligence (GenAI) assistant tools across health economics and outcomes research (HEOR) workflows and to assess the methodological quality of the publications.
METHODS: A scoping review of GenAI applications in HEOR was conducted. English publications were identified from PubMed, health technology agencies, and ISPOR databases published from 2024-2025. Data were extracted on HEOR task, GenAI tool, integration pattern, evaluation metrics, reported benefits, and limitations. Study quality was assessed using the ISPOR GenAI-HEOR Quality Assessment Framework, which evaluates 10 domains (security/privacy, deployment/efficiency, factuality, reproducibility/generalizability, fairness/bias, and accuracy) on a 0-2 scale.
RESULTS: Among 172 records screened, 23 studies were eligible. Most studies used GenAI for evidence-synthesis tasks (75%), with fewer applications in economic modelling/cost-effectiveness analysis (CEA) (22%) and health technology assessment (HTA) submission support (17%), in mutually non-exclusive tasks. Reported uses included data extraction and screening for systematic reviews, CEA model parameter extraction, CEA model recreation and qualitative coding. OpenAI GPT-based tools were the most reported AI tools (87%), followed by Claude (21%), and other Large Language Models (26%). Human-only workflows were the predominant comparator. Most implementations relied on prompting of base GenAI models, usually with human-in-the-loop. Quality assessment showed all studies assessed accuracy, and most (83%) reported moderate-to-high accuracy for structured extraction or screening tasks. Factuality and comprehensiveness were frequently evaluated (96%), and 35% of studies reported analyst time savings, which ranged from ~15minutes/study to 350minutes/reviewer, depending on task complexity. However, security, fairness, and bias were infrequently assessed.
CONCLUSIONS: Published applications of GenAI in HEOR are primarily concentrated in evidence-synthesis workflows, mainly as supervised assistants. Empirical evaluations of GenAI in economic modelling and HTA submission workflows are less common. Across studies, evaluation of security, bias and reproducibility was limited, indicating priorities for further methodological reporting.
METHODS: A scoping review of GenAI applications in HEOR was conducted. English publications were identified from PubMed, health technology agencies, and ISPOR databases published from 2024-2025. Data were extracted on HEOR task, GenAI tool, integration pattern, evaluation metrics, reported benefits, and limitations. Study quality was assessed using the ISPOR GenAI-HEOR Quality Assessment Framework, which evaluates 10 domains (security/privacy, deployment/efficiency, factuality, reproducibility/generalizability, fairness/bias, and accuracy) on a 0-2 scale.
RESULTS: Among 172 records screened, 23 studies were eligible. Most studies used GenAI for evidence-synthesis tasks (75%), with fewer applications in economic modelling/cost-effectiveness analysis (CEA) (22%) and health technology assessment (HTA) submission support (17%), in mutually non-exclusive tasks. Reported uses included data extraction and screening for systematic reviews, CEA model parameter extraction, CEA model recreation and qualitative coding. OpenAI GPT-based tools were the most reported AI tools (87%), followed by Claude (21%), and other Large Language Models (26%). Human-only workflows were the predominant comparator. Most implementations relied on prompting of base GenAI models, usually with human-in-the-loop. Quality assessment showed all studies assessed accuracy, and most (83%) reported moderate-to-high accuracy for structured extraction or screening tasks. Factuality and comprehensiveness were frequently evaluated (96%), and 35% of studies reported analyst time savings, which ranged from ~15minutes/study to 350minutes/reviewer, depending on task complexity. However, security, fairness, and bias were infrequently assessed.
CONCLUSIONS: Published applications of GenAI in HEOR are primarily concentrated in evidence-synthesis workflows, mainly as supervised assistants. Empirical evaluations of GenAI in economic modelling and HTA submission workflows are less common. Across studies, evaluation of security, bias and reproducibility was limited, indicating priorities for further methodological reporting.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR234
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas