Health Economics and Outcomes Research in the New Era of Artificial Intelligence: Catch Me If You Can

Full Text

Abstract

Although artificial intelligence (AI) has been part of the health economics and outcomes research (HEOR) toolkit for years in the form of predictive models and machine learning algorithms, the recent surge of generative AI (GenAI) has created a new wave of excitement. GenAI is moving at lightning speed. In the time it takes to conceive, draft, and submit an article on use of AI in HEOR, the underlying AI tools will have already evolved or been replaced. For researchers in our field, this precipitous pace presents exhilarating opportunities but also many challenges and concerns. The creation of health economic models is being reimagined, evidence synthesis is being automated, and many other tasks are being reshaped—often before we have had time to understand and validate the latest breakthrough.

This themed section explores this dynamic intersection of GenAI and HEOR, spotlighting how these technologies are not just enhancing existing methodologies but redefining them entirely. Reflecting the topic’s maturity, 5 of the papers are about the use of GenAI to aid literature reviews^1-5; one is about automating adaptations to health economic models⁶ and another about generating synthetic patient data.⁷ Finally, there is an article about public preferences regarding AI in mobile health applications.⁸ Although not formally part of the special section, the ISPOR Working Group on Generative AI published 3 articles that address the need for taxonomy and terminology,⁹ provide an overview of the field,¹⁰ and propose reporting guidelines.¹¹ Of note, none of the articles that could have applied the criteria in the latter would have fully met them. The contributions in this issue provide a window into a field in flux, in which innovation is constant, and catching up is part of our job description.

There are common themes across these articles. GenAI can assist humans by automating workflow, taking care of boring repetitive tasks, and minimizing error, thereby reducing labor and accelerating production. Just the same, concerns remain: accuracy, credibility, and transparency are often discussed, as is the need for standards.

All conclude that HEOR tasks still need humans (big sigh of relief?), and the systematic literature review on the role of GenAI⁵ appears to have been carried out entirely by humans (!); yet, the 2 literature review tools¹^,⁴ show impressive performance while substantially reducing the human oversight, whereas the test of the plain-vanilla large language model² for extracting health economic model inputs is somewhat less so. With the evolution of agentic AI, it is likely that some of that human oversight will itself soon be handed over to additional agents.

As noted in some of the articles, it may be difficult when problems surface to distinguish between inherent AI failings and those created by human user limitations. The idea of “AI friendliness”⁶ is an enchanting complement to the more familiar “user friendliness.” Just as in the pre-GenAI era, the quality of source data remains very important²—GenAI cannot fix that problem. Or can it? GenAI can be used to create synthetic data,⁷ opening up intriguing possibilities and, of course, a host of questions and trepidations.

Validation of the GenAI work (is that what we should call it?) is a persistent theme. Several studies emphasize that it is essential for humans to verify the GenAI products and some articles report on detailed comparisons of GenAI against human reviewers, regarding the latter as the “gold standard.” The reality is that humans make plenty of mistakes and are not very good at finding them. This raises a critical question: is a high average concordance sufficient, or do we require consistently high concordance? Additionally, if human verification remains necessary for every task, what is the true efficiency gain from GenAI?

Although mentioned occasionally, none of the articles fully addressed the advantages of deploying general large language models versus domain-specific ones, or of fine-tuned models. This remains an open question for our applications.

Just how fast GenAI is evolving is evident in these articles. In the one from the ISPOR Working Group on GenAI that was published in the February 2025 issue,¹⁰ early efforts at using GenAI in the creation of health economic models are noted, and agentic approaches are barely mentioned, but now, only a few months later, we are seeing fully functional agentic GenAI apps deployed. Undoubtedly, many of you readers will find some of the material already dated and it is very likely that we will need to move forward with “living” versions of much of what we do, even reports from ISPOR Task Forces, Working Groups, and Special Interest Groups. How this will affect publications and the journals they appear in will be a fascinating development.

Looking ahead, GenAI is poised to evolve from assisting with repetitive tasks to providing more autonomous analytic capabilities in HEOR. Future AI tools will increasingly act as evidence partners. Integration into regulatory and HTA frameworks will demand new standards for acceptable use, alongside focused efforts to mitigate bias and build public confidence. As GenAI continues to accelerate, the imperative for the HEOR community is not only to adapt but to critically evaluate, guide, and shape its application. Staying current is no longer sufficient; staying relevant requires a proactive engagement with the tools and questions that will define the future of health economics and ultimately our future.

Article and Author Information

Authorship Confirmation: All authors certify that they meet the ICMJE criteria for authorship.

Funding/Support: The authors received no financial support for this research.

Authors

Jaime Caro Jagpreet Chhatwal Rachael L. Fleurence

Back to Volume 28, Issue 11

Abstract

Abstract

Article and Author Information

Authors

ISPOR–The Professional Society for
Health Economics and Outcomes Research

Your browser is out-of-date