Evaluating the Performance of an Artificial Intelligence A-Powered Tool for Assessing Quality of Published Economic Evaluations: A Comparison With Human Reviewers Using the Drummond Checklist

Author(s)

Maria Arregui, PhD1, Maria Koufopoulou, MSc2.
1Cencora, Bad Homburg v. d. Höhe, Germany, 2Cencora, London, United Kingdom.
OBJECTIVES: Economic evaluations are crucial in healthcare decision-making, influencing policy formulation, funding allocations, and drug pricing. Ensuring the quality of these evaluations is paramount to prevent inefficient resource utilization and safeguard patient care standards. The Drummond checklist, a validated and comprehensive tool of 35 questions addressing study design, data collection, and analysis, is widely employed to assess quality of economic evaluations. Despite its value, the application of this checklist can be labor-intensive and time-consuming. Recent advancements in AI offer opportunities to streamline this process. This study evaluates the performance of an AI-powered tool in assessing the quality of published economic evaluations.
METHODS: The AI tool was engaged to assess the quality of eight published economic evaluations, encompassing seven cost-effectiveness analyses (including two cost-utility analyses), and one cost-minimization analysis. A customized prompt based on the Drummond checklist guided the evaluation. The outputs generated by the AI were then compared to assessments performed by trained human reviewers. Agreement rates were calculated both overall and for individual checklist items to gauge the AI tool's effectiveness.
RESULTS: The AI tool achieved agreement rates with human reviewers ranging from 65.7% to 100%, with a median of 94.3%. Discrepancies were identified in six of the eight studies, primarily within the Data Collection section. Specifically, disagreements arose in 12.5% of studies for 12 questions, in 25% for five questions, and in 37.5% for one question. Key areas of contention included the reporting of effectiveness study details, benefit valuation methods, costing practices, and sensitivity analysis approaches, often due to unclear or missing information in study texts.
CONCLUSIONS: The AI tool demonstrated strong potential for automating quality assessments in economic evaluations, achieving high overall agreement rates with human reviewers. Nonetheless, AI-assisted evaluations should complement, not replace, human oversight to ensure reliable and comprehensive assessments in healthcare decision-making.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

EE446

Topic

Economic Evaluation, Methodological & Statistical Research

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×