Use of Large Language Models to Extract Cost-Effectiveness Analysis Data: A Case Study

Full Text

Abstract

Objectives

Cost-effectiveness analyses (CEA) generate extensive data that can support much health economic research. However, manual data collection is time-consuming and prone to errors. Development in artificial intelligence (AI) and large language models (LLMs) offers a solution for automating this process. This study aims to evaluate the accuracy of LLM-based data extraction and assess its feasibility for supporting CEA data collection.

Methods

We evaluated the performance of the custom ChatGPT model (GPT), the Tufts CEA Registry (TCRD), and the researcher-validated data (RVE) in extracting 36 predetermined variables from 34 selected structured articles. Concordance rates between GPT and RVE, TCRD and RVE, and GPT and TCRD were calculated and compared. Paired student’s t tests assessed differences in accuracy, and concordance rates across 36 variables were provided.

Results

The accuracy of GPT (GPT RVE) (mean 0.88, SD 0.06 vs mean 0.90, SD 0.06, P = .71). The performance of GPT varied across variables. GPT outperformed TCRD in capturing “Population and Intervention Details” but struggled with complex variables like “Utility.”

Conclusions

This study demonstrated that LLMs, such as GPT, can be a promising tool for automating CEA data extraction, offering comparable accuracy to established registries. However, human supervision and expertise is essential to address challenges in complex variables.

Authors

Xujun Gu Hanwen Zhang Divya Patil Zafar Zafari Julia Slejko Eberechukwu Onukwugha

Back to Volume 28, Issue 11

Abstract

Abstract

Objectives

Methods

Results

Conclusions

Authors

ISPOR–The Professional Society for
Health Economics and Outcomes Research

Your browser is out-of-date