Evaluating the Performance of GPT-4o and Retrieval-Augmented Generation (RAG) in Extracting Data From Journal Articles: A Comparative Study

Author(s)

Huang WH, Poojary V, Kasireddy E, Fazeli MS
Evidinno Outcomes Research Inc., Vancouver, BC, Canada

OBJECTIVES: To evaluate the performance and efficacy of a custom-designed system utilizing GPT-4o and Retrieval-Augmented Generation (RAG) for extracting specific fields from scientific journal articles, compared to the gold standard of domain expert extraction.

METHODS: We developed a custom system leveraging OpenAI's GPT-4o model and Assistant API, enhanced with RAG capabilities. The evaluation process compared machine extraction with domain expert extraction across 36 diverse studies, focusing on consistency and completeness of data field identification. Key fields evaluated included study design, country, setting, sample size, RCT phase, and blinding, encompassing various extraction complexities.

RESULTS: The system extracted 168 data fields across the studies. Of these, 141 fields aligned precisely with domain expert extractions, yielding a consistency rate of 84% (141/168) between expert and machine. Performance varied across field types, with the highest similarity to expert extraction observed for straightforward fields like country and sample size. More nuanced and complex fields, particularly study design, presented greater challenges, showing the lowest similarity to expert extractions.

CONCLUSIONS: The GPT-4o and RAG-based system demonstrates significant potential for enhancing efficiency and accuracy in scientific data extraction. While the 84% match with the gold standard is promising, it also highlights areas for improvement. Further refinement and rigorous validation are necessary to elevate performance across all data field categories, especially for complex fields like study design.

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Value in Health, Volume 27, Issue 12, S2 (December 2024)

Code

MSR28

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×