Integrating Large Language Models Into an Existing Review Process: Promises and Pitfalls

Author(s)

Edwards M¹, Ferrante di Ruffano L²
¹York Health Economics Consortium, York, YOR, UK, ²York Health Economics Consortium, York, NYK, UK

Presentation Documents

ISPOREurope24_Ferrante di Ruffano_MSR118_POSTER145190.pdf

OBJECTIVES: The recent development and rise in accessibility of large language models (LLMs) has generated excitement around their possibilities for reducing the resource burden of conducting reviews. Following testing, we assessed the cost, accuracy, and accessibility of LLMs to reviewers, and consider what types of reviews LLMs are currently best suited to assist with.

METHODS: We conducted internal testing of a LLM, Claude 3 Opus, via the chat interface. We used the tool to conduct high level data extraction for a targeted review, highly granulated extraction for a systematic review, and risk of bias assessment of RCTs.

RESULTS: The LLM via a chat interface was highly accessible, inexpensive, and saved significant time in conducting high level qualitative data extraction for a pragmatic review. Outputs were standardized and easy to manipulate and integrate into our existing work process. Extracting accurate granular data for a systematic review proved more difficult, with the model failing to interpret complexities of patient flow, struggling to respond accurately to lengthy, detailed prompts, and the subsequent checking, correcting, and formatting outweighing any time saved. The model identified some relevant content for conducting risk of bias assessment with the Cochrane RoB 1 tool, although lacked context, and human judgement was needed for final decision making.

CONCLUSIONS: LLM chat interfaces offer significant time savings for pragmatic reviews, although copyright issues exist in uploading published papers for synthesis. Optimal performance for systematic reviews is unlikely to be achieved without fine tuning a version of the model with archive data. This process is currently costly, commercial confidentiality must be considered, and the skill set required is outside the scope of many review teams. Developers should ensure that any LLM based tools for reviewing can be integrated into clients’ existing processes with the use of standardized import and export formats such as CSV or RIS.

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Value in Health, Volume 27, Issue 12, S2 (December 2024)

Code

MSR118

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic

Methodology

Presentation