Can Artificial Intelligence (AI) Large Language Models (LLMS) Such as Generative Pre-Trained Transformer (GPT) Be Used to Automate Literature Reviews?

Author(s)

Guerra I¹, Gallinaro J¹, Rtveladze K², Lambova A¹, Asenova E¹
¹IQVIA, London, LON, UK, ²IQVIA, London , UK

Presentation Documents

20231025_ISPOR_AI_SLR_poster_v1.0_Rtveladze et al131667.pdf

OBJECTIVES: In healthcare, systematic literature reviews (SLR) are commonly required to support market access activities for new products. The SLR process involves multiple time-consuming, and potentially error-prone steps, such as publication screening, data extraction and reporting. The objective was to test whether AI could assist automation of the most time-consuming task, namely clinical data extraction in a complex oncology indication.

METHODS: We implemented an algorithm that uses AI LLMs, such as GPT, to generate the first draft of the clinical data extraction file in Excel®, including information on the study details, patient characteristics and interventions. We assessed performance by measuring the accuracy of the GPT-based extraction compared to a manual extraction (performed by humans). For the variables that consisted of free-flow text, accuracy was estimated with BERTScore that is an evaluation metric for text generation. We then iteratively engineered parts of the GPT-based extraction algorithm and re-evaluated performance for selected variables with poor performance.

RESULTS: For the measured variables, the accuracy of extraction with the pre-engineered version of the algorithm ranged from 17% to 100%. For example, variables where the AI demonstrated high performance included details on the study arm (single or multi), primary end points and crossover. These variables were extracted with 100%, 87% and 77% accuracy, respectively. By iteratively engineering the GPT-based algorithm, the extraction accuracy could be improved for variables where AI initially had low performance. For example, for patient inclusion criteria accuracy increased from 40% to 70%, and that for patient exclusion criteria increased from 35% to 80%.

CONCLUSIONS: These results suggest that AI LLMs such as GPT, in conjunction with iterative algorithm engineering, could be used for generating first draft of extraction file with good accuracy. Provided a human subject matter expert undertakes a quality check, these tools could provide more efficient data extraction versus a manual extraction alone.

Conference/Value in Health Info

2023-11, ISPOR Europe 2023, Copenhagen, Denmark

Value in Health, Volume 26, Issue 11, S2 (December 2023)

Code

MSR92

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic

Methodology

Presentation