Developing and Testing AI-Generated PICOs Summaries to Aid in Literature Reviews

Speaker(s)

Rawal A¹, Ashworth L², Luedke H³, Tiwari S³, Thomas C⁴, Murton M⁴
¹Costello Medical, Boston, MA, USA, ²Costello Medical, Manchester, UK, ³Costello Medical, London, UK, ⁴Costello Medical, Cambridge, UK

Presentation Documents

MSR107_Developing and Testing AI-Generated PICOS Summaries_DIGITAL_FINAL146197.pdf

OBJECTIVES: Ever-increasing volumes of published literature make in-depth literature reviews increasingly resource-intensive. Artificial intelligence (AI)-based tools may improve efficiency. This project aimed to incorporate AI-generated PICOS summaries, consisting of a concise bullet point on each PICOS domain, into a bespoke web application for abstract screening. Preliminary testing on time savings and accuracy for abstract screening was conducted.

METHODS: Different generative AI models and parameters were tested and prompt engineering for the desired output was conducted, including development of a generic context prompt that could be applied to any abstract. A workflow was established to integrate the output into a bespoke web application that manages record screening in literature reviews. Changes in efficiency and accuracy due to using the PICOS summaries were measured separately in two test reviews. To measure efficiency, reviewers recorded their screening rate (abstracts/hour) with and without the PICOS summaries. To estimate accuracy, reviewers compared the generated PICOS summary against the abstract content.

RESULTS: The final selected model was gpt-3.5-turbo with a temperature setting of 0.2. The context prompt stated the requirement for a response to be returned in JSON format. In the efficiency test project (targeted review on respiratory disease databases; four reviewers), 1,522 abstracts were divided into two equal groups (with and without AI-generated PICOS summaries) and reviewed by a single individual. Abstract review rates were 60/hour without vs 90/hour with PICOS summaries, representing a 50% increase in efficiency. In the accuracy test project (systematic review on an infectious disease; six reviewers), 4% of articles (38/1,009) were identified as having an incorrect PICOS summary.

CONCLUSIONS: In limited beta testing, AI-generated PICOS summaries demonstrate great potential for improving efficiency in the abstract review stage of literature reviews without compromising accuracy. Additional research is planned to test this technology in other literature review types and using alternative methodologies.

Code

MSR107

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

ISPOR Europe 2024

17 - 20 November