Developing and Testing AI-Generated PICOs Summaries to Aid in Literature Reviews

Speaker(s)

Rawal A1, Ashworth L2, Luedke H3, Tiwari S3, Thomas C4, Murton M4
1Costello Medical, Boston, MA, USA, 2Costello Medical, Manchester, UK, 3Costello Medical, London, UK, 4Costello Medical, Cambridge, UK

OBJECTIVES: Ever-increasing volumes of published literature make in-depth literature reviews increasingly resource-intensive. Artificial intelligence (AI)-based tools may improve efficiency. This project aimed to incorporate AI-generated PICOS summaries, consisting of a concise bullet point on each PICOS domain, into a bespoke web application for abstract screening. Preliminary testing on time savings and accuracy for abstract screening was conducted.

METHODS: Different generative AI models and parameters were tested and prompt engineering for the desired output was conducted, including development of a generic context prompt that could be applied to any abstract. A workflow was established to integrate the output into a bespoke web application that manages record screening in literature reviews. Changes in efficiency and accuracy due to using the PICOS summaries were measured separately in two test reviews. To measure efficiency, reviewers recorded their screening rate (abstracts/hour) with and without the PICOS summaries. To estimate accuracy, reviewers compared the generated PICOS summary against the abstract content.

RESULTS: The final selected model was gpt-3.5-turbo with a temperature setting of 0.2. The context prompt stated the requirement for a response to be returned in JSON format. In the efficiency test project (targeted review on respiratory disease databases; four reviewers), 1,522 abstracts were divided into two equal groups (with and without AI-generated PICOS summaries) and reviewed by a single individual. Abstract review rates were 60/hour without vs 90/hour with PICOS summaries, representing a 50% increase in efficiency. In the accuracy test project (systematic review on an infectious disease; six reviewers), 4% of articles (38/1,009) were identified as having an incorrect PICOS summary.

CONCLUSIONS: In limited beta testing, AI-generated PICOS summaries demonstrate great potential for improving efficiency in the abstract review stage of literature reviews without compromising accuracy. Additional research is planned to test this technology in other literature review types and using alternative methodologies.

Code

MSR107

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas