Developing and Testing AI-Generated PICOs Summaries to Aid in Literature Reviews
Speaker(s)
Rawal A1, Ashworth L2, Luedke H3, Tiwari S3, Thomas C4, Murton M4
1Costello Medical, Boston, MA, USA, 2Costello Medical, Manchester, UK, 3Costello Medical, London, UK, 4Costello Medical, Cambridge, UK
Presentation Documents
OBJECTIVES: Ever-increasing volumes of published literature make in-depth literature reviews increasingly resource-intensive. Artificial intelligence (AI)-based tools may improve efficiency. This project aimed to incorporate AI-generated PICOS summaries, consisting of a concise bullet point on each PICOS domain, into a bespoke web application for abstract screening. Preliminary testing on time savings and accuracy for abstract screening was conducted.
METHODS: Different generative AI models and parameters were tested and prompt engineering for the desired output was conducted, including development of a generic context prompt that could be applied to any abstract. A workflow was established to integrate the output into a bespoke web application that manages record screening in literature reviews. Changes in efficiency and accuracy due to using the PICOS summaries were measured separately in two test reviews. To measure efficiency, reviewers recorded their screening rate (abstracts/hour) with and without the PICOS summaries. To estimate accuracy, reviewers compared the generated PICOS summary against the abstract content.
RESULTS: The final selected model was gpt-3.5-turbo with a temperature setting of 0.2. The context prompt stated the requirement for a response to be returned in JSON format. In the efficiency test project (targeted review on respiratory disease databases; four reviewers), 1,522 abstracts were divided into two equal groups (with and without AI-generated PICOS summaries) and reviewed by a single individual. Abstract review rates were 60/hour without vs 90/hour with PICOS summaries, representing a 50% increase in efficiency. In the accuracy test project (systematic review on an infectious disease; six reviewers), 4% of articles (38/1,009) were identified as having an incorrect PICOS summary.
CONCLUSIONS: In limited beta testing, AI-generated PICOS summaries demonstrate great potential for improving efficiency in the abstract review stage of literature reviews without compromising accuracy. Additional research is planned to test this technology in other literature review types and using alternative methodologies.
Code
MSR107
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis
Disease
No Additional Disease & Conditions/Specialized Treatment Areas