Non-Systematic Literature Reviews: Can AI Enhance Current Methods?
Author(s)
Baisley W1, Perriello L2, Shoushi G2, Nguyen K3, Lahue B2
1Alkemi LLC, Georgetown, TX, USA, 2Alkemi LLC, Manchester Center, VT, USA, 3University of Illinois at Chicago College of Pharmacy, Chicago, IL, USA
Presentation Documents
OBJECTIVES: Targeted literature reviews (TLRs) involve non-systematic query of literature databases with results synthesis. The objective of this methods research was to evaluate artificial intelligence (AI) tools for conducting TLRs.
METHODS: Guided by an engineering framework, we studied process, efficiency, and quality for an abbreviated TLR method. Research steps were detailed, two rare disease prompts formulated, and outcomes pre-specified for Control and AI methods. Prompt#1 requested five 2018+ health economic sources; Prompt#2 requested three health economic sources with a 1-paragraph narrative. Three researchers entered each prompt into three AI tools (ChatGPT, Microsoft Bing, Google Bard). An independent researcher served as Control using the same TLR prompts to identify abstracts in PubMed. Researchers documented TLR steps and total time, and verified each source’s accuracy, topic-relevancy, and publication date (2018-2023 required for Prompt#1 only). The overall quality of each Prompt#2 response was ranked by a blinded independent researcher on topic-relevance, format, and readability (1=worst to 3=best, 9 total points).
RESULTS: Twenty TLR responses (n=18 AI, n=2 Control) were generated. Compared to Control, AI methods reduced total TLR research process steps, despite adding two new quality control steps. AI responses reduced total minutes (Prompt#1 6:22-7:30; Prompt#2 4:37-7:23) compared to Control (Prompt#1=9:55, Prompt#2=15:49). Of 45 requested sources for Prompt#1, AI tools accurately returned 18 (40%), with 17/18 (94%) deemed topic-relevant and 12/18 (71%) recently published. For Prompt#2, AI tools accurately returned 93% (25/27 requested sources) with 100% (25/25) topic-relevant. One AI tool could not identify sources after 2021. Control-sourced references were 100% (8/8) topic-relevant, recent. Blinded evaluation of Prompt#2 narrative found variable quality by AI tool (Range: 6.3-8.7).
CONCLUSIONS: Despite time-savings, TLRs executed with AI may generate unacceptable response quality while requiring careful prompt curation and additional source verification steps. Given the high quality of standard TLR methods and evolving research-specific AI platforms, further methods research is required.
Conference/Value in Health Info
Value in Health, Volume 26, Issue 11, S2 (December 2023)
Code
MSR49
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, Rare & Orphan Diseases