Non-Systematic Literature Reviews: Can AI Enhance Current Methods?

Author(s)

Baisley W1, Perriello L2, Shoushi G2, Nguyen K3, Lahue B2
1Alkemi LLC, Georgetown, TX, USA, 2Alkemi LLC, Manchester Center, VT, USA, 3University of Illinois at Chicago College of Pharmacy, Chicago, IL, USA

OBJECTIVES: Targeted literature reviews (TLRs) involve non-systematic query of literature databases with results synthesis. The objective of this methods research was to evaluate artificial intelligence (AI) tools for conducting TLRs.

METHODS: Guided by an engineering framework, we studied process, efficiency, and quality for an abbreviated TLR method. Research steps were detailed, two rare disease prompts formulated, and outcomes pre-specified for Control and AI methods. Prompt#1 requested five 2018+ health economic sources; Prompt#2 requested three health economic sources with a 1-paragraph narrative. Three researchers entered each prompt into three AI tools (ChatGPT, Microsoft Bing, Google Bard). An independent researcher served as Control using the same TLR prompts to identify abstracts in PubMed. Researchers documented TLR steps and total time, and verified each source’s accuracy, topic-relevancy, and publication date (2018-2023 required for Prompt#1 only). The overall quality of each Prompt#2 response was ranked by a blinded independent researcher on topic-relevance, format, and readability (1=worst to 3=best, 9 total points).

RESULTS: Twenty TLR responses (n=18 AI, n=2 Control) were generated. Compared to Control, AI methods reduced total TLR research process steps, despite adding two new quality control steps. AI responses reduced total minutes (Prompt#1 6:22-7:30; Prompt#2 4:37-7:23) compared to Control (Prompt#1=9:55, Prompt#2=15:49). Of 45 requested sources for Prompt#1, AI tools accurately returned 18 (40%), with 17/18 (94%) deemed topic-relevant and 12/18 (71%) recently published. For Prompt#2, AI tools accurately returned 93% (25/27 requested sources) with 100% (25/25) topic-relevant. One AI tool could not identify sources after 2021. Control-sourced references were 100% (8/8) topic-relevant, recent. Blinded evaluation of Prompt#2 narrative found variable quality by AI tool (Range: 6.3-8.7).

CONCLUSIONS: Despite time-savings, TLRs executed with AI may generate unacceptable response quality while requiring careful prompt curation and additional source verification steps. Given the high quality of standard TLR methods and evolving research-specific AI platforms, further methods research is required.

Conference/Value in Health Info

2023-11, ISPOR Europe 2023, Copenhagen, Denmark

Value in Health, Volume 26, Issue 11, S2 (December 2023)

Code

MSR49

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas, Rare & Orphan Diseases

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×