An Artificial Intelligence (AI)-Assisted Systematic Literature Review (SLR) of the Economic Burden in Metastatic Pancreatic Adenocarcinoma: A Proof-of-Concept Study
Author(s)
Carolina Casañas i Comabella, MSc1, Mansee Jajoo, MSc2, Jing Wang-Silvanto, PhD3, He Guo, MPP, MSc4, Allie Cichewicz, MSc5, Rishi Ohri, MSc2.
1Evidera, London, United Kingdom, 2Astellas Pharma Global Development, Inc., Chicago, IL, USA, 3Global HEOR, Astellas Pharma Ltd, Helsinki, Finland, 4Astellas Pharma Ltd, Harrison, NJ, USA, 5Evidera, Wilmington, NC, USA.
1Evidera, London, United Kingdom, 2Astellas Pharma Global Development, Inc., Chicago, IL, USA, 3Global HEOR, Astellas Pharma Ltd, Helsinki, Finland, 4Astellas Pharma Ltd, Harrison, NJ, USA, 5Evidera, Wilmington, NC, USA.
Presentation Documents
OBJECTIVES: To explore the potential efficiencies of leveraging AI to assist with the screening and extraction stages of an SLR and inform future use cases.
METHODS: A traditional SLR on the economic burden from metastatic pancreatic adenocarcinoma was replicated by deploying AI models by Nested Knowledge® where possible. Titles and abstracts were screened independently by one human and an AI machine-learning (ML) model that trained on human-reviewed records. Each full-text was screened and extracted independently by one human and a large-language model (LLM), where the LLM used prompts based on Population, Intervention, Outcome, Study design (PICOS) criteria.
RESULTS: After training on 25% of 1,307 titles and abstracts, the ML model achieved an accuracy, recall, and precision of 87%, 82%, and 40%, respectively, and included more records than humans (251 vs 85) for full-text screening. During full-text screening, the LLM included only 27.7% of records deemed eligible by the traditional SLR (17 vs 61). This was driven by the LLM being unable to identify publication/study type (81.4%) and disease stage (69.2%) information in the records. During extraction, mean accuracy was 72.93%, with 13.35% incorrect, 7.92% missing, and 5.28% incomplete data. AI saved 44% of hours compared to humans during title/abstract screening, but advanced 166 more records to full-text screening. LLM-based data extraction saved 41% of time.
CONCLUSIONS: AI saved time during title and abstract screening compared to humans; this was offset by more full-texts included by AI and studies being wrongly excluded at both screening levels. While AI saved time during data extraction, considerable rates of incorrect, missing, and incomplete information highlight the need for human supervision to ensure high-quality results. Based on this study, using Nested Knowledge®, AI may be best suited for rapid, targeted reviews or scoping activities, where speed is prioritized over perfect accuracy. Further improvements in AI-supported SLRs are needed.
METHODS: A traditional SLR on the economic burden from metastatic pancreatic adenocarcinoma was replicated by deploying AI models by Nested Knowledge® where possible. Titles and abstracts were screened independently by one human and an AI machine-learning (ML) model that trained on human-reviewed records. Each full-text was screened and extracted independently by one human and a large-language model (LLM), where the LLM used prompts based on Population, Intervention, Outcome, Study design (PICOS) criteria.
RESULTS: After training on 25% of 1,307 titles and abstracts, the ML model achieved an accuracy, recall, and precision of 87%, 82%, and 40%, respectively, and included more records than humans (251 vs 85) for full-text screening. During full-text screening, the LLM included only 27.7% of records deemed eligible by the traditional SLR (17 vs 61). This was driven by the LLM being unable to identify publication/study type (81.4%) and disease stage (69.2%) information in the records. During extraction, mean accuracy was 72.93%, with 13.35% incorrect, 7.92% missing, and 5.28% incomplete data. AI saved 44% of hours compared to humans during title/abstract screening, but advanced 166 more records to full-text screening. LLM-based data extraction saved 41% of time.
CONCLUSIONS: AI saved time during title and abstract screening compared to humans; this was offset by more full-texts included by AI and studies being wrongly excluded at both screening levels. While AI saved time during data extraction, considerable rates of incorrect, missing, and incomplete information highlight the need for human supervision to ensure high-quality results. Based on this study, using Nested Knowledge®, AI may be best suited for rapid, targeted reviews or scoping activities, where speed is prioritized over perfect accuracy. Further improvements in AI-supported SLRs are needed.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
PT3
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, SDC: Oncology