Bigger, Better Studies: Prioritizing Literature Review Using an Approach Based on Large Language Models
Author(s)
Polly Field, MSc, DPhil, Christian Eichinger, PhD, Marta Radwan, PhD, Kim Wager, BSc (Hons), MSc, DPHil.
Oxford PharmaGenesis, Oxford, United Kingdom.
Oxford PharmaGenesis, Oxford, United Kingdom.
OBJECTIVES: To develop an artificial intelligence (AI)-supported approach to rapidly and reproducibly review large bodies of literature, across multiple questions, adjusting sensitivity to prioritize the most robust and relevant publications (according to factors that cannot be specified in search strings) dialling sensitivity up or down according to data availability.
METHODS: We designed a sensitive, reproducible, Boolean search strategy and searched bibliographic databases via OVID, exporting results to Excel. AI and subject matter experts (SMEs) co-developed the prompt-engineering strategy for initial title/abstract screening (GPT-4o, followed by Python for data cleaning and processing, e.g. deduplication by DOI and parsing GPT-4o outputs into individual columns). Elicit was used for multiple data extraction rounds (limited datapoints), ahead of subsequent prompting rounds, giving a ranking of publications by question and data relevance. SMEs reviewed, increasing or relaxing selection criteria according to data availability. SMEs, with use of Elicit, extracted additional data and reported results. To maintain and verify quality, SMEs refined prompts based on the initial responses (checking vs human decision), and data extraction for prioritization was subject to spot checks of binary screening decisions or categories. We also compared machine run time with manual time.
RESULTS: A test project identified 23 997 publications (after deduplication) by OVID searching; initial screening reduced this number to 9532. Approximately 10 rounds of characterization led to 250 prioritized publications for SME review, with 120 reported in full. Screening required time for prompt setup, data processing, prompt iterations and SME review; this was <20% of the estimated time for the fully manual workflow. Total machine run time was approximately 50 hours.
CONCLUSIONS: This approach, supported by large language models, allows rapid reviews of large bodies of literature based on sensitive, reproducible OVID searches. The iterative approach to screening and data extraction allows SMEs to adjust sensitivity and prioritize the most relevant studies.
METHODS: We designed a sensitive, reproducible, Boolean search strategy and searched bibliographic databases via OVID, exporting results to Excel. AI and subject matter experts (SMEs) co-developed the prompt-engineering strategy for initial title/abstract screening (GPT-4o, followed by Python for data cleaning and processing, e.g. deduplication by DOI and parsing GPT-4o outputs into individual columns). Elicit was used for multiple data extraction rounds (limited datapoints), ahead of subsequent prompting rounds, giving a ranking of publications by question and data relevance. SMEs reviewed, increasing or relaxing selection criteria according to data availability. SMEs, with use of Elicit, extracted additional data and reported results. To maintain and verify quality, SMEs refined prompts based on the initial responses (checking vs human decision), and data extraction for prioritization was subject to spot checks of binary screening decisions or categories. We also compared machine run time with manual time.
RESULTS: A test project identified 23 997 publications (after deduplication) by OVID searching; initial screening reduced this number to 9532. Approximately 10 rounds of characterization led to 250 prioritized publications for SME review, with 120 reported in full. Screening required time for prompt setup, data processing, prompt iterations and SME review; this was <20% of the estimated time for the fully manual workflow. Total machine run time was approximately 50 hours.
CONCLUSIONS: This approach, supported by large language models, allows rapid reviews of large bodies of literature based on sensitive, reproducible OVID searches. The iterative approach to screening and data extraction allows SMEs to adjust sensitivity and prioritize the most relevant studies.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
SA17
Topic
Study Approaches
Topic Subcategory
Literature Review & Synthesis
Disease
No Additional Disease & Conditions/Specialized Treatment Areas