Bigger, Better Studies: Prioritizing Literature Review Using an Approach Based on Large Language Models

Author(s)

Polly Field, MSc, DPhil, Christian Eichinger, PhD, Marta Radwan, PhD, Kim Wager, BSc (Hons), MSc, DPHil.
Oxford PharmaGenesis, Oxford, United Kingdom.

OBJECTIVES: To develop an artificial intelligence (AI)-supported approach to rapidly and reproducibly review large bodies of literature, across multiple questions, adjusting sensitivity to prioritize the most robust and relevant publications (according to factors that cannot be specified in search strings) dialling sensitivity up or down according to data availability.
METHODS: We designed a sensitive, reproducible, Boolean search strategy and searched bibliographic databases via OVID, exporting results to Excel. AI and subject matter experts (SMEs) co-developed the prompt-engineering strategy for initial title/abstract screening (GPT-4o, followed by Python for data cleaning and processing, e.g. deduplication by DOI and parsing GPT-4o outputs into individual columns). Elicit was used for multiple data extraction rounds (limited datapoints), ahead of subsequent prompting rounds, giving a ranking of publications by question and data relevance. SMEs reviewed, increasing or relaxing selection criteria according to data availability. SMEs, with use of Elicit, extracted additional data and reported results. To maintain and verify quality, SMEs refined prompts based on the initial responses (checking vs human decision), and data extraction for prioritization was subject to spot checks of binary screening decisions or categories. We also compared machine run time with manual time.
RESULTS: A test project identified 23 997 publications (after deduplication) by OVID searching; initial screening reduced this number to 9532. Approximately 10 rounds of characterization led to 250 prioritized publications for SME review, with 120 reported in full. Screening required time for prompt setup, data processing, prompt iterations and SME review; this was <20% of the estimated time for the fully manual workflow. Total machine run time was approximately 50 hours.
CONCLUSIONS: This approach, supported by large language models, allows rapid reviews of large bodies of literature based on sensitive, reproducible OVID searches. The iterative approach to screening and data extraction allows SMEs to adjust sensitivity and prioritize the most relevant studies.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

SA17

Topic

Study Approaches

Topic Subcategory

Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)