An AI-Powered RAG Based Framework for Data Extractions in Systematic Literature Reviews

Author(s)

Rajdeep Kaur, PhD¹, Barinder Singh, RPh², Pankaj Rai, MS¹, Vedant Soni, BE¹, Sunil Kumar, M.Pharm¹, Mrinal Mayank, BE¹;
¹Pharmacoevidence, Mohali, India, ²Pharmacoevidence, London, United Kingdom

Presentation Documents

MSR-149_Data Extraction poster.pdf

OBJECTIVES: Data extraction in Systematic Literature Reviews (SLR) is important step to collect detailed information from the included studies. The aim of the study was to develop an AI automated RAG driven platform to streamline the extraction of relevant information from included studies in SLRs, reducing the time required for data extractions.
METHODS: Embase and MEDLINE databases were searched to identify cost-burden studies conducted in patients with Retinitis Pigmentosa (RP) published in the last 15-year timeframe (2009 to 2024). A dynamic Retrieval-Augmented Generation (RAG) pipeline was developed to standardize the content in the articles using Optical Character Recognition. Then the standardized content was divided into small chunks, and embeddings were stored in the vector database. A multi-agentic approach was used in this framework to extract the relevant information. Domain experts with at least 10 years of domain experience evaluated the extraction results and conducted cross verification against the data extraction grid to ensure the accuracy and consistency.
RESULTS: The SLR included a total six studies conducted across the United States (US) (n=2), Japan (n=2), Spain (n=1), and globally (US and Canada, n=1). The AI platform was used to extract the study characteristics, population characteristics, direct and indirect cost outcomes, and key findings from the included studies. Domain experts rated the AI-extracted outcomes 92% of the responses as “strongly agree”. However, in two instances (approximately 8% of the tested prompts), the AI introduced extra content, noise, or hallucinations, with one notable inaccuracy involving cross-referenced data from a linked study
CONCLUSIONS: The development of the AI-powered RAG based framework represented a significant advancement in automating extraction phase of the SLRs. Future work will focus on expanding the capabilities of the system to handle more complex extraction scenarios involving linked studies and data presented in graphs, tables, and figures.

Conference/Value in Health Info

2025-05, ISPOR 2025, Montréal, Quebec, CA

Value in Health, Volume 28, Issue S1

Code

MSR149

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)