From Evidence to Excel: Generative AI for Automated SLR Data Extractions
Author(s)
Barinder Singh, RPh, Mrinal Mayank, B. Tech, Ritesh Dubey, PharmD, Marjana Bharali, B.Tech, Rajdeep Kaur, PhD, Shubhram Pandey, MSc.
Pharmacoevidence Pvt. Ltd., Mohali, India.
Pharmacoevidence Pvt. Ltd., Mohali, India.
OBJECTIVES: Data extraction is one of the most time and resource intensive steps in evidence generation process. Leveraging Large Language Models (LLMs) can significantly streamline this process by reducing manual effort and improving efficiency. The study evaluated a generative AI powered tool developed to extract structured information from unstructured data sources (Regulatory submission dossiers, clinical study publications and guidelines) reimbursement submissions, and published) commonly used in HTA and HEOR research.
METHODS: The tool was developed using Python with AWS Bedrock for language model processing retrieval-augmented generation (RAG) for unstructured data and PostgreSQL for structured data storage. Data from 20 publicly available publications of randomized controlled trials (RCTs) on diabetes, focusing on efficacy and safety outcomes were uploaded in RAG. Custom extraction tables were defined by specifying field names (e.g., "Age", "Sample size"), data types (e.g., numerical, categorical), and extraction instructions (e.g., “extract mean age for all treatment groups”). Results were exported as Excel workbooks and validated by subject matter experts (SMEs) for completeness, clarity, and traceability of the extracted data.
RESULTS: Three separate extraction tables were produced, capturing study characteristics, patient demographics, intervention details, and clinical outcomes. SMEs verified that all data points related to study and patient characteristics were accurately extracted, with no omissions and complete traceability to the source documents. A minor issue was noted in the clinical outcomes table, where the names of two secondary outcomes initially missing but were subsequently corrected manually. Overall, SMEs confirmed that the tool effectively extracted structured data, enabling users to download analysis ready Excel workbooks and reduce manual effort by approximately 70%.
CONCLUSIONS: The tool demonstrated the strong potential to significantly reduce manual effort and save time by flexibly extracting data into user-defined tables. Its capability to download analysis-ready Excel outputs, further enhances usability, supporting streamlined data processing across diverse use cases.
METHODS: The tool was developed using Python with AWS Bedrock for language model processing retrieval-augmented generation (RAG) for unstructured data and PostgreSQL for structured data storage. Data from 20 publicly available publications of randomized controlled trials (RCTs) on diabetes, focusing on efficacy and safety outcomes were uploaded in RAG. Custom extraction tables were defined by specifying field names (e.g., "Age", "Sample size"), data types (e.g., numerical, categorical), and extraction instructions (e.g., “extract mean age for all treatment groups”). Results were exported as Excel workbooks and validated by subject matter experts (SMEs) for completeness, clarity, and traceability of the extracted data.
RESULTS: Three separate extraction tables were produced, capturing study characteristics, patient demographics, intervention details, and clinical outcomes. SMEs verified that all data points related to study and patient characteristics were accurately extracted, with no omissions and complete traceability to the source documents. A minor issue was noted in the clinical outcomes table, where the names of two secondary outcomes initially missing but were subsequently corrected manually. Overall, SMEs confirmed that the tool effectively extracted structured data, enabling users to download analysis ready Excel workbooks and reduce manual effort by approximately 70%.
CONCLUSIONS: The tool demonstrated the strong potential to significantly reduce manual effort and save time by flexibly extracting data into user-defined tables. Its capability to download analysis-ready Excel outputs, further enhances usability, supporting streamlined data processing across diverse use cases.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
MSR112
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas