AUTOMATED STUDY FEASIBILITY USING AGENTIC ARTIFICIAL INTELLIGENCE TO RAPIDLY IDENTIFY FIT-FOR-PURPOSE SECONDARY DATA
Author(s)
Nicola Sawalhi-Leckenby, MSc1, Sophie E. Graham, PhD1, Dimitra Lambrelli, MASc, MSc, PhD1, Mireia Raluy Callado, MSc2, Ashwin Kumar Rai, MS3, Mark Yates, BSc, PhD, MD1;
1Thermo Fisher Scientific, London, United Kingdom, 2Thermo Fisher Scientific, Stockholm, Sweden, 3Thermo Fisher Scientific, Overland Park, KS, USA
1Thermo Fisher Scientific, London, United Kingdom, 2Thermo Fisher Scientific, Stockholm, Sweden, 3Thermo Fisher Scientific, Overland Park, KS, USA
OBJECTIVES: Identifying appropriate secondary data sources for real-world evidence generation studies is increasingly challenging as research questions demand granular clinical, temporal, and biomarker data. ISPOR task force guidance emphasizes transparent, reproducible evaluation of data source suitability; however, feasibility assessments often rely on manual processes and fragmented institutional knowledge. Scalable approaches that operationalize these principles across diverse data sources remain limited.
METHODS: A metadata-driven feasibility framework was developed, combining a standardized data catalogue with an AI agent-based decision-support interface. Data source catalogue development followed a reproducible workflow using a unified Data Element Grid (DEG) template, harmonized from internal knowledge and EMA catalogue metadata. For initial development, 100 claims and EHR databases were prioritized based on current use. DEGs were curated by subject-matter experts using a structured extraction process, with independent review by a second knowledgeable reviewer. The AI agent-based interface is being designed to interpret study requirements, align them with catalogue metadata, and generate standardized, transparent recommendations for data source suitability. The interface front end will allow non expert users to enter study requirements in free text form, which are then cross checked against catalogued metadata to present data recommendations to the user.
RESULTS: The framework includes information on 100 databases, with consistent terminology and structure to facilitate access and interpretation by a broad spectrum of users. Pilot evaluations demonstrate that the metadata feasibility framework will offer substantial efficiency gains, reduced manual searching, improved repeatability, and fewer errors compared with traditional approaches.
CONCLUSIONS: Combining a harmonized metadata foundation with an AI driven decision-support interface can operationalize ISPOR principles for data source selection at scale and within shorter timeframes than manual assessments. This approach has the potential to support greater transparency, reproducibility, and confidence in selecting complex real-world data sources. Formal impact evaluations are ongoing to quantify time savings and output quality.
METHODS: A metadata-driven feasibility framework was developed, combining a standardized data catalogue with an AI agent-based decision-support interface. Data source catalogue development followed a reproducible workflow using a unified Data Element Grid (DEG) template, harmonized from internal knowledge and EMA catalogue metadata. For initial development, 100 claims and EHR databases were prioritized based on current use. DEGs were curated by subject-matter experts using a structured extraction process, with independent review by a second knowledgeable reviewer. The AI agent-based interface is being designed to interpret study requirements, align them with catalogue metadata, and generate standardized, transparent recommendations for data source suitability. The interface front end will allow non expert users to enter study requirements in free text form, which are then cross checked against catalogued metadata to present data recommendations to the user.
RESULTS: The framework includes information on 100 databases, with consistent terminology and structure to facilitate access and interpretation by a broad spectrum of users. Pilot evaluations demonstrate that the metadata feasibility framework will offer substantial efficiency gains, reduced manual searching, improved repeatability, and fewer errors compared with traditional approaches.
CONCLUSIONS: Combining a harmonized metadata foundation with an AI driven decision-support interface can operationalize ISPOR principles for data source selection at scale and within shorter timeframes than manual assessments. This approach has the potential to support greater transparency, reproducibility, and confidence in selecting complex real-world data sources. Formal impact evaluations are ongoing to quantify time savings and output quality.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
RWD35
Topic
Real World Data & Information Systems
Topic Subcategory
Reproducibility & Replicability
Disease
No Additional Disease & Conditions/Specialized Treatment Areas