AN AGENTIC AI FRAMEWORK INTEGRATING DATA-DRIVEN HYPOTHESIS GENERATION WITH TRADITIONAL VALIDATION FOR REAL-WORLD EVALUATION OF EMERGING THERAPIES
Author(s)
Peichang Shi, PhD.
Booz Allen Hamilton, Mclean, VA, USA.
Booz Allen Hamilton, Mclean, VA, USA.
OBJECTIVES: For emerging diseases and newly introduced therapies, predefined hypotheses regarding population-level effectiveness, safety risks, and long-term outcomes are often unavailable. Conventional hypothesis-driven analyses are therefore limited in early evidence generation. This study presents an agentic AI framework that integrates autonomous, data-driven hypothesis generation with traditional hypothesis-driven validation to support real-world evaluation under uncertainty.
METHODS: We developed an agentic AI workflow in which a transformer-based language model (e.g., BERT) autonomously performs iterative tasks, including exploratory pattern detection in real-world healthcare claims data, structured literature-informed reasoning to assess clinical plausibility, and prioritization of candidate hypotheses for formal testing. Human-in-the-loop oversight is incorporated to review and approve AI-generated hypotheses before validation. Final hypothesis testing is conducted using established epidemiological and causal inference methods. As a case study, we analyzed 2016 data from the Synthetic Healthcare Database for Research, comprising inpatient, outpatient, and pharmacy claims for over 2 million patients. A transformer-based model was used for propensity score matching to construct control cohorts with comparable baseline ICD and CPT code profiles to patients initiating apixaban. Difference-in-differences analyses were then applied to evaluate pre- and post-treatment changes in diagnosis codes, testing AI-prioritized effectiveness and safety hypotheses.
RESULTS: The agentic AI identified hypotheses consistent with known therapeutic benefits, including reduced stroke risk among patients with atrial fibrillation(ICD codes I25.5, I48.91) and ischemic cardiomyopathy(I25.5), as well as potential safety signals such as increased urinary tract bleeding and chest pain. These findings were confirmed through traditional statistical validation and aligned with published clinical evidence.
CONCLUSIONS: By combining agentic AI-driven hypothesis generation, literature-aware reasoning, and human-in-the-loop governance with conventional causal inference, this framework enables transparent and scalable real-world evaluation when prior knowledge is limited. The proposed methodology enhances early evidence generation for emerging diseases and new therapies while maintaining the rigor required for regulatory and health technology assessment decision-making.
METHODS: We developed an agentic AI workflow in which a transformer-based language model (e.g., BERT) autonomously performs iterative tasks, including exploratory pattern detection in real-world healthcare claims data, structured literature-informed reasoning to assess clinical plausibility, and prioritization of candidate hypotheses for formal testing. Human-in-the-loop oversight is incorporated to review and approve AI-generated hypotheses before validation. Final hypothesis testing is conducted using established epidemiological and causal inference methods. As a case study, we analyzed 2016 data from the Synthetic Healthcare Database for Research, comprising inpatient, outpatient, and pharmacy claims for over 2 million patients. A transformer-based model was used for propensity score matching to construct control cohorts with comparable baseline ICD and CPT code profiles to patients initiating apixaban. Difference-in-differences analyses were then applied to evaluate pre- and post-treatment changes in diagnosis codes, testing AI-prioritized effectiveness and safety hypotheses.
RESULTS: The agentic AI identified hypotheses consistent with known therapeutic benefits, including reduced stroke risk among patients with atrial fibrillation(ICD codes I25.5, I48.91) and ischemic cardiomyopathy(I25.5), as well as potential safety signals such as increased urinary tract bleeding and chest pain. These findings were confirmed through traditional statistical validation and aligned with published clinical evidence.
CONCLUSIONS: By combining agentic AI-driven hypothesis generation, literature-aware reasoning, and human-in-the-loop governance with conventional causal inference, this framework enables transparent and scalable real-world evaluation when prior knowledge is limited. The proposed methodology enhances early evidence generation for emerging diseases and new therapies while maintaining the rigor required for regulatory and health technology assessment decision-making.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR139
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas