VALIDATION OF AN AI-ASSISTED EVIDENCE SYNTHESIS IN HEOR
Author(s)
Cyrus Nouroozi, .1, Amir Saeidmehr, MSc2, Bright Huo, BSc3, Josh L. Howard, PhD4;
1The Synthesis Company of California Ltd., CEO, San Francisco, CA, USA, 2The Synthesis Company of California, Ltd., San Francisco, CA, USA, 3McMaster University, Department of Surgery, Hamilton, ON, Canada, 4Monash University, Melbourne, Australia
1The Synthesis Company of California Ltd., CEO, San Francisco, CA, USA, 2The Synthesis Company of California, Ltd., San Francisco, CA, USA, 3McMaster University, Department of Surgery, Hamilton, ON, Canada, 4Monash University, Melbourne, Australia
OBJECTIVES: HEOR decisions increasingly require rapid, reproducible evidence synthesis, yet conventional systematic review workflows remain slow and resource-intensive. We develop an AI-assisted workflow and validate its ability to conduct screening and extraction functions while preserving traceability to source documents.
METHODS: We validated an AI-assisted screening and extraction pipeline across two systematic review datasets of RCTs spanning clinical psychology and medicine. In total, 2924 pieces of information were compared between human and AI procedures. The AI title and abstract screening was compared against gold standard dual-human processes, and evaluated using sensitivity, specificity, and F1 scores. AI automated extraction of PDF documents was evaluated against two different human-coded datasets to establish accuracy. Extraction included verbatim reproduction of study figures, basic calculation of effect sizes, categorization tasks, as well as inferential tasks including Risk of Bias assessment.
RESULTS: Across datasets, AI screening achieved 97.1% sensitivity and 98.1% specificity, outperforming human performance (89.3% & 97.6% respectively). AI screening was approximately 90% more time efficient. AI data extraction achieved a mean of 95.5% accuracy across datasets, exceeding human extraction accuracy (92.1%). The majority of errors occurred in effect size calculations and occasionally in Risk of Bias assessments, indicating areas for further refinement. The AI procedure was again significantly more time efficient.
CONCLUSIONS: AI-assisted evidence synthesis can deliver large reductions in time-to-dataset while maintaining high screening and extraction accuracy with auditable traceability. This capability supports faster systematic review cycles, more frequent evidence updates, and more scalable foundations for HEOR decision-making.
METHODS: We validated an AI-assisted screening and extraction pipeline across two systematic review datasets of RCTs spanning clinical psychology and medicine. In total, 2924 pieces of information were compared between human and AI procedures. The AI title and abstract screening was compared against gold standard dual-human processes, and evaluated using sensitivity, specificity, and F1 scores. AI automated extraction of PDF documents was evaluated against two different human-coded datasets to establish accuracy. Extraction included verbatim reproduction of study figures, basic calculation of effect sizes, categorization tasks, as well as inferential tasks including Risk of Bias assessment.
RESULTS: Across datasets, AI screening achieved 97.1% sensitivity and 98.1% specificity, outperforming human performance (89.3% & 97.6% respectively). AI screening was approximately 90% more time efficient. AI data extraction achieved a mean of 95.5% accuracy across datasets, exceeding human extraction accuracy (92.1%). The majority of errors occurred in effect size calculations and occasionally in Risk of Bias assessments, indicating areas for further refinement. The AI procedure was again significantly more time efficient.
CONCLUSIONS: AI-assisted evidence synthesis can deliver large reductions in time-to-dataset while maintaining high screening and extraction accuracy with auditable traceability. This capability supports faster systematic review cycles, more frequent evidence updates, and more scalable foundations for HEOR decision-making.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR181
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas