AUTOMATING EVIDENCE GAP ANALYSIS USING A MULTI-AGENT AI FRAMEWORK: A MASH CASE STUDY
Author(s)
Jag Chhatwal, PhD1, Sumeyye Samur, PhD2, Ismail F. Yildirim, MSc2, Mine Tekman, PhD2, Turgay Ayer, PhD2;
1Mass. General Hospital/Harvard Medical School, Boston, MA, USA, 2Value Analytics Labs, Boston, MA, USA
1Mass. General Hospital/Harvard Medical School, Boston, MA, USA, 2Value Analytics Labs, Boston, MA, USA
OBJECTIVES: Evidence gap analysis requires cross-functional teams to manually review, synthesize, and continuously update diverse evidence sources. Increasing complexity and compressed timelines highlight the need for scalable, systematic approaches. We evaluated if an AI-enabled approach can automate and standardize gap analysis, using metabolic dysfunction-associated steatohepatitis (MASH) as an illustrative use case.
METHODS: A generative-AI platform tailored for the life sciences industry, ValueGen.AI, was developed as a multi-agent deep research tool built in Python using LangGraph. Specifically, a central agent decomposed each query and coordinated specialized sub-agents that extracted evidence from multiple sources via Model Context Protocol (MCP) tools implemented using FastMCP, enabling parallel execution and relevance evaluation. The tool was applied to the MASH evidence landscape, synthesizing regulatory guidance, clinical trial data, HTA reports, real-world evidence, guidelines, and peer-reviewed literature. Evidence gaps were structured across payer- and HTA-relevant domains, including clinical endpoints, real-world effectiveness, diagnostic pathways, and economic modeling.
RESULTS: ValueGen.AI identified and categorized high-impact evidence gaps in MASH, including: (1) absence of validated links between accelerated-approval histology endpoints (MASH resolution without worsening of fibrosis) and hard outcomes (decompensation, HCC, transplantation, mortality); (2) biopsy-related feasibility and limitations (sampling error, interobserver variability); (3) incomplete validation and real-world performance data for non-invasive tests, with no single NIT fully qualified to replace biopsy; (4) limited durability evidence beyond 18-24 months and uncertainty after discontinuation; (5) inconsistent PRO inclusion and heterogeneous utilities limiting QALY estimation; (6) constrained generalizability due to underrepresentation of key subgroups; and (7) high sensitivity of cost-effectiveness models to assumptions on fibrosis progression, adherence, duration, discontinuation/relapse, and cardiometabolic risk modification.
CONCLUSIONS: AI-enabled gap analysis can substantially reduce the time and organizational burden of evidence assessment while improving consistency and decision relevance. ValueGen.AI demonstrates the feasibility of scalable, continuously updated gap mapping applicable across therapeutic areas beyond MASH.
METHODS: A generative-AI platform tailored for the life sciences industry, ValueGen.AI, was developed as a multi-agent deep research tool built in Python using LangGraph. Specifically, a central agent decomposed each query and coordinated specialized sub-agents that extracted evidence from multiple sources via Model Context Protocol (MCP) tools implemented using FastMCP, enabling parallel execution and relevance evaluation. The tool was applied to the MASH evidence landscape, synthesizing regulatory guidance, clinical trial data, HTA reports, real-world evidence, guidelines, and peer-reviewed literature. Evidence gaps were structured across payer- and HTA-relevant domains, including clinical endpoints, real-world effectiveness, diagnostic pathways, and economic modeling.
RESULTS: ValueGen.AI identified and categorized high-impact evidence gaps in MASH, including: (1) absence of validated links between accelerated-approval histology endpoints (MASH resolution without worsening of fibrosis) and hard outcomes (decompensation, HCC, transplantation, mortality); (2) biopsy-related feasibility and limitations (sampling error, interobserver variability); (3) incomplete validation and real-world performance data for non-invasive tests, with no single NIT fully qualified to replace biopsy; (4) limited durability evidence beyond 18-24 months and uncertainty after discontinuation; (5) inconsistent PRO inclusion and heterogeneous utilities limiting QALY estimation; (6) constrained generalizability due to underrepresentation of key subgroups; and (7) high sensitivity of cost-effectiveness models to assumptions on fibrosis progression, adherence, duration, discontinuation/relapse, and cardiometabolic risk modification.
CONCLUSIONS: AI-enabled gap analysis can substantially reduce the time and organizational burden of evidence assessment while improving consistency and decision relevance. ValueGen.AI demonstrates the feasibility of scalable, continuously updated gap mapping applicable across therapeutic areas beyond MASH.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
HTA2
Topic
Health Technology Assessment
Topic Subcategory
Value Frameworks & Dossier Format
Disease
SDC: Gastrointestinal Disorders