AUTOMATING EVIDENCE GAP ANALYSIS USING A MULTI-AGENT AI FRAMEWORK: A MASH CASE STUDY

Author(s)

Jag Chhatwal, PhD¹, Sumeyye Samur, PhD², Ismail F. Yildirim, MSc², Mine Tekman, PhD², Turgay Ayer, PhD²;
¹Mass. General Hospital/Harvard Medical School, Boston, MA, USA, ²Value Analytics Labs, Boston, MA, USA

OBJECTIVES: Evidence gap analysis requires cross-functional teams to manually review, synthesize, and continuously update diverse evidence sources. Increasing complexity and compressed timelines highlight the need for scalable, systematic approaches. We evaluated if an AI-enabled approach can automate and standardize gap analysis, using metabolic dysfunction-associated steatohepatitis (MASH) as an illustrative use case.
METHODS: A generative-AI platform tailored for the life sciences industry, ValueGen.AI, was developed as a multi-agent deep research tool built in Python using LangGraph. Specifically, a central agent decomposed each query and coordinated specialized sub-agents that extracted evidence from multiple sources via Model Context Protocol (MCP) tools implemented using FastMCP, enabling parallel execution and relevance evaluation. The tool was applied to the MASH evidence landscape, synthesizing regulatory guidance, clinical trial data, HTA reports, real-world evidence, guidelines, and peer-reviewed literature. Evidence gaps were structured across payer- and HTA-relevant domains, including clinical endpoints, real-world effectiveness, diagnostic pathways, and economic modeling.
RESULTS: ValueGen.AI identified and categorized high-impact evidence gaps in MASH, including: (1) absence of validated links between accelerated-approval histology endpoints (MASH resolution without worsening of fibrosis) and hard outcomes (decompensation, HCC, transplantation, mortality); (2) biopsy-related feasibility and limitations (sampling error, interobserver variability); (3) incomplete validation and real-world performance data for non-invasive tests, with no single NIT fully qualified to replace biopsy; (4) limited durability evidence beyond 18-24 months and uncertainty after discontinuation; (5) inconsistent PRO inclusion and heterogeneous utilities limiting QALY estimation; (6) constrained generalizability due to underrepresentation of key subgroups; and (7) high sensitivity of cost-effectiveness models to assumptions on fibrosis progression, adherence, duration, discontinuation/relapse, and cardiometabolic risk modification.
CONCLUSIONS: AI-enabled gap analysis can substantially reduce the time and organizational burden of evidence assessment while improving consistency and decision relevance. ValueGen.AI demonstrates the feasibility of scalable, continuously updated gap mapping applicable across therapeutic areas beyond MASH.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

HTA2

Topic

Health Technology Assessment

Topic Subcategory

Value Frameworks & Dossier Format

Disease

SDC: Gastrointestinal Disorders

Presentation (CTI)