Piloting a GenAI Agent for Structured Quality Appraisal of Network Meta-Analyses Using the ISPOR-AMCP-NPC Checklist

Author(s)

Sahil Sharma, M. Pharm1, Georgios Xydopoulos, PhD2, Larisa Gofman, PhD3, Nils Fischer, BSc, MPH4, Molebedi Segwagwe, BSc, MSc5.
1ZS Associates, Gurugram, India, 2ZS Associates, Cambridge, United Kingdom, 3ZS Associates, Princeton, NJ, USA, 4ZS Associates, Boston, NY, USA, 5ZS Associates, London, United Kingdom.
OBJECTIVES: As health technology assessments (HTAs) and EU Joint Clinical Assessments (JCAs) emphasize the need for transparent, credible evidence, structured quality evaluation of Indirect Treatment Comparisons (ITCs) and Network Meta-Analyses (NMAs) has become critical. We developed and piloted a Generative Artificial Intelligence (Gen AI) agent trained on the ISPOR-AMCP-NPC checklist to support consistent and scalable appraisal of NMA studies.
METHODS: The Gen AI agent was developed using prompt engineering, embedded with guardrails, and informed by detailed training on the checklist's structure and interpretative guidance. The agent was piloted on two published NMA studies in hormone receptor-positive (HR+)/HER2-negative advanced breast cancer (ABC). Independent assessments by an experienced human reviewer served as the gold standard. The agent’s responses were compared with human evaluations across six checklist domains using percent agreement and Cohen’s kappa.
RESULTS: The Gen AI agent demonstrated high percent agreement (range: 85-100%) across most checklist domains, particularly in population relevance, comparator appropriateness, and outcome alignment with human reviewer. Cohen’s kappa showed fair to moderate agreement, with lower concordance in subjective domains such as bias assessment and methodological credibility. Notably, the agent completed evaluations faster than manual review (>85% efficiency gain), indicating potential time savings for large-scale evidence assessments. These discrepancies also helped identify areas where human expertise remains critical and where further agent training can enhance accuracy.
CONCLUSIONS: This pilot demonstrates the feasibility of deploying a Gen AI agent to support quality assessment of NMAs using established checklists. While early results are promising in core domains, further optimization is underway to improve performance in complex or assumption-prone areas. The approach shows strong potential to enhance efficiency and consistency in HTA/JCA evidence review processes, ultimately supporting timely access to innovative therapies.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

MSR167

Topic

Health Technology Assessment, Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

Oncology

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×