Lessons From the NHS AI Awards: Experience of Evaluating AI Tools in Real-World Environments

Author(s)

Catriona Inverarity, PhD1, Benjamin Caswell-Midwinter, PhD1, Babak Jamshidi, PhD1, Emily Kwong, PhD2, Chloe Black, MEng2, Lauren Gatting, PhD3, Fowzia Ibrahim, PhD3, Anthony Tsang, MRes1, Minjie Gao, PhD4, Akashdeep Singh Chauhan, PhD1, Jo Waller, PhD5, Alison Griffiths, MSc6, Jon Hindmarsh, PhD4, Juan I. Baeza, PhD4, salma ayis, PhD7, Angela A. Kehagia, MD PhD1, Anna Barnes, PhD1.
1King's Technology Evaluation Centre (KiTEC), King's College London, London, United Kingdom, 2Clinical Engineering, Guy's and St Thomas' NHS Foundation Trust, London, United Kingdom, 3King's College London, London, United Kingdom, 4Department of Public Services Management & Organisation, King's College London, London, United Kingdom, 5Centre for Cancer Prevention, Screening and Early Diagnosis, Queen Mary University of London, London, United Kingdom, 6Research Economics, Solihull, United Kingdom, 7School of Population Health & Environmental Sciences, King's College London, London, United Kingdom.
OBJECTIVES: The NHS AI in Health and Care Awards programme supported the development and evaluation of AI technologies across a range of clinical contexts. We conducted independent evaluations of several AI tools funded through this programme. These varied in their purpose, clinical area, acceptability and suitability in the departments in which they were evaluated. Here we present key lessons from the evaluations, with a focus on methodological, practical, and system considerations.
METHODS: A mixed-methods systems engineering framework (iitoolkit.com) approach enabled structured and adaptive assessment of AI introduction across complex clinical pathways. This included pathway mapping, stakeholder interviews, quantitative data collection and health economic modelling, incorporating both direct and indirect system impacts. Qualitative research, involving semi-structured interviews with relevant staff and patients, was conducted in parallel to describe the impact of AI tools from different perspectives and identify factors not described in the quantitative studies.
RESULTS: Heterogeneity in digital maturity, data formats, and data completeness across NHS departments is a feature of real-world evaluation, complicating implementation and comparability. While many AI tools demonstrated strong standalone performance, their integration into local systems often faced operational friction, for example where tools were insufficiently sensitive to local context or generated unsustainable additional workload. Local pathway audit, workforce planning and review support suitability and success of implementation. The importance of auditing not only direct pathways but also adjacent, indirectly affected services emerged as a key theme.
CONCLUSIONS: Real-world, independent mixed-method evaluations are essential to understanding the contextual and system-level impacts of AI tools. Successful and sustainable adoption requires early stakeholder involvement, alignment with local needs, and infrastructure to support high-quality data use. Evaluations must go beyond headline accuracy metrics to assess ongoing feasibility, workflow fit, and impact on system dynamics. Without this, even technically sound AI tools risk limited utility or premature abandonment in practice.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

EE564

Topic

Economic Evaluation, Health Technology Assessment, Medical Technologies

Disease

Cardiovascular Disorders (including MI, Stroke, Circulatory), Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal), Oncology

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×