Beyond Black Boxes: Case Studies of Transparent, Validated LLM Workflows for Accelerating Global HTA Submissions and Decisions

Moderator

Beth Devine, MBA, MSc, PharmD, PhD, University of Washington, Seattle, WA, United States

Speakers

Bill Malcolm, MSc, Bristol Myers Squibb, Middlesex, United Kingdom; Tim Reason, MSc, Estima Scientific, London, United Kingdom; Lockwood Taylor, PhD, MPH, Flatiron Health, Austin, TX, United States

Validation of AI/ML, LLM-enabled automation and model outputs cannot be one-size-fits-all; instead, it must be tailored to each application's evidentiary role and downstream decision-making impact. For clinical data extraction, automation workflows, and health economic model parameterization, this requires transparent, auditable frameworks with clearly defined gold standards, robust sampling strategies, and quality assurance pipelines. For higher-order applications like network meta-analyses and physician insight synthesis, validation must address both the accuracy of extracted source data and the reliability of model-synthesized conclusions. Key issues include managing heterogeneity in model behavior across settings, ensuring privacy and clinical oversight at scale, and communicating uncertainty introduced by model choices. While scientific consensus on a validation framework may not have been achieved yet, we can review case studies of recent frameworks applied in practice. Strengthening these validation practices will be essential to unlock the efficiency gains of LLM-enabled evidence generation while maintaining scientific rigor and enabling equitable, evidence-based decision-making globally. Dr. Devine opens the session, providing an overview and setting the context (8 minutes). Dr. Taylor introduces the application of LLMs for extraction of unstructured EHR data and physician insights and the VALID framework for assessing accuracy and reliability for decision making, providing a case study in disease progression dates in multiple countries. Mr. Reason demonstrates how validated LLM-generated data can be integrated into AI workflows for network meta-analysis, highlighting implications for transparency, reproducibility and HTA use. Finally, Mr. Malcolm outlines how AI-enabled automation has reduced HTA submission burden, particularly in LMIC contexts, and how automated evidence assembly can therefore accelerate submissions while maintaining data quality standards (12 minutes). Dr. Devine will facilitate an engaging discussion with the audience involving audience participation (15 min).

Topic

Health Technology Assessment, Methodological & Statistical Research, Real World Data & Information Systems