CHOOSING PUBLIC LLMS AND AI AGENTS FOR PATIENT OUTCOMES RESEARCH: PRACTICAL PROS/CONS AND GOVERNANCE IMPLICATIONS

Author(s)

Sherrine Eid, BS, MPH;
SAS Institute, Global Head, Epidemiology, RWE & Observational Research, Macungie, PA, USA

OBJECTIVES: LLMs and AI agents can materially enhance POR efficiency and insight generation when governed as statistical instruments rather than general automation tools. Model selection, deployment architecture, and governance controls should be treated as core methodological decisions, with explicit documentation of data lineage, model behavior, validation procedures, and human oversight. Responsible adoption of AI in POR requires aligning speed with statistical integrity, transparency, and regulatory readiness.
METHODS: We conducted a structured technical review and expert benchmarking of publicly available LLMs (OpenAI GPT-4 class models, Google Gemini via Vertex AI, Anthropic Claude, and open-weight models such as Meta Llama 3.1) and agent frameworks (e.g., AutoGen, Vertex AI Agent Builder, and retrieval-augmented generation [RAG] toolchains). Evaluation criteria included: (1) statistical reproducibility, (2) data governance and privacy controls, (3) transparency and auditability, (4) bias and drift management, and (5) fitness-for-purpose in regulated POR workflows. Assessments were aligned to established guidance including the NIST AI Risk Management Framework, TRIPOD-AI, CONSORT-AI, and FDA RWD/RWE considerations.
RESULTS: Closed, enterprise-grade LLMs demonstrated strong performance for NLP-driven phenotype extraction, protocol summarization, and exploratory analyses when deployed with RAG and human-in-the-loop validation. However, model opacity and vendor-driven version changes introduce reproducibility risks without formal versioning and output archiving. Open-weight models enabled greater statistical control, audit logging, and data-sovereignty compliance but required significant MLOps investment and validation rigor. Agentic workflows improved analytical throughput but increased risk of error propagation, emphasizing the need for tracing, constraint enforcement, and independent statistical review.
CONCLUSIONS: LLMs and AI agents can materially enhance POR efficiency when governed as statistical instruments rather than automation tools. Model choice, deployment architecture, and governance controls should be treated as core methodological decisions to ensure transparency, validity, and regulatory readiness.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR142

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Confounding, Selection Bias Correction, Causal Inference

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)