REAL-WORLD DATA LARGE LANGUAGE MODEL ASSISTIVE SQL CODING SYSTEM

Author(s)

Vladimir Turzhitsky, MS, PhD¹, Varun Kumar Nomula, MS¹, Yezhou Sun, MS¹, Tesfagabir Meharizghi, MS², Henry Wang, MS², Aude Genevay, PhD², Shinan Zhang, MS², Tim Shear, MS², Andy Mitchell, AA²;
¹Merck & Co. Inc, Rahway, NJ, USA, ²Amazon Web Services, Seattle, WA, USA

Presentation Documents

ISPOR26_Turzhitsky_ MSR148_POSTER.pdf

OBJECTIVES: To develop and evaluate a Large Language Model (LLM)-enabled text-to-SQL assistive programming system that accelerates and standardizes SQL generation for real-world data (RWD) analysis, and to characterize its methodological components and early performance on representative RWD tasks.
METHODS: We implemented a web-based assistant that integrates foundation LLMs with retrieval-augmented generation (RAG). The system embeds database-specific metadata (table structures, variable descriptions, DDL statements, example rows) and retrieves verified few-shot “Golden Examples” based on semantic similarity to the user prompt. Prompts combine user intent, metadata, and examples to produce SQL plus an explanation, which users can review, edit, and execute in the interface. Sessions preserve context for iterative refinement. The Golden Examples are stored in an Amazon OpenSearch Serverless vector database and surfaced via AWS-based retrieval. Access is governed through Merck’s Real-World Data Exchange (RWDEx) with single sign-on and role-based permissions. Preliminary performance was assessed on a 40-question benchmark derived from the DE-SynPUF Medicare claims dataset using Anthropic Claude Sonnet 3.5. Early production deployment includes multiple frequently-used commercial claims and EHR datasets, with ongoing collection of initial user case studies.
RESULTS: On the DE-SynPUF benchmark, first-attempt SQL generation accuracy was 82.5% and increased to 97.5% within two attempts. Accuracy by difficulty was: easy (N=5) 80% first attempt, 100% within two; medium (N=14) 86% first attempt, 100% within two; hard (N=21) 81% first attempt, 95% within two. Initial user case studies demonstrate feasible integration into RWD workflows for claims and EHR use cases; a crossover study to quantify efficiency gains (e.g., time-to-correct query) is planned.
CONCLUSIONS: An LLM-driven, RAG-enhanced text-to-SQL assistant can reliably generate executable SQL for RWD tasks and support iterative query refinement. Early results indicate high accuracy across diverse question types. Future work will expand benchmarking, characterize error modes, compare models, and quantify efficiency and usability in controlled studies.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR148

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)