AI Powered SQL for Real-World Data: From Questions to Insights
Author(s)
Vasileios Kontonis, MSc, DIMITRA BOKOU, MSc, Nikolaos Kountouris, MSc.
Pfizer, Athens, Greece.
Pfizer, Athens, Greece.
OBJECTIVES: Accessing and analyzing Real-World Data (RWD) is essential for informed decision-making in healthcare. However, many professionals who rely on such data—such as physicians—often lack SQL expertise, making database queries a significant barrier. This project aims to facilitate RWD exploration for users without technical backgrounds by leveraging AI to translate natural language questions into SQL queries.
METHODS: We developed an AI-powered system that enables users to interact with databases using natural language input, minimizing the need for manual SQL writing. At the core of this system is a Large Language Model (LLM) from openAI, fine-tuned specifically for SQL generation. The model is trained to interpret domain-specific queries and produce accurate, executable SQL code. Users input questions in plain language—in any language—and the system executes the query on a synthetic RWD (Synthea) database. Results are displayed through an intuitive user interface, including the SQL query, tabular output, and optional visualizations, with the ability to review the generated code for validation.
RESULTS: The system allows non-technical users to explore data, identify patterns, and perform basic feasibility assessments without needing programming knowledge. It also supports technically skilled users by enabling quick generation and testing of simple queries, improving efficiency. Rather than replacing data professionals, the tool complements their work by simplifying routine or exploratory tasks.
CONCLUSIONS: By streamlining access to complex health data, the system enhances productivity and supports early-stage insight generation. This work demonstrates how AI can responsibly broaden data access, enabling faster, more informed decision-making across a range of technical backgrounds.
METHODS: We developed an AI-powered system that enables users to interact with databases using natural language input, minimizing the need for manual SQL writing. At the core of this system is a Large Language Model (LLM) from openAI, fine-tuned specifically for SQL generation. The model is trained to interpret domain-specific queries and produce accurate, executable SQL code. Users input questions in plain language—in any language—and the system executes the query on a synthetic RWD (Synthea) database. Results are displayed through an intuitive user interface, including the SQL query, tabular output, and optional visualizations, with the ability to review the generated code for validation.
RESULTS: The system allows non-technical users to explore data, identify patterns, and perform basic feasibility assessments without needing programming knowledge. It also supports technically skilled users by enabling quick generation and testing of simple queries, improving efficiency. Rather than replacing data professionals, the tool complements their work by simplifying routine or exploratory tasks.
CONCLUSIONS: By streamlining access to complex health data, the system enhances productivity and supports early-stage insight generation. This work demonstrates how AI can responsibly broaden data access, enabling faster, more informed decision-making across a range of technical backgrounds.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
MSR20
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas