AI ENABLED RWE CODE REFACTORING OF ANALYTICS PIPELINE

Author(s)

Madhur Garg, MS;
NTT DATA, Real-World Evidence, Market Access & HEOR, Plano, TX, USA
OBJECTIVES: Reproducible and validated code is essential for trustworthy RWE analyses. We aimed to develop an AI driven framework to convert legacy RWE scripts into standardized, maintainable pipelines. Objectives included establishing a clear modular structure, removing redundant code, enabling flexible parameterization, and improving performance on large healthcare datasets.
METHODS: We setup a GitHub based Model Context Protocol (MCP) server. The MCP server ensured AI agents used trusted library functions; and instructions including: what functions exist, what parameters they take, and how they work together. Refactoring proceeded in 4 stages: (1) Notebook Standardization: adding consistent headers and comments, eliminating redundant or duplicate code, and externalizing study parameters. (2) Library Integration: replacing custom logic with calls to validated HEOR library functions under MCP guidance. (3) Performance Tuning: rewriting inefficient queries (e.g. replacing self-joins with window functions), enabling batch processing, selective column loading, checkpoints, and memory optimization. (4) Validation: All the post refactored results were checked for validity in terms of cohort attrition numbers and outcomes analyses.
RESULTS: The AI framework refactored five RWE study repository codebase, producing uniformly structured notebooks. Subjective increase in readability by 2 independent RWE SMEs, indicating clearer organization. Optimized logic and batch processing cut runtime by >10% on average and reduced peak memory usage by >15%. Consolidating logic into shared configs and libraries shrank bespoke code by ~10- 20%, easing maintenance. These gains align with findings that AI based refactoring emphasizes maintainability and readability.
CONCLUSIONS: AI enabled refactoring with a domain aware MCP server can potentially improve RWE analytics workflows. The revamped codebases are more consistent, efficient, and reliable, enhancing scalability, traceability, and compliance. These improvements support regulatory-grade RWE generation, where auditability, validation and reproducibility are critical. Future work will extend the MCP knowledge base across complex analyses, study designs, therapeutic areas and data models.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

RWD138

Topic

Real World Data & Information Systems

Topic Subcategory

Health & Insurance Records Systems, Reproducibility & Replicability

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×