LEVERAGING GENERATIVE AI AND MACHINE LEARNING FOR ROBUST REAL-WORLD COHORT DEFINITION AND PATIENT JOURNEY INSIGHTS IN HIV RESEARCH

Author(s)

Ching Yi Chuo, PhD1, Shikai Jin, PhD1, Jordan Guillot, PhD1, Jason A. Rivera, MS2, Mona Fathollahi, PhD3, Luke Liu, PhD1, Andrea Marongiu, PhD4, Travis Lim, PhD5, Rajiv Arora, Bachelor of Technology1, Bryce Chong, Bs1, SAEID SHAHRAZ, PhD6, Mary J. J. Christoph, PhD1, Gary Leung, PhD1, Li Tao, PhD1, Wenyi Wang, PhD7, Alex Asiimwe, PhD1;
1Gilead Sciences, Foster City, CA, USA, 2University of California San Francisco, San Francisco, CA, USA, 3Amazon Web Services, Palo Alto, CA, USA, 4Gilead Sciences, Uxbridge, United Kingdom, 5Gilead Sciences, Washington, DC, USA, 6Gilead Sciences, Mountain View, CA, USA, 7Gilead Sciences, Parsippany, NJ, USA
OBJECTIVES: When conducting real-world evidence (RWE) studies, researchers often rely on existing literature and clinical guidelines to define patient cohorts. While foundational, this approach is time-consuming and may constrain definitions toward guideline-driven assumptions, oversimplifying patient journeys and missing real-world treatment patterns. We developed an interactive AI-driven framework to streamline cohort definition, uncover treatment patterns, and enable scalable trial emulation. By leveraging graph neural networks and clustering algorithms, we aimed to accelerate insight generation for actionable interventions.
METHODS: HealthVerity claims and lab data (2022-2025) on 22,431 adults with HIV receiving antiretroviral therapy (ART) and ≥18 months of post-ART follow-up were analyzed. We built graphs representing treatments as time-linked nodes and applied a graph encoder to model each patient’s ART journey. Agglomerative clustering was performed on curated ART “lines of therapy” data (e.g., regimen complexity and timelines) to identify patterns. Outputs were reviewed with subject-matter experts to ensure interpretability, then packaged as a Model Context Protocol (MCP) and integrated into a Claude-based chat interface for interactive cohort definition.
RESULTS: The graph encoder predicted the random masked treatments with over 95% accuracy, confirming its ability to model individual sequences. Clustering identified three distinct ART groups: a large group on standard integrase strand transfer inhibitor (INSTI) plus nucleoside reverse transcriptase inhibitor (NRTI) regimens (mostly treatment-naïve or stable); a second group on protease inhibitor (PI)-heavy regimens (treatment-experienced); and a small group on novel multi-drug regimens (heavily treatment-experienced). Internally, the prototype was used experimentally to reflect on cohort and pattern definitions using epidemiologic standards.
CONCLUSIONS: Integrating generative AI and deep learning into RWE studies improved efficiency in defining patient cohorts and generating insights while preserving epidemiologic rigor through validation, sensitivity analyses, and expert-reviewed guideline mapping. The MCP interface enables interactive discovery of treatment patterns. Future work will evaluate this framework’s performance and generalizability across therapeutic areas and clinical trials.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR104

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

SDC: Infectious Disease (non-vaccine)

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×