LEVERAGING GENERATIVE AI AND MACHINE LEARNING FOR ROBUST REAL-WORLD COHORT DEFINITION AND PATIENT JOURNEY INSIGHTS IN HIV RESEARCH
Author(s)
Ching Yi Chuo, PhD1, Shikai Jin, PhD1, Jordan Guillot, PhD1, Jason A. Rivera, MS2, Mona Fathollahi, PhD3, Luke Liu, PhD1, Andrea Marongiu, PhD4, Travis Lim, PhD5, Rajiv Arora, Bachelor of Technology1, Bryce Chong, Bs1, SAEID SHAHRAZ, PhD6, Mary J. J. Christoph, PhD1, Gary Leung, PhD1, Li Tao, PhD1, Wenyi Wang, PhD7, Alex Asiimwe, PhD1;
1Gilead Sciences, Foster City, CA, USA, 2University of California San Francisco, San Francisco, CA, USA, 3Amazon Web Services, Palo Alto, CA, USA, 4Gilead Sciences, Uxbridge, United Kingdom, 5Gilead Sciences, Washington, DC, USA, 6Gilead Sciences, Mountain View, CA, USA, 7Gilead Sciences, Parsippany, NJ, USA
1Gilead Sciences, Foster City, CA, USA, 2University of California San Francisco, San Francisco, CA, USA, 3Amazon Web Services, Palo Alto, CA, USA, 4Gilead Sciences, Uxbridge, United Kingdom, 5Gilead Sciences, Washington, DC, USA, 6Gilead Sciences, Mountain View, CA, USA, 7Gilead Sciences, Parsippany, NJ, USA
OBJECTIVES: When conducting real-world evidence (RWE) studies, researchers often rely on existing literature and clinical guidelines to define patient cohorts. While foundational, this approach is time-consuming and may constrain definitions toward guideline-driven assumptions, oversimplifying patient journeys and missing real-world treatment patterns. We developed an interactive AI-driven framework to streamline cohort definition, uncover treatment patterns, and enable scalable trial emulation. By leveraging graph neural networks and clustering algorithms, we aimed to accelerate insight generation for actionable interventions.
METHODS: HealthVerity claims and lab data (2022-2025) on 22,431 adults with HIV receiving antiretroviral therapy (ART) and ≥18 months of post-ART follow-up were analyzed. We built graphs representing treatments as time-linked nodes and applied a graph encoder to model each patient’s ART journey. Agglomerative clustering was performed on curated ART “lines of therapy” data (e.g., regimen complexity and timelines) to identify patterns. Outputs were reviewed with subject-matter experts to ensure interpretability, then packaged as a Model Context Protocol (MCP) and integrated into a Claude-based chat interface for interactive cohort definition.
RESULTS: The graph encoder predicted the random masked treatments with over 95% accuracy, confirming its ability to model individual sequences. Clustering identified three distinct ART groups: a large group on standard integrase strand transfer inhibitor (INSTI) plus nucleoside reverse transcriptase inhibitor (NRTI) regimens (mostly treatment-naïve or stable); a second group on protease inhibitor (PI)-heavy regimens (treatment-experienced); and a small group on novel multi-drug regimens (heavily treatment-experienced). Internally, the prototype was used experimentally to reflect on cohort and pattern definitions using epidemiologic standards.
CONCLUSIONS: Integrating generative AI and deep learning into RWE studies improved efficiency in defining patient cohorts and generating insights while preserving epidemiologic rigor through validation, sensitivity analyses, and expert-reviewed guideline mapping. The MCP interface enables interactive discovery of treatment patterns. Future work will evaluate this framework’s performance and generalizability across therapeutic areas and clinical trials.
METHODS: HealthVerity claims and lab data (2022-2025) on 22,431 adults with HIV receiving antiretroviral therapy (ART) and ≥18 months of post-ART follow-up were analyzed. We built graphs representing treatments as time-linked nodes and applied a graph encoder to model each patient’s ART journey. Agglomerative clustering was performed on curated ART “lines of therapy” data (e.g., regimen complexity and timelines) to identify patterns. Outputs were reviewed with subject-matter experts to ensure interpretability, then packaged as a Model Context Protocol (MCP) and integrated into a Claude-based chat interface for interactive cohort definition.
RESULTS: The graph encoder predicted the random masked treatments with over 95% accuracy, confirming its ability to model individual sequences. Clustering identified three distinct ART groups: a large group on standard integrase strand transfer inhibitor (INSTI) plus nucleoside reverse transcriptase inhibitor (NRTI) regimens (mostly treatment-naïve or stable); a second group on protease inhibitor (PI)-heavy regimens (treatment-experienced); and a small group on novel multi-drug regimens (heavily treatment-experienced). Internally, the prototype was used experimentally to reflect on cohort and pattern definitions using epidemiologic standards.
CONCLUSIONS: Integrating generative AI and deep learning into RWE studies improved efficiency in defining patient cohorts and generating insights while preserving epidemiologic rigor through validation, sensitivity analyses, and expert-reviewed guideline mapping. The MCP interface enables interactive discovery of treatment patterns. Future work will evaluate this framework’s performance and generalizability across therapeutic areas and clinical trials.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR104
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
SDC: Infectious Disease (non-vaccine)