INTEGRATING REAL-WORLD PHARMACOGENOMIC AND CLINICAL DATA USING MACHINE LEARNING TO IMPROVE DRUG RESPONSE PREDICTION: IMPLICATIONS FOR PRECISION MEDICINE DECISION MAKING
Author(s)
Jashuva T, PharmD1, Manoj Kumar Mudigubba, MPH, PharmD, PhD2.
1Student, Raghavendra Institute of Pharmaceutical Education & Research (RIPER), Anantapur, India, 2Pharmacy Practice, Raghavendra Institute of Pharmaceutical education and Research (RIPER), Anantapur, India.
1Student, Raghavendra Institute of Pharmaceutical Education & Research (RIPER), Anantapur, India, 2Pharmacy Practice, Raghavendra Institute of Pharmaceutical education and Research (RIPER), Anantapur, India.
OBJECTIVES: To evaluate the extent to which integrating real world genotype-phenotype data with curated Pharmacogenomics annotations in enhancing the performance and generalizability of machine learning models for predicting drug response across five therapeutically relevant drug classes, and to assess the implications for clinical decision support and evidence generation relevant to Health Technology Assessment. This is the first study to systematically integrate real-world pharmacogenomic phenotypes with curated annotations across multiple drug classes using machine learning.
METHODS: Genotype-phenotype pairs are derived from the eMERGE-PGx cohort, including linked sequencing and clinical outcome data, PharmGKB Level 1A/1B variant drug associations and ClinVar entries overlapping pharmacogenes. The study encompasses five drug classes and their key variants: Clopidogrel-CYP2C19, Warfarin-CYP2C9/VKORC1, Statins-SLCO1B1, Thiopurines-TPMT, and Fluoropyrimidines-DPYD. Drug response phenotypes are delineated using standardized eMERGE definitions, which include binary or ordinal metabolizer/response categories. The importance of features is assessed to ensure biological consistency. Models are developed employing ancestry-stratified 10-fold cross-validation and are compared against counterparts trained solely on curated pharmacogenetic annotations.
RESULTS: The highest predictive performance across drug classes is achieved by XGBoost (AUC 0.71-0.79), followed by RF (AUC 0.69-0.76). Integration of real-world phenotypes obtained from eMERGE with curated PGx knowledge resulted in a 7-10% improvement in AUC over curated models. Feature importance analysis aligns with established pharmacogenomic relationships that support clinical interpretability. Performance declined by 6-11% in South Asian and African ancestry groups when models were primarily trained on European ancestry data, highlighting equity-relevant limitations in current pharmacogenomic evidence.
CONCLUSIONS: Integrating real-world clinical outcomes with curated pharmacogenomic annotations significantly enhances ML-based drug response prediction while maintaining interpretability. These findings favour the development of PGx-informed clinical decision support tools and provide decision-relevant evidence for HTA, reimbursement, and guideline development. Improved prediction of drug response has the potential to reduce adverse drug events, trial-and-error prescribing, and downstream healthcare utilization, advancing equitable precision medicine implementation.
METHODS: Genotype-phenotype pairs are derived from the eMERGE-PGx cohort, including linked sequencing and clinical outcome data, PharmGKB Level 1A/1B variant drug associations and ClinVar entries overlapping pharmacogenes. The study encompasses five drug classes and their key variants: Clopidogrel-CYP2C19, Warfarin-CYP2C9/VKORC1, Statins-SLCO1B1, Thiopurines-TPMT, and Fluoropyrimidines-DPYD. Drug response phenotypes are delineated using standardized eMERGE definitions, which include binary or ordinal metabolizer/response categories. The importance of features is assessed to ensure biological consistency. Models are developed employing ancestry-stratified 10-fold cross-validation and are compared against counterparts trained solely on curated pharmacogenetic annotations.
RESULTS: The highest predictive performance across drug classes is achieved by XGBoost (AUC 0.71-0.79), followed by RF (AUC 0.69-0.76). Integration of real-world phenotypes obtained from eMERGE with curated PGx knowledge resulted in a 7-10% improvement in AUC over curated models. Feature importance analysis aligns with established pharmacogenomic relationships that support clinical interpretability. Performance declined by 6-11% in South Asian and African ancestry groups when models were primarily trained on European ancestry data, highlighting equity-relevant limitations in current pharmacogenomic evidence.
CONCLUSIONS: Integrating real-world clinical outcomes with curated pharmacogenomic annotations significantly enhances ML-based drug response prediction while maintaining interpretability. These findings favour the development of PGx-informed clinical decision support tools and provide decision-relevant evidence for HTA, reimbursement, and guideline development. Improved prediction of drug response has the potential to reduce adverse drug events, trial-and-error prescribing, and downstream healthcare utilization, advancing equitable precision medicine implementation.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
RWD142
Topic
Real World Data & Information Systems
Topic Subcategory
Data Protection, Integrity, & Quality Assurance
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, STA: Personalized & Precision Medicine