Personalized Treatment Optimization Using Reinforcement Learning on Real-World Data (RWD)
Author(s)
Evangelos Vagianos, MSc, Angeliki Revelou, MSc, Nikolaos Kountouris, MSc.
R&D-RWD, Pfizer, Athens, Greece.
R&D-RWD, Pfizer, Athens, Greece.
OBJECTIVES: This study aimed to evaluate the effectiveness of a Reinforcement Learning (RL) model in optimizing treatment strategies for type 2 diabetes. The goal was to determine whether RL-based recommendations align with established clinical guidelines and to assess the model’s potential to support adaptive clinical decision-making using synthetic real-world data (RWD).
METHODS: A simulation environment was developed using synthetic patient-level data representing diverse characteristics and disease progression patterns in type 2 diabetes. The RL model (Advantage Actor-Critic, A2C) was trained to optimize treatment decisions over time, using clinical variables such as HbA1c, body mass index (BMI), and cardiovascular risk. The action space included common antidiabetic treatments (e.g., Metformin, SGLT-2 inhibitors, Insulin). Treatment sequences were evaluated using a cumulative reward system prioritizing glycemic control and long-term outcomes. A user interface was created to visualize patient trajectories and model recommendations.
RESULTS: The RL model’s recommendations were consistent with clinical guidelines: Metformin was most frequently recommended as first-line therapy, while insulin was suggested only in later stages when HbA1c remained above target. In scenarios with well-controlled diabetes (HbA1c <6.5%), the model occasionally recommended no pharmacologic treatment. The A2C algorithm outperformed alternative approaches in achieving sustained glycemic control and minimizing therapy changes.
CONCLUSIONS: The RL model demonstrated alignment with guideline-based care and showed potential to personalize treatment for type 2 diabetes. While developed using synthetic data, this approach offers a scalable foundation for clinical decision support systems trained on real-world data, with implications for improving patient outcomes and reducing unnecessary treatment costs.
METHODS: A simulation environment was developed using synthetic patient-level data representing diverse characteristics and disease progression patterns in type 2 diabetes. The RL model (Advantage Actor-Critic, A2C) was trained to optimize treatment decisions over time, using clinical variables such as HbA1c, body mass index (BMI), and cardiovascular risk. The action space included common antidiabetic treatments (e.g., Metformin, SGLT-2 inhibitors, Insulin). Treatment sequences were evaluated using a cumulative reward system prioritizing glycemic control and long-term outcomes. A user interface was created to visualize patient trajectories and model recommendations.
RESULTS: The RL model’s recommendations were consistent with clinical guidelines: Metformin was most frequently recommended as first-line therapy, while insulin was suggested only in later stages when HbA1c remained above target. In scenarios with well-controlled diabetes (HbA1c <6.5%), the model occasionally recommended no pharmacologic treatment. The A2C algorithm outperformed alternative approaches in achieving sustained glycemic control and minimizing therapy changes.
CONCLUSIONS: The RL model demonstrated alignment with guideline-based care and showed potential to personalize treatment for type 2 diabetes. While developed using synthetic data, this approach offers a scalable foundation for clinical decision support systems trained on real-world data, with implications for improving patient outcomes and reducing unnecessary treatment costs.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
MSR166
Topic
Methodological & Statistical Research, Patient-Centered Research, Real World Data & Information Systems
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
Diabetes/Endocrine/Metabolic Disorders (including obesity), No Additional Disease & Conditions/Specialized Treatment Areas