EVALUATING EARLY TREATMENT INITIATION REGIMES IN TREATMENT-RESISTANT DEPRESSION: A REINFORCEMENT LEARNING-BASED OFF-POLICY EVALUATION APPROACH
Author(s)
You Wang, BS.
Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, USA.
Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, USA.
Presentation Documents
OBJECTIVES: This research aims to develop, pilot, and empirically evaluate a reinforcement learning (RL)-based off-policy evaluation framework for estimating the comparative effectiveness of early treatment initiation regimes among individuals with treatment-resistant depression (TRD) on suicide attempt or intentional self-harm, in observational data.
METHODS: We analyzed Kythera Labs U.S. administrative claims-based longitudinal data comprising 1,331,775 patients with TRD and 13,476,105 monthly transitions over 12 months post TRD index, with suicide-related events modeled as an absorbing outcome. A double deep Q-network with constrained joint action support was trained to estimate state-action values across factored antidepressant therapy, psychotherapy, esketamine, and neuromodulation status. The policy estimate was defined as the 12-month risk under alternative regimes specified by initiation versus non-initiation of treatment classes within the first 3 months. Policy evaluation was conducted using weighted doubly robust off-policy evaluation, with importance weighting restricted to the early initiation window to mitigate weight explosion. Episode-level bootstrapping was used to construct risk differences (RD) and confidence intervals (CI).
RESULTS: 18 joint treatment combinations were supported by the data. Early psychotherapy initiation compared with no psychotherapy was associated with an estimated 12-month risk difference of 0.36 (95% CI: 0.09-0.39). Early neuromodulation showed smaller and statistically non-significant contrasts (RD −0.08; 95% CI: −0.12 to 0.20). Regimes involving rarely observed treatments, such as early esketamine initiation, produced unstable or extreme estimates. Exploratory enumeration of supported early-initiation regimes showed that lowest estimated risks clustered around common real-world care patterns like continuous antidepressant care and psychotherapy.
CONCLUSIONS: This study demonstrates a practical RL-based workflow for comparative evaluation of early treatment initiation regimes using observational data. The proposed framework offers a methodological foundation for future comparative effectiveness research on dynamic mental health treatment strategies, while highlighting key challenges related to decisions supported by the real-world treatment patterns and rare outcomes.
METHODS: We analyzed Kythera Labs U.S. administrative claims-based longitudinal data comprising 1,331,775 patients with TRD and 13,476,105 monthly transitions over 12 months post TRD index, with suicide-related events modeled as an absorbing outcome. A double deep Q-network with constrained joint action support was trained to estimate state-action values across factored antidepressant therapy, psychotherapy, esketamine, and neuromodulation status. The policy estimate was defined as the 12-month risk under alternative regimes specified by initiation versus non-initiation of treatment classes within the first 3 months. Policy evaluation was conducted using weighted doubly robust off-policy evaluation, with importance weighting restricted to the early initiation window to mitigate weight explosion. Episode-level bootstrapping was used to construct risk differences (RD) and confidence intervals (CI).
RESULTS: 18 joint treatment combinations were supported by the data. Early psychotherapy initiation compared with no psychotherapy was associated with an estimated 12-month risk difference of 0.36 (95% CI: 0.09-0.39). Early neuromodulation showed smaller and statistically non-significant contrasts (RD −0.08; 95% CI: −0.12 to 0.20). Regimes involving rarely observed treatments, such as early esketamine initiation, produced unstable or extreme estimates. Exploratory enumeration of supported early-initiation regimes showed that lowest estimated risks clustered around common real-world care patterns like continuous antidepressant care and psychotherapy.
CONCLUSIONS: This study demonstrates a practical RL-based workflow for comparative evaluation of early treatment initiation regimes using observational data. The proposed framework offers a methodological foundation for future comparative effectiveness research on dynamic mental health treatment strategies, while highlighting key challenges related to decisions supported by the real-world treatment patterns and rare outcomes.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR55
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
STA: Multiple/Other Specialized Treatments, STA: Personalized & Precision Medicine