EVALUATING EARLY TREATMENT INITIATION REGIMES IN TREATMENT-RESISTANT DEPRESSION: A REINFORCEMENT LEARNING-BASED OFF-POLICY EVALUATION APPROACH

Author(s)

You Wang, BS.
Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, USA.

Presentation Documents

ISPOR26_Wang_MSR55_POSTER.pdf

OBJECTIVES: This research aims to develop, pilot, and empirically evaluate a reinforcement learning (RL)-based off-policy evaluation framework for estimating the comparative effectiveness of early treatment initiation regimes among individuals with treatment-resistant depression (TRD) on suicide attempt or intentional self-harm, in observational data.
METHODS: We analyzed Kythera Labs U.S. administrative claims-based longitudinal data comprising 1,331,775 patients with TRD and 13,476,105 monthly transitions over 12 months post TRD index, with suicide-related events modeled as an absorbing outcome. A double deep Q-network with constrained joint action support was trained to estimate state-action values across factored antidepressant therapy, psychotherapy, esketamine, and neuromodulation status. The policy estimate was defined as the 12-month risk under alternative regimes specified by initiation versus non-initiation of treatment classes within the first 3 months. Policy evaluation was conducted using weighted doubly robust off-policy evaluation, with importance weighting restricted to the early initiation window to mitigate weight explosion. Episode-level bootstrapping was used to construct risk differences (RD) and confidence intervals (CI).
RESULTS: 18 joint treatment combinations were supported by the data. Early psychotherapy initiation compared with no psychotherapy was associated with an estimated 12-month risk difference of 0.36 (95% CI: 0.09-0.39). Early neuromodulation showed smaller and statistically non-significant contrasts (RD −0.08; 95% CI: −0.12 to 0.20). Regimes involving rarely observed treatments, such as early esketamine initiation, produced unstable or extreme estimates. Exploratory enumeration of supported early-initiation regimes showed that lowest estimated risks clustered around common real-world care patterns like continuous antidepressant care and psychotherapy.
CONCLUSIONS: This study demonstrates a practical RL-based workflow for comparative evaluation of early treatment initiation regimes using observational data. The proposed framework offers a methodological foundation for future comparative effectiveness research on dynamic mental health treatment strategies, while highlighting key challenges related to decisions supported by the real-world treatment patterns and rare outcomes.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR55

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

STA: Multiple/Other Specialized Treatments, STA: Personalized & Precision Medicine

Presentation (CTI)