Clinical Relevance of a Machine Learning Model for Automated Analyses of Depression Severity: The ePHQ-9 in Treatment-Resistant Depression

Author(s)

Pedro Alves, BS1, Carl D. Marci, MD1, Joseph W. Zabinski, PhD, MEM1, Michael Batech, DrPH, MPH2, Costas Boussios, PhD1.
1OM1, Inc., Boston, MA, USA, 2OM1, Inc., Frankfurt, Germany.
OBJECTIVES: The PHQ-9, a validated measure for depressive symptom severity, is inconsistently documented in real-world data (RWD). This limits RWD studies on diagnosis, treatment response, and the patient journey. A previous AI effort successfully estimated PHQ-9 scores (ePHQ-9) from clinical notes with strong analytic performance. This study assessed the association between observed PHQ-9 and ePHQ-9 scores and physician-attested treatment-resistant depression (TRD), a known phenomenon in the literature, to further validate the ePHQ-9’s utility.
METHODS: A large US real-world dataset, including claims and electronic medical records with clinical notes, was used to identify major depression patients. A subcohort with physician-attested TRD was labeled via text analysis. All patients had an observed PHQ-9 or ePHQ-9 score, or both, within 30 days of TRD attestation. The association between (e)PHQ-9 disease severity (with five categorical divisions) and TRD status was evaluated by quantifying the proportion of TRD patients in each severity category. To facilitate interpretation, subsamples with a fixed ratio (1:4) of TRD to non-TRD patients were evaluated.
RESULTS: The dataset included 77,871 patients (29,608 with observed PHQ-9, 61,794 with ePHQ-9). Physician-attested TRD was present in 1,927 (observed PHQ-9 group) and 1,424 (ePHQ-9 group). TRD proportion increased monotonically with severity for both measures. For observed PHQ-9, the TRD proportion in a fixed-ratio subsample ranged from 8.9% (none-minimal) to 36.5% (severe) (r=0.31). The ePHQ-9 showed a stronger relationship (r=0.42), with the TRD proportion ranging from 4.1% (none-minimal) to 52.7% (severe).
CONCLUSIONS: The AI-estimated PHQ-9 showed a stronger association with physician-attested TRD severity than observed PHQ-9 scores. This confirms ePHQ-9's utility as a depression severity measure. Unlike patient-reported observed PHQ-9, ePHQ-9 is derived from psychiatrists’ clinical narratives, providing consistency in the assessments of severity and treatment resistant status.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

RWD35

Topic

Clinical Outcomes, Epidemiology & Public Health, Real World Data & Information Systems

Disease

Mental Health (including addition), No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×