Clinical Relevance of a Machine Learning Model for Automated Analyses of Depression Severity: The ePHQ-9 in Treatment-Resistant Depression

Author(s)

Pedro Alves, BS¹, Carl D. Marci, MD¹, Joseph W. Zabinski, PhD, MEM¹, Michael Batech, DrPH, MPH², Costas Boussios, PhD¹.
¹OM1, Inc., Boston, MA, USA, ²OM1, Inc., Frankfurt, Germany.

OBJECTIVES: The PHQ-9, a validated measure for depressive symptom severity, is inconsistently documented in real-world data (RWD). This limits RWD studies on diagnosis, treatment response, and the patient journey. A previous AI effort successfully estimated PHQ-9 scores (ePHQ-9) from clinical notes with strong analytic performance. This study assessed the association between observed PHQ-9 and ePHQ-9 scores and physician-attested treatment-resistant depression (TRD), a known phenomenon in the literature, to further validate the ePHQ-9’s utility.
METHODS: A large US real-world dataset, including claims and electronic medical records with clinical notes, was used to identify major depression patients. A subcohort with physician-attested TRD was labeled via text analysis. All patients had an observed PHQ-9 or ePHQ-9 score, or both, within 30 days of TRD attestation. The association between (e)PHQ-9 disease severity (with five categorical divisions) and TRD status was evaluated by quantifying the proportion of TRD patients in each severity category. To facilitate interpretation, subsamples with a fixed ratio (1:4) of TRD to non-TRD patients were evaluated.
RESULTS: The dataset included 77,871 patients (29,608 with observed PHQ-9, 61,794 with ePHQ-9). Physician-attested TRD was present in 1,927 (observed PHQ-9 group) and 1,424 (ePHQ-9 group). TRD proportion increased monotonically with severity for both measures. For observed PHQ-9, the TRD proportion in a fixed-ratio subsample ranged from 8.9% (none-minimal) to 36.5% (severe) (r=0.31). The ePHQ-9 showed a stronger relationship (r=0.42), with the TRD proportion ranging from 4.1% (none-minimal) to 52.7% (severe).
CONCLUSIONS: The AI-estimated PHQ-9 showed a stronger association with physician-attested TRD severity than observed PHQ-9 scores. This confirms ePHQ-9's utility as a depression severity measure. Unlike patient-reported observed PHQ-9, ePHQ-9 is derived from psychiatrists’ clinical narratives, providing consistency in the assessments of severity and treatment resistant status.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

RWD35

Topic

Clinical Outcomes, Epidemiology & Public Health, Real World Data & Information Systems

Disease

Mental Health (including addition), No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)

Author(s)

Conference/Value in Health Info

Code

Topic

Disease

ISPOR–The Professional Society for
Health Economics and Outcomes Research

Your browser is out-of-date