SOCIAL DETERMINANTS OF HEALTH, MORTALITY, AND EXTENDED HOSPITAL STAY IN ATRIAL FIBRILLATION: LEVERAGING OPEN-SOURCE LARGE LANGUAGE MODELS
Author(s)
Vishnu Bharadwaj Suresh, MSc1, Won Lee, PhD2.
1Axtria HEOR/RWE, Berkeley Heights, NJ, USA, 2Axtria HEOR/RWE, San Francisco, CA, USA.
1Axtria HEOR/RWE, Berkeley Heights, NJ, USA, 2Axtria HEOR/RWE, San Francisco, CA, USA.
OBJECTIVES: To evaluate the feasibility of using small-scale open-source large language models (LLMs) to extract social determinants of health (SDoH) from clinical discharge summaries and assess whether these features improve prediction of 30-day mortality and extended length of stay (LOS) among patients with atrial fibrillation.
METHODS: We analyzed 1,184 atrial fibrillation admissions from the MIMIC-IV database (2008-2022). Only open-source LLMs with <7 billion parameters were considered; the best-performing model (Qwen3:1.7B) was selected based on extraction quality and stability. Thirteen SDoH attributes spanning employment status, social support, and relationship status were identified from 1,000 discharge summaries. Predictive models (Lasso logistic regression, Random Forest and XGBoost) were trained using clinical features alone and combined with SDoH to predict 30-day mortality and extended LOS (>7 days).
RESULTS: LLM-based extraction achieved a 32.9% detection rate under high-confidence validation. For extended LOS (37.1% prevalence), adding SDoH modestly improved performance: Random Forest AUC increased from 0.81 to 0.82 (+0.6%), and Lasso from 0.77 to 0.78 (+1.3%). Retired employment status was the strongest SDoH predictor, followed by relationship status. For 30-day mortality (9.1% prevalence), clinical-only models outperformed SDoH-augmented models (best AUC: 0.83 vs. 0.79, XGBoost), though limited social support showed moderate associations with mortality (Lasso coefficient = 0.45).
CONCLUSIONS: Small open-source LLMs can reliably extract meaningful SDoH from clinical notes and provide incremental value for predicting extended LOS without reliance on large proprietary models. Observed associations between SDoH and outcomes highlight opportunities for tailored discharge planning and care coordination.
METHODS: We analyzed 1,184 atrial fibrillation admissions from the MIMIC-IV database (2008-2022). Only open-source LLMs with <7 billion parameters were considered; the best-performing model (Qwen3:1.7B) was selected based on extraction quality and stability. Thirteen SDoH attributes spanning employment status, social support, and relationship status were identified from 1,000 discharge summaries. Predictive models (Lasso logistic regression, Random Forest and XGBoost) were trained using clinical features alone and combined with SDoH to predict 30-day mortality and extended LOS (>7 days).
RESULTS: LLM-based extraction achieved a 32.9% detection rate under high-confidence validation. For extended LOS (37.1% prevalence), adding SDoH modestly improved performance: Random Forest AUC increased from 0.81 to 0.82 (+0.6%), and Lasso from 0.77 to 0.78 (+1.3%). Retired employment status was the strongest SDoH predictor, followed by relationship status. For 30-day mortality (9.1% prevalence), clinical-only models outperformed SDoH-augmented models (best AUC: 0.83 vs. 0.79, XGBoost), though limited social support showed moderate associations with mortality (Lasso coefficient = 0.45).
CONCLUSIONS: Small open-source LLMs can reliably extract meaningful SDoH from clinical notes and provide incremental value for predicting extended LOS without reliance on large proprietary models. Observed associations between SDoH and outcomes highlight opportunities for tailored discharge planning and care coordination.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
RWD144
Topic
Real World Data & Information Systems
Topic Subcategory
Health & Insurance Records Systems
Disease
SDC: Cardiovascular Disorders (including MI, Stroke, Circulatory)