LEVERAGING AI-DRIVEN NATURAL LANGUAGE PROCESSING TO ENHANCE SYMPTOM CAPTURE IN PRIMARY BILIARY CHOLANGITIS (PBC) AND PRIMARY SCLEROSING CHOLANGITIS (PSC)
Author(s)
Anthony D. Perez, PHD, Ashley Jaksa, MPH;
Target RWE, Durham, NC, USA
Target RWE, Durham, NC, USA
OBJECTIVES: Fatigue and pruritus are hallmark symptoms of cholestatic liver diseases (e.g., Primary Biliary Cholangitis (PBC) and Primary Sclerosing Cholangitis (PSC)) that significantly impact patient quality of life. However, these symptoms are often under-captured in structured real-world data (RWD), such as Electronic Health Records (EHR). This study evaluated how Natural Language Processing (NLP) of unstructured clinical narratives can improve prevalence estimates for pruritus ( in PBC and PSC) and fatigue (in PSC specifically).
METHODS: Patients with PBC and PSC were identified within the TARGET-LD and TARGET-GASTRO cohorts using ICD-10 codes K74.3 and K83.0. Baseline symptom prevalence—pruritus in both cohorts and fatigue in the PSC cohort—was calculated using structured EHR data. For patients with available unstructured clinician notes, NLP models were deployed to detect symptom mentions and assign polarity scores (affirming vs. negating). NLP-derived insights were integrated with structured data to generate multimodal prevalence estimates, which were compared against code-based identification alone.
RESULTS: Preliminary analyses included 668 patients with PBC and 3,958 with PSC. Relying solely on structured data yielded low apparent prevalence for pruritus (11% PBC; 17% PSC) and fatigue (20% PSC). The integration of unstructured data significantly increased capture across both cohorts. Multimodal prevalence for pruritus rose to 37% in PBC and 53% in PSC. In the PSC cohort, fatigue identification increased to 66% through the addition of NLP-derived insights.
CONCLUSIONS: Diagnostic codes alone are insufficient to capture the fatigue and pruritus burden of rare cholestatic liver diseases. Incorporating NLP-processed unstructured data markedly increased the sensitivity of symptom identification. This multimodal approach provides a more robust framework for understanding the natural history and real-world burden of fatigue and pruritus, offering critical insights for payers and health technology assessors evaluating rare disease interventions.
METHODS: Patients with PBC and PSC were identified within the TARGET-LD and TARGET-GASTRO cohorts using ICD-10 codes K74.3 and K83.0. Baseline symptom prevalence—pruritus in both cohorts and fatigue in the PSC cohort—was calculated using structured EHR data. For patients with available unstructured clinician notes, NLP models were deployed to detect symptom mentions and assign polarity scores (affirming vs. negating). NLP-derived insights were integrated with structured data to generate multimodal prevalence estimates, which were compared against code-based identification alone.
RESULTS: Preliminary analyses included 668 patients with PBC and 3,958 with PSC. Relying solely on structured data yielded low apparent prevalence for pruritus (11% PBC; 17% PSC) and fatigue (20% PSC). The integration of unstructured data significantly increased capture across both cohorts. Multimodal prevalence for pruritus rose to 37% in PBC and 53% in PSC. In the PSC cohort, fatigue identification increased to 66% through the addition of NLP-derived insights.
CONCLUSIONS: Diagnostic codes alone are insufficient to capture the fatigue and pruritus burden of rare cholestatic liver diseases. Incorporating NLP-processed unstructured data markedly increased the sensitivity of symptom identification. This multimodal approach provides a more robust framework for understanding the natural history and real-world burden of fatigue and pruritus, offering critical insights for payers and health technology assessors evaluating rare disease interventions.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
RWD129
Topic
Real World Data & Information Systems
Topic Subcategory
Data Protection, Integrity, & Quality Assurance
Disease
SDC: Rare & Orphan Diseases