HARNESSING AGENTIC AI FOR COHORT IDENTIFICATION: A CASE STUDY IN CLL/SLL PATIENTS TREATED WITH BTK INHIBITORS AT A US ACADEMIC HEALTH SYSTEM
Author(s)
Sandeep Kumar, MBBS, M.Tech1, Gnaanikko Patchiappan Pa, MBBS1, Melissa Hagan, MPH, PhD2, Melissa Hagan, PhD3, Allison Smither, PhD4;
1nference, Bangalore, India, 2Be One Medicine, Florham Park, NJ, USA, 3BeOne Medicines Ltd, San Carlos, CA, USA, 4nference, Cambridge, MA, USA
1nference, Bangalore, India, 2Be One Medicine, Florham Park, NJ, USA, 3BeOne Medicines Ltd, San Carlos, CA, USA, 4nference, Cambridge, MA, USA
OBJECTIVES: Advancements in artificial intelligence (AI), particularly in natural language processing (NLP) and large language models (LLMs), have significantly improved the abstraction of information from clinical notes within medical records. This study investigated the use of AI technologies in identifying a cohort of de-identified Bruton tyrosine kinase (BTK) inhibitor-naïve patients with chronic lymphocytic leukemia (CLL) or small lymphocytic lymphoma (SLL) who initiated BTK inhibitor treatment at a large US academic health system.
METHODS: Patients diagnosed with CLL/SLL between January 1, 2005, and December 31, 2024, were identified through three methods: structured ICD codes and unstructured notes transformed into computable variables using nference’s proprietary NLP algorithms (ICD+NLP cohort), LLM-assisted analysis of unstructured data (LLM cohort), and a combined approach utilizing all three methods (ICD+NLP+LLM; combined cohort). Initial BTK inhibitor use from January 1, 2020, to December 31, 2024, was determined from structured medication orders, with confirmation via clinical notes using either NLP or LLM techniques. The accuracy of cohort inclusion was evaluated using Pearson’s Chi-Square test for independence, along with the Benjamini-Hochberg correction for multiple comparisons.
RESULTS: A total of 573 patients were identified in the ICD+NLP cohort, 361 in the LLM cohort, and 208 in the combined cohort. Demographics across groups revealed similar distributions: 93.8%-95.0% identified as White, 95.1%-95.8% as non-Hispanic or Latinx, and 65.9%-67.2% were male. The median age at BTK inhibitor initiation ranged from 66.7-68.3 years. The LLM cohort exhibited the highest specificity (92%), while the combined cohort had the highest sensitivity (92%). Moreover, both the LLM (86%) and combined cohorts (87%) achieved higher accuracy of cohort inclusion compared to the ICD+NLP cohort (75%) (P=0.012 and P=0.032, respectively).
CONCLUSIONS: This analysis demonstrates that patient-level abstraction of clinical text using LLMs outperforms traditional ICD-code-based and sentence-level NLP approaches, establishing a more accurate method for cohort identification.
METHODS: Patients diagnosed with CLL/SLL between January 1, 2005, and December 31, 2024, were identified through three methods: structured ICD codes and unstructured notes transformed into computable variables using nference’s proprietary NLP algorithms (ICD+NLP cohort), LLM-assisted analysis of unstructured data (LLM cohort), and a combined approach utilizing all three methods (ICD+NLP+LLM; combined cohort). Initial BTK inhibitor use from January 1, 2020, to December 31, 2024, was determined from structured medication orders, with confirmation via clinical notes using either NLP or LLM techniques. The accuracy of cohort inclusion was evaluated using Pearson’s Chi-Square test for independence, along with the Benjamini-Hochberg correction for multiple comparisons.
RESULTS: A total of 573 patients were identified in the ICD+NLP cohort, 361 in the LLM cohort, and 208 in the combined cohort. Demographics across groups revealed similar distributions: 93.8%-95.0% identified as White, 95.1%-95.8% as non-Hispanic or Latinx, and 65.9%-67.2% were male. The median age at BTK inhibitor initiation ranged from 66.7-68.3 years. The LLM cohort exhibited the highest specificity (92%), while the combined cohort had the highest sensitivity (92%). Moreover, both the LLM (86%) and combined cohorts (87%) achieved higher accuracy of cohort inclusion compared to the ICD+NLP cohort (75%) (P=0.012 and P=0.032, respectively).
CONCLUSIONS: This analysis demonstrates that patient-level abstraction of clinical text using LLMs outperforms traditional ICD-code-based and sentence-level NLP approaches, establishing a more accurate method for cohort identification.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
SA28
Topic
Study Approaches
Disease
SDC: Oncology, SDC: Rare & Orphan Diseases