Training and Validation of CARAaiTM: A Multi LLM Platform and Data Model to Address Oncology-Specific Challenges in Clinical Data Extraction

Author(s)

Jennifer Rider, ScD¹, Vivek P. Vaidya, BS², Kuldeep Jiwani, BS², Jeffrey Elton, PhD³, Louis Culot, MLA³, Pyeush Gurha, BS²;
¹ConcertAI, Vice Preseident, Real World Evidence Services, Cambridge, MA, USA, ²ConcertAI, Bengaluru, India, ³ConcertAI, Cambridge, MA, USA

Presentation Documents

Rider et al. ISPOR 2025 CARAai Validation Poster_FINAL.pdf

OBJECTIVES: In oncology, performance status, tumor characteristics, biomarkers, treatments, and tumor progression or response allow for analysis of outcomes and effectiveness. These concepts are derived from the unstructured portion of patient EHR records. Historically, this information relied on time and resource-intensive human abstraction, limiting study sample sizes and extending time to insights months after the actual clinical activities. Large Language Models (LLM) are an alternative approach. However, oncology presents a unique challenge due to vagueness in terminology (e.g. “stage 3" referring to Chronic Kidney Disease or cancer stage). To enable use of LLMS with performance comparable to human curation, we used the ConcertAI Oncology Real-world data set, and trained and validated the “CARAai^TM platform of multiple oncology tuned LLMs.
METHODS: We validated the performance of the CARAai^TM models based on precision, recall, and the F1 score (the harmonic mean of precision and recall) using 50,000 patients across 13 solid tumor types (80% training set and 20% testing set). The same records processed via oncology-domain trained human clinical abstraction were used as the gold standard.
RESULTS: For performance status, tumor stage, histology, tumor grade, procedure type, metastatic diagnosis and medication, precision was >0.90 (±0.05), recall ranged from 0.91-0.99, and F1 scores were >0.95. Precision, recall and F1 scores were 0.95, 0.98, and 0.96 for biomarker names, 0.87, 0.84, and 0.85 for biomarker categorical results, and 0.86, 0.94, and 0.90 for biomarker numeric test results.
CONCLUSIONS: The CARAai^TM LLM suite achieved high precision with respect to human curation for oncology key data elements allowing larger data sets with lower latency. The CARAai^TM LLM models will facilitate improved statistical power and timeliness for HEOR and epidemiological studies on outcomes and safety.

Conference/Value in Health Info

2025-05, ISPOR 2025, Montréal, Quebec, CA

Value in Health, Volume 28, Issue S1

Code

RWD116

Topic

Real World Data & Information Systems

Topic Subcategory

Reproducibility & Replicability

Disease

SDC: Oncology

Presentation (CTI)