Development of an LLM-Based Computable Phenotype for Pediatric Lymphoma Subtypes Using Structured and Unstructured Electronic Medical Record Data
Author(s)
Yoona Choi, Ph.D.1, Mihyun Park, M.Pharm.1, Bo Kyung Kim, M.D.2, Hyoung Jin Kang, M.D., Ph.D.2.
1Child Cancer and Rare Disease Project, Seoul National University Hospital, Seoul, Korea, Republic of, 2Department of Pediatrics, Seoul National University Hospital, Seoul, Korea, Republic of.
1Child Cancer and Rare Disease Project, Seoul National University Hospital, Seoul, Korea, Republic of, 2Department of Pediatrics, Seoul National University Hospital, Seoul, Korea, Republic of.
OBJECTIVES: This study aimed to develop and evaluate a computable phenotype algorithm for pediatric lymphoma subtypes that integrates structured and unstructured electronic medical record (EMR) data, using a Large Language Model (LLM)-based Natural Language Processing (NLP) approach.
METHODS: We conducted a retrospective analysis of pediatric patients with at least one lymphoma-related diagnosis between 2000 and 2024 from a tertiary hospital’s Clinical Data Warehouse. The computable phenotype algorithm was developed as a multi-step rule-based framework: (1) identify patients with relevant diagnostic codes containing "lymphoma"; (2) analyze pathology reports using a locally deployed LLaMA 2-7B model through the Ollama platform; (3) when pathology findings were inconclusive or discordant, incorporate bone marrow, flow cytometry, molecular testing (e.g., FISH, NGS, PCR) using the same NLP pipeline; (4) finalize the lymphoma subtype through rule-based aggregation of all available test outputs. The final algorithm was applied to the full cohort, and performance was validated in a randomly sampled subset of 100 patients using expert-reviewed gold-standard labels.
RESULTS: Out of 557 initial patients, 430 were included after excluding cases with insufficient or irrelevant data. The stepwise algorithm resolved subtype classification using pathology alone in 85% of cases; the remaining 15% required integration of ancillary test data. Compared to the gold standard, the algorithm achieved a precision of 71.0%, recall of 63.0%, and F1-score of 66.0%. When excluding rare subtypes grouped as “others,” the performance improved (precision 75.0%, recall 68.0%, F1-score 71.0%).
CONCLUSIONS: This study demonstrates the feasibility of a rule-based computable phenotype algorithm for complex disease subtype classification by combining LLM-based NLP with multimodal test interpretation. Further optimization and external validation are warranted.
METHODS: We conducted a retrospective analysis of pediatric patients with at least one lymphoma-related diagnosis between 2000 and 2024 from a tertiary hospital’s Clinical Data Warehouse. The computable phenotype algorithm was developed as a multi-step rule-based framework: (1) identify patients with relevant diagnostic codes containing "lymphoma"; (2) analyze pathology reports using a locally deployed LLaMA 2-7B model through the Ollama platform; (3) when pathology findings were inconclusive or discordant, incorporate bone marrow, flow cytometry, molecular testing (e.g., FISH, NGS, PCR) using the same NLP pipeline; (4) finalize the lymphoma subtype through rule-based aggregation of all available test outputs. The final algorithm was applied to the full cohort, and performance was validated in a randomly sampled subset of 100 patients using expert-reviewed gold-standard labels.
RESULTS: Out of 557 initial patients, 430 were included after excluding cases with insufficient or irrelevant data. The stepwise algorithm resolved subtype classification using pathology alone in 85% of cases; the remaining 15% required integration of ancillary test data. Compared to the gold standard, the algorithm achieved a precision of 71.0%, recall of 63.0%, and F1-score of 66.0%. When excluding rare subtypes grouped as “others,” the performance improved (precision 75.0%, recall 68.0%, F1-score 71.0%).
CONCLUSIONS: This study demonstrates the feasibility of a rule-based computable phenotype algorithm for complex disease subtype classification by combining LLM-based NLP with multimodal test interpretation. Further optimization and external validation are warranted.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
RWD63
Topic
Clinical Outcomes, Epidemiology & Public Health, Real World Data & Information Systems
Topic Subcategory
Health & Insurance Records Systems, Reproducibility & Replicability
Disease
Oncology, Pediatrics, Rare & Orphan Diseases