Evaluating a Deep Learning Model for Classifying Pediatric Pneumonia in Israel

Author(s)

Dong Wang, Ph.D.¹, Meghan White, PharmD¹, Gazit Sivan, MD, MA², Boshu Ru, Ph.D.¹, Jessica Weaver, MPH, PhD¹, Tal Patalon, MD, LLB, MBA², Craig Roberts, MBA, PharmD¹.
¹Merck & Co., Inc., Rahway, NJ, USA, ²Maccabi Healthcare Services, Tel Aviv, Israel.

OBJECTIVES: Primary Endpoint Pneumonia (PEP) is a WHO-defined radiological measure used in vaccine studies for its high specificity for bacterial pneumonia, particularly Streptococcus pneumoniae. Retrospective studies using ICD codes for all-cause pneumonia are less time consuming and have high sensitivity for bacterial pneumonia but lack specificity. Deep learning models can automate chest X-ray (CXR) classification, improving efficiency for large-scale studies while maintaining specificity for bacterial pneumonia. This study aims to validate a deep learning model to identify PEP using CXRs from Maccabi Healthcare Services in Israel, following successful validation in Hong Kong.
METHODS: The dataset comprised 537 anonymized CXR images from children <5 years of age in Israel (2004-2018). Three radiologists independently classified each CXR as PEP positive or negative. The deep learning model's predictions were compared to the radiologists’ majority consensus. Model performance was evaluated using precision, sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC). The impact of an autocropping algorithm on model performance was tested.
RESULTS: Among the 537 CXRs, radiologists classified 78 as PEP (14.5%), while the model classified 25 as PEP (4.7%). The model achieved an accuracy of 89.4%, sensitivity of 29.5%, specificity of 99.6%, precision of 92.0% and AUROC of 91.7%. Performance varied with inter-observer agreement with AUROC at 95.2% when radiologists agreed (n=443; 82.5%) and 74.5% when they disagreed (n=94; 17.5%). Model classification following the auto-cropping algorithm performed comparably to that with the original images.
CONCLUSIONS: The deep learning-based model identifies pneumonia on pediatric CXRs with 92.0% precision and nearly 100% specificity compared to human interpretation. Lower performance when radiologists disagreed indicates that the model's uncertainty aligns with physician uncertainty. The model's performance suggests its potential for classifying CXR images for PEP in future research within this health system.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

MSR89

Topic

Methodological & Statistical Research, Real World Data & Information Systems

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

Infectious Disease (non-vaccine), Pediatrics

Presentation (CTI)