Abstract
Objectives
To evaluate the performance of Claude 3.7 Sonnet in automating data extraction for systematic literature reviews (SLRs).
Methods
An artificial intelligence (AI) extraction model based on the Claude 3.7 Sonnet large language model was developed through a structured process, including targeted training using a master data list and selected full-text articles. The master data list enhanced the model’s contextual knowledge, guiding data extraction. Seven full-text articles from 4 oncology-focused treatment efficacy and safety SLRs were used for early testing and iterative refinement through error analysis. Model performance was then evaluated using 20 full-text articles, drawn from the same SLRs but not used for model development, and benchmarked against human extractions. Evaluation metrics included precision, recall, and F1 score. Extraction time was also compared across 3 different approaches: AI model-only, hybrid (AI model with human verification), and traditional human extraction.
Results
The AI model extracted 117 889 data points across 106 variables, achieving an overall precision of 98.2%, recall of 96.6%, and F1-score of 97.4%. Extraction performance was highest for Study Characteristics (precision: 97.7%, recall: 98.7%) and Participant Characteristics (precision: 97.3%, recall: 98.7%). Outcome data showed 96.4% recall and 98.7% precision. Intervention Characteristics achieved 97.5% precision and 94.6% recall. Extraction using the AI model alone averaged 4.5 minutes per article, compared with 64.5 minutes with the hybrid approach and approximately 240 minutes with traditional human extraction.
Conclusions
The Claude 3.7 Sonnet-based model demonstrated strong performance, supporting efficient and reliable AI-driven data extraction in oncology SLRs, with potential for broader applicability.
Authors
Ellen Kasireddy Cuthbert Chow Jun Collet Mir-Masoud Pourrahmat Mir Sohail Fazeli