EVALUATION OF MACHINE LEARNING-ASSISTED DATA EXTRACTION FOR SUPPORTING SYSTEMATIC REVIEWS.VA CASE STUDY OF A REVIEW IN MOVEMENT DISORDERS
Author(s)
Michaela Lunan, PhD1, Sathushan Thurairajah, BS Pharmacology1, Jose Marcano Belisario, PhD2, Louise Hartley, PhD2.
1RTI Health Solutions, Manchester, United Kingdom, 2RTI- Health Solutions, Manchester, United Kingdom.
1RTI Health Solutions, Manchester, United Kingdom, 2RTI- Health Solutions, Manchester, United Kingdom.
OBJECTIVES: Artificial intelligence (AI) enables automation of systematic literature review (SLR) components. AI data extraction (DE) evaluation is needed to ensure rigour and appropriate integration into workflows. Objectives were to evaluate AI-assisted DE in a SLR of randomised-controlled trials (RCTs) in movement disorders; considering consistency (measured by accuracy, recall and precision), time spent and implications for SLR researchers.
METHODS: DE was conducted in Nested Knowledge using AI-assisted Adaptive Smart Tags. Prompts for DE covered study characteristics, treatment description, patient characteristics, clinical outcomes and safety. Multiple text types were included: full-text articles, abstracts and Clinicaltrials.gov records. AI DE was quality checked and supplemented by human researchers. Data extracted for references were collated for linked studies to remove duplicate information. Time building and piloting prompts, and quality checking/supplementing AI extraction was compared to researcher averages for manual DE.
RESULTS: The SLR included 39 references (24 individual studies). AI performed best when extracting study and patient characteristics with high accuracy (0.94 and 0.93), recall (0.99 and 0.93) and precision (0.95 and 0.88). Outcomes and safety data had low recall (0.4 and 0.31, respectively). The review extracted multiple pain and disease severity scales and AI would misclassify data from one tool for another, while for safety, specific adverse events were misclassified (e.g. headaches for dysphagia). Although AI extraction was quick, significant time was required for prompt-building and correcting errors; total time savings were slightly reduced, averaging 50 minutes for AI and 1 hour for fully manual extraction.
CONCLUSIONS: AI extraction performed well for objective and atomic data but had low recall for outcome and safety data. Prompt building and error correction resulted in minimal time savings overall. AI DE may prove more worthwhile with standardised efficacy outcomes (e.g. oncology) rather than rare disorders as prompts could be reused and misclassifications of tools reduced.
METHODS: DE was conducted in Nested Knowledge using AI-assisted Adaptive Smart Tags. Prompts for DE covered study characteristics, treatment description, patient characteristics, clinical outcomes and safety. Multiple text types were included: full-text articles, abstracts and Clinicaltrials.gov records. AI DE was quality checked and supplemented by human researchers. Data extracted for references were collated for linked studies to remove duplicate information. Time building and piloting prompts, and quality checking/supplementing AI extraction was compared to researcher averages for manual DE.
RESULTS: The SLR included 39 references (24 individual studies). AI performed best when extracting study and patient characteristics with high accuracy (0.94 and 0.93), recall (0.99 and 0.93) and precision (0.95 and 0.88). Outcomes and safety data had low recall (0.4 and 0.31, respectively). The review extracted multiple pain and disease severity scales and AI would misclassify data from one tool for another, while for safety, specific adverse events were misclassified (e.g. headaches for dysphagia). Although AI extraction was quick, significant time was required for prompt-building and correcting errors; total time savings were slightly reduced, averaging 50 minutes for AI and 1 hour for fully manual extraction.
CONCLUSIONS: AI extraction performed well for objective and atomic data but had low recall for outcome and safety data. Prompt building and error correction resulted in minimal time savings overall. AI DE may prove more worthwhile with standardised efficacy outcomes (e.g. oncology) rather than rare disorders as prompts could be reused and misclassifications of tools reduced.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
SA11
Topic
Study Approaches
Topic Subcategory
Literature Review & Synthesis
Disease
No Additional Disease & Conditions/Specialized Treatment Areas