EVALUATION OF MACHINE LEARNING-ASSISTED DATA EXTRACTION FOR SUPPORTING SYSTEMATIC REVIEWS.VA CASE STUDY OF A REVIEW IN MOVEMENT DISORDERS

Author(s)

Michaela Lunan, PhD¹, Sathushan Thurairajah, BS Pharmacology¹, Jose Marcano Belisario, PhD², Louise Hartley, PhD².
¹RTI Health Solutions, Manchester, United Kingdom, ²RTI- Health Solutions, Manchester, United Kingdom.

Presentation Documents

SA11-Lunan-Taylor-ISPOR-MachineLearning-42x56-5May_PRINT.pdf

OBJECTIVES: Artificial intelligence (AI) enables automation of systematic literature review (SLR) components. AI data extraction (DE) evaluation is needed to ensure rigour and appropriate integration into workflows. Objectives were to evaluate AI-assisted DE in a SLR of randomised-controlled trials (RCTs) in movement disorders; considering consistency (measured by accuracy, recall and precision), time spent and implications for SLR researchers.
METHODS: DE was conducted in Nested Knowledge using AI-assisted Adaptive Smart Tags. Prompts for DE covered study characteristics, treatment description, patient characteristics, clinical outcomes and safety. Multiple text types were included: full-text articles, abstracts and Clinicaltrials.gov records. AI DE was quality checked and supplemented by human researchers. Data extracted for references were collated for linked studies to remove duplicate information. Time building and piloting prompts, and quality checking/supplementing AI extraction was compared to researcher averages for manual DE.
RESULTS: The SLR included 39 references (24 individual studies). AI performed best when extracting study and patient characteristics with high accuracy (0.94 and 0.93), recall (0.99 and 0.93) and precision (0.95 and 0.88). Outcomes and safety data had low recall (0.4 and 0.31, respectively). The review extracted multiple pain and disease severity scales and AI would misclassify data from one tool for another, while for safety, specific adverse events were misclassified (e.g. headaches for dysphagia). Although AI extraction was quick, significant time was required for prompt-building and correcting errors; total time savings were slightly reduced, averaging 50 minutes for AI and 1 hour for fully manual extraction.
CONCLUSIONS: AI extraction performed well for objective and atomic data but had low recall for outcome and safety data. Prompt building and error correction resulted in minimal time savings overall. AI DE may prove more worthwhile with standardised efficacy outcomes (e.g. oncology) rather than rare disorders as prompts could be reused and misclassifications of tools reduced.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

SA11

Topic

Study Approaches

Topic Subcategory

Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)