Can Artificial Intelligence Tools Enhance Data Abstraction During Systematic Literature Reviews?

Speaker(s)

Cribbs K1, Baisley W1, Blackmore L2, Lahue B1
1Alkemi LLC, Manchester Center, VT, USA, 2Alkemi LLC, Norwich, VT, USA

Presentation Documents

OBJECTIVES: In health economic and outcomes research (HEOR), systematic literature reviews (SLRs) are integral to obtaining data inputs and assessing the impact of health technologies, but the data abstraction process is prone to human error, compromising research integrity. Our study assessed abstraction error using a publicly available artificial intelligence (AI) tool.

METHODS: Seven AI prompts aligned to pre-established SLR abstraction domains (Publication Information, Treatments Studied, Study Design and Methodology, Baseline Patient Characteristics, Treatment Parameters, Efficacy Outcomes, Safety Outcomes) were developed. One reviewer ran each prompt in Microsoft Copilot for 33 SLR publications and transcribed AI-generated responses in Excel. Two independent reviewers conducted quality checks to compare AI responses with source publications and color-coded responses denoted inaccurate (i.e., wrong) and variable quality (i.e., key information or context omitted). A third reviewer reconciled discrepancies. Descriptive statistics were employed to analyze AI error rates (accuracy, quality, both) across domains and publications.

RESULTS: AI responses yielded a total of 1,089 populated data cells for the 33 publications. The overall error rate (i.e., inaccurate and variable quality) was 13% (142/1,089). Most errors were inaccuracies (106/1,089, 9.7%). Error frequency by publication ranged from 0 (2/33) to 10 (1/33), with mean 4.3 ± 2.28 errors per article. The Publication Domain had the lowest error rate (2.2%, 5/231), while Efficacy Outcomes had the greatest (42.4%, 42/99). Disaggregated findings revealed accuracy errors were more prevalent than quality errors across all but 2 abstraction domains (Publication Information, Treatment Parameters), with inaccuracies observed in 32.3% and 25.8% of efficacy and safety responses, respectively.

CONCLUSIONS: While the publicly available AI tool queried had limitations in gleaning accurate and comprehensive information from research publications, especially outcomes, observed error rates are lower than documented human error rates. AI can improve SLR data abstraction, but robust quality control measures are needed to minimize errors.

Code

MSR9

Topic

Clinical Outcomes, Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Clinical Outcomes Assessment, Literature Review & Synthesis

Disease

Medical Devices, Oncology