Ai-Augmented Data Extraction in Literature Reviews: Toward First-Pass Accuracy Competitive With Human Performance
Author(s)
Ross De Burgh, PhD1, Karl Moritz Herrmann, PhD2, Sam Mardini, BSc3, Christoph R Schlegel, PhD4.
1MedScope Review Solutions, Theoule Sur Mer, France, 2Reliant AI Europe GmbH, Berlin, Germany, 3Reliant AI Inc., Boston, MA, USA, 4Co-founder, Reliant AI Europe GmbH, Berlin, Germany.
1MedScope Review Solutions, Theoule Sur Mer, France, 2Reliant AI Europe GmbH, Berlin, Germany, 3Reliant AI Inc., Boston, MA, USA, 4Co-founder, Reliant AI Europe GmbH, Berlin, Germany.
OBJECTIVES: AI tools for literature reviews are well known for improving efficiency, but questions around accuracy and repeatability remain key barriers to adoption in regulatory and scientific settings. Human first-pass data extraction is never 100% accurate due to inherent variability (typos, interpretation errors). Demonstrating that AI can outperform or match human accuracy at first pass is critical for driving trust and adoption. Our objective was to evaluate the first-pass accuracy of AI-based structured data extraction (Reliant Tabular) vs first-pass human extraction in systematic literature reviews.
METHODS: A standardized literature review protocol was developed with predefined structured data fields (e.g., sample sizes, outcome measures, study design elements). First-pass extractions were performed using both Reliant Tabular and human reviewers. Each output was compared to a gold-standard dataset (manually validated). Accuracy rates and inter-rater agreement were calculated for both extraction methods.
RESULTS: AI-based extraction demonstrates accuracy comparable to that of human reviewers across key structured fields in the literature review. Human first-pass extraction typically achieves an accuracy rate of approximately 80-95%. The AI system approaches this range, with most discrepancies arising from challenges in entity resolution and handling ambiguous phrasing. In contrast, human reviewers most frequently made errors due to typographical mistakes, omissions, or inconsistent interpretations of study elements.
CONCLUSIONS: AI-based data extraction not only accelerates literature workflows but can exceed human first-pass accuracy, offering a strong foundation for scalable, repeatable, regulator-friendly evidence generation.
METHODS: A standardized literature review protocol was developed with predefined structured data fields (e.g., sample sizes, outcome measures, study design elements). First-pass extractions were performed using both Reliant Tabular and human reviewers. Each output was compared to a gold-standard dataset (manually validated). Accuracy rates and inter-rater agreement were calculated for both extraction methods.
RESULTS: AI-based extraction demonstrates accuracy comparable to that of human reviewers across key structured fields in the literature review. Human first-pass extraction typically achieves an accuracy rate of approximately 80-95%. The AI system approaches this range, with most discrepancies arising from challenges in entity resolution and handling ambiguous phrasing. In contrast, human reviewers most frequently made errors due to typographical mistakes, omissions, or inconsistent interpretations of study elements.
CONCLUSIONS: AI-based data extraction not only accelerates literature workflows but can exceed human first-pass accuracy, offering a strong foundation for scalable, repeatable, regulator-friendly evidence generation.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
MSR23
Topic
Health Policy & Regulatory, Health Technology Assessment, Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas