Automated Data Extraction Using Artificial Intelligence to Accelerate Systematic Literature Reviews in Rheumatoid Arthritis

Author(s)

von Hein M1, Talbot-Watt N2, Chen CH3, Lundie G4
1Gilead, London, LON, UK, 2Gilead Sciences Ltd, London, LON, UK, 3Virtual Science AI, London, London, UK, 4Gilead Sciences, London, UK

OBJECTIVES: To test the application of a custom-build artificial intelligence (AI) to automatically extract data from publications to accelerate the systematic literature review (SLR) process in rheumatoid arthritis (RA).

METHODS: A conventional SLR in RA conducted in 2018, which was rerun in 2024 using the exact search strategy of the 2018 SLR, served as the dataset to train the AI to extract data from the final records to be included in the SLR. The data extraction output of the AI was compared against a human-curated data extraction grid from the 2018 SLR, using the Jaccard similarity coefficient, whereby the human-curated data extraction was assumed to be perfectly accurate. Additionally, the accuracy rate of the AI was evaluated by a human reviewer, whereby the reviewer checked for meaning rather than exact matches against the data extraction grid.

RESULTS: The 2018 RA SLR identified 992 records through database searches via MEDLINE and Cochrane and included 89 records after full-text screening, whereby the RA SLR search strategy replicated in 2024 identified 1043 records applying the same SLR search terms and time horizon specified in the 2018 RA SLR. The human-curated data extraction grid included both text and numerical variables, such as study type, number of participants, intervention, comparator or patient-reported outcomes. The AI achieved a 74% accuracy rate based on the Jaccard similarity coefficient, whereby the human-assessed accuracy rate of the AI was 84%. The human review identified minor errors in the 2018 SLR data extraction grid, questioning the perfect accuracy assumption of the data sample against which the accuracy rates were calculated.

CONCLUSIONS: Training an AI to automate data extraction for SLRs in RA is feasible but needs further work to improve accuracy rates. Human-supervised quality control of data extraction results remains an important aspect to ensure quality, transparency and validity of the SLR outputs.

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Value in Health, Volume 27, Issue 12, S2 (December 2024)

Code

MSR29

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis

Disease

Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal), No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×