Machine Learning Outperforms Manual Screening in Literature Reviews: A Case Study With Rare Disease Natural History Data

Speaker(s)

Rapoport M1, Randet A2, Cooper C3
1Wickenstones Ltd, Horton, STS, UK, 2Wickenstones Ltd, Oxford, UK, 3Independent Researcher, London, London, UK

Presentation Documents

OBJECTIVES: Machine learning (ML) supported screening of titles and abstracts (ti/ab) has emerged as a promising application of artificial intelligence in the labour-intensive literature review workflow. We used ASReview, an open source, active learning-based application to screen ti/ab for a review of the natural history of osteogenesis imperfecta, a rare genetic disorder. Due to the paucity of natural history data for rare conditions, the review included longitudinal and cross-sectional studies and relevant subgroup data (CRD42024536369).

METHODS: Reports were identified in database searches and deduplicated using Endnote X8. A pilot screening of 100 random reports was conducted by two reviewers in Excel. Then, one reviewer used ASReview (recommended settings); the other screened in Excel in random order. All reports from the pilot screen were provided as prior knowledge to ASReview. Four stopping criteria were tested: the SAFE procedure, 50 consecutive irrelevant, 2.5% consecutive irrelevant, 95% of estimated relevant records (based on pilot). For the SAFE procedure, after reaching the stopping criterion, ASReview was set to doc2vec and random forest.

RESULTS: Database searches retrieved 2,521 reports, of which 1,447 remained after deduplication. Screening in ASReview found relevant reports notably sooner than screening in Excel; in ASReview, most (51%) relevant reports (n=90) were found after screening only 12% of the sample, while in Excel, 52% of the sample were screened for the same recall. The SAFE procedure achieved the best recall with fewest reports screened (recall 98% after 53% screened) among the tested stopping rules.

CONCLUSIONS: Despite the unusually high heterogeneity of the literature in this review, ASReview performed efficiently, enabling screening of fewer reports. Testing of the SAFE procedure showed that it offers a simple to implement, but robust process to determine when screening should stop. Limiting uncertainties in ML-supported screening will open up the use of the technology to new users.

Code

RWD63

Topic

Methodological & Statistical Research, Organizational Practices, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Best Research Practices, Literature Review & Synthesis

Disease

Musculoskeletal Disorders (Arthritis, Bone Disorders, Osteoporosis, Other Musculoskeletal), Rare & Orphan Diseases