Are the Machines Ready to Take Over? Can Artificial Intelligence Replace a Human Reviewer for Literature Screening and Selection for Systematic Literature Reviews?

Speaker(s)

Nass P1, Rekowska D2, Arca` E3, Nowak A4, Sadowska E5, Borowiack E6, Halfpenny N2
1OPEN Health, Stuttgart, BW, Germany, 2OPEN Health, Rotterdam, Netherlands, 3OPEN Health, Brussels, Belgium, 4Evidence Prime, Krakow, MA, Poland, 5Evidence Prime, Krakow, Poland, 6Evidence Prime, Kraków, Poland

OBJECTIVES: The exponential increase in clinical research literature and aggressive timelines for Health Technology Assessment (HTA) submissions, especially with the upcoming EU HTA regulation, is making systematic literature reviews (SLRs) more resource-intensive and costly. Recent studies have shown that artificial intelligence (AI) could accelerate SLR preparation by serving as a second reviewer during title/abstract (TI/AB) screening. Here we present a case study to test LaserAI’s functionality for TI/AB screening.

METHODS: The case study used an existing comprehensive clinical SLR, involving eight updates, of biologic treatments for Crohn’s disease, which involved two human reviewers for literature screening/selection with conflicts resolved by a third reviewer. The original SLR was used to train the AI; inputs were the original search results (7272 records), studies selected for full text review (176 records), and final study inclusions (63 records). Subsequently, the AI replicated the human reviewers’ screening for all eight updates (3257 records) and the results were compared against the human literature screening/selection. The main outcomes were sensitivity and workload savings.

RESULTS: Across all updates the human reviewers included 165 records for full text review, while the AI selected 466 records. In seven of eight updates, the AI identified all studies that had been included by the human reviewers, which corresponds to 100% sensitivity. However, in update 6, one of three studies included by the human reviewers was missed by the AI, resulting in 67% sensitivity for this update. The average workload saving across all updates was 45.4%.

CONCLUSIONS: The results of this study support the use of AI as a second reviewer for TI/AB screening during update searches. The AI demonstrated a good level of sensitivity, and its use could save considerable resources. Additional workload savings might have been achieved if we had retrained the AI after each update.

Code

SA94

Topic

Study Approaches

Topic Subcategory

Literature Review & Synthesis

Disease

Biologics & Biosimilars, Gastrointestinal Disorders