Prospects for Automation of Systemic Literature Reviews (SLRs) With Artificial Intelligence and Natural Language Processing
Speaker(s)
Royer J1, Wu EQ2, Ayyagari R2, Parravano S2, Pathare U2, Kisielinska M2
1Analysis Group, Inc., Montreal, QC, Canada, 2Analysis Group, Inc., Boston, MA, USA
Presentation Documents
OBJECTIVES: This research explores the performance of the latest artificial intelligence (AI) techniques to assist with SLRs, with the goal of improving review time while maintaining high accuracy.
METHODS: Sourcing abstracts from an SLR on attention-deficit/hyperactivity disorder (ADHD)-related studies, we approach the problem with two techniques. For the first, we use a pre-trained sentence embedder from Hugging-Face to vectorize abstracts and apply a binary classifier to decide inclusion of a study. For the second, we provide abstracts to OpenAI’s GPT-3.5 and ask questions that correspond to inclusion/exclusion criteria for a given review. We use GPT-3.5’s responses to directly assess inclusion, and to train a binary classifier that predicts inclusion/exclusion decisions. Since SLRs prioritize the inclusion of all relevant studies for a given topic, we assess performance by measuring the number of studies correctly excluded from screening while minimizing exclusions of relevant studies.
RESULTS: We find that training different classifiers on 20% of the available data (151 abstracts out of 752 abstracts) results in the best predictions. For the sentence embedding technique, a trained Support Vector Machine excludes 40.1% (+1.6/-1.8) of all irrelevant articles while retaining 94.2% (+1.2/-0.6) of relevant ones with 90% confidence. Conversely, using GPT-3.5’s responses directly to rank abstracts according to their likely relevance allows us to exclude 25.0% of irrelevant abstracts while retaining 95.8% of all relevant ones. Furthermore, by combining the GPT-3.5 responses with the training set of 151 abstracts from above, a Logistic Regression excludes 40.4% (+1.3/-1.6) of irrelevant abstracts while keeping 96.5% (+1.0/-0.9) of relevant ones with 90% confidence.
CONCLUSIONS: Recent AI developments show promise in creating significant reductions in the time associated with conducting SLRs by better targeting abstracts for human review. In particular, human reviewers can focus on the more uncertain/inconclusive AI recommendations while the AI rapidly trims the screening set by dropping high-confidence irrelevant abstracts.
Code
MSR131
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas