Prospects for Automation of Systemic Literature Reviews (SLRs) With Artificial Intelligence and Natural Language Processing

Speaker(s)

Royer J¹, Wu EQ², Ayyagari R², Parravano S², Pathare U², Kisielinska M²
¹Analysis Group, Inc., Montreal, QC, Canada, ²Analysis Group, Inc., Boston, MA, USA

Presentation Documents

ISPOREurope23_Royer_MSR131_POSTER133464.pdf

OBJECTIVES: This research explores the performance of the latest artificial intelligence (AI) techniques to assist with SLRs, with the goal of improving review time while maintaining high accuracy.

METHODS: Sourcing abstracts from an SLR on attention-deficit/hyperactivity disorder (ADHD)-related studies, we approach the problem with two techniques. For the first, we use a pre-trained sentence embedder from Hugging-Face to vectorize abstracts and apply a binary classifier to decide inclusion of a study. For the second, we provide abstracts to OpenAI’s GPT-3.5 and ask questions that correspond to inclusion/exclusion criteria for a given review. We use GPT-3.5’s responses to directly assess inclusion, and to train a binary classifier that predicts inclusion/exclusion decisions. Since SLRs prioritize the inclusion of all relevant studies for a given topic, we assess performance by measuring the number of studies correctly excluded from screening while minimizing exclusions of relevant studies.

RESULTS: We find that training different classifiers on 20% of the available data (151 abstracts out of 752 abstracts) results in the best predictions. For the sentence embedding technique, a trained Support Vector Machine excludes 40.1% (+1.6/-1.8) of all irrelevant articles while retaining 94.2% (+1.2/-0.6) of relevant ones with 90% confidence. Conversely, using GPT-3.5’s responses directly to rank abstracts according to their likely relevance allows us to exclude 25.0% of irrelevant abstracts while retaining 95.8% of all relevant ones. Furthermore, by combining the GPT-3.5 responses with the training set of 151 abstracts from above, a Logistic Regression excludes 40.4% (+1.3/-1.6) of irrelevant abstracts while keeping 96.5% (+1.0/-0.9) of relevant ones with 90% confidence.

CONCLUSIONS: Recent AI developments show promise in creating significant reductions in the time associated with conducting SLRs by better targeting abstracts for human review. In particular, human reviewers can focus on the more uncertain/inconclusive AI recommendations while the AI rapidly trims the screening set by dropping high-confidence irrelevant abstracts.

Code

MSR131

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

ISPOR Europe 2023

12 - 15 November