Can Artificial Intelligence (AI) Replace a Human Reviewer in Systematic Literature Review (SLR)? Validation of the LIVESTARTTM Tool

Author(s)

Liu J¹, Jafar R², Girard LA³, Thorlund K⁴, Forsythe A⁵
¹Cytel Inc., Toronto, ON, Canada, ²Cytel Inc., Vancouver, BC, Canada, ³Cytel Inc., Montreal, QC, Canada, ⁴McMaster University, Hamilton, ON, Canada, ⁵Cytel Inc., Waltham, MA, USA

Presentation Documents

ISPOREU22_Liu_b.pdf

OBJECTIVES: SLRs are labor-intensive and time-consuming, however, they are required for regulatory and health technology assessments (HTA). The new PRISMA guidelines (Page et al. 2020) allows the inclusion of automated tools in screening. We developed the LiveSTART^TM AI tool utilizing transfer learning to perform the title and abstract (TiAb) review stage of SLR processes.

METHODS: LiveSTART^TM utilizes deep learning (12-layer neural network) to identify texts relevant to population, intervention/comparator, outcome, and study design (PICOS), and then hierarchically predicts publication acceptance based on given inclusion/exclusion criteria. LiveSTART^TM comprises 4 functions: 1) de-duplicate by grouping abstracts with the same or similar content; 2) provide probability of inclusion for each PICOS criteria; 3) predict the inclusion of each publication by comparing its abstract to the inclusion/exclusion criteria; and 4) predict the reason of rejection based on PICOS with the pre-specified hierarchy. LiveSTART^TM was trained on 59 SLR datasets with 65,328 publications, all of which were manually annotated by two independent reviewers and the discrepancies were verified by a third senior reviewer.

RESULTS: Fifty-nine datasets covered 17 oncology and 6 non-oncology indications with 47 clinical, 6 economic and 6 health-related quality-of-life SLRs. LiveSTART^TM validation showed an accuracy = 0.92, precision = 0.91, recall = 0.86, F1-score = 0.89, and AUC = 0.91 when compared to the results generated by two independent reviewers and a third verifier. LiveSTART^TM reviews 1000 publications in ≈12.5 minutes with no additional preparation of the datasets as compared to manual review. Hierarchical rejection by PICOS criteria allows traceability and flexibility of changes in SLR scope.

CONCLUSIONS: With the combination of the unique algorithm, rigorous training on broad datasets, and highly reliable and transparent output, LiveSTART^TM AI combined with a single reviewer could potentially yield comparable accuracy with significant time savings. However, adoption by regulatory and HTA authorities will be required.

Conference/Value in Health Info

2022-11, ISPOR Europe 2022, Vienna, Austria

Value in Health, Volume 25, Issue 12S (December 2022)

Code

MSR74

Topic

Methodological & Statistical Research, Study Approaches

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Explore Related HEOR by Topic

Methodology

Presentation