Training and Validation Methods of Artificial Intelligence Tools For Automated Systematic Reviews In Pharmacoepidemiology and Health Technology Assessment: A Systematic Review

Author(s)

Aurore BERGAMASCO, PharmD, MSc1, Richard CHIV, MSc2, Yola Moride, PhD2.
1YOLARX Consultants, Paris, France, 2YOLARX Consultants, Montreal, QC, Canada.

Presentation Documents

OBJECTIVES: Systematic reviews (SRs) are pivotal in pharmacoepidemiology (PE) and health technology assessment (HTA) to guide decision-making. However, traditional manual processes for screening of search outputs and data extraction are labor-intensive and time-consuming. Since the introduction of ChatGPT, the use of artificial intelligence (AI) in automating SR workflows has garnered increasing attention. Despite the availability of several commercial AI tools, concerns about their transparency persist, even as NICE acknowledges their potential in evidence generation. This SR evaluated how commercially available AI tools for SR in PE and HTA are trained, validated, and tested.
METHODS: A systematic search of MEDLINE and Embase (01 Jan 2019- 02 Jan 2025) was conducted to identify studies on the training, validation, and testing methods of commercially available AI tools for SRs. Extracted data included tool name, purpose (screening, extraction and/or study quality assessment), stage of development, evaluation methods, and dataset characteristics (type, size, and therapeutic area). To minimize publication bias, pragmatic searches were performed. The protocol followed the PRISMA-P 2015 checklist and was submitted to the International Prospective Register of Systematic Reviews (PROSPERO CRD pending).
RESULTS: Of the 1,540 studies identified through literature and pragmatic searches, 96 were eligible. Most studies reported on Abstrackr, Rayyan, DistillerSR and Nested Knowledge. The majority of AI tool applications focused on title/abstract screening (75.9%), with data extraction accounting for 24.1%. Methods for training, validation, testing, and prompt engineering were described in 58.6%, 34.5%, 37.9%, and 37.9% of studies, respectively.
CONCLUSIONS: This SR highlights the growing adoption of AI tools for automating SR workflows in PE and HTA. While title/abstract screening remain the primary application, reporting of training, validation, and testing, methods varies widely, raising concerns about consistency and transparency. These findings underscore the need for standardized evaluation frameworks to ensure the reliability and reproducibility of AI tools in SRs.

Conference/Value in Health Info

2025-05, ISPOR 2025, Montréal, Quebec, CA

Value in Health, Volume 28, Issue S1

Code

MSR35

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×