An Evaluation of PhlexNeuron, an Internal, Proprietary Artificial Intelligence (AI) Tool for Systematic Literature Review (SLR) Screening
Author(s)
Nicole Szydlowski, PharmD1, Brittany Galloway, PharmD2, Malia Gill, MS2, Derek Swiger, MS, PharmD2, Daniel Koppers, .2, Suresh Shankar, MBA2, Kimberly M. Ruiz, EdM2, Nicole Fusco, ScD2.
1Medical Communications Research Fellow, Cencora, Inc., Conshohocken, PA, USA, 2Cencora, Inc., Conshohocken, PA, USA.
1Medical Communications Research Fellow, Cencora, Inc., Conshohocken, PA, USA, 2Cencora, Inc., Conshohocken, PA, USA.
Presentation Documents
OBJECTIVES: Systematic literature reviews (SLRs) are an essential tool for evidence-based decision making. However, their rigorous methodology requires substantial time and cost investment. Several tools are available to perform literature screening assisted by artificial intelligence (AI); these tools typically require a training set of example references for each new SLR. The aim of this research is to assess the performance of a proprietary, internal AI tool for literature screening that does not require a training set which will likely produce time-savings.
METHODS: Title/abstract screening was previously completed in 4 SLRs by human reviewers. The SLRs evaluated clinical, costs and health-care resource utilization (HCRU), economic evaluation, and humanistic outcomes. Eligibility questions were generated using the population, intervention, comparator, outcome, and study design (PICOS) criteria from the original SLRs. The AI tool was prompted to answer the PICOS questions and to provide a screening recommendation (include, exclude, or uncertain). Analyses were conducted to compare human and AI results, including sensitivity and specificity.
RESULTS: The clinical, costs/HCRU, economic evaluation, and humanistic datasets included 4233, 1908, 1289, and 2637 references, respectively. When the AI screening recommendations were compared to human reviewers, the sensitivity and specificity estimates were 91%/65%, 89%/50%, 93%/45%, and 89%/71% for the clinical, costs/HCRU, economic evaluation, and humanistic SLRs, respectively. The AI tool also provided an explanation for its response to each PICOS question.
CONCLUSIONS: The AI literature screening tool was able to identify the majority of relevant articles with sensitivity estimates greater than 89% and specificity estimates greater than 45%. Therefore, AI-assisted screening using prompts based on the PICOS framework is feasible and the explanations alongside the AI responses to each PICOS question can increase transparency. These study results can be used to inform future refinement of AI tools that do not require training sets for SLR screening processes.
METHODS: Title/abstract screening was previously completed in 4 SLRs by human reviewers. The SLRs evaluated clinical, costs and health-care resource utilization (HCRU), economic evaluation, and humanistic outcomes. Eligibility questions were generated using the population, intervention, comparator, outcome, and study design (PICOS) criteria from the original SLRs. The AI tool was prompted to answer the PICOS questions and to provide a screening recommendation (include, exclude, or uncertain). Analyses were conducted to compare human and AI results, including sensitivity and specificity.
RESULTS: The clinical, costs/HCRU, economic evaluation, and humanistic datasets included 4233, 1908, 1289, and 2637 references, respectively. When the AI screening recommendations were compared to human reviewers, the sensitivity and specificity estimates were 91%/65%, 89%/50%, 93%/45%, and 89%/71% for the clinical, costs/HCRU, economic evaluation, and humanistic SLRs, respectively. The AI tool also provided an explanation for its response to each PICOS question.
CONCLUSIONS: The AI literature screening tool was able to identify the majority of relevant articles with sensitivity estimates greater than 89% and specificity estimates greater than 45%. Therefore, AI-assisted screening using prompts based on the PICOS framework is feasible and the explanations alongside the AI responses to each PICOS question can increase transparency. These study results can be used to inform future refinement of AI tools that do not require training sets for SLR screening processes.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
MSR121
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
SDC: Sensory System Disorders (Ear, Eye, Dental, Skin)