Assessment of Reasoning Agents for building Literature Search Strategies
Author(s)
Joshua Twaites, MS, Kevin Kallmes, BS, MA, JD, Karl Holub, BS;
Nested Knowledge, Inc., St. Paul, MN, USA
Nested Knowledge, Inc., St. Paul, MN, USA
OBJECTIVES: The application of Large language models (LLMs) to screening and extraction in Systematic Literature Reviews (SLRs) is well studied. However, current LLM search tools represent ‘black boxes’, lacking transparency or human feedback, and can hallucinate (including hallucinating MeSH terms). Embedding-based search methods may address hallucination, but at the cost of human understanding. We propose a novel approach that utilizes human-in-the-loop reasoning agents in building Boolean Search strings for SLRs and targeted literature reviews.
METHODS: We launched ‘Smart Search,’ a reasoning agent that employs LLM-based chain-of-thought reasoning and a generator-critic loop, into the Nested Knowledge SLR software. Smart Search iteratively generates and assesses Boolean strings based on users’ textual Research Questions and iterative chat-based user clarifications. To validate this, we provided the Objective/Aims statement from ten Cochrane SLRs to Smart Search and assessed Recall, using PubMed-indexed records that were included in the Cochrane SLRs as the gold standard. We repeated this test on twenty SLRs performed in the Nested Knowledge system, and also assessed black-box LLMs (specifically GPT) in the same tasks.
RESULTS: Cochrane reviews covered the following topics: Multiple Sclerosis, Non-small Cell Lung Cancer, Renal Cell Carcinoma, Subfertility, Non-alcoholic Fatty Liver Disease, Epilepsy, Human Immunodeficiency Virus, Statins, Ischemic Conditioning, and Prostate cancer. Smart Search had 76.8% Recall against Cochrane reviews and 79.6% Recall against reviews performed in Nested Knowledge, compared to 13.0% Recall with black-box LLM search construction.
CONCLUSIONS: Our results demonstrate the potential of human-in-the-loop reasoning agents to generate search strategies for SLRs and targeted reviews. Specifically, Smart Search outperformed LLMs and achieved acceptable Recall in validation against SLRs of diverse clinical topics. Further research, particularly comparison of searches built by reasoning agents against expert-drafted search strategies, are required to assess appropriateness of LLMs for SLR search strategies.
METHODS: We launched ‘Smart Search,’ a reasoning agent that employs LLM-based chain-of-thought reasoning and a generator-critic loop, into the Nested Knowledge SLR software. Smart Search iteratively generates and assesses Boolean strings based on users’ textual Research Questions and iterative chat-based user clarifications. To validate this, we provided the Objective/Aims statement from ten Cochrane SLRs to Smart Search and assessed Recall, using PubMed-indexed records that were included in the Cochrane SLRs as the gold standard. We repeated this test on twenty SLRs performed in the Nested Knowledge system, and also assessed black-box LLMs (specifically GPT) in the same tasks.
RESULTS: Cochrane reviews covered the following topics: Multiple Sclerosis, Non-small Cell Lung Cancer, Renal Cell Carcinoma, Subfertility, Non-alcoholic Fatty Liver Disease, Epilepsy, Human Immunodeficiency Virus, Statins, Ischemic Conditioning, and Prostate cancer. Smart Search had 76.8% Recall against Cochrane reviews and 79.6% Recall against reviews performed in Nested Knowledge, compared to 13.0% Recall with black-box LLM search construction.
CONCLUSIONS: Our results demonstrate the potential of human-in-the-loop reasoning agents to generate search strategies for SLRs and targeted reviews. Specifically, Smart Search outperformed LLMs and achieved acceptable Recall in validation against SLRs of diverse clinical topics. Further research, particularly comparison of searches built by reasoning agents against expert-drafted search strategies, are required to assess appropriateness of LLMs for SLR search strategies.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
P22
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
STA: Multiple/Other Specialized Treatments