Leveraging Artificial Intelligence to Streamline Study Quality Assessment in Systematic Literature Reviews
Author(s)
Maria Arregui, PhD1, Maria Koufopoulou, MSc2, Sarah Cadarette, MPH3, Erika Wissinger, PhD3.
1Assistant director, Cencora, Hannover, Germany, 2Cencora, London, United Kingdom, 3Cencora, Conshohocken, PA, USA.
1Assistant director, Cencora, Hannover, Germany, 2Cencora, London, United Kingdom, 3Cencora, Conshohocken, PA, USA.
Presentation Documents
OBJECTIVES: Ensuring the reliability and validity of findings in a systematic literature review (SLR) relies on the meticulous evaluation of individual study quality. This process, involving a thorough scrutiny of study methodology and reporting, can be labor-intensive. Artificial intelligence (AI)-powered tools offer potential in streamlining this task. This study explores using an internal, closed system AI tool to assess study quality in an SLR.
METHODS: Tailored prompts, aligned with pertinent quality assessment tools, were generated for different study designs encompassed in the SLR. These tools included the Cochrane Risk-of-Bias tool for randomized controlled trials (RCTs; n=6), the Newcastle-Ottawa scale for prospective cohort studies and case-control studies (n=4), and the Motheral checklist for retrospective cohort and registry studies (n=18). The AI system was tasked with analyzing each study publication, responding to quality assessment inquiries, and providing comprehensive responses supported by verbatim text excerpts from the publication. To ensure precision and dependability, the AI-generated responses were qualitatively compared against assessments by a trained systematic reviewer.
RESULTS: In RCTs, AI-generated responses aligned with the systematic reviewer's evaluations in 81% of cases, with discrepancies mainly revolving around the interpretation of allocation concealment. For retrospective observational studies, the AI tool achieved an 83% concordance with the systematic reviewer, with discrepancies often linked to assessments of data source reliability and validity. In prospective observational studies, the AI tool exhibited an 82% agreement, with discrepancies primarily concerning the comparability of exposed and non-exposed cohorts based on design or analysis. Overall, the tool reliably delivered detailed responses and saved time.
CONCLUSIONS: The findings highlight the potential value of AI-powered tools in facilitating the quality assessment of studies within SLRs. While AI tools can reduce the time it takes to complete a task, human oversight remains indispensable to ensure the accuracy and robustness of the assessment process.
METHODS: Tailored prompts, aligned with pertinent quality assessment tools, were generated for different study designs encompassed in the SLR. These tools included the Cochrane Risk-of-Bias tool for randomized controlled trials (RCTs; n=6), the Newcastle-Ottawa scale for prospective cohort studies and case-control studies (n=4), and the Motheral checklist for retrospective cohort and registry studies (n=18). The AI system was tasked with analyzing each study publication, responding to quality assessment inquiries, and providing comprehensive responses supported by verbatim text excerpts from the publication. To ensure precision and dependability, the AI-generated responses were qualitatively compared against assessments by a trained systematic reviewer.
RESULTS: In RCTs, AI-generated responses aligned with the systematic reviewer's evaluations in 81% of cases, with discrepancies mainly revolving around the interpretation of allocation concealment. For retrospective observational studies, the AI tool achieved an 83% concordance with the systematic reviewer, with discrepancies often linked to assessments of data source reliability and validity. In prospective observational studies, the AI tool exhibited an 82% agreement, with discrepancies primarily concerning the comparability of exposed and non-exposed cohorts based on design or analysis. Overall, the tool reliably delivered detailed responses and saved time.
CONCLUSIONS: The findings highlight the potential value of AI-powered tools in facilitating the quality assessment of studies within SLRs. While AI tools can reduce the time it takes to complete a task, human oversight remains indispensable to ensure the accuracy and robustness of the assessment process.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
SA28
Topic
Study Approaches
Topic Subcategory
Literature Review & Synthesis
Disease
No Additional Disease & Conditions/Specialized Treatment Areas