REPORTING OF STUDY DETAILS IN ABSTRACTS: INFORMATIVE FOR ARTIFICIAL INTELLIGENCE (AI) OR UNDERWHELMING?
Author(s)
Allie Cichewicz, MSc1, Marius Sauca, BSc, MSc2, Kevin Kallmes, BS, MA, JD3;
1Nested Knowledge, Boston, MA, USA, 2Nested Knowledge, UTRECHT, Netherlands, 3Nested Knowledge, St. Paul, MN, USA
1Nested Knowledge, Boston, MA, USA, 2Nested Knowledge, UTRECHT, Netherlands, 3Nested Knowledge, St. Paul, MN, USA
OBJECTIVES: Advancements in AI, particularly large language models, streamline the initial review phase to help researchers quickly identify relevant studies. However, accuracy is limited by study information described in abstracts. Checklists like CONSORT-A, PRISMA-A, and STARD have helped standardize reporting, but differences still exist between authors, journals, and study types. We aimed to synthesize evidence on reporting frequencies of critical study details in abstracts to identify trends and gaps to inform screening methods leveraging AI.
METHODS: A comprehensive, living review was undertaken to identify studies that evaluate the prevalence of key concepts commonly used to determine abstract-level eligibility for literature reviews: Study type, data source(s), study registration, patient population, treatment(s), sample size, and outcomes.
RESULTS: As of December 2025, 47 studies were included covering 37,177 abstracts, predominantly from randomized controlled trials (RCTs) (10,132 abstracts [27.3%]; n=33 studies), systematic reviews (742[2.0%];n=6), observational (650[1.8%];n=2), diagnostic accuracy (616[1.7%];n=4), RCT+observational (130[0.4%];n=1), and all study types (24,907[67.0%];n=1). Across all study types, intervention/treatment (88%) and disease/condition (86%) were consistently well-reported, while participant eligibility (60%), effectiveness outcomes (62%), and sample size (58%) showed moderate reporting; safety outcomes (38%), data source/setting (38%), and registration (27%) were poorly reported. Notably, diagnostic accuracy studies had strong sample size reporting (78%) but the poorest eligibility (26%) and registration (2%) details. Systematic reviews had strong study type identification (89%) but weak registration (6%), and RCTs showed particularly poor data source/setting reporting (32%) with highly variable study registration (1-99%).
CONCLUSIONS: The abstract-reporting evidence base is heavily skewed toward RCTs (70%), with limited representation of other study types. Most assessments used reporting guidelines (e.g., CONSORT-A) based on rigorous methodological requirements; this may overestimate gaps for AI-assisted screening, which may be able to assess basic concept presence. Future assessments focused on PICO-based concept presence rather than reporting quality may provide more actionable insights for AI-based screening prompts.
METHODS: A comprehensive, living review was undertaken to identify studies that evaluate the prevalence of key concepts commonly used to determine abstract-level eligibility for literature reviews: Study type, data source(s), study registration, patient population, treatment(s), sample size, and outcomes.
RESULTS: As of December 2025, 47 studies were included covering 37,177 abstracts, predominantly from randomized controlled trials (RCTs) (10,132 abstracts [27.3%]; n=33 studies), systematic reviews (742[2.0%];n=6), observational (650[1.8%];n=2), diagnostic accuracy (616[1.7%];n=4), RCT+observational (130[0.4%];n=1), and all study types (24,907[67.0%];n=1). Across all study types, intervention/treatment (88%) and disease/condition (86%) were consistently well-reported, while participant eligibility (60%), effectiveness outcomes (62%), and sample size (58%) showed moderate reporting; safety outcomes (38%), data source/setting (38%), and registration (27%) were poorly reported. Notably, diagnostic accuracy studies had strong sample size reporting (78%) but the poorest eligibility (26%) and registration (2%) details. Systematic reviews had strong study type identification (89%) but weak registration (6%), and RCTs showed particularly poor data source/setting reporting (32%) with highly variable study registration (1-99%).
CONCLUSIONS: The abstract-reporting evidence base is heavily skewed toward RCTs (70%), with limited representation of other study types. Most assessments used reporting guidelines (e.g., CONSORT-A) based on rigorous methodological requirements; this may overestimate gaps for AI-assisted screening, which may be able to assess basic concept presence. Future assessments focused on PICO-based concept presence rather than reporting quality may provide more actionable insights for AI-based screening prompts.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR177
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas