REPORTING OF STUDY DETAILS IN ABSTRACTS: INFORMATIVE FOR ARTIFICIAL INTELLIGENCE (AI) OR UNDERWHELMING?

Author(s)

Allie Cichewicz, MSc1, Marius Sauca, BSc, MSc2, Kevin Kallmes, BS, MA, JD3;
1Nested Knowledge, Boston, MA, USA, 2Nested Knowledge, UTRECHT, Netherlands, 3Nested Knowledge, St. Paul, MN, USA
OBJECTIVES: Advancements in AI, particularly large language models, streamline the initial review phase to help researchers quickly identify relevant studies. However, accuracy is limited by study information described in abstracts. Checklists like CONSORT-A, PRISMA-A, and STARD have helped standardize reporting, but differences still exist between authors, journals, and study types. We aimed to synthesize evidence on reporting frequencies of critical study details in abstracts to identify trends and gaps to inform screening methods leveraging AI.
METHODS: A comprehensive, living review was undertaken to identify studies that evaluate the prevalence of key concepts commonly used to determine abstract-level eligibility for literature reviews: Study type, data source(s), study registration, patient population, treatment(s), sample size, and outcomes.
RESULTS: As of December 2025, 47 studies were included covering 37,177 abstracts, predominantly from randomized controlled trials (RCTs) (10,132 abstracts [27.3%]; n=33 studies), systematic reviews (742[2.0%];n=6), observational (650[1.8%];n=2), diagnostic accuracy (616[1.7%];n=4), RCT+observational (130[0.4%];n=1), and all study types (24,907[67.0%];n=1). Across all study types, intervention/treatment (88%) and disease/condition (86%) were consistently well-reported, while participant eligibility (60%), effectiveness outcomes (62%), and sample size (58%) showed moderate reporting; safety outcomes (38%), data source/setting (38%), and registration (27%) were poorly reported. Notably, diagnostic accuracy studies had strong sample size reporting (78%) but the poorest eligibility (26%) and registration (2%) details. Systematic reviews had strong study type identification (89%) but weak registration (6%), and RCTs showed particularly poor data source/setting reporting (32%) with highly variable study registration (1-99%).
CONCLUSIONS: The abstract-reporting evidence base is heavily skewed toward RCTs (70%), with limited representation of other study types. Most assessments used reporting guidelines (e.g., CONSORT-A) based on rigorous methodological requirements; this may overestimate gaps for AI-assisted screening, which may be able to assess basic concept presence. Future assessments focused on PICO-based concept presence rather than reporting quality may provide more actionable insights for AI-based screening prompts.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR177

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×