WHAT'S IN AN ABSTRACT? PICOS REPORTING FREQUENCIES TO INFORM AI-ASSISTED SCREENING

Author(s)

Allie Cichewicz, MSc¹, Marius Sauca, BSc, MSc², Kevin Kallmes, BS, MA, JD³;
¹Nested Knowledge, Boston, MA, USA, ²Nested Knowledge, UTRECHT, Netherlands, ³Nested Knowledge, St. Paul, MN, USA

Presentation Documents

ISPOR26_Cichewicz_MSR127_POSTER.pdf

OBJECTIVES: Large language models (LLMs) for abstract screening do not require training, but depend on explicit inclusion/exclusion criteria provided by researchers. However, incomplete reporting and ambiguous language in abstracts often require inferential judgments traditionally made by experienced reviewers. Optimizing prompts for AI-assisted screening requires understanding which concepts are reliably reported versus where gaps necessitate flexible criteria or human oversight. We aimed to assess what concepts are sufficiently reported in abstracts of randomized controlled trials (RCTs) and observational studies to inform LLM screening.
METHODS: PubMed was searched via Nested Knowledge to identify a random sample of RCT/observational abstracts assessing treatment efficacy/effectiveness and/or safety. Abstracts were reviewed for the presence of key Population, Intervention/Comparator, Outcome, Study Design (PICOS) concepts used to screen for eligibility in literature reviews.
RESULTS: Among 600 abstracts (300 RCTs, 300 observational studies), treatment/intervention (99.0% vs 98.7%, p=1.00), disease/condition (93.7% vs 92.7%, p=0.75), and sample size (96.7% vs 94.7%, p=0.32) demonstrated consistently high reporting frequencies across study types. However, RCT abstracts more frequently reported study design (99.0% vs 79.3%, p<0.001) and efficacy/effectiveness outcomes (97.7% vs 71.3%, p<0.001). Observational studies more frequently reported data source/setting (76.3% vs 23.7%, p<0.001), geography (48.7% vs 19.0%, p<0.001), and safety outcomes (85.3% vs 60.7%, p<0.001). Age was moderately reported in both RCTs and observational studies (37.7% vs 45.7%, p=0.057). Among RCTs, registration (19.0%) and trial phase (15.3%) were infrequently reported.
CONCLUSIONS: Due to highly consistent reporting rates, LLM-assisted abstract screening can reliably assess treatment, disease, and sample size across study types, but should account for study design-specific reporting patterns, particularly lower reporting of effectiveness outcomes in observational studies and lower safety, setting, and geography reporting in RCTs. Criteria requiring concepts with low or variable reporting frequencies (e.g., registration, trial phase, age) may benefit from flexible prompt language or tolerance for missing information to avoid inappropriate exclusions.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MSR127

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)