WHAT'S IN AN ABSTRACT? PICOS REPORTING FREQUENCIES TO INFORM AI-ASSISTED SCREENING
Author(s)
Allie Cichewicz, MSc1, Marius Sauca, BSc, MSc2, Kevin Kallmes, BS, MA, JD3;
1Nested Knowledge, Boston, MA, USA, 2Nested Knowledge, UTRECHT, Netherlands, 3Nested Knowledge, St. Paul, MN, USA
1Nested Knowledge, Boston, MA, USA, 2Nested Knowledge, UTRECHT, Netherlands, 3Nested Knowledge, St. Paul, MN, USA
OBJECTIVES: Large language models (LLMs) for abstract screening do not require training, but depend on explicit inclusion/exclusion criteria provided by researchers. However, incomplete reporting and ambiguous language in abstracts often require inferential judgments traditionally made by experienced reviewers. Optimizing prompts for AI-assisted screening requires understanding which concepts are reliably reported versus where gaps necessitate flexible criteria or human oversight. We aimed to assess what concepts are sufficiently reported in abstracts of randomized controlled trials (RCTs) and observational studies to inform LLM screening.
METHODS: PubMed was searched via Nested Knowledge to identify a random sample of RCT/observational abstracts assessing treatment efficacy/effectiveness and/or safety. Abstracts were reviewed for the presence of key Population, Intervention/Comparator, Outcome, Study Design (PICOS) concepts used to screen for eligibility in literature reviews.
RESULTS: Among 600 abstracts (300 RCTs, 300 observational studies), treatment/intervention (99.0% vs 98.7%, p=1.00), disease/condition (93.7% vs 92.7%, p=0.75), and sample size (96.7% vs 94.7%, p=0.32) demonstrated consistently high reporting frequencies across study types. However, RCT abstracts more frequently reported study design (99.0% vs 79.3%, p<0.001) and efficacy/effectiveness outcomes (97.7% vs 71.3%, p<0.001). Observational studies more frequently reported data source/setting (76.3% vs 23.7%, p<0.001), geography (48.7% vs 19.0%, p<0.001), and safety outcomes (85.3% vs 60.7%, p<0.001). Age was moderately reported in both RCTs and observational studies (37.7% vs 45.7%, p=0.057). Among RCTs, registration (19.0%) and trial phase (15.3%) were infrequently reported.
CONCLUSIONS: Due to highly consistent reporting rates, LLM-assisted abstract screening can reliably assess treatment, disease, and sample size across study types, but should account for study design-specific reporting patterns, particularly lower reporting of effectiveness outcomes in observational studies and lower safety, setting, and geography reporting in RCTs. Criteria requiring concepts with low or variable reporting frequencies (e.g., registration, trial phase, age) may benefit from flexible prompt language or tolerance for missing information to avoid inappropriate exclusions.
METHODS: PubMed was searched via Nested Knowledge to identify a random sample of RCT/observational abstracts assessing treatment efficacy/effectiveness and/or safety. Abstracts were reviewed for the presence of key Population, Intervention/Comparator, Outcome, Study Design (PICOS) concepts used to screen for eligibility in literature reviews.
RESULTS: Among 600 abstracts (300 RCTs, 300 observational studies), treatment/intervention (99.0% vs 98.7%, p=1.00), disease/condition (93.7% vs 92.7%, p=0.75), and sample size (96.7% vs 94.7%, p=0.32) demonstrated consistently high reporting frequencies across study types. However, RCT abstracts more frequently reported study design (99.0% vs 79.3%, p<0.001) and efficacy/effectiveness outcomes (97.7% vs 71.3%, p<0.001). Observational studies more frequently reported data source/setting (76.3% vs 23.7%, p<0.001), geography (48.7% vs 19.0%, p<0.001), and safety outcomes (85.3% vs 60.7%, p<0.001). Age was moderately reported in both RCTs and observational studies (37.7% vs 45.7%, p=0.057). Among RCTs, registration (19.0%) and trial phase (15.3%) were infrequently reported.
CONCLUSIONS: Due to highly consistent reporting rates, LLM-assisted abstract screening can reliably assess treatment, disease, and sample size across study types, but should account for study design-specific reporting patterns, particularly lower reporting of effectiveness outcomes in observational studies and lower safety, setting, and geography reporting in RCTs. Criteria requiring concepts with low or variable reporting frequencies (e.g., registration, trial phase, age) may benefit from flexible prompt language or tolerance for missing information to avoid inappropriate exclusions.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR127
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas