PERFORMANCE OF ADAPTIVE SMART TAGS IN NESTED KNOWLEDGE FOR AUTOMATED EXTRACTION OF STUDY CHARACTERISTICS
Author(s)
Priccila Zuchinali, PhD1, Sophie Yoon, MPH2, Joanna Kamar, MPH1, Amber Martin, BSc2;
1Thermo Fisher Scientific, Montreal, QC, Canada, 2Thermo Fisher Scientific, Waltham, MA, USA
1Thermo Fisher Scientific, Montreal, QC, Canada, 2Thermo Fisher Scientific, Waltham, MA, USA
OBJECTIVES: Nested Knowledge (NK) is an evidence synthesis platform used in systematic literature reviews (SLRs). This study evaluated the accuracy of Adaptive Smart Tags (ASTs) in NK for extracting study characteristics.
METHODS: An SLR of randomized trials evaluating efficacy and safety of treatments for refractory chronic cough was conducted. Study characteristics were extracted using ASTs, which leverage OpenAI large language models within NK to identify relevant text or numeric values for predefined data elements. AST development involved (1) creating question-based prompts (“tags”) for each data element and organizing them hierarchically to convey concepts and relationships to the model, which searched full-text publications for optimal responses, and (2) applying a human-in-the-loop approach in which two full-text articles were used to pilot and refine prompts prior to deployment across all studies.
RESULTS: AST performance was evaluated across 18 study characteristics in 37 studies. Accuracy was highest for bibliographic and core design elements, with correct information and formatting achieved for publication type (n=35; 95%), trial registration number (n=35; 95%), comorbidities (n=34; 92%), and interventions (n=29; 78%). Availability of key outcomes was frequently captured (range: 30-33 studies) for most of the outcomes of interest, including cough severity, 24-hour cough frequency, urge-to-cough, Leicester Cough Questionnaire scores, and other patient-reported outcomes. Performance declined for complex or inconsistently reported elements. Subgroup reporting showed poor accuracy, with incorrect information identified in 32 studies. Formatting errors were common for comparator identification (n=30; 81%), timepoints assessed (n=20; 54%), and trial name (n=15; 41%).
CONCLUSIONS: Within NK, AI-driven ASTs showed good performance for structured, consistently reported study characteristics but were less reliable for nuanced, complex, or variably reported data elements. A human-in-the-loop workflow remains essential to ensure accuracy, particularly for subgroup data and detailed outcome specifications.
METHODS: An SLR of randomized trials evaluating efficacy and safety of treatments for refractory chronic cough was conducted. Study characteristics were extracted using ASTs, which leverage OpenAI large language models within NK to identify relevant text or numeric values for predefined data elements. AST development involved (1) creating question-based prompts (“tags”) for each data element and organizing them hierarchically to convey concepts and relationships to the model, which searched full-text publications for optimal responses, and (2) applying a human-in-the-loop approach in which two full-text articles were used to pilot and refine prompts prior to deployment across all studies.
RESULTS: AST performance was evaluated across 18 study characteristics in 37 studies. Accuracy was highest for bibliographic and core design elements, with correct information and formatting achieved for publication type (n=35; 95%), trial registration number (n=35; 95%), comorbidities (n=34; 92%), and interventions (n=29; 78%). Availability of key outcomes was frequently captured (range: 30-33 studies) for most of the outcomes of interest, including cough severity, 24-hour cough frequency, urge-to-cough, Leicester Cough Questionnaire scores, and other patient-reported outcomes. Performance declined for complex or inconsistently reported elements. Subgroup reporting showed poor accuracy, with incorrect information identified in 32 studies. Formatting errors were common for comparator identification (n=30; 81%), timepoints assessed (n=20; 54%), and trial name (n=15; 41%).
CONCLUSIONS: Within NK, AI-driven ASTs showed good performance for structured, consistently reported study characteristics but were less reliable for nuanced, complex, or variably reported data elements. A human-in-the-loop workflow remains essential to ensure accuracy, particularly for subgroup data and detailed outcome specifications.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR189
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas