FROM DATA TO INSIGHTS: VALIDATING ARTIFICIAL INTELLIGENCE (AI)-GENERATED WRITING IN EVIDENCE SYNTHESIS
Author(s)
Allie Cichewicz, MSc;
Independent Consultant, Boston, MA, USA
Independent Consultant, Boston, MA, USA
OBJECTIVES: As generative AI is increasingly integrated into literature review software, automated written synthesis remains a key unmet capability and a challenge for reviewers to validate. Following the release of Smart Insights in Nested Knowledge, this work aims to evaluate the quality and reliability of AI-generated narrative synthesis using a novel, structured framework for assessing scientific writing produced by AI.
METHODS: Data were extracted for three oncology reviews using Adaptive Smart Tags in Nested Knowledge: clinical effectiveness from real-world evidence; clinical efficacy and safety from randomized trials; comparative effectiveness from matching-adjusted indirect comparisons. For all tagged data, AI-generated summaries and supporting claims were produced via Smart Insights, with each summary and claim linked back to the citation and evidence for traceability. Each summary was evaluated across six domains: Faithfulness to Sources (Source), Citation Accuracy and Integrity (Citation), Synthesis Quality (Synthesis), Completeness and Representativeness (Completeness), Nuance and Uncertainty Handling (Nuance), and Writing Quality (Writing) on a scoring system ranging from 1 (poor) to 5 (excellent). To assess consistency, Smart Insights was run twice on the same datasets; outputs were compared qualitatively.
RESULTS: Smart Insights demonstrated the strongest performance on Citation (mean score 4.9/5), Synthesis (4.4), and Writing (4.3), followed by Source (3.6) and Completeness (3.4). Performance was lowest on Nuance (2.7). No hallucinated citations were identified; however, some claims omitted relevant evidence or citations. Repeated Insights generations produced minimal, non-meaningful differences, though greater textual variation was observed for summaries of larger evidence bases and in qualitative domains (e.g., study limitations or author conclusions).
CONCLUSIONS: AI-generated written synthesis within Nested Knowledge achieves high performance in citation accuracy and integrity, coherence, and clarity, supporting its use as an efficiency-enhancing tool in evidence synthesis. Limitations in capturing nuance and uncertainty underscore the need for structured validation and human oversight of any AI-generated summary.
METHODS: Data were extracted for three oncology reviews using Adaptive Smart Tags in Nested Knowledge: clinical effectiveness from real-world evidence; clinical efficacy and safety from randomized trials; comparative effectiveness from matching-adjusted indirect comparisons. For all tagged data, AI-generated summaries and supporting claims were produced via Smart Insights, with each summary and claim linked back to the citation and evidence for traceability. Each summary was evaluated across six domains: Faithfulness to Sources (Source), Citation Accuracy and Integrity (Citation), Synthesis Quality (Synthesis), Completeness and Representativeness (Completeness), Nuance and Uncertainty Handling (Nuance), and Writing Quality (Writing) on a scoring system ranging from 1 (poor) to 5 (excellent). To assess consistency, Smart Insights was run twice on the same datasets; outputs were compared qualitatively.
RESULTS: Smart Insights demonstrated the strongest performance on Citation (mean score 4.9/5), Synthesis (4.4), and Writing (4.3), followed by Source (3.6) and Completeness (3.4). Performance was lowest on Nuance (2.7). No hallucinated citations were identified; however, some claims omitted relevant evidence or citations. Repeated Insights generations produced minimal, non-meaningful differences, though greater textual variation was observed for summaries of larger evidence bases and in qualitative domains (e.g., study limitations or author conclusions).
CONCLUSIONS: AI-generated written synthesis within Nested Knowledge achieves high performance in citation accuracy and integrity, coherence, and clarity, supporting its use as an efficiency-enhancing tool in evidence synthesis. Limitations in capturing nuance and uncertainty underscore the need for structured validation and human oversight of any AI-generated summary.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MSR164
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
SDC: Oncology