THE ANALYTICAL FRAMEWORK OF CLINICAL TRIALS EVALUATING CLINICAL OUTCOMES OF ARTIFICIAL INTELLIGENCE-BASED DIGITAL HEALTH INTERVENTIONS: A SYSTEMATIC LITERATURE REVIEW
Author(s)
Vlad Zah, PhD, Filip Stanicic, PhD (c), Dimitrije Grbic, PhD (c);
ZRx Outcomes Research, Inc., Mississauga, ON, Canada
ZRx Outcomes Research, Inc., Mississauga, ON, Canada
OBJECTIVES: The aim of this systematic literature review (SLR) was to provide an analytical framework for clinical trials evaluating clinical outcomes of artificial intelligence-based digital health interventions (AI-DHI).
METHODS: The SLR was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis guidelines. A search was conducted in PubMed and Embase databases. A hand-search was conducted to ensure that all relevant studies were included. Population included patients with any disease using an AI-DHI as an intervention or comparator. Only publications written in English, designed as clinical trials exploring clinical outcomes were considered. The quality appraisal was performed using the National Institute for Health and Care Excellence checklist.
RESULTS: The final sample had 84 studies, mostly published from 2020 (86.9%). The most common indications were metabolic (28.6%), musculoskeletal (20.2%), and mental health disorders (19.0%). Most studies were conducted only in the US (33.3%). Most studies (75.0%) were controlled, parallel-group clinical trials with at least two arms. Most trials compared AI-DHI with the standard-of-care or waitlist groups, while pharmacological treatments and other DHIs were rarely used. Blinding should be clearly denoted, but 22.6% of trials did not report the blinding level. Although the type of intervention often precludes blinding (64.3% were open-label), a double-blinding is strongly recommended (only 6.0%). Only 9.5% studies were conducted at multiple sites across different countries. Appropriate sample sizes should be determined using a power analysis (59.5%). Dropout rates in the total sample and each study arm should be <20% at all endpoints (64.3%). Statistical tests were used based on the outcome measures and data type. Limitations also varied, but most studies reported small sample sizes and limited generalizability of findings.
CONCLUSIONS: The SLR results emphasized current methodological gaps and should guide researchers in designing future clinical trials with reliable evidence of AI-DHI efficacy.
METHODS: The SLR was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis guidelines. A search was conducted in PubMed and Embase databases. A hand-search was conducted to ensure that all relevant studies were included. Population included patients with any disease using an AI-DHI as an intervention or comparator. Only publications written in English, designed as clinical trials exploring clinical outcomes were considered. The quality appraisal was performed using the National Institute for Health and Care Excellence checklist.
RESULTS: The final sample had 84 studies, mostly published from 2020 (86.9%). The most common indications were metabolic (28.6%), musculoskeletal (20.2%), and mental health disorders (19.0%). Most studies were conducted only in the US (33.3%). Most studies (75.0%) were controlled, parallel-group clinical trials with at least two arms. Most trials compared AI-DHI with the standard-of-care or waitlist groups, while pharmacological treatments and other DHIs were rarely used. Blinding should be clearly denoted, but 22.6% of trials did not report the blinding level. Although the type of intervention often precludes blinding (64.3% were open-label), a double-blinding is strongly recommended (only 6.0%). Only 9.5% studies were conducted at multiple sites across different countries. Appropriate sample sizes should be determined using a power analysis (59.5%). Dropout rates in the total sample and each study arm should be <20% at all endpoints (64.3%). Statistical tests were used based on the outcome measures and data type. Limitations also varied, but most studies reported small sample sizes and limited generalizability of findings.
CONCLUSIONS: The SLR results emphasized current methodological gaps and should guide researchers in designing future clinical trials with reliable evidence of AI-DHI efficacy.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
MT28
Topic
Medical Technologies
Topic Subcategory
Digital Health
Disease
No Additional Disease & Conditions/Specialized Treatment Areas