THE ANALYTICAL FRAMEWORK OF CLINICAL TRIALS EVALUATING CLINICAL OUTCOMES OF ARTIFICIAL INTELLIGENCE-BASED DIGITAL HEALTH INTERVENTIONS: A SYSTEMATIC LITERATURE REVIEW

Author(s)

Vlad Zah, PhD, Filip Stanicic, PhD (c), Dimitrije Grbic, PhD (c);
ZRx Outcomes Research, Inc., Mississauga, ON, Canada

OBJECTIVES: The aim of this systematic literature review (SLR) was to provide an analytical framework for clinical trials evaluating clinical outcomes of artificial intelligence-based digital health interventions (AI-DHI).
METHODS: The SLR was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis guidelines. A search was conducted in PubMed and Embase databases. A hand-search was conducted to ensure that all relevant studies were included. Population included patients with any disease using an AI-DHI as an intervention or comparator. Only publications written in English, designed as clinical trials exploring clinical outcomes were considered. The quality appraisal was performed using the National Institute for Health and Care Excellence checklist.
RESULTS: The final sample had 84 studies, mostly published from 2020 (86.9%). The most common indications were metabolic (28.6%), musculoskeletal (20.2%), and mental health disorders (19.0%). Most studies were conducted only in the US (33.3%). Most studies (75.0%) were controlled, parallel-group clinical trials with at least two arms. Most trials compared AI-DHI with the standard-of-care or waitlist groups, while pharmacological treatments and other DHIs were rarely used. Blinding should be clearly denoted, but 22.6% of trials did not report the blinding level. Although the type of intervention often precludes blinding (64.3% were open-label), a double-blinding is strongly recommended (only 6.0%). Only 9.5% studies were conducted at multiple sites across different countries. Appropriate sample sizes should be determined using a power analysis (59.5%). Dropout rates in the total sample and each study arm should be <20% at all endpoints (64.3%). Statistical tests were used based on the outcome measures and data type. Limitations also varied, but most studies reported small sample sizes and limited generalizability of findings.
CONCLUSIONS: The SLR results emphasized current methodological gaps and should guide researchers in designing future clinical trials with reliable evidence of AI-DHI efficacy.

Conference/Value in Health Info

2026-05, ISPOR 2026, Philadelphia, PA, USA

Value in Health, Volume 29, Issue S6

Code

MT28

Topic

Medical Technologies

Topic Subcategory

Digital Health

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)