Abstract
Objectives
Heterogeneous treatment effects (HTEs) refer to differences in how individual patients or subgroups respond to the same treatment. Estimating HTEs helps target care to those most likely to benefit, improving outcomes and avoiding unnecessary interventions. Machine learning (ML) enables the use of real-world data (RWD) to estimate HTEs when randomized controlled trials are not feasible. However, practical guidance for applying these methods in health economics is lacking. To support method selection, we identified and categorized ML approaches to estimating HTEs in RWD and assessed the methodological quality of studies applying them.
Methods
We conducted a scoping review following Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews guidelines. PubMed, Scopus, Web of Science, EBSCO, and MEDLINE were searched for studies published between 2014 and 2025 that applied ML to estimate HTEs from RWD. Methodological quality was assessed using a standardized checklist.
Results
Of 1743 records screened, 74 met the inclusion criteria. We grouped the included studies into 3 categories: those using prediction-only approaches unsuited to HTE estimation (n = 8), those applying outcome modeling (n = 9), and those using customized conditional average treatment effect estimation (n = 58). Most innovations originated in the ML and statistics communities, with minimal uptake in health economics. Methodological quality was inconsistent and requires improvement.
Conclusions
ML methods for HTE estimation are increasingly applied to RWD. Tree-based models are most common, and interest in customized conditional average treatment effect approaches is growing. Better evaluation standards and more transparent reporting are needed for these methods to become reliable tools for health economics research.
Authors
Michael Möller Eva-Maria Wild Winnie Tan Jonas Schreyögg