Abstract
Objectives
To identify how machine learning (ML) approaches were implemented in mapping studies and to determine the extent to which ML improved performance compared with regression models (RMs).
Methods
A systematic literature search was conducted in 12 databases from inception to December 2023 to identify studies that applied ML to develop mapping algorithms. A data template was applied to extract data set information, source and target measures, ML approaches and RMs, mapping types (direct vs indirect), goodness-of-fit indicators (mean absolute error, mean squared error, root mean squared error, R-squared, and intraclass correlation coefficient), and validation methods. Differences in goodness-of-fit indicators between ML and RMs were summarized. Potential advantages and challenges for ML were further discussed.
Results
Thirteen mapping studies were identified, in which both ML and RM were adopted. Bayesian networks were the most frequently used ML approach (n = 6), followed by the least absolute shrinkage and selection operator (n = 4). The ordinary least square model was the most used RM (n = 8), followed by the censored least absolute deviation and multinomial logit models (n = 5 each). The average improvement in the goodness-of-fit of ML compared with that of RMs by indicators were 0.007 (mean absolute error), 0.004 (mean squared error), 0.058 (R-squared), 0.016 (intraclass correlation coefficient), and −0.0004 (root mean squared error).
Conclusions
There is an increasing number of studies using ML in developing mapping algorithms. Generally, a minor improvement of goodness-of-fit was observed compared with RMs when using mean-based comparisons. Issues such as how to interpret, apply, and externally validate the ML-based outputs would affect their implementation. Future studies are warranted to verify advantages of ML approaches.
Authors
Tianqi Hong Shitong Xie Xinran Liu Jing Wu Gang Chen