Assessing the Use of Variational Bayes for Large Real-World Data
Author(s)
Buckley B, O'Hagan A, Galligan M
University College Dublin, Dublin, D, Ireland
Presentation Documents
OBJECTIVES: Bayesian approaches to real-world observational analyses has been limited by the computational challenges applying the Markov-chain Monte-carlo (MCMC) approach to large real-world data. MCMC is considered the gold standard for Bayesian inference because, in the limit, MCMC is guaranteed to converge to the true posterior distribution. Approximate Bayesian inference via optimization of the variational evidence lower bound, usually called Variational Inference (VI), has been successfully demonstrated for other applications. We investigate the performance and characteristics of currently available R and Python VI software implementations for real-world observational data.
METHODS: Four R implementations and four Python implementations are compared. The implementations include several algorithms for VI: coordinate ascent mean-field VI, stochastic VI, automatic differentiation VI and two application-specific R packages. We applied these VI methods to a Bayesian latent class analysis of a large real-world dataset, OptumTM EHR, containing 1,133,214 paediatric patients with a rare Type 1 form of diabetes, incorporating missing data. We conclude the study with a simulation analysis to explore in more detail where differences occur.
RESULTS: Coordinate ascent and stochastic mean-field VI with pre-specified objective functions performed best for model predictive accuracy and computation. Both are practical alternatives to MCMC for logistic Bayesian models.
CONCLUSIONS: We find that automatic VI approaches require more effort and technical knowledge to set up for accurate posterior estimation and are very sensitive to algorithm hyperparameters. We find that several data characteristics common in clinical data, for example a very high proportion of zeroes, significantly affect the posterior accuracy of automatic VI methods compared with conditionally conjugate mean-field methods. We propose further investigation of automatic VI methods with a view to improving posterior accuracy and computational runtime from default settings.
Conference/Value in Health Info
Value in Health, Volume 25, Issue 12S (December 2022)
Code
RWD121
Topic
Methodological & Statistical Research, Real World Data & Information Systems
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Reproducibility & Replicability
Disease
SDC: Diabetes/Endocrine/Metabolic Disorders (including obesity), SDC: Pediatrics