Assessing the Use of Variational Bayes for Large Real-World Data

Author(s)

Buckley B, O'Hagan A, Galligan M
University College Dublin, Dublin, D, Ireland

Presentation Documents

OBJECTIVES: Bayesian approaches to real-world observational analyses has been limited by the computational challenges applying the Markov-chain Monte-carlo (MCMC) approach to large real-world data. MCMC is considered the gold standard for Bayesian inference because, in the limit, MCMC is guaranteed to converge to the true posterior distribution. Approximate Bayesian inference via optimization of the variational evidence lower bound, usually called Variational Inference (VI), has been successfully demonstrated for other applications. We investigate the performance and characteristics of currently available R and Python VI software implementations for real-world observational data.

METHODS: Four R implementations and four Python implementations are compared. The implementations include several algorithms for VI: coordinate ascent mean-field VI, stochastic VI, automatic differentiation VI and two application-specific R packages. We applied these VI methods to a Bayesian latent class analysis of a large real-world dataset, OptumTM EHR, containing 1,133,214 paediatric patients with a rare Type 1 form of diabetes, incorporating missing data. We conclude the study with a simulation analysis to explore in more detail where differences occur.

RESULTS: Coordinate ascent and stochastic mean-field VI with pre-specified objective functions performed best for model predictive accuracy and computation. Both are practical alternatives to MCMC for logistic Bayesian models.

CONCLUSIONS: We find that automatic VI approaches require more effort and technical knowledge to set up for accurate posterior estimation and are very sensitive to algorithm hyperparameters. We find that several data characteristics common in clinical data, for example a very high proportion of zeroes, significantly affect the posterior accuracy of automatic VI methods compared with conditionally conjugate mean-field methods. We propose further investigation of automatic VI methods with a view to improving posterior accuracy and computational runtime from default settings.

Conference/Value in Health Info

2022-11, ISPOR Europe 2022, Vienna, Austria

Value in Health, Volume 25, Issue 12S (December 2022)

Code

RWD121

Topic

Methodological & Statistical Research, Real World Data & Information Systems

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Reproducibility & Replicability

Disease

SDC: Diabetes/Endocrine/Metabolic Disorders (including obesity), SDC: Pediatrics

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×