1932

Abstract

I consider the development of Markov chain Monte Carlo (MCMC) methods, from late-1980s Gibbs sampling to present-day gradient-based methods and piecewise-deterministic Markov processes. In parallel, I show how these ideas have been implemented in successive generations of statistical software for Bayesian inference. These software packages have been instrumental in popularizing applied Bayesian modeling across a wide variety of scientific domains. They provide an invaluable service to applied statisticians in hiding the complexities of MCMC from the user while providing a convenient modeling language and tools to summarize the output from a Bayesian model. As research into new MCMC methods remains very active, it is likely that future generations of software will incorporate new methods to improve the user experience.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-122121-040905
2023-03-09
2024-04-30
Loading full text...

Full text loading...

/deliver/fulltext/statistics/10/1/annurev-statistics-122121-040905.html?itemId=/content/journals/10.1146/annurev-statistics-122121-040905&mimeType=html&fmt=ahah

Literature Cited

  1. Albert J, Chib S. 1993. Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88:669–79
    [Google Scholar]
  2. Andrieu C, Thoms J. 2008. A tutorial on adaptive MCMC. Stat. Comput. 18:4343–73
    [Google Scholar]
  3. Baydin AG, Pearlmutter BA, Radul AA, Siskind JM. 2018. Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res. 18:1–43
    [Google Scholar]
  4. Bertazzi A, Bierkens J, Dobson P. 2021. Approximations of piecewise deterministic Markov processes and their convergence properties. arXiv:2109.11827 [math.PR]
  5. Beskos A, Pillai N, Roberts G, Sanz-Serna J, Stuart A. 2013. Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli 19:1501–34
    [Google Scholar]
  6. Betancourt M. 2017. A conceptual introduction to Hamiltonian Monte Carlo. arXiv:1701.02434 [stat.ME]
  7. Bierkens J. 2016. Non-reversible Metropolis–Hastings. Stat. Comput. 26:1213–28
    [Google Scholar]
  8. Bierkens J. 2019. RZigZag: Zig-Zag sampler R Package Version 0.2.1. https://cran.r-project.org/package=RZigZag
  9. Bierkens J, Fearnhead P, Roberts G. 2016. The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data. Ann. Stat. 47:31288–320
    [Google Scholar]
  10. Blei DM, Kucukelbir A, McAuliffe JD. 2016. Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112:518859–77
    [Google Scholar]
  11. Bou-Rabee N, Sanz-Serna JM. 2017. Randomized Hamiltonian Monte Carlo. Ann. Appl. Probab. 27:42159–94
    [Google Scholar]
  12. Bouchard-Côté A, Vollmer SJ, Doucet A. 2018. The bouncy particle sampler: a nonreversible rejection-free Markov chain Monte Carlo method. J. Am. Stat. Assoc. 113:522855–67
    [Google Scholar]
  13. Breslow NE, Clayton DG. 1993. Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88:4219–25
    [Google Scholar]
  14. Brooks S, Gelman A, Jones G, Meng XL, eds. 2011. Handbook of Markov Chain Monte Carlo Boca Raton, FL: CRC
  15. Bürkner PC. 2017. brms: an R package for Bayesian multilevel models using Stan. J. Stat. Softw. 80:1–28
    [Google Scholar]
  16. Chen F, Lovász L, Pak I. 1999. Lifting Markov chains to speed up mixing. Proceedings of the 31st Annual ACM Symposium on Theory of Computing (STOC '99)275–81. New York: ACM
    [Google Scholar]
  17. Cowles MK, Carlin BP. 1996. Markov chain Monte Carlo convergence diagnostics: a comparative review. J. Am. Stat. Assoc. 91:434883–904
    [Google Scholar]
  18. Davis MHA. 1984. Piecewise-deterministic Markov processes: a general class of non-diffusion stochastic models. J. R. Stat. Soc. B 46:3353–88
    [Google Scholar]
  19. Davis TA. 2006. Direct Methods for Sparse Linear Systems. Philadelphia: SIAM
  20. de Valpine P, Paciorek C, Turek D, Michaud N, Anderson-Bergman C et al. 2022. NIMBLE: MCMC, particle filtering, and programmable hierarchical modeling R Package Version 0.12.2. https://r-nimble.org
  21. de Valpine P, Turek D, Paciorek C, Anderson-Bergman C, Temple Lang D, Bodik R 2017. Programming with models: writing statistical algorithms for general model structures with NIMBLE. J. Comput. Graph. Stat. 26:2403–13
    [Google Scholar]
  22. Denwood MJ. 2016. runjags: an R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. J. Stat. Softw. 71:91–25
    [Google Scholar]
  23. Diaconis P, Holmes S, Neal RM. 2000. Analysis of a nonreversible Markov chain sampler. Ann. Appl. Probab. 10:3726–52
    [Google Scholar]
  24. Duane S, Kennedy A, Pendleton BJ, Roweth D. 1987. Hybrid Monte Carlo. Phys. Lett. B 195:2216–22
    [Google Scholar]
  25. Fearnhead P, Bierkens J, Pollock M, Roberts GO. 2018. Piecewise deterministic Markov processes for continuous-time Monte Carlo. Stat. Sci. 33:3386–412
    [Google Scholar]
  26. Frühwirth-Schnatter S, Frühwirth R, Held L, Rue H. 2009. Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Stat. Comput. 19:4479–92
    [Google Scholar]
  27. Garey MR, Johnson DS, Stockmeyer LJ. 1976. Some simplified NP-complete graph problems. Theor. Comput. Sci. 1:3237–67
    [Google Scholar]
  28. Gelfand AE, Smith AFM. 1990. Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 85:410398–409
    [Google Scholar]
  29. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. 2014. Bayesian Data Analysis. Boca Raton, FL: CRC. , 3rd ed..
  30. Gelman A, Gilks WR, Roberts GO. 1997. Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 7:1110–20
    [Google Scholar]
  31. Gelman A, Lee D, Guo J. 2015. Stan: a probabilistic programming language for Bayesian inference and optimization. J. Educ. Behav. Stat. 40:5530–43
    [Google Scholar]
  32. Gelman A, Rubin DB. 1992. Inference from iterative simulation using multiple sequences. Stat. Sci. 7:4457–72
    [Google Scholar]
  33. Geman S, Geman D. 1984. Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6:721–41
    [Google Scholar]
  34. Gilks WR 1992. Derivative-free adaptive rejection sampling for Gibbs sampling. Proceedings of the 4th Valencia International Meeting (Bayesian Statistics 4) J Bernardo, J Berger, AP Dawid, AFM Smith 641–49. Oxford, UK: Clarendon
    [Google Scholar]
  35. Gilks WR, Best NG, Tan KKC. 1995. Adaptive rejection Metropolis sampling within Gibbs sampling. J. R. Stat. Soc. C 44:4455–72
    [Google Scholar]
  36. Gilks WR, Clayton DG, Spiegelhalter DJ, Best NG, McNeil AJ et al. 1993. Modelling complexity: applications of Gibbs sampling in medicine. J. R. Stat. Soc. B 55:139–52
    [Google Scholar]
  37. Gilks WR, Thomas A, Spiegelhalter DJ. 1994. A language and program for complex Bayesian modelling. J. R. Stat. Soc. D 43:1169–77
    [Google Scholar]
  38. Gilks WR, Wild P. 1992. Adaptive rejection sampling for Gibbs sampling. J. R. Stat. Soc. C 41:2337–48
    [Google Scholar]
  39. Goodman ND, Mansinghka VK, Roy D, Bonawitz K, Tenenbaum JB. 2008. Church: a language for generative models. Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI'08)220–29. Arlington, VA: AUAI
    [Google Scholar]
  40. Goodrich B, Gabry J, Ali I, Brilleman S 2020. rstanarm: Bayesian applied regression modeling via Stan R Package Version 2.21.1. https://mc-stan.org/rstanarm
  41. Goudie RJB, Turner RM, De Angelis D, Thomas A 2020. MultiBUGS: a parallel implementation of the BUGS modeling framework for faster Bayesian inference. J. Stat. Softw. 95:71–20
    [Google Scholar]
  42. Green PJ. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:4711–32
    [Google Scholar]
  43. Green PJ, Latuszyński K, Pereyra M, Robert CP. 2015. Bayesian computation: a summary of the current state, and samples backwards and forwards. Stat. Comput. 25:835–62
    [Google Scholar]
  44. Guennebaud G, Jacob B et al. 2010. Eigen, version 3 Template Library. http://eigen.tuxfamily.org
  45. Gustafson P. 1998. A guided walk Metropolis algorithm. Stat. Comput. 8:357–64
    [Google Scholar]
  46. Hastings WK. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:197–109
    [Google Scholar]
  47. Hobert J. 2011. The data augmentation algorithm: theory and methodology. See Brooks et al. 2011 253–94
  48. Hoffman MD, Gelman A. 2014. The No-U-Turn Sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15:1593–623
    [Google Scholar]
  49. Holmes C, Held L. 2006. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1:1145–68
    [Google Scholar]
  50. Kellner K. 2021. jagsUI: a wrapper around `rjags' to streamline JAGS analyses R Package Version 1.5.2. https://CRAN.R-project.org/package=jagsUI
  51. Lauritzen SL, Dawid AP, Larsen BN, Leimer HG. 1990. Independence properties of directed Markov fields. Networks 20:5491–505
    [Google Scholar]
  52. Lauritzen SL, Spiegelhalter DJ. 1988. Local computations with probabilities on graphical structures and their application to expert systems. J. R. Stat. Soc. B 50:2157–224
    [Google Scholar]
  53. Lewis PAW, Shedler GS. 2012. Simulation of non-homogeneous Poisson processes by thinning. Naval Res. Logist. Q. 26:3403–13
    [Google Scholar]
  54. Livingstone S, Betancourt M, Byrne S, Girolami M. 2019. On the geometric ergodicity of Hamiltonian Monte Carlo. Bernoulli 25:4A3109–38
    [Google Scholar]
  55. Livingstone S, Zanella G. 2022. The Barker proposal: combining robustness and efficiency in gradient-based MCMC. J. R. Stat. Soc. B 84:2496–523
    [Google Scholar]
  56. Lunn D. 2003. WinBUGS development interface (WBDev). ISBA Bull. 10:310–11
    [Google Scholar]
  57. Lunn D, Best N, Whittaker J. 2009. Generic reversible jump MCMC using graphical models. Stat. Comput. 19:395
    [Google Scholar]
  58. Lunn D, Jackson C, Best N, Thomas A, Spiegelhalter D 2012. The BUGS Book: A Practical Introduction to Bayesian Analysis. London: CRC/Chapman & Hall
  59. Lunn D, Thomas A, Best N, Spiegelhalter D. 2000. WinBUGS, a Bayesian modeling framework: concepts, structure and extensibility. Stat. Comput. 10:325–37
    [Google Scholar]
  60. Lunn DJ, Wakefield J, Thomas A, Best N, Spiegelhalter D. 1999. PKBugs: an efficient interface for population PK/PD within WinBUGS. Stat. Softw.,: Cambridge Univ., Cambridge, UK. https://www.mrc-bsu.cam.ac.uk/software/bugs/the-bugs-project-winbugs/winbugs-development/pkbugs-an-efficient-interface-for-population-pk-pd-within-winbugs
  61. Mayrink VD, Duarte JDN, Demarqui FN. 2021. pexm: a JAGS module for applications involving the piecewise exponential distribution. J. Stat. Softw. 100:81–28
    [Google Scholar]
  62. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. 1953. Equation of state calculations by fast computing machines. J. Chem. Phys. 21:61087–92
    [Google Scholar]
  63. Metropolis N, Ulam S. 1949. The Monte Carlo method. J. Am. Stat. Assoc. 44:247335–41
    [Google Scholar]
  64. Michaud N, de Valpine P, Turek D, Paciorek CJ, Nguyen D. 2021. Sequential Monte Carlo methods in the nimble and nimbleSMC R packages. J. Stat. Softw. 100:31–39
    [Google Scholar]
  65. Neal RM 1999. Suppressing random walks in Markov chain Monte Carlo using ordered overrelaxation. Learning in Graphical Models MI Jordan 205–28. Cambridge, MA: MIT Press
    [Google Scholar]
  66. Neal RM. 2003. Slice sampling. Ann. Stat. 31:3705–67
    [Google Scholar]
  67. Neal RM. 2011. MCMC using Hamiltonian dynamics. See Brooks et al. 2011 113–62
  68. NIMBLE Dev. Team 2021. nimbleSMC: sequential Monte Carlo methods for ‘nimble.’ R Package Version 0.10.1. https://cran.r-project.org/package=nimbleSMC
  69. Peters EAJF, de With G. 2012. Rejection-free Monte Carlo sampling for general potentials. Phys. Rev. E 85:026703
    [Google Scholar]
  70. Pfeffer A. 2016. Practical Probabilistic Programming. Shelter Island, NY: Manning
  71. Plummer M. 2017. JAGS Version 4.3.0 User Manual. https://sourceforge.net/projects/mcmc-jags/files/Manuals/4.x
  72. Plummer M. 2021. rjags: Bayesian graphical models using MCMC R Package Version 4-13. https://CRAN.R-project.org/package=rjags
  73. Plummer M, Best N, Cowles K, Vines K. 2006. CODA: convergence diagnosis and output analysis for MCMC. R News 6:17–11
    [Google Scholar]
  74. Polson NG, Scott JG, Windle J. 2013. Bayesian inference for logistic models using Pólya–Gamma latent variables. J. Am. Stat. Assoc. 108:5041339–49
    [Google Scholar]
  75. Riddell A, Hartikainen A, Carter M. 2021. PyStan version 3.5.0. PyPI Statistical Software. https://pypi.org/project/pystan
  76. Riou-Durand L, Vogrinc J. 2022. Metropolis adjusted Langevin trajectories: a robust alternative to Hamiltonian Monte Carlo. arXiv:2202.13230 [stat.CO]
  77. Roberts GO, Rosenthal JS. 1998. Optimal scaling of discrete approximations to Langevin diffusions. J. R. Stat. Soc. B 60:1255–68
    [Google Scholar]
  78. Roberts GO, Tweedie RL. 1996. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2:4341–63
    [Google Scholar]
  79. Rubin DB. 1981. Estimation in parallel randomized experiments. J. Educ. Behav. Stat. 6:4377–401
    [Google Scholar]
  80. Rue H, Martino S, Chopin N. 2009. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. B 71:2319–92
    [Google Scholar]
  81. Salvatier J, Wiecki T, Fonnesbeck C. 2016. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2:e55
    [Google Scholar]
  82. Sherlock C, Thiery AH. 2022. A discrete bouncy particle sampler. Biometrika 109:2335–49
    [Google Scholar]
  83. Spiegelhalter D, Thomas A, Best N, Gilks W. 1996a. BUGS 0.5—Bayesian Inference Using Gibbs Sampling (Version II) Cambridge, UK: Inst. Public Health
    [Google Scholar]
  84. Spiegelhalter D, Thomas A, Best N, Gilks W. 1996b. BUGS Examples—Version 0.5. Vol. 1 Cambridge, UK: MRC Biostat. Unit
  85. Spiegelhalter D, Thomas A, Best N, Gilks W. 1996c. BUGS Examples—Version 0.5. Vol. 2 Cambridge, UK: MRC Biostat. Unit
  86. Spiegelhalter D, Thomas A, Best N, Lunn D. 2003. WinBUGS User Manual, Version 1.4. Cambridge, UK: MRC Biostat. Unit https://www.mrc-bsu.cam.ac.uk/software/bugs/the-bugs-project-winbugs
    [Google Scholar]
  87. Stan Dev. Team 2021. RStan: the R interface to Stan R Package Version 2.21.3
  88. Stan Dev. Team 2022. Stan User's Guide and Reference Manual 2.30 https://mc-stan.org/users/documentation
  89. Sturtz S, Ligges U, Gelman A. 2005. R2WinBUGS: a package for running WinBUGS from R. J. Stat. Softw. 12:31–16
    [Google Scholar]
  90. Su YS, Yajima M. 2015. R2jags: using R to run JAGS R Package Version 0.5-7. https://CRAN.R-project.org/package=R2jags
  91. Thomas A. 2006. The BUGS language. R News 6:117–21
    [Google Scholar]
  92. Thomas A, Best N, Lunn D, Arnold R, Spiegelhalter D 2004. GeoBUGS User Manual Cambridge, UK: MRC Biostat. Unit https://www.mrc-bsu.cam.ac.uk/software/bugs/thebugs-project-geobugs
  93. Thomas A, O'Hara B, Ligges U, Sturtz S 2006. Making BUGS open. R News 6:112–17
    [Google Scholar]
  94. Turitsyn KS, Chertkov M, Vucelja M. 2011. Irreversible Monte Carlo algorithms for efficient sampling. Physica D 240:4410–14
    [Google Scholar]
  95. Vats D, Knudson C. 2021. Revisiting the Gelman–Rubin diagnostic. Stat. Sci. 36:4518–29
    [Google Scholar]
  96. Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner PC. 2021. Rank-normalization, folding, and localization: an improved for assessing convergence of MCMC (with discussion). Bayesian Anal. 16:2667–718
    [Google Scholar]
  97. Wabersich D, Vandekerckhove J. 2014. Extending JAGS: a tutorial on adding custom distributions to JAGS (with a diffusion model example). Behav. Res. Methods 46:15–28
    [Google Scholar]
  98. Wakefield JC, Aarons L, Racine-Poon A 1999. The Bayesian approach to population pharmacokinetic/pharmacodynamic modelling. Case Studies in Bayesian Statistics, Vol. 4 BP Carlin, AL Carriquiry, C Gatsonis, A Gelman, RE Kass et al.205–65. New York: Springer
    [Google Scholar]
  99. Wetzels R, Lee M, Wagenmakers E. 2010. Bayesian inference using WBDev: a tutorial for social scientists. Behav. Res. Methods 42:884–97
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-122121-040905
Loading
/content/journals/10.1146/annurev-statistics-122121-040905
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error