1932

Abstract

Markov chain Monte Carlo is the engine of modern Bayesian statistics, being used to approximate the posterior and derived quantities of interest. Despite this, the issue of how the output from a Markov chain is postprocessed and reported is often overlooked. Convergence diagnostics can be used to control bias via burn-in removal, but these do not account for (common) situations where a limited computational budget engenders a bias-variance trade-off. The aim of this article is to review state-of-the-art techniques for postprocessing Markov chain output. Our review covers methods based on discrepancy minimization, which directly address the bias-variance trade-off, as well as general-purpose control variate methods for approximating expected quantities of interest.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-040220-091727
2022-03-07
2024-12-04
Loading full text...

Full text loading...

/deliver/fulltext/statistics/9/1/annurev-statistics-040220-091727.html?itemId=/content/journals/10.1146/annurev-statistics-040220-091727&mimeType=html&fmt=ahah

Literature Cited

  1. Andradóttir S, Heyman DP, Ott TJ. 1993. Variance reduction through smoothing and control variates for Markov chain simulations. ACM Trans. Model. Comput. Simul. 3:3167–89
    [Google Scholar]
  2. Assaraf R, Caffarel M. 1999. Zero-variance principle for Monte Carlo algorithms. Phys. Rev. Lett. 83:234682–85
    [Google Scholar]
  3. Assaraf R, Caffarel M. 2003. Zero-variance zero-bias principle for observables in quantum Monte Carlo: application to forces. J. Chem. Phys. 119:2010536–52
    [Google Scholar]
  4. Atchadé YF, Perron F. 2005. Improving on the independent Metropolis-Hastings algorithm. Stat. Sin. 15:13–18
    [Google Scholar]
  5. Baker J, Fearnhead P, Fox EB, Nemeth C 2019. Control variates for stochastic gradient MCMC. Stat. Comput. 29:3599–615
    [Google Scholar]
  6. Barp A, Oates CJ, Porcu E, Girolami M. 2021. A Riemann–Stein kernel method. Bernoulli In press
    [Google Scholar]
  7. Belomestny D, Iosipoi L, Moulines É, Naumov A, Samsonov S. 2020a. Variance reduction for dependent sequences with applications to stochastic gradient MCMC. arXiv:2008.06858 [math.ST]
  8. Belomestny D, Iosipoi L, Moulines É, Naumov A, Samsonov S. 2020b. Variance reduction for Markov chains with application to MCMC. Stat. Comput. 30:4973–97
    [Google Scholar]
  9. Belomestny D, Iosipoi L, Paris Q, Zhivotovskiy N 2021. Empirical variance minimization with applications in variance reduction and optimal control. Bernoulli. In press
    [Google Scholar]
  10. Berger JO. 2013. Statistical Decision Theory and Bayesian Analysis New York: Springer
    [Google Scholar]
  11. Berlinet A, Thomas-Agnan C. 2011. Reproducing Kernel Hilbert Spaces in Probability and Statistics New York: Springer
    [Google Scholar]
  12. Bissiri PG, Holmes CC, Walker SG. 2016. A general framework for updating belief distributions. J. R. Stat. Soc. Ser. B 78:51103–30
    [Google Scholar]
  13. Biswas N, Jacob PE, Vanetti P 2019. Estimating convergence of Markov chains with L-lag couplings. Advances in Neural Information Processing Systems 32 H Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, E Fox, R Garnett Red Hook, NY: Curran
    [Google Scholar]
  14. Briol FX, Oates CJ, Cockayne J, Chen WY, Girolami M 2017. On the sampling problem for kernel quadrature. Proceedings of the 34th International Conference on Machine Learning D Precup, YW Teh 586–95 N.p. PMLR
    [Google Scholar]
  15. Briol FX, Oates C, Girolami M, Osborne M, Sejdinovic D 2019. Probabilistic integration: a role in statistical computation?. Stat. Sci. 34:11–22
    [Google Scholar]
  16. Brooks S, Gelman A, Jones G, Meng XL, eds. 2011. Handbook of Markov Chain Monte Carlo Boca Raton, FL: Chapman and Hall/CRC
    [Google Scholar]
  17. Brooks SP, Gelman A. 1998. General methods for monitoring convergence of iterative simulations. J. Comput. Graph. Stat. 7:4434–55
    [Google Scholar]
  18. Brosse N, Durmus A, Meyn S, Moulines E, Radhakrishnan A 2019. Diffusion approximations and control variates for MCMC. arXiv:1808.01665 [stat.ME]
  19. Bühlmann P, Van De Geer S. 2011. Statistics for High-Dimensional Data: Methods, Theory and Applications New York: Springer
    [Google Scholar]
  20. Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B et al. 2017. Stan: A probabilistic programming language. J. Stat. Softw. 76:1)
    [Google Scholar]
  21. Chen WY, Barp A, Briol FX, Gorham J, Girolami M et al. 2019. Stein point Markov chain Monte Carlo. Proceedings of the 36th International Conference on Machine Learning K Chaudhuri, R Salakhutdinov 1011–21 N.p.: PMLR
    [Google Scholar]
  22. Chen WY, Mackey L, Gorham J, Briol FX, Oates CJ 2018. Stein points. Proceedings of the 35th International Conference on Machine Learning J Dy, A Krause 844–53 N.p.: PMLR
    [Google Scholar]
  23. Chopin N. 2002. A sequential particle filter method for static models. Biometrika 89:3539–52
    [Google Scholar]
  24. Chwialkowski K, Strathmann H, Gretton A 2016. A kernel test of goodness of fit. Proceedings of The 33rd International Conference on Machine Learning MF Balcan, KQ Weinberger 2606–15 N.p.: PMLR
    [Google Scholar]
  25. Dellaportas P, Kontoyiannis I. 2012. Control variates for estimation based on reversible Markov chain Monte Carlo samplers. J. R. Stat. Soc. Ser. B 74:1133–61
    [Google Scholar]
  26. DeVore RA. 1998. Nonlinear approximation. Acta Numer. 7:51–150
    [Google Scholar]
  27. Dick J, Pillichshammer F. 2010. Digital Nets and Sequences: Discrepancy Theory and Quasi–Monte Carlo Integration Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  28. Duane S, Kennedy AD, Pendleton BJ, Roweth D. 1987. Hybrid Monte Carlo. Phys. Lett. B 195:2216–22
    [Google Scholar]
  29. Flegal JM, Jones GL. 2010. Batch means and spectral variance estimators in Markov chain Monte Carlo. Ann. Stat. 38:21034–70
    [Google Scholar]
  30. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. 2013. Bayesian Data Analysis Boca Raton, FL: Chapman and Hall/CRC. , 3rd ed..
    [Google Scholar]
  31. Gelman A, Carlin JB, Stern HS, Rubin DB. 2003. Bayesian Data Analysis Boca Raton, FL: Chapman and Hall/CRC. , 2nd ed..
    [Google Scholar]
  32. Gelman A, Rubin DB. 1992. Inference from iterative simulation using multiple sequences. Stat. Sci. 7:4457–72
    [Google Scholar]
  33. Geyer CJ. 1992. Practical Markov chain Monte Carlo. Stat. Sci. 7:4473–83
    [Google Scholar]
  34. Gorham J, Duncan AB, Vollmer SJ, Mackey L. 2019. Measuring sample quality with diffusions. Ann. Appl. Probab. 29:52884–928
    [Google Scholar]
  35. Gorham J, Mackey L 2015. Measuring sample quality with Stein's method. Advances in Neural Information Processing Systems 28 C Cortes, N Lawrence, D Lee, M Sugiyama, R Garnett 226–34 Cambridge, MA: MIT Press
    [Google Scholar]
  36. Gorham J, Mackey L. 2017. Measuring sample quality with kernels. Proceedings of the 34th International Conference on Machine Learning D Precup, YW Teh 1292–301 N.p. PMLR
    [Google Scholar]
  37. Gramacy R, Samworth R, King R. 2010. Importance tempering. Stat. Comput. 20:11–7
    [Google Scholar]
  38. Grathwohl W, Choi D, Wu Y, Roeder G, Duvenaud D 2018. Backpropagation through the void: optimizing control variates for black-box gradient estimation. Presented at the Sixth International Conference on Learning Representations Vancouver, Canada: Apr. 30–May 3
    [Google Scholar]
  39. Hammer H, Tjelmeland H. 2008. Control variates for the Metropolis–Hastings algorithm. Scand. J. Stat. 35:3400–14
    [Google Scholar]
  40. Henderson SG. 1997. Variance reduction via an approximating Markov process Ph.D. Thesis Stanford Univ. Stanford, CA:
    [Google Scholar]
  41. Hickernell F. 1998. A generalized discrepancy and quadrature error bound. Math. Comput. 67:221299–322
    [Google Scholar]
  42. Hodgkinson L, Salomone R, Roosta F 2020. The reproducing Stein kernel approach for post-hoc corrected sampling. arXiv:2001.09266 [math.ST]
  43. Jacob PE, O'Leary J, Atchadé YF. 2020. Unbiased Markov chain Monte Carlo methods with couplings. J. R. Stat. Soc. Ser. B 82:3543–600
    [Google Scholar]
  44. Jones GL, Hobert JP. 2001. Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Stat. Sci. 16:4312–34
    [Google Scholar]
  45. Knudson C, Vats D. 2020. stableGR: a stable Gelman-Rubin diagnostic for Markov chain Monte Carlo. R Package version 1.0. https://cran.r-project.org/package=stableGR
    [Google Scholar]
  46. Kolmogorov AN. 1956. Foundations of the Theory of Probability New York: Chelsea
    [Google Scholar]
  47. Kontoyiannis I, Meyn SP. 2008. Computable exponential bounds for screened estimation and simulation. Ann. Appl. Probab. 18:41491–518
    [Google Scholar]
  48. Kyriazopoulou-Panagiotopoulou S, Kontoyiannis I, Meyn SP 2008. Control variates as screening functions. Proceedings of the 3rd International Conference on Performance Evaluation Methodologies and Tools J Baras, C Courcoubetis 1–9 Brussels: ICST
    [Google Scholar]
  49. Liu Q, Lee J 2017. Black-box importance sampling. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics A Singh, J Zhu 952–61 N.p.: PMLR
    [Google Scholar]
  50. Liu Q, Lee J, Jordan M 2016. A kernelized Stein discrepancy for goodness-of-fit tests. Proceedings of the 33rd International Conference on Machine Learning MF Balcan, KQ Weinberger 276–84 N.p.: PMLR
    [Google Scholar]
  51. Lunn DJ, Thomas A, Best N, Spiegelhalter D 2000. WinBUGS – a Bayesian modelling framework: concepts, structure, and extensibility. Stat. Comput. 10:4325–37
    [Google Scholar]
  52. Mak S, Joseph VR. 2018. Support points. Ann. Stat. 46:6A2562–92
    [Google Scholar]
  53. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. 1953. Equations of state calculations by fast computing machines. J. Chem. Phys. 12:61087–92
    [Google Scholar]
  54. Meyn SP, Tweedie RL. 1994. Computable bounds for geometric convergence rates of Markov chains. Ann. Appl. Probab. 4:4981–1011
    [Google Scholar]
  55. Meyn SP, Tweedie RL. 2012. Markov Chains and Stochastic Stability. New York: Springer
    [Google Scholar]
  56. Mijatović A, Vogrinc J. 2018. On the Poisson equation for Metropolis–Hastings chains. Bernoulli 24:32401–28
    [Google Scholar]
  57. Mira A, Solgi R, Imparato D 2013. Zero variance Markov chain Monte Carlo for Bayesian estimators. Stat. Comput. 23:5653–62
    [Google Scholar]
  58. Müller A. 1997. Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29:2429–43
    [Google Scholar]
  59. Oates CJ, Cockayne J, Briol FX, Girolami M. 2019. Convergence rates for a class of estimators based on Stein's method. Bernoulli 25:21141–59
    [Google Scholar]
  60. Oates CJ, Girolami M, Chopin N. 2017. Control functionals for Monte Carlo integration. J. R. Stat. Soc. Ser. B 79:3695–718
    [Google Scholar]
  61. Oates CJ, Papamarkou T, Girolami M. 2016. The controlled thermodynamic integral for Bayesian model evidence evaluation. J. Am. Stat. Assoc. 111:514634–45
    [Google Scholar]
  62. Oettershagen J. 2017. Construction of optimal cubature algorithms with applications to econometrics and uncertainty quantification Ph.D. Thesis Univ. Bonn
    [Google Scholar]
  63. Owen AB. 2017. Statistically efficient thinning of a Markov chain sampler. J. Comput. Graph. Stat. 26:3738–44
    [Google Scholar]
  64. Paige B, Sejdinovic D, Wood FD 2016. Super-sampling with a reservoir. Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence A Ihler, D Janzing 567–76 Arlington, VA: AUAI
    [Google Scholar]
  65. Papamarkou T, Mira A, Girolami M 2014. Zero variance differential geometric Markov chain Monte Carlo algorithms. Bayesian Anal. 9:197–128
    [Google Scholar]
  66. Plummer M 2003. JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. Proceedings of the 3rd International Workshop on Distributed Statistical Computing K Hornik, F Leisch, A Zeileis Vienna: DSC
    [Google Scholar]
  67. Plummer M, Best N, Cowles K, Vines K. 2006. CODA: Convergence diagnosis and output analysis for MCMC. R News 6:17–11
    [Google Scholar]
  68. Portier F, Segers J. 2018. Monte Carlo integration with a growing number of control variates. arXiv:1801.01797 [math.ST]
  69. Pronzato L, Zhigljavsky A. 2020. Bayesian quadrature, energy minimization, and space-filling design. SIAM/ASA J. Uncertainty Quantificat. 8:3959–1011
    [Google Scholar]
  70. R Core Team 2020. R: a language and environment for statistical computing. Statistical Software R Found Stat. Comput. Vienna:
    [Google Scholar]
  71. Radivojević T, Akhmatskaya E. 2020. Modified Hamiltonian Monte Carlo for Bayesian inference. Stat. Comput. 30:2377–404
    [Google Scholar]
  72. Rasmussen CE 2003. Gaussian processes in machine learning. Summer School on Machine Learning O Bousquet, U von Luxburg, G Rätsch 63–71 New York: Springer
    [Google Scholar]
  73. Rendl F. 2016. Semidefinite relaxations for partitioning, assignment and ordering problems. Ann. Operat. Res. 240:119–40
    [Google Scholar]
  74. Riabiz M, Chen W, Cockayne J, Swietach P, Niederer SA et al. 2021. Optimal thinning of MCMC output. J. R. Stat. Soc. Ser. B. In press
    [Google Scholar]
  75. Roberts GO, Stramer O. 2002. Langevin diffusions and Metropolis-Hastings algorithms. Methodol. Comput. Appl. Probab. 4:4337–57
    [Google Scholar]
  76. Roberts GO, Tweedie RL. 1999. Bounds on regeneration times and convergence rates for Markov chains. Stochastic Proc. Appl. 80:2211–29
    [Google Scholar]
  77. Rosenthal JS. 1995. Minorization conditions and convergence rates for Markov chain Monte Carlo. J. Am. Stat. Assoc. 90:430558–66
    [Google Scholar]
  78. Roy V. 2020. Convergence diagnostics for Markov chain Monte Carlo. Annu. Rev. Stat. Appl. 7:387–412
    [Google Scholar]
  79. Salvatier J, Wiecki TV, Fonnesbeck C. 2016. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2:e55
    [Google Scholar]
  80. Sard A. 1949. Best approximate integration formulas; best approximation formulas. Am. J. Math. 71:180–91
    [Google Scholar]
  81. Si S, Oates CJ, Duncan AB, Carin L, Briol FX 2020. Scalable control variates for Monte Carlo methods via stochastic optimization. arXiv:2006.07487 [stat.ML]
  82. South LF. 2020. ZVCV: Zero-variance control variates. R Package version 2.1.0. https://cran.r-project.org/package=ZVCV
    [Google Scholar]
  83. South LF, Karvonen T, Nemeth C, Girolami M, Oates CJ. 2021. Semi-exact control functionals from Sard's method. Biometrika 2021asab036
    [Google Scholar]
  84. South LF, Nemeth C, Oates CJ. 2019. Discussion of “Unbiased Markov chain Monte Carlo with couplings. ” by Pierre E. Jacob, John O'Leary and Yves F. Atchadé. arXiv:1912.10496 [stat.ME]
  85. South LF, Oates CJ, Mira A, Drovandi C 2018. Regularised zero-variance control variates for high-dimensional variance reduction. arXiv:1811.05073 [stat.CO]
  86. Statisticat LLC. 2021. LaplacesDemon: complete environment for Bayesian inference. R Package, version 16.1.6. https://cran.r-project.org/web/packages/LaplacesDemon/index.html
    [Google Scholar]
  87. Stein C 1972. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 2 LM Le Cam, J Neyman, EL Scott 583–602 Berkeley: Univ. Calif. Press
    [Google Scholar]
  88. Teymur O, Gorham J, Riabiz M, Oates CJ 2021. Optimal quantisation of probability measures using maximum mean discrepancy. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics A Banerjee, K Fukumizu 1027–35. N.p.: PMLR
    [Google Scholar]
  89. Vats D, Knudson C. 2018. Revisiting the Gelman-Rubin diagnostic. arXiv:1812.09384 [stat.CO]
  90. Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner PC. 2021. Rank-normalization, folding, and localization: an improved for assessing convergence of MCMC. Bayesian Anal. 16:2667–718
    [Google Scholar]
  91. Wan R, Zhong M, Xiong H, Zhu Z 2019. Neural control variates for variance reduction. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases K Myszkowski 1–9 New York: ACM
    [Google Scholar]
  92. Wang C, Chen X, Smola AJ, Xing EP 2013. Variance reduction for stochastic gradient optimization. Advances in Neural Information Processing Systems 26 CJC Burges, L Bottou, M Welling, Z Ghahramani, KQ Weinberger 181–89 Red Hook, NY: Curran
    [Google Scholar]
  93. Wendland H. 2004. Scattered Data Approximation Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  94. Wenliang LK. 2020. Blindness of score-based methods to isolated components and mixing proportions. arXiv:2008.10087 [stat.ML]
  95. Wolsey LA. 2020. Integer Programming New York: Wiley. , 2nd ed..
    [Google Scholar]
  96. Yu Y, Meng XL 2011. To center or not to center: That is not the question—an ancillarity–sufficiency interweaving strategy (ASIS) for boosting MCMC efficiency. J. Comput. Graph. Stat. 20:3531–70
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-040220-091727
Loading
/content/journals/10.1146/annurev-statistics-040220-091727
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error