Postprocessing of MCMC

Leah F. South; Marina Riabiz; Onur Teymur; Chris J. Oates

doi:10.1146/annurev-statistics-040220-091727

Annual Review of Statistics and Its Application

Volume 9, 2022

Review Article

Free

Postprocessing of MCMC

Leah F. South¹, Marina Riabiz^2,3, Onur Teymur^3,4, and Chris J. Oates^3,5
View Affiliations Hide Affiliations

Affiliations: ¹School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland 4000, Australia; email: [email protected] ²Department of Biomedical Engineering, King's College London, London SE1 7EH, United Kingdom ³Alan Turing Institute, London NW1 2DB, United Kingdom ⁴School of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury CT2 7FS, United Kingdom ⁵School of Mathematics, Statistics & Physics, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
Vol. 9:529-555 (Volume publication date March 2022) https://doi.org/10.1146/annurev-statistics-040220-091727
First published as a Review in Advance on November 29, 2021
Copyright © 2022 by Annual Reviews. All rights reserved

Abstract

Markov chain Monte Carlo is the engine of modern Bayesian statistics, being used to approximate the posterior and derived quantities of interest. Despite this, the issue of how the output from a Markov chain is postprocessed and reported is often overlooked. Convergence diagnostics can be used to control bias via burn-in removal, but these do not account for (common) situations where a limited computational budget engenders a bias-variance trade-off. The aim of this article is to review state-of-the-art techniques for postprocessing Markov chain output. Our review covers methods based on discrepancy minimization, which directly address the bias-variance trade-off, as well as general-purpose control variate methods for approximating expected quantities of interest.

Keyword(s): bias removal, control variates, Markov chain, Monte Carlo, Stein discrepancy, thinning, variance reduction

Article metrics loading...

/content/journals/10.1146/annurev-statistics-040220-091727

2022-03-07

2024-04-27

Full text loading...

/deliver/fulltext/statistics/9/1/annurev-statistics-040220-091727.html?itemId=/content/journals/10.1146/annurev-statistics-040220-091727&mimeType=html&fmt=ahah

Literature Cited

Andradóttir S, Heyman DP, Ott TJ. 1993. Variance reduction through smoothing and control variates for Markov chain simulations. ACM Trans. Model. Comput. Simul. 3:3167–89
[Google Scholar]
Assaraf R, Caffarel M. 1999. Zero-variance principle for Monte Carlo algorithms. Phys. Rev. Lett. 83:234682–85
[Google Scholar]
Assaraf R, Caffarel M. 2003. Zero-variance zero-bias principle for observables in quantum Monte Carlo: application to forces. J. Chem. Phys. 119:2010536–52
[Google Scholar]
Atchadé YF, Perron F. 2005. Improving on the independent Metropolis-Hastings algorithm. Stat. Sin. 15:13–18
[Google Scholar]
Baker J, Fearnhead P, Fox EB, Nemeth C 2019. Control variates for stochastic gradient MCMC. Stat. Comput. 29:3599–615
[Google Scholar]
Barp A, Oates CJ, Porcu E, Girolami M. 2021. A Riemann–Stein kernel method. Bernoulli In press
[Google Scholar]
Belomestny D, Iosipoi L, Moulines É, Naumov A, Samsonov S. 2020a. Variance reduction for dependent sequences with applications to stochastic gradient MCMC. arXiv:2008.06858 [math.ST]
Belomestny D, Iosipoi L, Moulines É, Naumov A, Samsonov S. 2020b. Variance reduction for Markov chains with application to MCMC. Stat. Comput. 30:4973–97
[Google Scholar]
Belomestny D, Iosipoi L, Paris Q, Zhivotovskiy N 2021. Empirical variance minimization with applications in variance reduction and optimal control. Bernoulli. In press
[Google Scholar]
Berger JO. 2013. Statistical Decision Theory and Bayesian Analysis New York: Springer
Berlinet A, Thomas-Agnan C. 2011. Reproducing Kernel Hilbert Spaces in Probability and Statistics New York: Springer
Bissiri PG, Holmes CC, Walker SG. 2016. A general framework for updating belief distributions. J. R. Stat. Soc. Ser. B 78:51103–30
[Google Scholar]
Biswas N, Jacob PE, Vanetti P 2019. Estimating convergence of Markov chains with L-lag couplings. Advances in Neural Information Processing Systems 32 H Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, E Fox, R Garnett Red Hook, NY: Curran
[Google Scholar]
Briol FX, Oates CJ, Cockayne J, Chen WY, Girolami M 2017. On the sampling problem for kernel quadrature. Proceedings of the 34th International Conference on Machine Learning D Precup, YW Teh 586–95 N.p. PMLR
[Google Scholar]
Briol FX, Oates C, Girolami M, Osborne M, Sejdinovic D 2019. Probabilistic integration: a role in statistical computation?. Stat. Sci. 34:11–22
[Google Scholar]
Brooks S, Gelman A, Jones G, Meng XL, eds. 2011. Handbook of Markov Chain Monte Carlo Boca Raton, FL: Chapman and Hall/CRC
Brooks SP, Gelman A. 1998. General methods for monitoring convergence of iterative simulations. J. Comput. Graph. Stat. 7:4434–55
[Google Scholar]
Brosse N, Durmus A, Meyn S, Moulines E, Radhakrishnan A 2019. Diffusion approximations and control variates for MCMC. arXiv:1808.01665 [stat.ME]
Bühlmann P, Van De Geer S. 2011. Statistics for High-Dimensional Data: Methods, Theory and Applications New York: Springer
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B et al. 2017. Stan: A probabilistic programming language. J. Stat. Softw. 76:1)
[Google Scholar]
Chen WY, Barp A, Briol FX, Gorham J, Girolami M et al. 2019. Stein point Markov chain Monte Carlo. Proceedings of the 36th International Conference on Machine Learning K Chaudhuri, R Salakhutdinov 1011–21 N.p.: PMLR
[Google Scholar]
Chen WY, Mackey L, Gorham J, Briol FX, Oates CJ 2018. Stein points. Proceedings of the 35th International Conference on Machine Learning J Dy, A Krause 844–53 N.p.: PMLR
[Google Scholar]
Chopin N. 2002. A sequential particle filter method for static models. Biometrika 89:3539–52
[Google Scholar]
Chwialkowski K, Strathmann H, Gretton A 2016. A kernel test of goodness of fit. Proceedings of The 33rd International Conference on Machine Learning MF Balcan, KQ Weinberger 2606–15 N.p.: PMLR
[Google Scholar]
Dellaportas P, Kontoyiannis I. 2012. Control variates for estimation based on reversible Markov chain Monte Carlo samplers. J. R. Stat. Soc. Ser. B 74:1133–61
[Google Scholar]
DeVore RA. 1998. Nonlinear approximation. Acta Numer. 7:51–150
[Google Scholar]
Dick J, Pillichshammer F. 2010. Digital Nets and Sequences: Discrepancy Theory and Quasi–Monte Carlo Integration Cambridge, UK: Cambridge Univ. Press
Duane S, Kennedy AD, Pendleton BJ, Roweth D. 1987. Hybrid Monte Carlo. Phys. Lett. B 195:2216–22
[Google Scholar]
Flegal JM, Jones GL. 2010. Batch means and spectral variance estimators in Markov chain Monte Carlo. Ann. Stat. 38:21034–70
[Google Scholar]
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. 2013. Bayesian Data Analysis Boca Raton, FL: Chapman and Hall/CRC. , 3rd ed..
Gelman A, Carlin JB, Stern HS, Rubin DB. 2003. Bayesian Data Analysis Boca Raton, FL: Chapman and Hall/CRC. , 2nd ed..
Gelman A, Rubin DB. 1992. Inference from iterative simulation using multiple sequences. Stat. Sci. 7:4457–72
[Google Scholar]
Geyer CJ. 1992. Practical Markov chain Monte Carlo. Stat. Sci. 7:4473–83
[Google Scholar]
Gorham J, Duncan AB, Vollmer SJ, Mackey L. 2019. Measuring sample quality with diffusions. Ann. Appl. Probab. 29:52884–928
[Google Scholar]
Gorham J, Mackey L 2015. Measuring sample quality with Stein's method. Advances in Neural Information Processing Systems 28 C Cortes, N Lawrence, D Lee, M Sugiyama, R Garnett 226–34 Cambridge, MA: MIT Press
[Google Scholar]
Gorham J, Mackey L. 2017. Measuring sample quality with kernels. Proceedings of the 34th International Conference on Machine Learning D Precup, YW Teh 1292–301 N.p. PMLR
[Google Scholar]
Gramacy R, Samworth R, King R. 2010. Importance tempering. Stat. Comput. 20:11–7
[Google Scholar]
Grathwohl W, Choi D, Wu Y, Roeder G, Duvenaud D 2018. Backpropagation through the void: optimizing control variates for black-box gradient estimation. Presented at the Sixth International Conference on Learning Representations Vancouver, Canada: Apr. 30–May 3
Hammer H, Tjelmeland H. 2008. Control variates for the Metropolis–Hastings algorithm. Scand. J. Stat. 35:3400–14
[Google Scholar]
Henderson SG. 1997. Variance reduction via an approximating Markov process Ph.D. Thesis Stanford Univ. Stanford, CA:
Hickernell F. 1998. A generalized discrepancy and quadrature error bound. Math. Comput. 67:221299–322
[Google Scholar]
Hodgkinson L, Salomone R, Roosta F 2020. The reproducing Stein kernel approach for post-hoc corrected sampling. arXiv:2001.09266 [math.ST]
Jacob PE, O'Leary J, Atchadé YF. 2020. Unbiased Markov chain Monte Carlo methods with couplings. J. R. Stat. Soc. Ser. B 82:3543–600
[Google Scholar]
Jones GL, Hobert JP. 2001. Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Stat. Sci. 16:4312–34
[Google Scholar]
Knudson C, Vats D. 2020. stableGR: a stable Gelman-Rubin diagnostic for Markov chain Monte Carlo. R Package version 1.0. https://cran.r-project.org/package=stableGR
Kolmogorov AN. 1956. Foundations of the Theory of Probability New York: Chelsea
Kontoyiannis I, Meyn SP. 2008. Computable exponential bounds for screened estimation and simulation. Ann. Appl. Probab. 18:41491–518
[Google Scholar]
Kyriazopoulou-Panagiotopoulou S, Kontoyiannis I, Meyn SP 2008. Control variates as screening functions. Proceedings of the 3rd International Conference on Performance Evaluation Methodologies and Tools J Baras, C Courcoubetis 1–9 Brussels: ICST
[Google Scholar]
Liu Q, Lee J 2017. Black-box importance sampling. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics A Singh, J Zhu 952–61 N.p.: PMLR
[Google Scholar]
Liu Q, Lee J, Jordan M 2016. A kernelized Stein discrepancy for goodness-of-fit tests. Proceedings of the 33rd International Conference on Machine Learning MF Balcan, KQ Weinberger 276–84 N.p.: PMLR
[Google Scholar]
Lunn DJ, Thomas A, Best N, Spiegelhalter D 2000. WinBUGS – a Bayesian modelling framework: concepts, structure, and extensibility. Stat. Comput. 10:4325–37
[Google Scholar]
Mak S, Joseph VR. 2018. Support points. Ann. Stat. 46:6A2562–92
[Google Scholar]
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. 1953. Equations of state calculations by fast computing machines. J. Chem. Phys. 12:61087–92
[Google Scholar]
Meyn SP, Tweedie RL. 1994. Computable bounds for geometric convergence rates of Markov chains. Ann. Appl. Probab. 4:4981–1011
[Google Scholar]
Meyn SP, Tweedie RL. 2012. Markov Chains and Stochastic Stability. New York: Springer
Mijatović A, Vogrinc J. 2018. On the Poisson equation for Metropolis–Hastings chains. Bernoulli 24:32401–28
[Google Scholar]
Mira A, Solgi R, Imparato D 2013. Zero variance Markov chain Monte Carlo for Bayesian estimators. Stat. Comput. 23:5653–62
[Google Scholar]
Müller A. 1997. Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29:2429–43
[Google Scholar]
Oates CJ, Cockayne J, Briol FX, Girolami M. 2019. Convergence rates for a class of estimators based on Stein's method. Bernoulli 25:21141–59
[Google Scholar]
Oates CJ, Girolami M, Chopin N. 2017. Control functionals for Monte Carlo integration. J. R. Stat. Soc. Ser. B 79:3695–718
[Google Scholar]
Oates CJ, Papamarkou T, Girolami M. 2016. The controlled thermodynamic integral for Bayesian model evidence evaluation. J. Am. Stat. Assoc. 111:514634–45
[Google Scholar]
Oettershagen J. 2017. Construction of optimal cubature algorithms with applications to econometrics and uncertainty quantification Ph.D. Thesis Univ. Bonn
Owen AB. 2017. Statistically efficient thinning of a Markov chain sampler. J. Comput. Graph. Stat. 26:3738–44
[Google Scholar]
Paige B, Sejdinovic D, Wood FD 2016. Super-sampling with a reservoir. Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence A Ihler, D Janzing 567–76 Arlington, VA: AUAI
[Google Scholar]
Papamarkou T, Mira A, Girolami M 2014. Zero variance differential geometric Markov chain Monte Carlo algorithms. Bayesian Anal. 9:197–128
[Google Scholar]
Plummer M 2003. JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. Proceedings of the 3rd International Workshop on Distributed Statistical Computing K Hornik, F Leisch, A Zeileis Vienna: DSC
[Google Scholar]
Plummer M, Best N, Cowles K, Vines K. 2006. CODA: Convergence diagnosis and output analysis for MCMC. R News 6:17–11
[Google Scholar]
Portier F, Segers J. 2018. Monte Carlo integration with a growing number of control variates. arXiv:1801.01797 [math.ST]
Pronzato L, Zhigljavsky A. 2020. Bayesian quadrature, energy minimization, and space-filling design. SIAM/ASA J. Uncertainty Quantificat. 8:3959–1011
[Google Scholar]
R Core Team 2020. R: a language and environment for statistical computing. Statistical Software R Found Stat. Comput. Vienna:
[Google Scholar]
Radivojević T, Akhmatskaya E. 2020. Modified Hamiltonian Monte Carlo for Bayesian inference. Stat. Comput. 30:2377–404
[Google Scholar]
Rasmussen CE 2003. Gaussian processes in machine learning. Summer School on Machine Learning O Bousquet, U von Luxburg, G Rätsch 63–71 New York: Springer
[Google Scholar]
Rendl F. 2016. Semidefinite relaxations for partitioning, assignment and ordering problems. Ann. Operat. Res. 240:119–40
[Google Scholar]
Riabiz M, Chen W, Cockayne J, Swietach P, Niederer SA et al. 2021. Optimal thinning of MCMC output. J. R. Stat. Soc. Ser. B. In press
[Google Scholar]
Roberts GO, Stramer O. 2002. Langevin diffusions and Metropolis-Hastings algorithms. Methodol. Comput. Appl. Probab. 4:4337–57
[Google Scholar]
Roberts GO, Tweedie RL. 1999. Bounds on regeneration times and convergence rates for Markov chains. Stochastic Proc. Appl. 80:2211–29
[Google Scholar]
Rosenthal JS. 1995. Minorization conditions and convergence rates for Markov chain Monte Carlo. J. Am. Stat. Assoc. 90:430558–66
[Google Scholar]
Roy V. 2020. Convergence diagnostics for Markov chain Monte Carlo. Annu. Rev. Stat. Appl. 7:387–412
[Google Scholar]
Salvatier J, Wiecki TV, Fonnesbeck C. 2016. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2:e55
[Google Scholar]
Sard A. 1949. Best approximate integration formulas; best approximation formulas. Am. J. Math. 71:180–91
[Google Scholar]
Si S, Oates CJ, Duncan AB, Carin L, Briol FX 2020. Scalable control variates for Monte Carlo methods via stochastic optimization. arXiv:2006.07487 [stat.ML]
South LF. 2020. ZVCV: Zero-variance control variates. R Package version 2.1.0. https://cran.r-project.org/package=ZVCV
[Google Scholar]
South LF, Karvonen T, Nemeth C, Girolami M, Oates CJ. 2021. Semi-exact control functionals from Sard's method. Biometrika 2021asab036
South LF, Nemeth C, Oates CJ. 2019. Discussion of “Unbiased Markov chain Monte Carlo with couplings. ” by Pierre E. Jacob, John O'Leary and Yves F. Atchadé. arXiv:1912.10496 [stat.ME]
South LF, Oates CJ, Mira A, Drovandi C 2018. Regularised zero-variance control variates for high-dimensional variance reduction. arXiv:1811.05073 [stat.CO]
Statisticat LLC. 2021. LaplacesDemon: complete environment for Bayesian inference. R Package, version 16.1.6. https://cran.r-project.org/web/packages/LaplacesDemon/index.html
[Google Scholar]
Stein C 1972. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 2 LM Le Cam, J Neyman, EL Scott 583–602 Berkeley: Univ. Calif. Press
[Google Scholar]
Teymur O, Gorham J, Riabiz M, Oates CJ 2021. Optimal quantisation of probability measures using maximum mean discrepancy. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics A Banerjee, K Fukumizu 1027–35. N.p.: PMLR
[Google Scholar]
Vats D, Knudson C. 2018. Revisiting the Gelman-Rubin diagnostic. arXiv:1812.09384 [stat.CO]
Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner PC. 2021. Rank-normalization, folding, and localization: an improved for assessing convergence of MCMC. Bayesian Anal. 16:2667–718
[Google Scholar]
Wan R, Zhong M, Xiong H, Zhu Z 2019. Neural control variates for variance reduction. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases K Myszkowski 1–9 New York: ACM
[Google Scholar]
Wang C, Chen X, Smola AJ, Xing EP 2013. Variance reduction for stochastic gradient optimization. Advances in Neural Information Processing Systems 26 CJC Burges, L Bottou, M Welling, Z Ghahramani, KQ Weinberger 181–89 Red Hook, NY: Curran
[Google Scholar]
Wendland H. 2004. Scattered Data Approximation Cambridge, UK: Cambridge Univ. Press
Wenliang LK. 2020. Blindness of score-based methods to isolated components and mixing proportions. arXiv:2008.10087 [stat.ML]
Wolsey LA. 2020. Integer Programming New York: Wiley. , 2nd ed..
Yu Y, Meng XL 2011. To center or not to center: That is not the question—an ancillarity–sufficiency interweaving strategy (ASIS) for boosting MCMC efficiency. J. Comput. Graph. Stat. 20:3531–70
[Google Scholar]

/content/journals/10.1146/annurev-statistics-040220-091727

Postprocessing of MCMC

Annual Review of Statistics and Its Application 9, 529 (2022); https://doi.org/10.1146/annurev-statistics-040220-091727

/content/journals/10.1146/annurev-statistics-040220-091727

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 9, 2022

Review Article

Free

Postprocessing of MCMC

Abstract

Most Read This Month

Most Cited Most Cited RSS feed