1932

Abstract

Rich data generating mechanisms are ubiquitous in this age of information and require complex statistical models to draw meaningful inference. While Bayesian analysis has seen enormous development in the last 30 years, benefitting from the impetus given by the successful application of Markov chain Monte Carlo (MCMC) sampling, the combination of big data and complex models conspire to produce significant challenges for the traditional MCMC algorithms. We review modern algorithmic developments addressing the latter and compare their performance using numerical experiments.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-033121-110254
2023-03-09
2024-04-30
Loading full text...

Full text loading...

/deliver/fulltext/statistics/10/1/annurev-statistics-033121-110254.html?itemId=/content/journals/10.1146/annurev-statistics-033121-110254&mimeType=html&fmt=ahah

Literature Cited

  1. Andrieu C, Roberts GO. 2009. The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Stat. 37:2697–725
    [Google Scholar]
  2. Andrieu C, Thoms J. 2008. A tutorial on adaptive MCMC. Stat. Comput. 18:343–73
    [Google Scholar]
  3. Andrieu C, Vihola M. 2015. Convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms. Ann. Appl. Probab. 25:21030–77
    [Google Scholar]
  4. Bai Y, Roberts GO, Rosenthal JS. 2011. On the containment condition for adaptive Markov chain Monte Carlo algorithms. Adv. Appl. Stat. 21:11–54
    [Google Scholar]
  5. Bardenet R, Doucet A, Holmes C. 2014. Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach. PMLR 32:1405–13
    [Google Scholar]
  6. Bardenet R, Doucet A, Holmes C. 2017. On Markov chain Monte Carlo methods for tall data. J. Mach. Learn. Res. 18:11515–57
    [Google Scholar]
  7. Beaumont MA, Zhang W, Balding DJ. 2002. Approximate Bayesian computation in population genetics. Genetics 162:42025–35
    [Google Scholar]
  8. Biswas N, Jacob PE, Vanetti P. 2019. Estimating convergence of Markov chains with L-lag couplings. NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing Systems7389–99. N.p.: NeurIPS
    [Google Scholar]
  9. Blei DM, Kucukelbir A, McAuliffe JD. 2017. Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112:518859–77
    [Google Scholar]
  10. Campbell T, Broderick T. 2019. Automated scalable Bayesian inference via Hilbert coresets. J. Mach. Learn. Res. 20:1551–88
    [Google Scholar]
  11. Changye W, Robert CP. 2019. Parallelising MCMC via random forests. arXiv:1911.09698 [stat.CO]
  12. Chipman HA, George EI, McCulloch RE. 2010. BART: Bayesian additive regression trees. Ann. Appl. Stat. 4:1266–98
    [Google Scholar]
  13. Craiu RV, Gray L, Łatuszyński K, Madras N, Roberts GO, Rosenthal JS. 2015. Stability of adversarial Markov chains, with an application to adaptive MCMC algorithms. Ann. Appl. Probab. 25:63592–623
    [Google Scholar]
  14. Craiu RV, Meng XL. 2005. Multiprocess parallel antithetic coupling for backward and forward Markov chain Monte Carlo. Ann. Stat. 33:2661–97
    [Google Scholar]
  15. Craiu RV, Rosenthal JS, Yang C 2009. Learn from thy neighbor: parallel-chain adaptive and regional MCMC. J. Am. Stat. Assoc. 104:1454–66
    [Google Scholar]
  16. Cui T, Peeters L, Pagendam D, Pickett T, Jin H et al. 2018. Emulator-enabled approximate Bayesian computation (ABC) and uncertainty analysis for computationally expensive groundwater models. J. Hydrol. 564:191–207
    [Google Scholar]
  17. Deligiannidis G, Doucet A, Pitt MK. 2018. The correlated pseudomarginal method. J. R. Stat. Soc. Ser. B 80:5839–70
    [Google Scholar]
  18. Dette H, Hallin M, Kley T, Volgushev S 2015. Of copulas, quantiles, ranks and spectra: an l1-approach to spectral analysis. Bernoulli 21:2781–831
    [Google Scholar]
  19. Entezari R, Craiu RV, Rosenthal JS. 2018. Likelihood inflating sampling algorithm. Can. J. Stat. 46:1147–75
    [Google Scholar]
  20. Fearnhead P, Prangle D. 2012. Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J. R. Stat. Soc. Ser. B 74:3419–74
    [Google Scholar]
  21. Fieller E, Hartley H. 1954. Sampling with control variables. Biometrika 41:3/4494–501
    [Google Scholar]
  22. Filippi S, Barnes CP, Cornebise J, Stumpf MP. 2013. On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. Stat. Appl. Genet. Mol. Biol. 12:187–107
    [Google Scholar]
  23. Frigessi A, Gåsemyr J, Rue H. 2000. Antithetic coupling of two Gibbs sampler chains. Ann. Stat. 28:1128–49
    [Google Scholar]
  24. Gallant AR, McCulloch RE. 2009. On the determination of general scientific models with application to asset pricing. J. Am. Stat. Assoc. 104:485117–31
    [Google Scholar]
  25. Gong L, Flegal JM. 2016. A practical sequential stopping rule for high-dimensional Markov chain Monte Carlo. J. Comput. Graph. Stat. 25:3684–700
    [Google Scholar]
  26. Gourieroux C, Monfort A, Renault E 1993. Indirect inference. J. Appl. Econom. 8:S1S85–118
    [Google Scholar]
  27. Haario H, Saksman E, Tamminen J. 2001. An adaptive Metropolis algorithm. Bernoulli 7:223–42
    [Google Scholar]
  28. Hastings WK. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:197–109
    [Google Scholar]
  29. Hoffman MD, Gelman A 2014. The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15:11593–623
    [Google Scholar]
  30. Huggins J, Campbell T, Broderick T. 2016. Coresets for scalable Bayesian logistic regression. NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems4087–95. N.p.: NeurIPS
    [Google Scholar]
  31. Korattikara A, Chen Y, Welling M. 2014. Austerity in MCMC land: cutting the Metropolis-Hastings budget. PMLR 32:1181–89
    [Google Scholar]
  32. Łatuszyński K, Rosenthal JS. 2014. The containment condition and adapfail algorithms. J. Appl. Probab. 51:41189–95
    [Google Scholar]
  33. Lee A. 2012. On the choice of MCMC kernels for approximate Bayesian computation with SMC samplers. Proceedings of the 2012 Winter Simulation Conference (WSC) New York: IEEE
    [Google Scholar]
  34. Lee A, Andrieu C, Doucet A. 2012. Discussion of constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J. R. Stat. Soc. Ser. B 74:3449–50
    [Google Scholar]
  35. Levi E, Craiu RV. 2022. Finding our way in the dark: approximate MCMC for approximate Bayesian methods. Bayesian Anal. 17:1193–221
    [Google Scholar]
  36. Lux T, Marchesi M. 2000. Volatility clustering in financial markets: a microsimulation of interacting agents. Int. J. Theor. Appl. Finance 3:04675–702
    [Google Scholar]
  37. Manousakas D, Xu Z, Mascolo C, Campbell T. 2020. Bayesian pseudocoresets. Advances in Neural Information Processing Systems 33 (NeurIPS 2020)14950–60. N.p.: NeurIPS
    [Google Scholar]
  38. Marin JM, Pillai NS, Robert CP, Rousseau J. 2014. Relevant statistics for Bayesian model choice. J. R. Stat. Soc. Ser. B 76:5833–59
    [Google Scholar]
  39. Marjoram P, Molitor J, Plagnol V, Tavaré S. 2003. Markov chain Monte Carlo without likelihoods. PNAS 100:2615324–28
    [Google Scholar]
  40. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. 1953. Equation of state calculations by fast computing machines. J. Chem. Phys. 21:61087–92
    [Google Scholar]
  41. Metropolis N, Ulam S. 1949. Monte Carlo method. J. Am. Stat. Assoc. 44:247335–41
    [Google Scholar]
  42. Neiswanger W, Wang C, Xing E. 2013. Asymptotically exact, embarrassingly parallel MCMC. arXiv:1311.4780 [stat.ML]
  43. Nemeth C, Sherlock C. 2018. Merging MCMC subposteriors through Gaussian-process approximations. Bayesian Anal. 13:2507–30
    [Google Scholar]
  44. Oyebamiji OK, Edwards NR, Holden PB, Garthwaite PH, Schaphoff S, Gerten D. 2015. Emulating global climate change impacts on crop yields. Stat. Model. 15:6499–525
    [Google Scholar]
  45. Plumlee M, Asher TG, Chang W, Bilskie MV 2021. High-fidelity hurricane surge forecasting using emulation and sequential experiments. Ann. Appl. Stat. 15:1460–80
    [Google Scholar]
  46. Polson NG, Scott JG, Windle J. 2013. Bayesian inference for logistic models using Pólya–Gamma latent variables. J. Am. Stat. Assoc. 108:5041339–49
    [Google Scholar]
  47. Pompe E, Holmes C, Łatuszyński K. 2020. A framework for adaptive MCMC targeting multimodal distributions. Ann. Stat. 48:52930–52
    [Google Scholar]
  48. Pratola MT. 2016. Efficient Metropolis–Hastings proposal mechanisms for Bayesian regression tree models. Bayesian Anal. 11:3885–911
    [Google Scholar]
  49. Price LF, Drovandi CC, Lee A, Nott DJ. 2018. Bayesian synthetic likelihood. J. Comput. Graph. Stat. 27:11–11
    [Google Scholar]
  50. Pritchard JK, Seielstad MT, Perez-Lezaun A, Feldman MW. 1999. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16:121791–98
    [Google Scholar]
  51. Quiroz M, Kohn R, Villani M, Tran MN. 2018. Speeding up MCMC by efficient data subsampling. J. Am. Stat. Assoc.
    [Google Scholar]
  52. Robert CP. 2014. Bayesian computational tools. Annu. Rev. Stat. Appl. 1:153–77
    [Google Scholar]
  53. Roberts GO, Rosenthal JS. 2009. Examples of adaptive MCMC. J. Comput. Graph. Stat. 18:349–67
    [Google Scholar]
  54. Rosenthal JS, Yang J 2018. Ergodicity of combocontinuous adaptive MCMC algorithms. Methodol. Comput. Appl. Probab. 20:2535–51
    [Google Scholar]
  55. Rue H, Riebler A, Sorbye S, Illian J, Simpson D, Lindgren F. 2017. Bayesian computing with INLA: a review. Annu. Rev. Stat. Appl. 4:395–421
    [Google Scholar]
  56. Schmitt TA, Schäfer R, Dette H, Guhr T. 2015. Quantile correlations: uncovering temporal dependencies in financial time series. Int. J. Theor. Appl. Finance 18:71550044
    [Google Scholar]
  57. Scott SL, Blocker AW, Bonassi FV, Chipman HA, George EI, McCulloch RE 2016. Bayes and big data: the consensus Monte Carlo algorithm. Int. J. Manag. Sci. Eng. Manag. 11:278–88
    [Google Scholar]
  58. Sisson SA, Fan Y, Beaumont M. 2018a. Overview of approximate Bayesian computation. arXiv:1802.09720 [stat.CO]
  59. Sisson SA, Fan Y, Beaumont M. 2018b. Handbook of Approximate Bayesian Computation Boca Raton, FL: Chapman and Hall/CRC
  60. Sisson SA, Fan Y, Tanaka MM. 2007. Sequential Monte Carlo without likelihoods. PNAS 104:61760–65
    [Google Scholar]
  61. Smith AA Jr. 1993. Estimating nonlinear time-series models using simulated vector autoregressions. J. Appl. Econom. 8:S1S63–84
    [Google Scholar]
  62. Tavaré S, Balding DJ, Griffiths RC, Donnelly P. 1997. Inferring coalescence times from DNA sequence data. Genetics 145:2505–18
    [Google Scholar]
  63. Vats D, Flegal JM, Jones GL. 2019. Multivariate output analysis for Markov chain Monte Carlo. Biometrika 106:2321–37
    [Google Scholar]
  64. Wang X, Dunson DB. 2013. Parallelizing MCMC via Weierstrass sampler. arXiv:1312.4605 [stat.CO]
  65. Wood SN. 2010. Statistical inference for noisy nonlinear ecological dynamic systems. Nature 466:73101102
    [Google Scholar]
  66. Yang J, Levi E, Craiu RV, Rosenthal JS. 2019. Adaptive component-wise multiple-try Metropolis sampling. J. Comput. Graph. Stat. 28:2276–89
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-033121-110254
Loading
/content/journals/10.1146/annurev-statistics-033121-110254
Loading

Data & Media loading...

Supplemental Material

Supplementary Data

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error