1932

Abstract

This article reviews recent progress in high-dimensional bootstrap. We first review high-dimensional central limit theorems for distributions of sample mean vectors over the rectangles, bootstrap consistency results in high dimensions, and key techniques used to establish those results. We then review selected applications of high-dimensional bootstrap: construction of simultaneous confidence sets for high-dimensional vector parameters, multiple hypothesis testing via step-down, postselection inference, intersection bounds for partially identified parameters, and inference on best policies in policy evaluation. Finally, we also comment on a couple of future research directions.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-040120-022239
2023-03-09
2024-06-21
Loading full text...

Full text loading...

/deliver/fulltext/statistics/10/1/annurev-statistics-040120-022239.html?itemId=/content/journals/10.1146/annurev-statistics-040120-022239&mimeType=html&fmt=ahah

Literature Cited

  1. Asriev A, Rotar' V 1986. On the convergence rate in the infinite-dimensional central limit theorem for probabilities of hitting parallelepipeds. Theory Probab. Appl. 30:4691–701
    [Google Scholar]
  2. Athey S, Wager S. 2021. Policy learning with observational data. Econometrica 89:1133–61
    [Google Scholar]
  3. Bach P, Chernozhukov V, Spindler M. 2018. Valid simultaneous inference in high-dimensional settings (with the hdm package for R). arXiv:1809.04951 [econ.EM]
  4. Ball K. 1993. The reverse isoperimetric problem for Gaussian measure. Discrete Comput. Geometry 10:4411–20
    [Google Scholar]
  5. Belloni A, Chernozhukov V. 2013. Least squares after model selection in high-dimensional sparse models. Bernoulli 19:2521–47
    [Google Scholar]
  6. Belloni A, Chernozhukov V, Chetverikov D, Hansen C, Kato K. 2018a. High-dimensional econometrics and regularized GMM. arXiv:1806.01888 [math.ST]
  7. Belloni A, Chernozhukov V, Chetverikov D, Wei Y. 2018b. Uniformly valid post-regularization confidence regions for many functional parameters in Z-estimation framework. Ann. Stat. 46:6B3643–75
    [Google Scholar]
  8. Belloni A, Chernozhukov V, Hansen C. 2014. Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81:2608–50
    [Google Scholar]
  9. Belloni A, Chernozhukov V, Kato K. 2015. Uniform post selection inference for LAD regression and other Z-estimation problems. Biometrika 102:77–94
    [Google Scholar]
  10. Bentkus V. 2003. On the dependence of the Berry–Esseen bound on dimension. J. Stat. Plan. Inference 113:2385–402
    [Google Scholar]
  11. Bentkus V. 2005. A Lyapunov-type bound in Rd. Theory Probab. Appl. 49:2311–23
    [Google Scholar]
  12. Berk R, Brown L, Buja A, Zhang K, Zhao L. 2013. Valid post-selection inference. Ann. Stat. 41:2802–37
    [Google Scholar]
  13. Bickel PJ, Ritov Y, Tsybakov AB. 2009. Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37:41705–32
    [Google Scholar]
  14. Bonis T. 2020. Stein's method for normal approximation in Wasserstein distances with application to the multivariate central limit theorem. Probab. Theory Relat. Fields 178:3827–60
    [Google Scholar]
  15. Boucheron S, Lugosi G, Massart P. 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence Oxford, UK: Oxford Univ. Press
    [Google Scholar]
  16. Bühlmann P, van de Geer S. 2011. Statistics for High-Dimensional Data New York: Springer
    [Google Scholar]
  17. Chang J, Chen X, Wu M 2021. Central limit theorems for high dimensional dependent data. arXiv:2104.12929 [math.ST]
  18. Chang J, Zheng C, Zhou WX, Zhou W. 2017a. Simulation-based hypothesis testing of high dimensional means under covariance heterogeneity. Biometrics 73:41300–10
    [Google Scholar]
  19. Chang J, Zhou W, Zhou WX, Wang L. 2017b. Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering. Biometrics 73:31–41
    [Google Scholar]
  20. Chatterjee S, Meckes E. 2008. Multivariate normal approximation using exchangeable pairs. ALEA 4:257–83
    [Google Scholar]
  21. Chen X. 2018. Gaussian and bootstrap approximations for high-dimensional U-statistics and their applications. Ann. Stat. 46:2642–78
    [Google Scholar]
  22. Chen X, Kato K. 2019. Randomized incomplete U-statistics in high dimensions. Ann. Stat. 47:63127–56
    [Google Scholar]
  23. Chen X, Kato K. 2020. Jackknife multiplier bootstrap: finite sample approximations to the U-process supremum with applications. Probab. Theory Relat. Fields 176:31097–163
    [Google Scholar]
  24. Chen YC, Genovese CR, Tibshirani RJ, Wasserman L. 2016. Nonparametric modal regression. Ann. Stat. 44:2489–514
    [Google Scholar]
  25. Chen YC, Genovese CR, Wasserman L. 2015. Asymptotic theory for density ridges. Ann. Stat. 43:51896–928
    [Google Scholar]
  26. Chernozhukov V, Chetverikov D, Kato K. 2013a. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Stat. 41:62786–819
    [Google Scholar]
  27. Chernozhukov V, Chetverikov D, Kato K. 2014a. Anti-concentration and honest, adaptive confidence bands. Ann. Stat. 42:51787–818
    [Google Scholar]
  28. Chernozhukov V, Chetverikov D, Kato K. 2014b. Gaussian approximation of suprema of empirical processes. Ann. Stat. 42:41564–97
    [Google Scholar]
  29. Chernozhukov V, Chetverikov D, Kato K. 2015. Comparison and anti-concentration bounds for maxima of Gaussian random vectors. Probab. Theory Relat. Fields 162:47–70
    [Google Scholar]
  30. Chernozhukov V, Chetverikov D, Kato K. 2016a. Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related Gaussian couplings. Stoch. Proc. Appl. 126:123632–51
    [Google Scholar]
  31. Chernozhukov V, Chetverikov D, Kato K. 2017a. Central limit theorems and bootstrap in high dimensions. Ann. Probab. 45:42309–52
    [Google Scholar]
  32. Chernozhukov V, Chetverikov D, Kato K. 2017b. Detailed proof of Nazarov's inequality. arXiv:1711.10696 [math.ST]
  33. Chernozhukov V, Chetverikov D, Kato K. 2019. Inference on causal and structural parameters using many moment inequalities. Rev. Econ. Stud. 86:51867–900
    [Google Scholar]
  34. Chernozhukov V, Chetverikov D, Kato K, Koike Y. 2022. Improved central limit theorem and bootstrap approximations in high dimensions. Ann. Stat. In press
    [Google Scholar]
  35. Chernozhukov V, Chetverikov D, Koike Y. 2020. Nearly optimal central limit theorem and bootstrap approximations in high dimensions. arXiv:2012.09513 [math.PR]
  36. Chernozhukov V, Hansen C, Spindler M. 2016b. High-dimensional metrics in R. arXiv:1603.01700 [stat.ML]
  37. Chernozhukov V, Lee S, Rosen AM. 2013b. Intersection bounds: estimation and inference. Econometrica 81:2667–737
    [Google Scholar]
  38. Chesher A, Rosen AM. 2017. Generalized instrumental variable models. Econometrica 85:3959–89
    [Google Scholar]
  39. Chetverikov D. 2018. Adaptive tests of conditional moment inequalities. Econom. Theory 34:1186–227
    [Google Scholar]
  40. Chetverikov D. 2019. Testing regression monotonicity in econometric models. Econom. Theory 35:4729–76
    [Google Scholar]
  41. Chiang HD, Kato K, Sasaki Y. 2021. Inference for high-dimensional exchangeable arrays. J. Am. Stat. Assoc. https://doi.org/10.1080/01621459.2021.2000868
    [Crossref] [Google Scholar]
  42. Courtade TA, Fathi M, Pananjady A. 2019. Existence of Stein kernels under a spectral gap, and discrepancy bounds. Ann. Inst. Henri Poincaré Probab. Stat. 55:2777–90
    [Google Scholar]
  43. Das D, Lahiri S. 2021. Central limit theorem in high dimensions: the optimal bound on dimension growth rate. Trans. Am. Math. Soc. 374:106991–7009
    [Google Scholar]
  44. Deng H, Zhang CH. 2020. Beyond Gaussian approximation: bootstrap for maxima of sums of independent random vectors. Ann. Stat. 48:63643–71
    [Google Scholar]
  45. Dezeure R, Buehlmann P, Zhang CH. 2017. High-dimensional simultaneous inference with the bootstrap. Test 26:685–719
    [Google Scholar]
  46. Efron B. 1979. Bootstrap methods: another look at the jackknife. Ann. Stat. 7:11–26
    [Google Scholar]
  47. Eldan R, Mikulincer D, Zhai A. 2020. The CLT in high dimensions: quantitative bounds via martingale embedding. Ann. Probab. 48:52494–524
    [Google Scholar]
  48. Fan J, Hall P, Yao Q. 2007. To how many simultaneous hypothesis tests can normal, Student's t or bootstrap calibration be applied?. J. Am. Stat. Assoc. 102:4801282–88
    [Google Scholar]
  49. Fan J, Shao QM, Zhou WX. 2018. Are discoveries spurious? Distributions of maximum spurious correlations and their applications. Ann. Stat. 46:3989–1017
    [Google Scholar]
  50. Fang X, Koike Y. 2020. Large-dimensional central limit theorem with fourth-moment error bounds on convex sets and balls. arXiv:2009.00339 [math.PR]
  51. Fang X, Koike Y. 2021. High-dimensional central limit theorems by Stein's method. Ann. Appl. Probab. 31:41660–86
    [Google Scholar]
  52. Fang X, Shao QM, Xu L. 2019. Multivariate approximations in Wasserstein distance by Stein's method and Bismut's formula. Probab. Theory Relat. Fields 174:3945–79
    [Google Scholar]
  53. Giraud C. 2014. Introduction to High-Dimensional Statistics Boca Raton, FL: Chapman and Hall/CRC
    [Google Scholar]
  54. Götze F. 1991. On the rate of convergence in the multivariate CLT. Ann. Probab. 19:724–39
    [Google Scholar]
  55. Guo X, He X. 2021. Inference on selected subgroups in clinical trials. J. Am. Stat. Assoc. 116:5351498–506
    [Google Scholar]
  56. Hall P. 1992. The Bootstrap and Edgeworth Expansion New York: Springer
    [Google Scholar]
  57. Hastie T, Tibshirani RJ, Wainwright MJ. 2015. Statistical Learning with Sparsity: the Lasso and Generalizations. Boca Raton, FL: Chapman and Hall/CRC
    [Google Scholar]
  58. Horowitz JL 2001. The bootstrap. Handbook of Econometrics, Vol. 5 JJ Heckman, E Leamer 3159–228 Amsterdam: Elsevier
    [Google Scholar]
  59. James G, Witten D, Hastie T, Tibshirani R. 2021. An Introduction to Statistical Learning New York: Springer. , 2nd ed..
    [Google Scholar]
  60. Janková J, Shah RD, Bühlmann P, Samworth RJ. 2020. Goodness-of-fit testing in high dimensional generalized linear models. J. R. Stat. Soc. Ser. B 82:3773–95
    [Google Scholar]
  61. Javanmard A, Montanari A. 2014. Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 15:12869–909
    [Google Scholar]
  62. Klivans AR, O'Donnell R, Servedio RA 2008. Learning geometric concepts via Gaussian surface area. 2008 49th Annual IEEE Symposium on Foundations of Computer Science541–50 New York: IEEE
    [Google Scholar]
  63. Koike Y. 2019. Mixed-normal limit theorems for multiple Skorohod integrals in high-dimensions, with application to realized covariance. Electron. J. Stat. 13:11443–522
    [Google Scholar]
  64. Koike Y. 2021. Notes on the dimension dependence in high-dimensional central limit theorems for hyperrectangles. Jpn. J. Stat. Data Sci. 4:1257–97
    [Google Scholar]
  65. Kuchibhotla AK, Brown LD, Buja A, Cai J, George EI, Zhao LH. 2020. Valid post-selection inference in model-free linear regression. Ann. Stat. 48:52953–81
    [Google Scholar]
  66. Kuchibhotla AK, Kolassa JE, Kuffner TA. 2021a. Post-selection inference. Annu. Rev. Stat. Appl. 9:505–27
    [Google Scholar]
  67. Kuchibhotla AK, Mukherjee S, Banerjee D. 2021b. High-dimensional CLT: improvements, non-uniform extensions and large deviations. Bernoulli 27:1192–217
    [Google Scholar]
  68. Kuchibhotla AK, Rinaldo A 2020. High-dimensional CLT for sums of non-degenerate random vectors: n–1/2-rate. arXiv:2009.13673 [math.ST]
  69. Kurisu D, Kato K, Shao X. 2021. Gaussian approximation and spatially dependent wild bootstrap for high-dimensional spatial data. arXiv:2103.10720 [math.ST]
  70. Lopes ME. 2020. Central limit theorem and bootstrap approximation in high dimensions with near 1/ rates. arXiv:2009.06004 [math.ST]
  71. Lopes ME, Lin Z, Müller HG. 2020. Bootstrapping max statistics in high dimensions: near-parametric rates under weak variance decay and application to functional and multinomial data. Ann. Stat. 48:21214–29
    [Google Scholar]
  72. Lopes ME, Wang S, Mahoney M. 2019. A bootstrap method for error estimation in randomized matrix multiplication. J. Mach. Learn. Res. 20:1434–73
    [Google Scholar]
  73. Mammen E. 1993. Bootstrap and wild bootstrap for high dimensional linear models. Ann. Stat. 21:1255–85
    [Google Scholar]
  74. Manski CF 2010. Partial identification in econometrics. Microeconometrics SN Durlauf, LE Blume 178–88 New York: Springer
    [Google Scholar]
  75. Manski CF, Pepper JV. 2009. More on monotone instrumental variables. Econom. J. 12:S200–16
    [Google Scholar]
  76. Nazarov FL 2003. On the maximal perimeter of a convex set in Rn with respect to Gaussian measure. Geometric Aspects of Functional Analysis, Vol. 2003 VD Milman, G Schechtman 169–87 Berlin: Springer
    [Google Scholar]
  77. Ning Y, Liu H. 2017. A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann. Stat. 45:1158–95
    [Google Scholar]
  78. Raič M. 2019. A multivariate Berry–Esseen theorem with explicit constants. Bernoulli 25:4A2824–53
    [Google Scholar]
  79. Reinert G, Röllin A. 2009. Multivariate normal approximation with Stein's method of exchangeable pairs under a general linearity condition. Ann. Probab. 37:62150–73
    [Google Scholar]
  80. Rinaldo A, Wasserman L, G'Sell M 2019. Bootstrapping and sample splitting for high-dimensional, assumption-lean inference. Ann. Stat. 47:63438–69
    [Google Scholar]
  81. Romano JP, Wolf M. 2005. Exact and approximate stepdown methods for multiple hypothesis testing. J. Am. Stat. Assoc. 100:46994–108
    [Google Scholar]
  82. Romano JP, Wolf M. 2016. Efficient computation of adjusted p-values for resampling-based stepdown multiple testing. Stat. Probab. Lett. 113:38–40
    [Google Scholar]
  83. Song Y, Chen X, Kato K 2019. Approximating high-dimensional infinite-order U-statistics: statistical and computational guarantees. Electron. J. Stat. 13:24794–848
    [Google Scholar]
  84. Song Y, Chen X, Kato K 2020. Stratified incomplete local simplex tests for curvature of nonparametric multiple regression. arXiv:2003.09091 [math.ST]
  85. Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58:1267–88
    [Google Scholar]
  86. van de Geer S, Bühlmann P, Ritov Y, Dezeure R. 2014. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat. 42:31166–202
    [Google Scholar]
  87. van der Vaart AW. 2000. Asymptotic Statistics, Vol. 3 Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  88. van der Vaart AW, Wellner J. 1996. Weak Convergence and Empirical Processes: With Applications to Statistics New York: Springer
    [Google Scholar]
  89. Wainwright MJ. 2019. High-Dimensional Statistics: A Non-Asymptotic Viewpoint Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  90. Wasserman L, Kolar M, Rinaldo A 2014. Berry-Esseen bounds for estimating undirected graphs. Electron. J. Stat. 8:11188–224
    [Google Scholar]
  91. Wellner JA, Zhan Y. 1996. Bootstrapping Z-estimators Tech. Rep. 308 Dep. Stat., Univ. Wash. Seattle:
    [Google Scholar]
  92. Zhai A. 2018. A high-dimensional CLT in W2 distance with near optimal convergence rate. Probab. Theory Relat. Fields 170:3821–45
    [Google Scholar]
  93. Zhang CH, Zhang SS. 2014. Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B 76:1217–42
    [Google Scholar]
  94. Zhang D, Wu W. 2017. Gaussian approximation for high dimensional time series. Ann. Stat. 45:1895–919
    [Google Scholar]
  95. Zhang X, Cheng G. 2017. Simultaneous inference for high-dimensional linear models. J. Am. Stat. Assoc. 112:757–68
    [Google Scholar]
  96. Zhang X, Cheng G. 2018. Gaussian approximation for high dimensional vector under physical dependence. Bernoulli 24:4A2640–75
    [Google Scholar]
  97. Zhilova M. 2020a. New Edgeworth-type expansions with finite sample guarantees. arXiv:2006.03959 [math.ST]
  98. Zhilova M. 2020b. Nonclassical Berry–Esseen inequalities and accuracy of the bootstrap. Ann. Stat. 48:41922–39
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-040120-022239
Loading
/content/journals/10.1146/annurev-statistics-040120-022239
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error