1932

Abstract

We present a unifying view on various statistical estimation techniques including penalization, variational, and thresholding methods. These estimators are analyzed in the context of statistical linear inverse problems including nonparametric and change point regression, and high-dimensional linear models as examples. Our approach reveals many seemingly unrelated estimation schemes as special instances of a general class of variational multiscale estimators, called MIND (multiscale Nemirovskii–Dantzig). These estimators result from minimizing certain regularization functionals under convex constraints that can be seen as multiple statistical tests for local hypotheses. For computational purposes, we recast MIND in terms of simpler unconstraint optimization problems via Lagrangian penalization as well as Fenchel duality. Performance of several MINDs is demonstrated on numerical examples.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-040120-030531
2022-03-07
2024-06-19
Loading full text...

Full text loading...

/deliver/fulltext/statistics/9/1/annurev-statistics-040120-030531.html?itemId=/content/journals/10.1146/annurev-statistics-040120-030531&mimeType=html&fmt=ahah

Literature Cited

  1. Abramovich F, Benjamini Y. 1996. Adaptive thresholding of wavelet coefficients. Comput. Stat. Data Anal. 22:4351–61
    [Google Scholar]
  2. Abramovich F, Silverman BW. 1998. Wavelet decomposition approaches to statistical inverse problems. Biometrika 85:1115–29
    [Google Scholar]
  3. Allen DM. 1974. The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16:1125–27
    [Google Scholar]
  4. Antoniadis A, Fan J 2001. Regularization of wavelet approximations. J. Am. Stat. Assoc. 96:455939–67
    [Google Scholar]
  5. Aspelmeier T, Egner A, Munk A. 2015. Modern statistical challenges in high-resolution fluorescence microscopy. Annu. Rev. Stat. Appl. 2:163–202
    [Google Scholar]
  6. Beck A, Teboulle M. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2:1183–202
    [Google Scholar]
  7. Becker S, Bobin J, Candès EJ. 2011. NESTA: a fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4:11–39
    [Google Scholar]
  8. Behr M, Holmes C, Munk A. 2018. Multiscale blind source separation. Ann. Stat. 46:2711–44
    [Google Scholar]
  9. Bertero M, Boccacci P, Desiderà G, Vicidomini G 2009. Image deblurring with Poisson data: from cells to galaxies. Inverse Problems 25:12123006
    [Google Scholar]
  10. Bickel PJ, Ritov Y, Tsybakov AB. 2009. Simultaneous analysis of lasso and Dantzig selector. Ann. Stat. 37:41705–32
    [Google Scholar]
  11. Boysen L, Kempe A, Liebscher V, Munk A, Wittich O 2009. Consistencies and rates of convergence of jump-penalized least squares estimators. Ann. Stat. 37:1157–83
    [Google Scholar]
  12. Breiman L. 1992. The little bootstrap and other methods for dimensionality selection in regression: X-fixed prediction error. J. Am. Stat. Assoc. 87:419738–54
    [Google Scholar]
  13. Breiman L. 1995. Better subset regression using the nonnegative garrote. Technometrics 37:4373–84
    [Google Scholar]
  14. Bühlmann P, van de Geer S. 2011. Statistics for High-Dimensional Data New York: Springer
    [Google Scholar]
  15. Cai T. 1999. Adaptive wavelet estimation: a block thresholding and oracle inequality approach. Ann. Stat. 27:3898–924
    [Google Scholar]
  16. Cai T, Zhou H. 2009. A data-driven block thresholding approach to wavelet estimation. Ann. Stat. 37:2569–95
    [Google Scholar]
  17. Candès EJ, Donoho DL. 1999. Ridgelets: a key to higher-dimensional intermittency?. Philos. Trans. R. Soc. A 357: 1760.2495–509
    [Google Scholar]
  18. Candès EJ, Donoho DL 2000. Curvelets—a surprisingly effective nonadaptive representation for objects with edges. Curves and Surfaces ed. A Cohen, C Rabut, LL Schumaker 105–20 Nashville, TN: Vanderbilt Univ. Press
    [Google Scholar]
  19. Candès EJ, Donoho D. 2002. Recovering edges in ill-posed inverse problems: optimality of curvelet frames. Ann. Stat. 30:3784–842
    [Google Scholar]
  20. Candès EJ, Guo F. 2002. New multiscale transforms, minimum total variation synthesis: applications to edge preserving image reconstruction. Signal Process. 82:1519–43
    [Google Scholar]
  21. Candès EJ, Tao T. 2007. The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35:62313–51
    [Google Scholar]
  22. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. 2006. Measurement Error in Nonlinear Models: A Modern Perspective Boca Raton, FL: Chapman and Hall/CRC. , 2nd ed..
    [Google Scholar]
  23. Chambolle A, DeVore RA, Lee N, Lucier BJ. 1998. Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans. Image Process. 7:3319–35
    [Google Scholar]
  24. Chambolle A, Lions PL. 1997. Image recovery via total variation minimization and related problems. Numer. Math. 76:2167–88
    [Google Scholar]
  25. Chambolle A, Pock T. 2011. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40:1120–45
    [Google Scholar]
  26. Chan TF, Zhou H. 2000. Total variation improved wavelet thresholding in image compression. Proceedings 2000 International Conference on Image Processing, Vol. 2391–94 New York: IEEE
    [Google Scholar]
  27. Chesneau C, Fadili J, Starck JLL. 2010. Stein block thresholding for wavelet-based image deconvolution. Electron. J. Stat. 4:415–35
    [Google Scholar]
  28. Chui CK. 1992. An Introduction to Wavelets. : Wavelet Analysis and Its Applications, Vol. 1: Boston: Academic
    [Google Scholar]
  29. Cohen A. 2003. Numerical Analysis of Wavelet Methods. Amsterdam: North-Holland
  30. Coifman RR, Donoho DL 1995. Translation-invariant de-noising. Wavelets and Statistics A Antoniadis, G Oppenheim 125–50 New York: Springer
    [Google Scholar]
  31. Daubechies I. 1992. Ten Lectures on Wavelets Philadelphia: SIAM
    [Google Scholar]
  32. Davies PL, Kovac A. 2001. Local extremes, runs, strings and multiresolution. Ann. Stat. 29:11–65
    [Google Scholar]
  33. Davies PL, Kovac A, Meise M. 2009. Nonparametric regression, confidence regions and regularization. Ann. Stat. 37:5B2597–625
    [Google Scholar]
  34. Davies PL, Meise M. 2008. Approximating data with weighted smoothing splines. J. Nonparametr. Stat. 20:3207–28
    [Google Scholar]
  35. Davis RA, Yau CY. 2013. Consistency of minimum description length model selection for piecewise stationary time series models. Electron. J. Stat. 7:381–411
    [Google Scholar]
  36. del Álamo M, Li H, Munk A 2021. Frame-constrained total variation regularization for white noise regression. Ann. Stat. 49:31318–46
    [Google Scholar]
  37. del Álamo M, Li H, Munk A, Werner F. 2020. Variational multiscale nonparametric regression: algorithms and implementation. Algorithms 13:11296
    [Google Scholar]
  38. del Álamo M, Munk A. 2020. Total variation multiscale estimators for linear inverse problems. Inf. Inference 9:4961–86
    [Google Scholar]
  39. Dette H, Eckle T, Vetter M 2020. Multiscale change point detection for dependent data. Scand. J. Stat. 47:41243–74
    [Google Scholar]
  40. Do MN, Vetterli M. 2005. The contourlet transform: an efficient directional multiresolution image representation. IEEE Trans. Image Process. 14:122091–106
    [Google Scholar]
  41. Dong Y, Hintermüller M, Rincon-Camacho MM. 2011. Automated regularization parameter selection in multi-scale total variation models for image restoration. J. Math. Imaging Vis. 40:182–104
    [Google Scholar]
  42. Donoho DL. 1995a. De-noising by soft-thresholding. IEEE Trans. Inf. Theory 41:3613–27
    [Google Scholar]
  43. Donoho DL. 1995b. Nonlinear solution of linear inverse problems by wavelet-vaguelette decomposition. Appl. Comput. Harmon. Anal. 2:2101–26
    [Google Scholar]
  44. Donoho DL, Johnstone IM. 1994. Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:3425–55
    [Google Scholar]
  45. Donoho DL, Johnstone IM. 1995. Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 90:4321200–24
    [Google Scholar]
  46. Dümbgen L, Spokoiny VG. 2001. Multiscale testing of qualitative hypotheses. Ann. Stat. 29:1124–52
    [Google Scholar]
  47. Durand S, Froment J 2001. Artifact free signal denoising with wavelets. 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings3685–88 New York: IEEE
    [Google Scholar]
  48. Eggermont PPB, LaRiccia VN. 2009. Maximum Penalized Likelihood Estimation, Vol. II Regression New York: Springer
    [Google Scholar]
  49. Epstein CL. 2008. Introduction to the Mathematics of Medical Imaging. . Philadelphia: SIAM. , 2nd ed..
    [Google Scholar]
  50. Fang B, Guntuboyina A, Sen B 2021. Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy–Krause variation. Ann. Stat. In press
    [Google Scholar]
  51. Fang X, Li J, Siegmund D 2020. Segmentation and estimation of change-point models: false positive control and confidence regions. Ann. Stat. 48:31615–47
    [Google Scholar]
  52. Frick K, Marnitz P, Munk A. 2012. Statistical multiresolution estimation in imaging: fundamental concepts and algorithmic approach. Electron. J. Stat. 6:231–68
    [Google Scholar]
  53. Frick K, Munk A, Sieling H. 2014. Multiscale change point inference. J. R. Stat. Soc. Ser. B. 76:3495–580
    [Google Scholar]
  54. Fuchs JJ. 2001. On the application of the global matched filter to DOA estimation with uniform circular arrays. IEEE Trans. Signal Process. 49:4702–9
    [Google Scholar]
  55. Fuchs JJ. 2004. On sparse representations in arbitrary redundant basis. IEEE Trans. Inf. Theory 50:61341–44
    [Google Scholar]
  56. Gao H. 1998. Wavelet shrinkage denoising using the non-negative garrote. J. Comput. Graph. Stat. 7:4469–88
    [Google Scholar]
  57. Ghosal S, van der Vaart A. 2017. Fundamentals of Nonparametric Bayesian Inference. Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  58. Grasmair M, Li H, Munk A. 2018. Variational multiscale nonparametric regression: smooth functions. Ann. Inst. Henri Poincaré Probab. Stat. 54:21058–97
    [Google Scholar]
  59. Guntuboyina A, Lieu D, Chatterjee S, Sen B 2020. Adaptive risk bounds in univariate total variation denoising and trend filtering. Ann. Stat. 48:1205–29
    [Google Scholar]
  60. Hall P, Penev S, Kerkyacharian G, Picard D 1997. Numerical performance of block thresholded wavelet estimators. Stat. Comput. 7:115–24
    [Google Scholar]
  61. Haltmeier M, Munk A. 2014. Extreme value analysis of empirical frame coefficients and implications for denoising by soft-thresholding. Appl. Comput. Harmon. Anal. 36:3434–60
    [Google Scholar]
  62. Hart JD. 1997. Nonparametric Smoothing and Lack-of-Fit Tests New York: Springer
    [Google Scholar]
  63. Huang J, Ma S, Zhang C 2008. Adaptive lasso for sparse high-dimensional regression models. Stat. Sin. 18:41603–18
    [Google Scholar]
  64. Hütter JC, Rigollet P. 2016. Optimal rates for total variation denoising. Proc. Mach. Learn. Res. 49:1115–46
    [Google Scholar]
  65. James W, Stein C 1961. Estimation with quadratic loss. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability J Neyman 361–79 Berkeley: Univ. Calif. Press
    [Google Scholar]
  66. Kabluchko Z. 2011. Extremes of the standardized Gaussian noise. Stochast. Process. Appl. 121:3515–33
    [Google Scholar]
  67. Kalifa J, Mallat S. 2003. Thresholding estimators for linear inverse problems and deconvolutions. Ann. Stat. 31:158–109
    [Google Scholar]
  68. Kerkyacharian G, Kyriazis G, Le Pennec E, Petrushev P, Picard D 2010. Inversion of noisy Radon transform by SVD based needlets. Appl. Comput. Harmon. Anal. 28:124–45
    [Google Scholar]
  69. König C, Munk A, Werner F 2020. Multidimensional multiscale scanning in exponential families: limit theory and statistical consequences. Ann. Stat. 48:2655–78
    [Google Scholar]
  70. Korostelev A, Korosteleva O. 2011. Mathematical Statistics: Asymptotic Minimax Theory Providence, RI: Am. Math. Soc.
    [Google Scholar]
  71. Kovács S, Li H, Haubner L, Munk A, Bühlmann P. 2020. Optimistic search strategy: change point detection for large-scale data via adaptive logarithmic queries. arXiv:2010.10194 [stat.ME]
  72. Kutyniok G, Shahram M, Zhuang X. 2012. ShearLab: a rational design of a digital parabolic scaling algorithm. SIAM J. Imaging Sci. 5:41291–332
    [Google Scholar]
  73. Labate D, Lim WQ, Kutyniok G, Weiss G. 2005. Sparse multidimensional representation using shearlets. Proc. SPIE 5914:59140U
    [Google Scholar]
  74. Lang M, Guo H, Odegard JE, Burrus CS, Wells RO. 1996. Noise reduction using an undecimated discrete wavelet transform. IEEE Signal Process. Lett. 3:110–12
    [Google Scholar]
  75. Leeb H, Pötscher BM. 2006. Can one estimate the conditional distribution of post-model-selection estimators?. Ann. Stat. 34:52554–91
    [Google Scholar]
  76. Lenzen F, Berger J 2015. Solution-driven adaptive total variation regularization. Scale Space and Variational Methods in Computer Vision 5th International Conference, SSVM 2015 J-F Aujol, M Nikolova, N Papadakis 203–15 New York: Springer
    [Google Scholar]
  77. Lepskiĭ OV. 1990. A problem of adaptive estimation in Gaussian white noise. Teor. Veroyatnost. Primenen. 35:3459–70
    [Google Scholar]
  78. Lepski OV, Mammen E, Spokoiny VG 1997. Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors. Ann. Stat. 25:3929–47
    [Google Scholar]
  79. Li H, Guo Q, Munk A. 2019. Multiscale change-point segmentation: beyond step functions. Electron. J. Stat. 13:23254–96
    [Google Scholar]
  80. Li H, Munk A, Sieling H 2016. FDR-control in multiscale change-point segmentation. Electron. J. Stat. 10:1918–59
    [Google Scholar]
  81. Li H, Munk A, Sieling H, Walther G 2020. The essential histogram. Biometrika 107:2347–64
    [Google Scholar]
  82. Li H, Werner F 2020. Empirical risk minimization as parameter choice rule for general linear regularization methods. Ann. Inst. Henri Poincaré Probab. Stat. 56:1405–27
    [Google Scholar]
  83. Loader CR. 1999. Bandwidth selection: classical or plug-in?. Ann. Stat. 27:2415–38
    [Google Scholar]
  84. Ma J, Plonka G. 2010. The curvelet transform. IEEE Signal Process. Mag. 27:2118–33
    [Google Scholar]
  85. Malgouyres F. 2002a. Mathematical analysis of a model which combines total variation and wavelet for image restoration. J. Inform. Process. 2:11–10
    [Google Scholar]
  86. Malgouyres F. 2002b. Minimizing the total variation under a general convex constraint for image restoration. IEEE Trans. Image Process. 11:121450–56
    [Google Scholar]
  87. Mallat S. 2009. A Wavelet Tour of Signal Processing: The Sparse Way Amsterdam: Elsevier. , 3rd ed..
    [Google Scholar]
  88. Mallows CL. 2000. Some comments on cp. Technometrics 42:187–94
    [Google Scholar]
  89. Mammen E, van de Geer S. 1997. Locally adaptive regression splines. Ann. Stat. 25:1387–413
    [Google Scholar]
  90. Martin D, Fowlkes C, Tal D, Malik J 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. Proceedings Eighth IEEE International Conference on Computer Vision—ICCV 2001, Vol. 2416–23 New York: IEEE
    [Google Scholar]
  91. Morozov VA. 1966. On the solution of functional equations by the method of regularization. Sov. Math. Dokl. 7:414–17
    [Google Scholar]
  92. Nason GP, Silverman BW 1995. The stationary wavelet transform and some statistical applications. Wavelets and Statistics A Antoniadis, G Oppenheim 281–99 New York: Springer
    [Google Scholar]
  93. Natterer F, Wübbeling F. 2001. Mathematical Methods in Image Reconstruction. Philadelphia: SIAM
  94. Nemirovskii AS. 1985. Nonparametric estimation of smooth regression functions. Izv. Akad. Nauk. SSR Teckhn. Kibernet 3:50–60
    [Google Scholar]
  95. Nesterov Y. 2005. Smooth minimization of non-smooth functions. Math. Program. 103:1127–52
    [Google Scholar]
  96. Ortelli F, van de Geer S. 2020. Adaptive rates for total variation image denoising. J. Mach. Learn. Res. 21:247–38
    [Google Scholar]
  97. Ortelli F, van de Geer S. 2021. Tensor denoising with trend filtering. arXiv:2101.10692 [math.ST]
  98. Osborne MR, Presnell B, Turlach BA 2000. On the LASSO and its dual. J. Comput. Graph. Statist. 9:2319–37
    [Google Scholar]
  99. O'Sullivan F. 1986. A statistical perspective on ill-posed inverse problems. Stat. Sci. 1:4502–27
    [Google Scholar]
  100. Pein F, Sieling H, Munk A. 2017. Heterogeneous change point inference. J. R. Stat. Soc. Ser. B 79:41207–27
    [Google Scholar]
  101. Pesquet JC, Krim H, Carfantan H. 1996. Time-invariant orthonormal wavelet representations. IEEE Trans. Signal Process. 44:81964–70
    [Google Scholar]
  102. Portnoy S. 1988. Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann. Stat. 16:1356–66
    [Google Scholar]
  103. Proksch K, Werner F, Munk A 2018. Multiscale scanning in inverse problems. Ann. Stat. 46:6B3569–602
    [Google Scholar]
  104. Rudin LI, Osher S, Fatemi E. 1992. Nonlinear total variation based noise removal algorithms. Phys. D 60:1–4259–68
    [Google Scholar]
  105. Sadhanala V, Wang YX, Sharpnack JL, Tibshirani RJ 2017. Higher-order total variation classes on grids: Minimax theory and trend filtering methods. Advances in Neural Information Processing Systems 30 (NIPS 2017) I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus et al. Red Hook, NY: Curran
    [Google Scholar]
  106. Schmidt-Hieber J, Munk A, Dümbgen L 2013. Multiscale methods for shape constraints in deconvolution: confidence statements for qualitative features. Ann. Stat. 41:31299–328
    [Google Scholar]
  107. Shao J. 1996. Bootstrap model selection. J. Am. Stat. Assoc. 91:434655–65
    [Google Scholar]
  108. Sharpnack J, Arias-Castro E. 2016. Exact asymptotics for the scan statistic and fast alternatives. Electron. J. Stat. 10:22641–84
    [Google Scholar]
  109. Siegmund D, Venkatraman ES. 1995. Using the generalized likelihood ratio statistic for sequential detection of a change-point. Ann. Stat. 23:1255–71
    [Google Scholar]
  110. Siegmund D, Yakir B. 2000. Tail probabilities for the null distribution of scanning statistics. Bernoulli 6:2191–213
    [Google Scholar]
  111. Spokoiny V. 2009. Multiscale local change point detection with applications to value-at-risk. Ann. Stat. 37:31405–36
    [Google Scholar]
  112. Starck JL, Candès EJ, Donoho DL. 2002. The curvelet transform for image denoising. IEEE Trans. Image Process. 11:6670–84
    [Google Scholar]
  113. Starck JL, Donoho D, Candès E. 2011. Very high quality image restoration by combining wavelets and curvelets. Proc. SPIE 4478:9
    [Google Scholar]
  114. Starck JL, Murtagh F, Fadili JM. 2010. Sparse Image and Signal Processing Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  115. Stone M. 1974. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B 36:111–47
    [Google Scholar]
  116. Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58:1267–88
    [Google Scholar]
  117. Tibshirani RJ. 2014. Adaptive piecewise polynomial estimation via trend filtering. Ann. Stat. 42:1285–323
    [Google Scholar]
  118. Tropp JA. 2006. Just relax: convex programming methods for identifying sparse signals in noise. IEEE Trans. Inf. Theory 52:31030–51
    [Google Scholar]
  119. Tsybakov AB. 2009. Introduction to Nonparametric Estimation, transl. V Zaiats New York: Springer
    [Google Scholar]
  120. van der Vaart AW. 1998. Asymptotic Statistics Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  121. Vanegas LJ, Behr M, Munk A. 2021. Multiscale quantile segmentation. J. Am. Stat. Assoc. In press . https://doi.org/10.1080/01621459.2020.1859380
    [Crossref] [Google Scholar]
  122. Vidakovic B. 1999. Statistical Modeling by Wavelets New York: Wiley
    [Google Scholar]
  123. Wahba G. 1977. Practical approximate solutions to linear operator equations when the data are noisy. SIAM J. Numer. Anal. 14:4651–67
    [Google Scholar]
  124. Wahba G. 1990. Spline Models for Observational Data Philadelphia: SIAM:
    [Google Scholar]
  125. Wainwright MJ. 2019. High-Dimensional Statistics: A Non-Asymptotic Viewpoint Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  126. Walther G, Perry A 2020. Calibrating the scan statistic: finite sample performance versus asymptotics. arXiv:2008.06136 [math.ST]
  127. Walter GG, Shen X. 2001. Wavelets and Other Orthogonal Systems. : Studies in Advanced Mathematics Boca Raton, FL: Chapman and Hall/CRC. , 2nd ed..
    [Google Scholar]
  128. Wang YX, Sharpnack J, Smola AJ, Tibshirani RJ. 2016. Trend filtering on graphs. J. Mach. Learn. Res. 17:105
    [Google Scholar]
  129. Wendland H. 2005. Scattered Data Approximation Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  130. Zhang NR, Siegmund DO. 2007. A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63:122–32
    [Google Scholar]
  131. Zou H. 2006. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101:4761418–29
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-040120-030531
Loading
/content/journals/10.1146/annurev-statistics-040120-030531
Loading

Data & Media loading...

Supplementary Data

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error